These are representative sample records from Science.gov related to your search topic.
For comprehensive and current results, perform a real-time search at Science.gov.
1

Speech, Speech!  

ERIC Educational Resources Information Center

Discussion focuses on the nature of computer-generated speech and voice synthesis today. State-of-the-art devices for home computers are called text-to-speech (TTS) systems. Details about the operation and use of TTS synthesizers are provided, and the time saving in programing over previous methods is emphasized. (MP)

McComb, Gordon

1982-01-01

2

Sparkling Speeches  

NSDL National Science Digital Library

Sparkling is the word! In this lesson, students will investigate transforming an exciting student-created expository into an engaging and quality speech using resources from the classroom and the school media center. Students will listen to a remarkable Martin Luther King speech provided by YouTube, confer with classmates on speech construction, and use a variety of easy to access materials (included with this lesson) during the construction of their speech. The lesson allows for in-depth trials and experiments with expository writing and speech writing. In one exciting option, students may use a "Speech Forum" to safely practice their unique speeches in front of a small non-assessing audience of fellow students. A complete exploration and comprehension of introductions, main ideas with support details, and an engaging conclusion transformed into a student speech with a written exam are the final assessments for this memorable lesson.

2012-12-14

3

Speech Aids  

NASA Technical Reports Server (NTRS)

Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.

1987-01-01

4

Speech Communication.  

ERIC Educational Resources Information Center

The communications approach to teaching speech to high school students views speech as the study of the communication process in order to develop an awareness of and a sensitivity to the variables that affect human interaction. In using this approach the student is encouraged to try out as many types of messages using as many techniques and…

Anderson, Betty

5

Speech Development  

MedlinePLUS

... two most common speech surgeries are 1) pharyngeal flap and 2) sphincter pharyngoplasty. (The surgeon may also ... his/her nose.” Otitis Media — Ear infection. Pharyngeal Flap — Surgical procedure designed to minimize hypernasality. A flap ...

6

Speech Problems  

MedlinePLUS

... If you're in your teens and still stuttering, though, you may not feel like it's so ... million Americans have the speech disorder known as stuttering (or stammering, as it's known in Britain). It's ...

7

Great American Speeches  

NSDL National Science Digital Library

Watch the video presentations of each of these speeches. Gettysburg address Martin Luther King- I Have a Dream Freedom of Speech by Mario Savio Mario Savio Speech New worker plan Speech by FDR For manuscripts, audio and video of many other modern and past speeches follow the link below: American Speech Bank ...

Ms. Olsen

2006-11-14

8

Japanese speech databases for robust speech recognition  

Microsoft Academic Search

At ATR, a next-generation speech translation system is under development towards natural trans-language communication. To cope with the various requirements to speech recognition technology for the new system, further research efforts should emphasize the robustness for large vocabulary, speaking variations often found in fast spontaneous speech and speaker variances. These are key problems to be solved not only for speech

Atsushi Nakamura; Shoichi Matsunaga; Tohru Shimizu; Masahiro Tonomura; Yoshinori Sagisaka

1996-01-01

9

Subvocal Speech  

NSDL National Science Digital Library

Every word you say is controlled by electrical nerve signals from your brain, which tell your lips, throat, and tongue exactly how to say it. This Science Update lesson deals with how scientists are trying to tap into those silent speech commands.

Science Update (; )

2004-07-26

10

Speech Improvement.  

ERIC Educational Resources Information Center

This book serves as a guide for the native and non-native speaker of English in overcoming various problems in articulation, rhythm, and intonation. It is also useful in group therapy speech programs. Forty-five practice chapters offer drill materials for all the vowels, diphthongs, and consonants of American English plus English stress and…

Gordon, Morton J.

11

Speech communications in noise  

NASA Technical Reports Server (NTRS)

The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.

1984-01-01

12

78 FR 49693 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...  

Federal Register 2010, 2011, 2012, 2013

...FCC 13-101] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...Commission's Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...378-3160, fax: (202) 488-5563, Internet: www.bcpiweb.com. Document...

2013-08-15

13

Speech and Language Disorders  

MedlinePLUS

... This information in Spanish ( en español ) Speech and language disorders More information on speech and language disorders ... Return to top More information on Speech and language disorders Explore other publications and websites Aphasia - This ...

14

Speech impairment (adult)  

MedlinePLUS

Language impairment; Impairment of speech; Inability to speak; Aphasia; Dysarthria; Slurred speech; Dysphonia voice disorders ... Common speech and language disorders include: APHASIA Aphasia is ... understand or express spoken or written language. It commonly ...

15

Speech recognition and understanding  

SciTech Connect

This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

Vintsyuk, T.K.

1983-05-01

16

Speech research  

NASA Astrophysics Data System (ADS)

Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

1992-06-01

17

Careers in Speech Communication.  

ERIC Educational Resources Information Center

Brief discussions in this pamphlet suggest educational and career opportunities in the following fields of speech communication: rhetoric, public address, and communication; theatre, drama, and oral interpretation; radio, television, and film; speech pathology and audiology; speech science, phonetics, and linguistics; and speech education.…

Speech Communication Association, New York, NY.

18

Delayed Speech or Language Development  

MedlinePLUS

... your child is right on schedule. Normal Speech & Language Development It's important to discuss early speech and ... for example). Continue The Difference Between Speech and Language Speech and language are often confused, but there ...

19

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS  

E-print Network

to learn the characteristic speech patterns from a large speech database with accompanying tran- scriptionsSPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gaji of automatic speech recognition systems (ASR) against additive background noise, by finding speech parameters

20

Speech disorders - children  

MedlinePLUS

... person has problems creating or forming the speech sounds needed to communicate with others. Three common speech ... are disorders in which a person repeats a sound, word, or phrase. Stuttering may be the most ...

21

Silence, speech, and responsibility  

E-print Network

Pornography deserves special protections, it is often said, because it qualifies as speech; therefore, no matter what we think of it, we must afford it the protections that we extend to most speech, but don't extend to ...

Maitra, Ishani, 1974-

2002-01-01

22

Perceptual speech modeling for noisy speech recognition  

Microsoft Academic Search

This paper proposes a perceptual modeling approach with a two-stage recognition to deal with the issues of recognition degradation in noisy environment. The auditory masking effect is used for speech enhancement and acoustic modeling in order to overcome the model inconsistencies between training speech and noisy input. In the two-stage recognition, the maximum a posteriori (MAP) based adaptation algorithm is

Chung-Hsien Wu; Yu-Hsien Chiu; Huigan Lim

2002-01-01

23

Speech Compression by Polynomial Approximation  

Microsoft Academic Search

Methods for speech compression aim at reducing the transmission bit rate while preserving the quality and intelligibility of speech. These objectives are antipodal in nature since higher compression presupposes preserving less information about the original speech signal. This paper presents a method for compressing speech based on polynomial approximations of the trajectories in time of various speech features (i.e., spectrum,

Sorin Dusan; James L. Flanagan; Amod Karve; Mridul Balaraman

2007-01-01

24

RAPID DEVLOPEMENT OF SPEECH-TO-SPEECH TRANSLATION SYSTEMS  

Microsoft Academic Search

This paper describes building of the basic components, par- ticularly speech recognition and synthesis, of a speech-to- speech translation system. This work is described within the framework of the \\

Alan W Black; Ralf D. Brown; Robert Frederking; Kevin Lenzo; John Moody; Alexander Rudnicky; Rita Singh; Eric Steinbrecher

2002-01-01

25

Early recognition of speech  

PubMed Central

Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

Remez, Robert E; Thomas, Emily F

2013-01-01

26

Analyzing a Famous Speech  

NSDL National Science Digital Library

After gaining skill through analyzing a historic and contemporary speech as a class, students will select a famous speech from a list compiled from several resources and write an essay that identifies and explains the rhetorical strategies that the author deliberately chose while crafting the text to make an effective argument. Their analysis will consider questions such as: What makes the speech an argument?, How did the author's rhetoric evoke a response from the audience?, and Why are the words still venerated today?

Noel, Melissa W.

2012-08-01

27

Speech and Language  

Microsoft Academic Search

\\u000a The speech\\/language area of cognitive functioning can be broken down into several subar-eas—fluency, repetition, naming, auditory\\u000a comprehension, oral reading, reading comprehension, writing, and spelling. Verbal fluency is production of an uninterrupted, flowing series of speech sounds. Prosody is the rhythmic intonation or melody of speech. Word fluency is production of words representative of a category (e.g., animals, foods, words that

Robert M. Anderson

28

Speech Sound Disorders: Articulation and Phonological Processes  

MedlinePLUS

Speech Sound Disorders: Articulation and Phonological Processes What are speech sound disorders ? Can adults have speech sound disorders ? What ... individuals with speech sound disorders ? What are speech sound disorders? Most children make some mistakes as they ...

29

Free Speech Yearbook: 1971.  

ERIC Educational Resources Information Center

This publication of ten scholarly articles provides perspectives on problems and forces that inhibit freedom of speech. 1) "Freedom of Speech and Change in American Education" suggests that a more communicative society, and increasing academic freedoms, helps schools adapt to social change; 2) "Syllabus and Bibliography for 'Issues in Freedom of…

Tedford, Thomas L., Editor

30

Free Speech Yearbook 1980.  

ERIC Educational Resources Information Center

The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…

Kane, Peter E., Ed.

31

Cued Speech and Speechreading.  

ERIC Educational Resources Information Center

Cued speech is presented as a system of phonemes and mouthshapes which can supplement speechreading. Research findings are presented on cue reception, cue comprehension, and development of sensory aids for cue presentation. Also discussed are research needs, and applications of cued speech for hearing-impaired speechreaders and for hearing…

Kipila, Elizabeth L.; Williams-Scott, Barbara

1988-01-01

32

Free Speech Yearbook 1977.  

ERIC Educational Resources Information Center

The eleven articles in this collection explore various aspects of freedom of speech. Topics include the lack of knowledge on the part of many judges regarding the complex act of communication; the legislatures and free speech in colonial Connecticut and Rhode Island; contributions of sixteenth century Anabaptist heretics to First Amendment…

Phifer, Gregg, Ed.

33

STUDENTS AND FREE SPEECH  

NSDL National Science Digital Library

Free speech is a constitutional right, correct? What about in school? The US Constitution protects everyone, young or old big or small. As Horton said "A person is a person no matter how small". Yet does that mean people can say what ever they want, whenever they want? Does the right to free speech give ...

Amsden

2013-04-22

34

Speech and Language Impairments  

MedlinePLUS

... help Educational considerations Tips for teachers Tips for parents Resources of more info A Day in the Life of an SLP Christina is a speech-language pathologist. She works with children and adults who have impairments in their speech, voice, or language skills. ...

35

Free Speech Yearbook 1976.  

ERIC Educational Resources Information Center

The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976" (William A. Linsley), "'Arnett v.…

Phifer, Gregg, Ed.

36

Free Speech. No. 38.  

ERIC Educational Resources Information Center

This issue of "Free Speech" contains the following articles: "Daniel Schoor Relieved of Reporting Duties" by Laurence Stern, "The Sellout at CBS" by Michael Harrington, "Defending Dan Schorr" by Tome Wicker, "Speech to the Washington Press Club, February 25, 1976" by Daniel Schorr, "Funds Voted For Schorr Inquiry" by Richard Lyons, "Erosion of the…

Kane, Peter E., Ed.

37

Improving Alaryngeal Speech Intelligibility.  

ERIC Educational Resources Information Center

Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the production…

Christensen, John M.; Dwyer, Patricia E.

1990-01-01

38

Illustrated Speech Anatomy.  

ERIC Educational Resources Information Center

Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

Shearer, William M.

39

Speech Production and Speech Discrimination by Hearing-Impaired Children.  

ERIC Educational Resources Information Center

Seven hearing impaired children (five to seven years old) assigned to the Speakers group made highly significant gains in speech production and auditory discrimination of speech, while Listeners made only slight speech production gains and no gains in auditory discrimination. Combined speech and auditory training was more effective than auditory…

Novelli-Olmstead, Tina; Ling, Daniel

1984-01-01

40

ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION  

E-print Network

speech recognition to enhance noisy speech. Typically, a single prior model is trained by pooling the entire training data. In this paper we propose to train multiple prior models of speech instead of a single prior model. The prior models can be trained based on distinct characteristics of speech

Wang, DeLiang "Leon"

41

Speech, Language & Hearing Association  

NSDL National Science Digital Library

The American Speech-Language-Hearing Association’s (ASHA) mission statement is to “promote the interests of and provide the highest quality services for professionals in audiology, speech-language pathology, and speech and hearing science.” Their website is designed to help ASHA accomplish this task, and is a valuable resource for anyone involved in this industry. The ASHA has been around for 79 years and in that time has created resources for students and the general public, in order to educate people about speech and communication disorders and diseases. The site includes detailed explanations on many diseases and disorders and provides additional resources for those who want to learn more. For students, there are sections with information on various speech, language, and hearing professions; a guide to academic programs; and a useful guide to the Praxis exam required for many of these professions.

42

SpeechDat(E) - Eastern European Telephone Speech Databases  

Microsoft Academic Search

This paper describes the creation of five new telephony speech databases for Central and Eastern European lanuages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-IIspecifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlierSpeechDat projects with ragrd to databse items such as generation

Petr Pollak; J. Cernocky; J. Boudy; K. Choukri; H. van den Heuvel; K. Vicsi; A. Virag; R. Siemund; W. Majewski; J. Sadowski; P. Staroniewicz; H. S. Tropf; J. Kochanina; A. Ostrouchov; M. Rusko; M. Trnka

2000-01-01

43

Speech Perception Dominic W. Massaro  

E-print Network

Speech Perception Dominic W. Massaro This psychological account of speech perception includes and theory indicate that speech perception is a form of pattern recognition that is influenced by multiple of California, Santa Cruz Santa Cruz, California 95060, USA Massaro@ucsc.edu Speech Perception warrants an entry

Massaro, Dominic

44

Voice and Speech after Laryngectomy  

ERIC Educational Resources Information Center

The aim of the investigation is to compare voice and speech quality in alaryngeal patients using esophageal speech (ESOP, eight subjects), electroacoustical speech aid (EACA, six subjects) and tracheoesophageal voice prosthesis (TEVP, three subjects). The subjects reading a short story were recorded in the sound-proof booth and the speech samples…

Stajner-Katusic, Smiljka; Horga, Damir; Musura, Maja; Globlek, Dubravka

2006-01-01

45

Maynard Dixon: "Free Speech."  

ERIC Educational Resources Information Center

Based on Maynard Dixon's oil painting, "Free Speech," this lesson attempts to expand high school students' understanding of art as a social commentary and the use of works of art to convey ideas and ideals. (JDH)

Day, Michael

1987-01-01

46

Great American Speeches  

NSDL National Science Digital Library

This new companion site from PBS offers an excellent collection of speeches, some with audio and video clips, from many of the nation's "most influential and poignant speakers of the recorded age." In the Speech Archives, users will find a timeline of significant 20th-century events interspersed with the texts of over 90 speeches, some of which also offer background and audio or video clips. Additional sections of the site include numerous activities for students: two quizzes in the American History Challenge, Pop-Up Trivia, A Wordsmith Challenge, Critics' Corner and Could You be a Politician? which allows visitors to try their hand at reading a speech off of a teleprompter.

47

Research in speech communication.  

PubMed Central

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806

Flanagan, J

1995-01-01

48

Speech Recognition Via Phonetically Featured Syllables   

E-print Network

We describe a speech recogniser which uses a speech production-motivated phonetic-feature description of speech. We argue that this is a natural way to describe the speech signal and offers an efficient intermediate ...

King, Simon; Stephenson, Todd; Isard, Stephen; Taylor, Paul; Strachan, Alex

49

Speech processing using maximum likelihood continuity mapping  

DOEpatents

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Hogden, John E. (Santa Fe, NM)

2000-01-01

50

Towards speech as a knowledge resource  

Microsoft Academic Search

Speech is a tantalizing mode of human communication. On the one hand, humans understand speech with ease and use speech to express complex ideas, information, and knowledge. On the other hand, automatic speech recognition with computers is still very hard, and extracting knowledge from speech is even harder. In this paper we motivate the study of speech as a knowledge

Eric W. Brown; Savitha Srinivasan; Anni Coden; Dulce B. Ponceleon; James W. Cooper; Arnon Amir; Jan Pieper

2001-01-01

51

Functional Anatomy of Speech Perception and Speech Production: Psycholinguistic Implications  

Microsoft Academic Search

This paper presents evidence for a new model of the functional anatomy of speech\\/language (Hickok & Poeppel, 2000) which has, at its core, three central claims: (1) Neural systems supporting the per- ception of sublexical aspects of speech are essentially bilaterally organized in posterior superior tem- poral lobe regions; (2) neural systems supporting the production of phonemic aspects of speech

Gregory Hickok

2001-01-01

52

Constructing emotional speech synthesizers with limited speech database  

Microsoft Academic Search

This paper describes an emotional speech synthesis system based on HMMs and related modeling techniques. For con- catenative speech synthesis, we require all of the concatena- tion units that will be used to be recorded beforehand and made available at synthesis time. To adopt this approach for synthe- sizing the wide variety of human emotions possible in speech, implies that

Heiga Zen; Tadashi Kitamura; Murtaza Bulut; Shrikanth Narayanan; Ryosuke Tsuzuki; Keiichi Tokuda

2004-01-01

53

Audio-Visual Speech Modeling for Continuous Speech Recognition  

Microsoft Academic Search

This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is

Stéphane Dupont; Juergen Luettin

2000-01-01

54

Development of a speech autocuer  

NASA Technical Reports Server (NTRS)

A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.

Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.

1980-01-01

55

Thesis Seminar Articulatory Speech Processing  

E-print Network

real speech production data from a database containing simultaneous audio and mouth movement recordingsThesis Seminar Articulatory Speech Processing Sam Roweis Computation and Neural Systems Wednesday recognition or pattern completion module. In the case of human speech perception and production, the models

Roweis, Sam

56

Audio-Visual Speech Recognition  

Microsoft Academic Search

We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, for ASR to approach human levels of performance and for speech to become a truly pervasive user interface, we need novel, nontraditional approaches that have the potential of yielding dramatic ASR improvements. Visual speech

Chalapathy Neti; Gerasimos Potamianos; Juergen Luettin

2000-01-01

57

Speech Perception in the Classroom.  

ERIC Educational Resources Information Center

This article discusses how poor room acoustics can make speech inaudible and presents a speech-perception model demonstrating the linkage between adequacy of classroom acoustics and the development of a speech and language systems. It argues both aspects must be considered when evaluating barriers to listening and learning in a classroom.…

Smaldino, Joseph J.; Crandell, Carl C.

1999-01-01

58

The Festival Speech Synthesis System  

Microsoft Academic Search

1 1 Abstract This document provides a user manual for the Festival Speech Synthesis System, version 1.1.1. Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library,

Alan W Black; Paul Taylor

1999-01-01

59

Castro Speech Databases  

NSDL National Science Digital Library

The Latin American Network Information Center at the University of Texas provides access to a searchable and browsable database of speeches by Cuban Leader Fidel Castro. It contains "full text of English translations of speeches, interviews, and press conferences by Castro, based upon the records of the Foreign Broadcast Information Service (FBIS), a US government agency responsible for monitoring broadcast and print media in countries throughout the world." Users should note that the search interface, while allowing searching on any of nine types of documents, as well as keyword and date, lacks user guidance. Documents are organized by date. While this is not a repository of all of Castro's speeches, the amount of material at the site makes it valuable to researchers.

60

Speech spectrogram expert  

SciTech Connect

Various authors have pointed out that humans can become quite adept at deriving phonetic transcriptions from speech spectrograms (as good as 90percent accuracy at the phoneme level). The authors describe an expert system which attempts to simulate this performance. The speech spectrogram expert (spex) is actually a society made up of three experts: a 2-dimensional vision expert, an acoustic-phonetic expert, and a phonetics expert. The visual reasoning expert finds important visual features of the spectrogram. The acoustic-phonetic expert reasons about how visual features relates to phonemes, and about how phonemes change visually in different contexts. The phonetics expert reasons about allowable phoneme sequences and transformations, and deduces an english spelling for phoneme strings. The speech spectrogram expert is highly interactive, allowing users to investigate hypotheses and edit rules. 10 references.

Johannsen, J.; Macallister, J.; Michalek, T.; Ross, S.

1983-01-01

61

Concept-to-Speech Synthesis by Phonological Structure Matching  

E-print Network

gen- eration problem in a concept-to-speech system. Off-line, a database of recorded speech Speech Text-to-Speech Concept-to-Speech speech generation Database Query Figure 1. Text to speechConcept-to-Speech Synthesis by Phonological Structure Matching BY P A TAYLOR Centre for Speech

Edinburgh, University of

62

Speech transmission index from running speech: A neural network approach  

NASA Astrophysics Data System (ADS)

Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.

Li, F. F.; Cox, T. J.

2003-04-01

63

Development and application of multilingual speech translation  

Microsoft Academic Search

This paper describes the latest version of handheld speech-to-speech translation system developed by National Institute of Information and Communications Technology, NICT. As the entire speech-to-speech translation functions are implemented into one terminal, it realizes real-time and location free speech-to-speech translation service for many language pairs. A new noise-suppression technique notably improves speech recognition performance. Corpus-based approaches of recognition, translation, and

Satoshi Nakamura

2009-01-01

64

Interactive Speech Understanding  

Microsoft Academic Search

This paper introduces a robust interactive method for speech understanding. The generalized LR parsing is enhanced in this approach. Parsing proceeds from left to right correcting minor errors. When a very noisy portion is detected, the parser skips that portion using a nonterminal symbol. The unidentified portion is resolved by re-utterance of that portion which is parsed very efficiently by

Hiroaki Saito

1992-01-01

65

Women, Speech and Experience  

Microsoft Academic Search

As a harbinger of new first amendment doctrine, antipornography feminism offers a puzzling and fruitful study. Pitted against their erstwhile political allies and allying with otherwise adversaries, its adherents do not easily fall into any ideological map. 1 Against civil libertarianism they point to the inequality of speakers and subjects. 2 Against the postmodern discursive appropriation and subversion of speech,

Kathleen S. Sullivan

2005-01-01

66

Black History Speech  

ERIC Educational Resources Information Center

The author argues in this speech that one cannot expect students in the school system to know and understand the genius of Black history if the curriculum is Eurocentric, which is a residue of racism. He states that his comments are designed for the enlightenment of those who suffer from a school system that "hypocritically manipulates Black…

Noldon, Carl

2007-01-01

67

Speech intelligibility in hospitals.  

PubMed

Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII?>?0.75) and several locations found to have "poor" intelligibility (SII?speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research. PMID:23862833

Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

2013-07-01

68

Free Speech Yearbook 1973.  

ERIC Educational Resources Information Center

The first article in this collection examines civil disobedience and the protections offered by the First Amendment. The second article discusses a study on antagonistic expressions in a free society. The third essay deals with attitudes toward free speech and treatment of the United States flag. There are two articles on media; the first examines…

Barbour, Alton, Ed.

69

Interlocutor Informative Speech  

ERIC Educational Resources Information Center

Sharing information orally is an important skill that public speaking classes teach well. However, the author's students report that they do not often see informative lectures, demonstrations, presentations, or discussions that follow the structures and formats of an informative speech as it is discussed in their textbooks. As a result, the author…

Gray, Jonathan M.

2005-01-01

70

Free Speech Yearbook, 1974.  

ERIC Educational Resources Information Center

A collection of essays on free speech and communication is contained in this book. The essays include "From Fairness to Access and Back Again: Some Dimensions of Free Expression in Broadcasting"; "Local Option on the First Amendment?"; "A Look at the Fire Symbol Before and After May 4, 1970"; "Freedom to Teach, to Learn, and to Speak: Rhetorical…

Barbour, Alton, Ed.

71

Speech After Banquet  

NASA Astrophysics Data System (ADS)

I am usually not so short of words, but the previous speeches have rendered me really speechless. I have known and admired the eloquence of Freeman Dyson, but I did not know that there is a hidden eloquence in my colleague George Sterman...

Yang, Chen Ning

2013-05-01

72

Hearing speech in music.  

PubMed

The masking effect of a piano composition, played at different speeds and in different octaves, on speech-perception thresholds was investigated in 15 normal-hearing and 14 moderately-hearing-impaired subjects. Running speech (just follow conversation, JFC) testing and use of hearing aids increased the everyday validity of the findings. A comparison was made with standard audiometric noises [International Collegium of Rehabilitative Audiology (ICRA) noise and speech spectrum-filtered noise (SPN)]. All masking sounds, music or noise, were presented at the same equivalent sound level (50 dBA). The results showed a significant effect of piano performance speed and octave (P<.01). Low octave and fast tempo had the largest effect; and high octave and slow tempo, the smallest. Music had a lower masking effect than did ICRA noise with two or six speakers at normal vocal effort (P<.01) and SPN (P<.05). Subjects with hearing loss had higher masked thresholds than the normal-hearing subjects (P<.01), but there were smaller differences between masking conditions (P<.01). It is pointed out that music offers an interesting opportunity for studying masking under realistic conditions, where spectral and temporal features can be varied independently. The results have implications for composing music with vocal parts, designing acoustic environments and creating a balance between speech perception and privacy in social settings. PMID:21768731

Ekström, Seth-Reino; Borg, Erik

2011-01-01

73

Figures of Speech  

ERIC Educational Resources Information Center

In this article, the authors report that almost one in three adults in the UK have experience of learning a language as an adult, but only four percent are currently doing so--one percent less that in 1999, equivalent to a drop of half a million adults learning languages. Figures of speech, NIACE's UK-wide survey of language learning, also found a…

Dutton, Yanina; Meyer, Sue

2007-01-01

74

From the Speech Files  

ERIC Educational Resources Information Center

In a speech, Looking Ahead in Vocational Education", to a group of Hamilton educators, D.O. Davis, Vice-President, Engineering, Dominion Foundries and Steel Limited, Hamilton, Ontario spoke of the challenge of change and what educators and industry must do to help the future of vocational education. (Editor)

Can Vocat J, 1970

1970-01-01

75

Apraxia of Speech  

MedlinePLUS

... of organizations that can answer questions and provide printed or electronic information on apraxia of speech. Please see the list of organizations at www.nidcd.nih.gov/directory . Use the following ... a printed list of organizations, contact: NIDCD Information Clearinghouse 1 ...

76

System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech  

DOEpatents

Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

2002-01-01

77

Speech recognition from GSM codec parameters  

Microsoft Academic Search

Speech coding affects speech recognition performance, with rec- ognition accuracy deteriorating as the coded bit rate decreases. Virtually all systems that recognize coded speech reconstruct the speech waveform from the coded parameters, and then perform recognition (after possible noise and\\/or channel compensation) using conventional techniques. In this paper we compare the rec- ognition accuracy of coded speech obtained by reconstructing

Juan M. Huerta; Richard M. Stern

1998-01-01

78

TEACHER'S GUIDE TO HIGH SCHOOL SPEECH.  

ERIC Educational Resources Information Center

THIS GUIDE TO HIGH SCHOOL SPEECH FOCUSES ON SPEECH AS ORAL COMPOSITION, STRESSING THE IMPORTANCE OF CLEAR THINKING AND COMMUNICATION. THE PROPOSED 1-SEMESTER BASIC COURSE IN SPEECH ATTEMPTS TO IMPROVE THE STUDENT'S ABILITY TO COMPOSE AND DELIVER SPEECHES, TO THINK AND LISTEN CRITICALLY, AND TO UNDERSTAND THE SOCIAL FUNCTION OF SPEECH. IN ADDITION…

JENKINSON, EDWARD B., ED.

79

The role of speech production system in audiovisual speech perception.  

PubMed

Seeing the articulatory gestures of the speaker significantly enhances speech perception. Findings from recent neuroimaging studies suggest that activation of the speech motor system during lipreading enhance speech perception by tuning, in a top-down fashion, speech-sound processing in the superior aspects of the posterior temporal lobe. Anatomically, the superior-posterior temporal lobe areas receive connections from the auditory, visual, and speech motor cortical areas. Thus, it is possible that neuronal receptive fields are shaped during development to respond to speech-sound features that coincide with visual and motor speech cues, in contrast with the anterior/lateral temporal lobe areas that might process speech sounds predominantly based on acoustic cues. The superior-posterior temporal lobe areas have also been consistently associated with auditory spatial processing. Thus, the involvement of these areas in audiovisual speech perception might partly be explained by the spatial processing requirements when associating sounds, seen articulations, and one's own motor movements. Tentatively, it is possible that the anterior "what" and posterior "where / how" auditory cortical processing pathways are parts of an interacting network, the instantaneous state of which determines what one ultimately perceives, as potentially reflected in the dynamics of oscillatory activity. PMID:20922046

Jääskeläinen, Iiro P

2010-01-01

80

The NESPOLE! Speech-to-Speech Translation System  

Microsoft Academic Search

NESPOLE! is a speech-to-speech machine translation research system designed to provide fully functional speech-to-speech capabilities\\u000a within real-world settings of common users involved in e-commerce applications. The project is funded jointly by the European\\u000a Commission and the US NSF. The NESPOLE! system uses a client-server architecture to allow a common user, who is browsing web-pages\\u000a on the internet, to connect seamlessly

Alon Lavie; Lori S. Levin; Robert E. Frederking; Fabio Pianesi

2002-01-01

81

ConcepttoSpeech Synthesis by Phonological Structure Matching  

E-print Network

gen­ eration problem in a concept­to­speech system. Off­line, a database of recorded speech generation waveform Speech Text­to­Speech Concept­to­Speech speech generation Database Query Figure 1. TextConcept­to­Speech Synthesis by Phonological Structure Matching BY P A TAYLOR Centre for Speech

Edinburgh, University of

82

SpeechBot  

NSDL National Science Digital Library

This new experimental search engine from Compaq indexes over 2,500 hours of content from 20 popular American radio shows. Using its speech recognition software, Compaq creates "a time-aligned 'transcript' of the program and build[s] an index of the words spoken during the program." Users can then search the index by keyword or advanced search. Search returns include the text of the clip, a link to a longer transcript, the relevant audio clip in RealPlayer format, the entire program in RealPlayer format, and a link to the radio show's Website. The index is updated daily. Please note that, while SpeechBot worked fine on Windows/NT machines, the Scout Project was unable to access the audio clips using Macs.

83

Articulatory Features for Robust Visual Speech Recognition  

E-print Network

, Experimentation. Keywords Multimodal interfaces, audio-visual speech recognition, speechreading, visual feature or speechreading, uses just the visual input to recognize speech. The second, Audio-Visual Speech Recognition (AVSR

84

Real-time speech animation system  

E-print Network

We optimize the synthesis procedure of a videorealistic speech animation system [7] to achieve real-time speech animation synthesis. A synthesis rate must be high enough for real-time video streaming for speech animation ...

Fu, Jieyun

2011-01-01

85

X-RAY MICROBEAM SPEECH PRODUCTION DATABASE  

E-print Network

X-RAY MICROBEAM SPEECH PRODUCTION DATABASE USER'S HANDBOOK Version 1.0 (June 1994) prepared by John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter Three: Speech & Task Sample) Physiology of Speech Production, the now-classic cineradiographic account of thirteen disyllables spoken

86

Predicting confusions and intelligibility of noisy speech  

E-print Network

Current predictors of speech intelligibility are inadequate for making predictions of speech confusions caused by acoustic interference. This thesis is inspired by the need for a capability to understand and predict speech ...

Messing, David P. (David Patrick), 1979-

2007-01-01

87

An Articulatory Speech-Prosthesis System  

E-print Network

We investigate speech-coding strategies for brain-machine-interface (BMI) based speech prostheses. We present an articulatory speech-synthesis system using an experimental integrated-circuit vocal tract that models the ...

Wee, Keng Hoong

88

SSML: A speech synthesis markup language.   

E-print Network

This paper describes the Speech Synthesis Markup Language, SSML, which has been designed as a platform independent interface standard for speech synthesis systems. The paper discusses the need for standardisation in speech ...

Taylor, Paul A; Isard, Amy

1997-01-01

89

Evaluation of NASA speech encoder  

NASA Technical Reports Server (NTRS)

Techniques developed by NASA for spaceflight instrumentation were used in the design of a quantizer for speech-decoding. Computer simulation of the actions of the quantizer was tested with synthesized and real speech signals. Results were evaluated by a phometician. Topics discussed include the relationship between the number of quantizer levels and the required sampling rate; reconstruction of signals; digital filtering; speech recording, sampling, and storage, and processing results.

1976-01-01

90

Disorders of Speech and Voice  

Microsoft Academic Search

\\u000a Speech is a learned behavior that requires rapid coordination of respiratory, phonatory, and articulatory systems coupled\\u000a with intact language, cognition, and hearing functions. Speech is often divided into sub-domains that include speech sound\\u000a production (articulation), fluency, resonance, and voice quality. Children develop control of each of these sub-domains over\\u000a a period of years, often raising questions for parents and pediatricians

Helen M. Sharp; Stephen M. Tasko

91

Semantic Interpretation for Speech Recognition  

NSDL National Science Digital Library

The first working draft of the World Wide Web Consortium's (W3C) Semantic Interpretation for Speech Recognition is now available. The document "defines the process of Semantic Interpretation for Speech Recognition and the syntax and semantics of semantic interpretation tags that can be added to speech recognition grammars." The document is a draft, open for suggestions from W3C members and other interested users.

Lernout & Hauspie Speech Products.; Tichelen, Luc V.

2001-01-01

92

Somatosensory basis of speech production.  

PubMed

The hypothesis that speech goals are defined acoustically and maintained by auditory feedback is a central idea in speech production research. An alternative proposal is that speech production is organized in terms of control signals that subserve movements and associated vocal-tract configurations. Indeed, the capacity for intelligible speech by deaf speakers suggests that somatosensory inputs related to movement play a role in speech production-but studies that might have documented a somatosensory component have been equivocal. For example, mechanical perturbations that have altered somatosensory feedback have simultaneously altered acoustics. Hence, any adaptation observed under these conditions may have been a consequence of acoustic change. Here we show that somatosensory information on its own is fundamental to the achievement of speech movements. This demonstration involves a dissociation of somatosensory and auditory feedback during speech production. Over time, subjects correct for the effects of a complex mechanical load that alters jaw movements (and hence somatosensory feedback), but which has no measurable or perceptible effect on acoustic output. The findings indicate that the positions of speech articulators and associated somatosensory inputs constitute a goal of speech movements that is wholly separate from the sounds produced. PMID:12815431

Tremblay, Stéphanie; Shiller, Douglas M; Ostry, David J

2003-06-19

93

Learning Speech Translation from Interpretation.  

E-print Network

??This thesis introduces methods to directly train speech translation systems on audio recordings of interpreter-mediated communication. By employing unsupervised and lightly supervised training techniques, the… (more)

Paulik, Matthias

2010-01-01

94

Speech-in-Speech Recognition: A Training Study  

ERIC Educational Resources Information Center

This study aims to identify aspects of speech-in-noise recognition that are susceptible to training, focusing on whether listeners can learn to adapt to target talkers ("tune in") and learn to better cope with various maskers ("tune out") after short-term training. Listeners received training on English sentence recognition in speech-shaped noise…

Van Engen, Kristin J.

2012-01-01

95

Microphone array speech recognition: experiments on overlapping speech in meetings  

Microsoft Academic Search

This paper investigates the use of microphone arrays to acquire and recognise speech in meetings. Meetings pose several interesting problems for speech processing, as they consist of multiple competing speakers within a small space, typically around a table. Due to their ability to provide hands-free acquisition and directional discrimination, microphone arrays present a potential alternative to close-talking microphones in such

Darren C. Moore; Iain A. McCowan

2003-01-01

96

Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity  

ERIC Educational Resources Information Center

In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

Opt, Susan

2012-01-01

97

Neural Network Speech Enhancement for Noise Robust Speech Recognition  

Microsoft Academic Search

We have developed neural net equalizers that compensate for the effects of mismatched acoustics in the training and operational environments of speech recognizers. We show that neural nets can be used to significantly boost recognition accuracy, without retraining the speech recognizer.

Bert de Vries; Chi Wei Che; Roger Crane; Jim Flanagan; Qiguang Lin; John Pearson

98

Recognizing articulatory gestures from speech for robust speech recognition.  

PubMed

Studies have shown that supplementary articulatory information can help to improve the recognition rate of automatic speech recognition systems. Unfortunately, articulatory information is not directly observable, necessitating its estimation from the speech signal. This study describes a system that recognizes articulatory gestures from speech, and uses the recognized gestures in a speech recognition system. Recognizing gestures for a given utterance involves recovering the set of underlying gestural activations and their associated dynamic parameters. This paper proposes a neural network architecture for recognizing articulatory gestures from speech and presents ways to incorporate articulatory gestures for a digit recognition task. The lack of natural speech database containing gestural information prompted us to use three stages of evaluation. First, the proposed gestural annotation architecture was tested on a synthetic speech dataset, which showed that the use of estimated tract-variable-time-functions improved gesture recognition performance. In the second stage, gesture-recognition models were applied to natural speech waveforms and word recognition experiments revealed that the recognized gestures can improve the noise-robustness of a word recognition system. In the final stage, a gesture-based Dynamic Bayesian Network was trained and the results indicate that incorporating gestural information can improve word recognition performance compared to acoustic-only systems. PMID:22423722

Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol; Saltzman, Elliot; Goldstein, Louis

2012-03-01

99

Robust speech recognition by integrating speech separation and hypothesis testing  

E-print Network

. Such methods require a binary mask to label speech-dominant T­F regions of a noisy speech signal as reliable and the rest as unreliable. Current methods for computing the mask are based mainly on bottom-up cues-stage recognition system that com- bines bottom-up and top-down cues in order to simultaneously improve both mask

Wang, DeLiang "Leon"

100

ROBUST SPEECH RECOGNITION BY INTEGRATING SPEECH SEPARATION AND HYPOTHESIS TESTING  

E-print Network

-frequency domain. Such methods require a binary mask which labels time-frequency regions of a noisy speech signal for estimating the mask are based mainly on bottom-up speech separation cues such as har- monicity and produce- tem in order to improve mask estimation and produce better recog- nition results. First, an n

Wang, DeLiang "Leon"

101

Logitboost weka classifier speech segmentation  

Microsoft Academic Search

Segmenting the speech signals on the basis of time-frequency analysis is the most natural approach. Boundaries are located in places where energy of some frequency subband rapidly changes. Speech segmentation method which bases on dis- crete wavelet transform, the resulting power spectrum and its derivatives is presented. This information allows to locate the boundaries of phonemes. A statistical classification method

Bartosz Ziólko; Suresh Manandhar; Richard C. Wilson; Mariusz Ziólko

2008-01-01

102

Audiovisual Speech Recalibration in Children  

ERIC Educational Resources Information Center

In order to examine whether children adjust their phonetic speech categories, children of two age groups, five-year-olds and eight-year-olds, were exposed to a video of a face saying /aba/ or /ada/ accompanied by an auditory ambiguous speech sound halfway between /b/ and /d/. The effect of exposure to these audiovisual stimuli was measured on…

van Linden, Sabine; Vroomen, Jean

2008-01-01

103

Methods of Teaching Speech Recognition  

ERIC Educational Resources Information Center

Objective: This article introduces the history and development of speech recognition, addresses its role in the business curriculum, outlines related national and state standards, describes instructional strategies, and discusses the assessment of student achievement in speech recognition classes. Methods: Research methods included a synthesis of…

Rader, Martha H.; Bailey, Glenn A.

2010-01-01

104

Speech Prosody in Cerebellar Ataxia  

ERIC Educational Resources Information Center

Persons with cerebellar ataxia exhibit changes in physical coordination and speech and voice production. Previously, these alterations of speech and voice production were described primarily via perceptual coordinates. In this study, the spatial-temporal properties of syllable production were examined in 12 speakers, six of whom were healthy…

Casper, Maureen A.; Raphael, Lawrence J.; Harris, Katherine S.; Geibel, Jennifer M.

2007-01-01

105

Speech Communication and Multimodal Interfaces  

Microsoft Academic Search

Within the area of advanced man-machine interaction, speech communication has always played a major role for several decades. The idea of replacing the con- vential input devices such as buttons and keyboard by voice control and thus increas- ing the comfort and the input speed considerably, seems that much attractive, that even the quite slow progress of speech technology during

Björn Schuller; Markus Ablaßmeier; Ronald Müller; Stefan Reifinger; Tony Poitschke; Gerhard Rigoll

106

Deictic Reference in Children's Speech.  

ERIC Educational Resources Information Center

The purpose of this paper is to examine the status of deictic reference in the speech of 19 three-year-old Black children. The deictic verbs of motion are examined with reference to other aspects of the deictic system. The data for this study are approximately eight hours of spontaneous speech collected in a pre-school classroom. The hypothesis to…

Keller-Cohen, Deborah

107

SILENT SPEECH DURING SILENT READING.  

ERIC Educational Resources Information Center

EFFORTS WERE MADE IN THIS STUDY TO (1) RELATE THE AMOUNT OF SILENT SPEECH DURING SILENT READING TO LEVEL OF READING PROFICIENCY, INTELLIGENCE, AGE, AND GRADE PLACEMENT OF SUBJECTS, AND (2) DETERMINE WHETHER THE AMOUNT OF SILENT SPEECH DURING SILENT READING IS AFFECTED BY THE LEVEL OF DIFFICULTY OF PROSE READ AND BY THE READING OF A FOREIGN…

MCGUIGAN, FRANK J.

108

Speech Restoration: An Interactive Process  

ERIC Educational Resources Information Center

Purpose: This study investigates the ability to understand degraded speech signals and explores the correlation between this capacity and the functional characteristics of the peripheral auditory system. Method: The authors evaluated the capability of 50 normal-hearing native French speakers to restore time-reversed speech. The task required them…

Grataloup, Claire; Hoen, Michael; Veuillet, Evelyne; Collet, Lionel; Pellegrino, Francois; Meunier, Fanny

2009-01-01

109

"The Speech Teacher": Early Years.  

ERIC Educational Resources Information Center

Discusses the role of "The Speech Teacher" journal (since 1977 "Communication Education") from its start in the 1930s from the point of view of the author of the lead article in the premier issue. Notes many changes the journal has undergone while the field of speech communication transitioned from beginner to expert. (SG)

Reid, Loren

2002-01-01

110

Retention of Rural Speech Pathologists.  

ERIC Educational Resources Information Center

Stresses the need for research examining the critical shortage of speech-language pathologists (SLPs) in rural Canada. Results of a survey of 87 speech language pathologists in rural British Columbia and Saskatchewan (Canada) were similar to earlier American studies in that employment practices, employment benefits, lifestyle, and other personal…

Foster, Felicity; Harvey, Brian

1996-01-01

111

Alignment to visual speech information.  

PubMed

Speech alignment is the tendency for interlocutors to unconsciously imitate one another's speaking style. Alignment also occurs when a talker is asked to shadow recorded words (e.g., Shockley, Sabadini, & Fowler, 2004). In two experiments, we examined whether alignment could be induced with visual (lipread) speech and with auditory speech. In Experiment 1, we asked subjects to lipread and shadow out loud a model silently uttering words. The results indicate that shadowed utterances sounded more similar to the model's utterances than did subjects' nonshadowed read utterances. This suggests that speech alignment can be based on visual speech. In Experiment 2, we tested whether raters could perceive alignment across modalities. Raters were asked to judge the relative similarity between a model's visual (silent video) utterance and subjects' audio utterances. The subjects' shadowed utterances were again judged as more similar to the model's than were read utterances, suggesting that raters are sensitive to cross-modal similarity between aligned words. PMID:20675805

Miller, Rachel M; Sanchez, Kauyumari; Rosenblum, Lawrence D

2010-08-01

112

Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder  

ERIC Educational Resources Information Center

Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech

Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

2006-01-01

113

PINPOINTING PRONUNCIATION ERRORS IN CHILDREN S SPEECH: EXAMINING THE ROLE OF THE SPEECH  

E-print Network

. A DATABASE OF CHILDREN S SPEECH Speech from children differs from that of adults in several ways. Very young s. For all of these reasons we have chosen to collect a database of children s speech and then trainPINPOINTING PRONUNCIATION ERRORS IN CHILDREN S SPEECH: EXAMINING THE ROLE OF THE SPEECH RECOGNIZER

Eskenazi, Maxine

114

Speech Perception Within an Auditory Cognitive Science  

E-print Network

Speech Perception Within an Auditory Cognitive Science Framework Lori L. Holt1 and Andrew J. Lotto2 speech begins with auditory processing, inves- tigation of speech perception has progressed mostly inde the study of general auditory processing and speech perception, showing that the latter is constrained

Holt, Lori L.

115

Phonetic Recalibration Only Occurs in Speech Mode  

ERIC Educational Resources Information Center

Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

Vroomen, Jean; Baart, Martijn

2009-01-01

116

Norwegian Speech Recognition for Telephone Applications  

E-print Network

tele- phone speech database, TABU.0. We discuss the database design speci cation and some ex- periences-based recogniser trained on a subset of the database. 1. INTRODUCTION Automatic speech recognition has now reached to be the speech database used to train the recogniser. Therefore, a Norwegian speech database is necessary

Amdal, Ingunn

117

Analysis of False Starts in Spontaneous Speech.  

ERIC Educational Resources Information Center

A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

O'Shaughnessy, Douglas

118

Emerging Technologies Speech Tools and Technologies  

ERIC Educational Resources Information Center

Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

Godwin-Jones, Robert

2009-01-01

119

SPEECH CODING WITH LINEAR PREDICTIVE CODING  

Microsoft Academic Search

Speech coding has been and still is a major issue in the area of digital speech processing in which speech compression is needed for storing digital voice and it requires fixed amount of available memory and compression makes it possible to store longer messages. Several techniques of speech coding such as Linear Predictive Coding (LPC), Waveform Coding and Sub band

G. CHENCHAMMA; P. L. CHOWDARY; K. SHALINI KATYAYANI

120

ANNUAL SPEECH PATHOLOGY HONOURS RESEARCH MINICONFERENCE 2012  

E-print Network

ANNUAL SPEECH PATHOLOGY HONOURS RESEARCH MINICONFERENCE 2012 Every year the Speech Pathology CRICOSProviderCode00301J(WA),02637B(NSW) All interested are welcome. This invitation extends to Speech Pathology.Yuen@curtin.edu.au Telephone +61 8 9266 7984 or visit psych.curtin.edu.au School of Psychology & Speech Pathology Monday 15th

121

Undergraduate Student SPEECH-LANGUAGE PATHOLOGY  

E-print Network

Undergraduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY and Disorders Program #12;2 Contents SECTION PAGE 1.0 Speech-Language Pathology Program 4 1.1 History 4 1-Language-Hearing Association 27 #12;4 1.0 SFASU SPEECH-LANGUAGE PATHOLOGY AND AUDIOLOGY PROGRAM 1.1 History The Speech

Long, Nicholas

122

The "Checkers" Speech and Televised Political Communication.  

ERIC Educational Resources Information Center

Richard Nixon's 1952 "Checkers" speech was an innovative use of television for political communication. Like television news itself, the campaign fund crisis behind the speech can be thought of in the same terms as other television melodrama, with the speech serving as its climactic episode. The speech adapted well to television because it was…

Flaningam, Carl

123

ON THE NATURE OF SPEECH SCIENCE.  

ERIC Educational Resources Information Center

IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…

PETERSON, GORDON E.

124

Volume perception in parkinsonian speech.  

PubMed

This study contrasted the volume level of speech production with perceived volume. Fifteen idiopathic patients with Parkinson's disease who have hypophonic dysarthria and 15 healthy age- and sex-matched control subjects participated in this study. Testing took place in a sound-proof room. Ability to regulate volume was tested at three instructional levels of loudness: participants were given no instructions regarding volume (to elicit normal default volume) or were asked to read loudly or quietly. Two types of volume-perception judgments were made. First, an estimate of one's own volume, immediately after speaking (that is, immediate perception), and secondly, an estimation of reading volume after hearing one's own voice played back (that is, playback perception). These perceptual ratings were compared with actual speech volume produced in reading and conversation tasks. It was found that there was less of a difference between patients' production and perception of speech volume compared with that of the control subjects. While patients spoke more quietly than control subjects, they nevertheless perceived (immediate and playback perception) their own speech to be louder than did the control subjects. Patients overestimated the volume of their speech during both reading and conversation. The findings raise the question as to whether impaired speech production is driven by a basic perceptual fault or whether perception is abnormal as a consequence of impaired mechanisms involved in the generation of quiet speech. PMID:11104195

Ho, A K; Bradshaw, J L; Iansek, T

2000-11-01

125

Adding Speech, Language, and Hearing Benefits to Your Policy  

MedlinePLUS

... Speech-Language-Hearing Association Making effective communication, a human right, accessible and achievable for all. ... › Information for the Public › Adding Speech and Hearing Benefits Adding Speech, Language, ...

126

ACQUIRING VARIABLE LENGTH SPEECH BASES FOR FACTORISATION-BASED NOISE ROBUST SPEECH RECOGNITION  

E-print Network

, speech recogni- tion, noise robustness 1. INTRODUCTION Speech contains phonetic units of varying lengths in variable length words [5]. Phonetic segmentation of speech has been dis- cussed and demonstrated

Virtanen, Tuomas

127

Is Private Speech Really Private?   

E-print Network

This study sought to answer the question “is private speech really private?” by assessing if participants spoke more to themselves when in the company of the experimenter or when they were alone. The similarity between ...

Smith, Ashley

2011-01-01

128

Compressed Speech: Capabilities and Uses  

ERIC Educational Resources Information Center

A brief look at compressed speech, which offers opportunities for circumventing the information explosion and effecting economy in time. It also permits training in listening and encourages development of powers of concentration. (Author/HB)

Silverstone, David M.

1974-01-01

129

Contingent categorization in speech perception.  

PubMed

The speech signal is notoriously variable, with the same phoneme realized differently depending on factors like talker and phonetic context. Variance in the speech signal has led to a proliferation of theories of how listeners recognize speech. A promising approach, supported by computational modeling studies, is contingent categorization, wherein incoming acoustic cues are computed relative to expectations. We tested contingent encoding empirically. Listeners were asked to categorize fricatives in CV syllables constructed by splicing the fricative from one CV syllable with the vowel from another CV syllable. The two spliced syllables always contained the same fricative, providing consistent bottom-up cues; however on some trials, the vowel and/or talker mismatched between these syllables, giving conflicting contextual information. Listeners were less accurate and slower at identifying the fricatives in mismatching splices. This suggests that listeners rely on context information beyond bottom-up acoustic cues during speech perception, providing support for contingent categorization. PMID:25157376

Apfelbaum, Keith S; Bullock-Rest, Natasha; Rhone, Ariane E; Jongman, Allard; McMurray, Bob

2014-01-01

130

AUTOMATIC SPEECHREADING OF IMPAIRED SPEECH  

Microsoft Academic Search

We investigate the use of visual, mouth-region information in improving automatic speech recognition (ASR) of the speech impaired. Given the video of an utterance by such a subject, we first extract appearance-based visual features from the mouth region-of-interest, and we use a feature fu- sion method to combine them with the subject's audio fea- tures into bimodal observations. Subsequently, we

Gerasimos Potamianos; Chalapathy Neti

131

Neural pathways for visual speech perception.  

PubMed

This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

Bernstein, Lynne E; Liebenthal, Einat

2014-01-01

132

Neural pathways for visual speech perception  

PubMed Central

This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611

Bernstein, Lynne E.; Liebenthal, Einat

2014-01-01

133

System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech  

DOEpatents

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

2006-08-08

134

The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech  

ERIC Educational Resources Information Center

Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

Wayne, Rachel V.; Johnsrude, Ingrid S.

2012-01-01

135

An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition  

E-print Network

of speech recognition. Results and discus- sion are presented on the M2VTS database for the tasksAn Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition Simon Lucey University Pittsburgh PA 15213, USA slucey@ieee.org Abstract. In this paper an evaluation of visual speech

Chen, Tsuhan

136

Running Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing  

E-print Network

to provide a comprehensive but user-friendly database from speech perception experiments. The experimentsRunning Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing Dominic;ABSTRACT Theories of speech perception, like most theories, have tended to be qualitative rather than

Massaro, Dominic

137

Development and Evaluation of Polish Speech Corpus for Unit Selection Speech Synthesis Systems  

E-print Network

and suprasegmental features, the size of databases for speech technology purposes is expected to be substantial, e the database structure influence on the quality of the resulting synthesised speech. 2. Polish Speech Corpus 2 language, we have decided to use various speech units from different mixed databases as follows: · Base A

Möbius, Bernd

138

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES FOR CONCATENATIVE SPEECH SYNTHESIS  

E-print Network

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied of controlled units (e.g. diphones) the avail­ ability of more units taken from large speech databases seems

Greenberg, Albert

139

IMPROVING THE UNDERSTANDABILITY OF SPEECH SYNTHESIS BY MODELING SPEECH IN NOISE  

E-print Network

that produced the CMU SIN database for speech synthesis [3] to record a small (30 sentence) database of speech, the result is that two recording sessions are required to build a database of speech in noise, with the in-noise and not-in-noise conditions reversed for each session, giv- ing us an identical database of plain speech

Eskenazi, Maxine

140

EFFECT OF SPEECH AND NOISE CROSS CORRELATION ON AMFCC SPEECH RECOGNITION FEATURES  

E-print Network

EFFECT OF SPEECH AND NOISE CROSS CORRELATION ON AMFCC SPEECH RECOGNITION FEATURES Benjamin J speech recognition feature extraction algorithms, it is common to assume that the noise and speech signal per- formed using the AURORA II database. From these evalua- tions, we show that the assumption

141

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Rigid Head Motion in Expressive Speech  

E-print Network

are derived from an audiovisual database, comprising synchronized facial gestures and speech, which revealedIEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Rigid Head Motion in Expressive Speech characteristic patterns in emotional head motion sequences. Head motion patterns with neutral speech

Busso, Carlos

142

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES FOR CONCATENATIVE SPEECH SYNTHESIS  

E-print Network

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied of controlled units (e.g. diphones) the avail- ability of more units taken from large speech databases seems

Greenberg, Albert

143

Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis  

Microsoft Academic Search

In an effort to increase the naturalness of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied prosodic and spectral characteristics in the database, it is not desirable to have variable voice quality. We present an automatic method for voice quality assessment and correction, whenever necessary, of large speech databases for concatenative speech

Yannis Stylianou

1999-01-01

144

UNIT SELECTION IN A CONCATENATIVE SPEECH SYNTHESIS SYSTEM USING A LARGE SPEECH DATABASE  

E-print Network

UNIT SELECTION IN A CONCATENATIVE SPEECH SYNTHESIS SYSTEM USING A LARGE SPEECH DATABASE Andrew J from the database. This approach to waveform synthesis permits training from natural speech: two meth by concatenating the waveforms of units selected from large, single­speaker speech databases. The primary

Black, Alan W

145

The design of Polish Speech Corpus for Unit Selection Speech Synthesis  

E-print Network

is to select at run-time from a large recorded speech database the longest available strings of phonetic;naturalness of synthetic speech. In a speech database comprising several hours of recordings, it is likely a segment or a diphone. Defining the optimal speech database for unit selection is a crucial, yet difficult

Möbius, Bernd

146

Perceived Liveliness and Speech Comprehensibility in Aphasia: The Effects of Direct Speech in Auditory Narratives  

ERIC Educational Resources Information Center

Background: Previous studies have shown that in semi-spontaneous speech, individuals with Broca's and anomic aphasia produce relatively many direct speech constructions. It has been claimed that in "healthy" communication direct speech constructions contribute to the liveliness, and indirectly to the comprehensibility, of speech.…

Groenewold, Rimke; Bastiaanse, Roelien; Nickels, Lyndsey; Huiskes, Mike

2014-01-01

147

Common neural substrates support speech and non-speech vocal tract gestures  

PubMed Central

The issue of whether speech is supported by the same neural substrates as non-speech vocal-tract gestures has been contentious. In this fMRI study we tested whether producing non-speech vocal tract gestures in humans shares the same functional neuroanatomy as non-sense speech syllables. Production of non-speech vocal tract gestures, devoid of phonological content but similar to speech in that they had familiar acoustic and somatosensory targets, were compared to the production of speech syllables without meaning. Brain activation related to overt production was captured with BOLD fMRI using a sparse sampling design for both conditions. Speech and non-speech were compared using voxel-wise whole brain analyses, and ROI analyses focused on frontal and temporoparietal structures previously reported to support speech production. Results showed substantial activation overlap between speech and non-speech function in regions. Although non-speech gesture production showed greater extent and amplitude of activation in the regions examined, both speech and non-speech showed comparable left laterality in activation for both target perception and production. These findings posit a more general role of the previously proposed “auditory dorsal stream” in the left hemisphere – to support the production of vocal tract gestures that are not limited to speech processing. PMID:19327400

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Poletto, Christopher J.; Ludlow, Christy L.

2009-01-01

148

Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy  

ERIC Educational Resources Information Center

Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

149

Acoustic and perceptual studies of Lombard speech: application to isolated-words automatic speech recognition  

Microsoft Academic Search

The purpose of this study was (1) to determine what are the acoustic-phonetic differences between speech produced in quiet and speech produced in noise (Lombard speech) and (2) to evaluate the influence of these differences on human listeners and automatic speech recognizers. The acoustical analyses, done at the phonetic level on about 40 parameters, showed significant differences in variability for

Jean-Claude Junqua; Yolande Anglade

1990-01-01

150

Microphone array processing for distance speech capture: A probe study on whisper speech detection  

Microsoft Academic Search

In this study, we develop a probe system for whisper-island detection for distance speech capture using a microphone array technique. The developed corpus consists of distance speech in neutral vocal effort and embedded with whisper speech which are produced at different distances for this study. The microphone array beamforming technique is used to enhance the distance speech before being processed

Chi Zhang; Tao Yu; J. H. L. Hansen

2010-01-01

151

Dual-Mode Wideband Speech Recovery from Narrowband Speech Yasheng Qian Peter Kabal  

E-print Network

Dual-Mode Wideband Speech Recovery from Narrowband Speech Yasheng Qian Peter Kabal Department speech transmitted in current public tele- phone networks is bandpass-filtered to 300­3400 Hz. The filter at the interface between newer wideband systems and conventional narrowband systems is to generate wideband speech

Kabal, Peter

152

Speech coding, reconstruction and recognition using acoustics and electromagnetic waves  

DOEpatents

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

Holzrichter, J.F.; Ng, L.C.

1998-03-17

153

Speech coding, reconstruction and recognition using acoustics and electromagnetic waves  

DOEpatents

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

1998-01-01

154

Determining the threshold for usable speech within co-channel speech with the SPHINX automated speech recognition system  

NASA Astrophysics Data System (ADS)

Much research has been and is continuing to be done in the area of separating the original utterances of two speakers from co-channel speech. This is very important in the area of automated speech recognition (ASR), where the current state of technology is not nearly as accurate as human listeners when the speech is co-channel. It is desired to determine what types of speech (voiced, unvoiced, and silence) and at what target to interference ratio (TIR) two speakers can speak at the same time and not reduce speech intelligibility of the target speaker (referred to as usable speech). Knowing which segments of co-channel speech are usable in ASR can be used to improve the reconstruction of single speaker speech. Tests were performed using the SPHINX ASR software and the TIDIGITS database. It was found that interfering voiced speech with a TIR of 6 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech. It was further found that interfering unvoiced speech with a TIR of 18 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech.

Hicks, William T.; Yantorno, Robert E.

2004-10-01

155

In this paper we compare speech recognition accuracy for high-quality speech recorded under controlled conditions with speech  

E-print Network

speech transmitted over long distance telephone lines. The TIMIT database is a continuous, speaker independent, phonetically-bal- anced and phonetically-labelled speech database. The NTIMIT SOURCES response of a typical telephone channel [2], and (4) speech from the NTIMIT database. We note that

Stern, Richard

156

Speech and Language Problems in Children  

MedlinePLUS

Children vary in their development of speech and language skills. Health professionals have milestones for what's normal. ... it may be due to a speech or language disorder. Language disorders can mean that the child ...

157

Childhood Apraxia of Speech Family Start Guide  

MedlinePLUS

... 10:00 am Advanced Webinar Training Series & Colloquium – Critical Thinking in Childhood Apraxia of Speech Treatment on March ... April 10, 2015 Advanced Webinar Training Series & Colloquium – Critical Thinking in Childhood Apraxia of Speech Treatment on April ...

158

Articulatory features for robust visual speech recognition  

E-print Network

This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic ...

Saenko, Ekaterina, 1976-

2004-01-01

159

Speech Recognition by Machine, A Review  

E-print Network

This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recog...

Anusuya, M A

2010-01-01

160

Speech for People with Tracheostomies or Ventilators  

MedlinePLUS

... the team may consist of the following: physicians nurses respiratory therapists dietitians speech-language pathologists (SLPs) The ... life enriching. The Preferred Practice Patterns for the Profession of Speech-Language Pathology outline the common practices ...

161

Phonological Models in Automatic Speech Recognition  

E-print Network

://www.nist.gov/speech/publications/papers/] #12;What is so difficult about conversational speech? Nonspeech (e.g. laughter, sigh) Variable.8 Simulateddata experiments show potential benefit of a good pronunciation model [McAllaster et al. '98] Test

Livescu, Karen

162

Speech perception and production in severe environments  

NASA Astrophysics Data System (ADS)

The goal was to acquire new knowledge about speech perception and production in severe environments such as high masking noise, increased cognitive load or sustained attentional demands. Changes were examined in speech production under these adverse conditions through acoustic analysis techniques. One set of studies focused on the effects of noise on speech production. The experiments in this group were designed to generate a database of speech obtained in noise and in quiet. A second set of experiments was designed to examine the effects of cognitive load on the acoustic-phonetic properties of speech. Talkers were required to carry out a demanding perceptual motor task while they read lists of test words. A final set of experiments explored the effects of vocal fatigue on the acoustic-phonetic properties of speech. Both cognitive load and vocal fatigue are present in many applications where speech recognition technology is used, yet their influence on speech production is poorly understood.

Pisoni, David B.

1990-09-01

163

Speech synthesis by phonological structure matching.   

E-print Network

This paper presents a new technique for speech synthesis by unit selection. The technique works by specifying the synthesis target and the speech database as phonological trees, and using a selection algorithm which ...

Taylor, Paul; Black, Alan W

1999-01-01

164

President Kennedy's Speech at Rice University  

NASA Technical Reports Server (NTRS)

This video tape presents unedited film footage of President John F. Kennedy's speech at Rice University, Houston, Texas, September 12, 1962. The speech expresses the commitment of the United States to landing an astronaut on the Moon.

1988-01-01

165

Topic Learning in Text and Conversational Speech  

E-print Network

Topic Learning in Text and Conversational Speech Constantinos Boulis A dissertation submitted Abstract Topic Learning in Text and Conversational Speech Constantinos Boulis Chair of Supervisory Committee: Professor Mari Ostendorf Electrical Engineering Extracting topics from large collections of data

Washington at Seattle, University of

166

European speech databases for telephone applications  

Microsoft Academic Search

The SpeechDat project aims to produce speech databases for all official languages of the European Union and some major dialectal variants and minority languages resulting in 28 speech databases. They will be recorded over fixed and mobile telephone networks. This will provide a realistic basis for training and assessment of both isolated and continuous-speech utterances, employing whole-word or subword approaches,

H. Hoge; H. S. Tropf; R. Winski; H. van den Heuvel; R. Haeb-Umbach; K. Choukri

1997-01-01

167

Inner speech as a forward model?  

PubMed

Pickering & Garrod (P&G) consider the possibility that inner speech might be a product of forward production models. Here I consider the idea of inner speech as a forward model in light of empirical work from the past few decades, concluding that, while forward models could contribute to it, inner speech nonetheless requires activity from the implementers. PMID:23789938

Oppenheim, Gary M

2013-08-01

168

Interventions for Speech Sound Disorders in Children  

ERIC Educational Resources Information Center

With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

2010-01-01

169

Social Dialect and Speech Communication Proficiency.  

ERIC Educational Resources Information Center

In Hawaii today, many persons find it a disadvantage to speak only the social dialect of their home speech communities. For those young adults who enter a University, the problem may be especially acute. The Speech Communication Center of the University of Hawaii is developing a measure of speech-communication proficiency that predicts the…

Harms, L.S.

170

Speech-Song Interface of Chinese Speakers  

ERIC Educational Resources Information Center

Pitch is a psychoacoustic construct crucial in the production and perception of speech and songs. This article is an exploration of the interface of speech and song performance of Chinese speakers. Although parallels might be drawn from the prosodic and sound structures of the linguistic and musical systems, perceiving and producing speech and…

Mang, Esther

2007-01-01

171

Somatosensory function in speech perception Takayuki Itoa  

E-print Network

Somatosensory function in speech perception Takayuki Itoa , Mark Tiedea,b , and David J. Ostrya,c,1- pany speech production. We find that when we stretch the facial skin while people listen to words-like patterns of skin stretch indicates that somatosen- sory inputs affect the neural processing of speech

Malfait, Nicole

172

Linguistic Resources for Speech Parsing , Stephanie Strassela  

E-print Network

to annotating metadata, speech effects and syntactic structure in English conversational speech: separately structure information, and describes the resulting corpus of English conversational speech. 1. Motivation rate (WER) has been the metric minimized by such systems. Although WER is dropping, ASR systems do

Liu, Yang

173

The African Speech Technology Project: An Assessment  

Microsoft Academic Search

This paper reflects on the recently completed African Speech Technology (AST) Project. The AST Project successfully developed eleven annotated telephone speech databases for five languages spoken in South Africa i.e. Xhosa, Southern Sotho, Zulu, English and Afrikaans. These databases were used to train and test speech recognition systems applied in a multilingual telephone-based prototype hotel booking system. An overview is

JC Roux; PH Louw

174

Using interactive objects for speech intervention  

Microsoft Academic Search

Technological advances in physical computing and automatic speech recognition (ASR) have made the development of novel solutions for speech intervention possible. I plan to combine an ASR engine with programmable microcontrollers to develop exercises and activities based on interaction with smart objects for helping with speech therapy intervention for children.

Foad Hamidi

2010-01-01

175

A Signing Deaf Child's Use of Speech.  

ERIC Educational Resources Information Center

Longitudinal study of a deaf child's (with deaf signing and speaking parents) speech functions revealed that the child, before age three, rarely attempted speech imitation. By age five, the child had acquired new words through speechreading and had adjusted language modes to listener needs for flexible communication, and speech behavior assumed…

Maxwell, Madeline M.

1989-01-01

176

Robust Speech Recognition Using Articulatory Information  

E-print Network

­bandwidth speech; the recognition domain is continuous numbers. The second is a German database of studioRobust Speech Recognition Using Articulatory Information Der Technischen FakultË? at der Universit 1999 Gedruckt auf alterungsbestË? andigem Papier 1 ISO 9706 #12; #12; Abstract Current automatic speech

Kirchhoff, Katrin

177

Hyperspeech: navigating in speech-only hypermedia  

Microsoft Academic Search

Most hypermedia systems emphasize the integration of graphics, images, video, and audio into a traditional hypertext framework. The hyperspeech system described in this paper, a speech-only hypermedia application, explores issues of navigation and system architecture in an audio environment without a visual display. The system under development uses speech recognition to maneuver in a database of digitally recorded speech segments;

Barry Arons

1991-01-01

178

Audiovisual Asynchrony Detection in Human Speech  

ERIC Educational Resources Information Center

Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…

Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta

2011-01-01

179

Graduate Student SPEECH-LANGUAGE PATHOLOGY  

E-print Network

1 Graduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY P;2 Dear Prospective student: Welcome to the Program website for Speech-Language Pathology and Audiology in Stephen F Austin State University. The field of Speech-Language Pathology and Audiology is concerned

Long, Nicholas

180

Syllable Structure in Dysfunctional Portuguese Children's Speech  

ERIC Educational Resources Information Center

The goal of this work is to investigate whether children with speech dysfunctions (SD) show a deficit in planning some Portuguese syllable structures (PSS) in continuous speech production. Knowledge of which aspects of speech production are affected by SD is necessary for efficient improvement in the therapy techniques. The case-study is focused…

Candeias, Sara; Perdigao, Fernando

2010-01-01

181

Acoustics of Clear Speech: Effect of Instruction  

ERIC Educational Resources Information Center

Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…

Lam, Jennifer; Tjaden, Kris; Wilding, Greg

2012-01-01

182

The Dynamic Nature of Speech Perception  

ERIC Educational Resources Information Center

The speech perception system must be flexible in responding to the variability in speech sounds caused by differences among speakers and by language change over the lifespan of the listener. Indeed, listeners use lexical knowledge to retune perception of novel speech (Norris, McQueen, & Cutler, 2003). In that study, Dutch listeners made lexical…

McQueen, James M.; Norris, Dennis; Cutler, Anne

2006-01-01

183

SSML: A speech synthesis markup language  

Microsoft Academic Search

This paper describes the Speech Synthesis Markup Language, SSML, which has been designedas a platform independent interface standard for speech synthesis systems. The paper discussesthe need for standardisation in speech synthesizers and how this will help builders of systemsmake better use of synthesis. The SGML based markup language is then discussed, and detailsof the Edinburgh SSML interpreter are given as

Paul Taylor; Amy Isard

1997-01-01

184

Advances in speech and audio compression  

Microsoft Academic Search

Speech and audio compression has advanced rapidly in recent years spurred on by cost-effective digital technology and diverse commercial applications. Recent activity in speech compression is dominated by research and development of a family of techniques commonly described as code-excited linear prediction (CELP) coding. These algorithms exploit models of speech production and auditory perception and offer a quality versus bit

ALLEN GERSHO

1994-01-01

185

Speech entrainment enables patients with Broca's aphasia to produce fluent speech.  

PubMed

A distinguishing feature of Broca's aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect 'speech entrainment' and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca's aphasia. In Experiment 1, 13 patients with Broca's aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca's area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca's aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca's aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889

Fridriksson, Julius; Hubbard, H Isabel; Hudspeth, Sarah Grace; Holland, Audrey L; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-12-01

186

Using Speech-Specific Characteristics for Automatic Speech Summarization  

E-print Network

) and the Department of Linguistics and English Language at the University of Edinburgh, for stimulating conversations as text summarization with a noisy transcript. We begin by investigating which term-weighting metrics are effective for summarization of meeting speech, with the inclusion of two novel metrics designed specifically

Murray, Gabriel

187

NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database  

Microsoft Academic Search

The creation of the network TIMIT (NTIMIT) database, which is the result of transmitting the TIMIT database over the telephone network, is described. A brief description of the TIMIT database is given, including characteristics useful for speech analysis and recognition. The hardware and software required for the transmission of the database is described. The geographic distribution of the TIMIT utterances

C. Jankowski; A. Kalyanswamy; S. Basson; J. Spitz

1990-01-01

188

Automatic speech recognition and speech variability: A review  

Microsoft Academic Search

Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circum- stances. This is related to several factors, such as the sensitivity to the environment (background noise), or the weak representation of grammatical and semantic

M. Benzeghiba; Renato De Mori; O. Deroo; Stéphane Dupont; T. Erbes; Denis Jouvet; Luciano Fissore; Pietro Laface; Alfred Mertins; Christophe Ris; R. Rose; Vivek Tyagi; Christian Wellekens

2007-01-01

189

PSEUDO-WIDEBAND SPEECH RECONSTRUCTION FROM TELEPHONE SPEECH  

E-print Network

in a number of ways from the narrowband residual of an LP (Linear Prediction) filter. The highband excitation Frequency (LSF) VQ codebook mapping from the narrowband speech to the high frequency components, the extra quality will be lost in the connection to the existing narrowband PSTN. Pseudo-wideband expansion

Kabal, Peter

190

Speech recognition with amplitude and frequency modulations  

PubMed Central

Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited number of spectral bands may be sufficient for speech recognition in quiet, FM significantly enhances speech recognition in noise, as well as speaker and tone recognition. Additional speech reception threshold measures revealed that FM is particularly critical for speech recognition with a competing voice and is independent of spectral resolution and similarity. These results suggest that AM and FM provide independent yet complementary contributions to support robust speech recognition under realistic listening situations. Encoding FM may improve auditory scene analysis, cochlear-implant, and audiocoding performance. PMID:15677723

Zeng, Fan-Gang; Nie, Kaibao; Stickney, Ginger S.; Kong, Ying-Yee; Vongphoe, Michael; Bhargave, Ashish; Wei, Chaogang; Cao, Keli

2005-01-01

191

Adaptive Redundant Speech Transmission over Wireless Multimedia Sensor Networks Based on Estimation of Perceived Speech Quality  

PubMed Central

An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs. PMID:22164086

Kang, Jin Ah; Kim, Hong Kook

2011-01-01

192

Perception of the speech code  

Microsoft Academic Search

Man could not perceive speech well if each phoneme were cued by a unit sound. In fact, many phonemes are encoded so that a single acoustic cue carries information in parallel about successive phonemic segments. This reduces the rate at which discrete sounds must be perceived, but at the price of a complex relation between cue and phoneme: cues vary

A. M. Liberman; F. S. Cooper; D. P. Shankweiler; M. Studdert-Kennedy

1967-01-01

193

Speech Recognition in Mobile Environments  

E-print Network

for their comments and feedback. I am indebted to the whole current and past members of the SPHINX group, and the Robust Speech Recognition group for the great intellectual infrastructure they have accumulated at CMU contributions. In addition, Pedro, Evandro, Matt, Sam-Joo, Uday, Jon, and Mike have constituted a great team all

Stern, Richard

194

Cued Speech: An Evaluative Study  

ERIC Educational Resources Information Center

To evaluate the effects of Cued Speech (visual symbols) as a supplement to speechreading, cued and non-cued sentences and phrases were presented in a live situation at normal and at slow rates to 12 hearing-impaired subjects (7-to 11-years-old). (Author/LS)

Ling, Daniel; Clarke, Bryan R.

1975-01-01

195

Linguistic aspects of speech synthesis.  

PubMed Central

The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

Allen, J

1995-01-01

196

Multilingual Speech Databases at LDC  

Microsoft Academic Search

As multilingual products and technology grow in importance, the Linguistic Data Consortium (LDC) intends to provide the resources needed for research and development activities, especially in telephone-based, small-vocabulary recognition applications; language identification research; and large vocabulary continuous speech recognition research.The POLYPHONE corpora, a multilingual \\

John J. Godfrey

1994-01-01

197

Speech and Language Developmental Milestones  

MedlinePLUS

... What are the milestones for speech and language development? The first signs of communication occur when an infant learns that a cry will bring food, comfort, and companionship. Newborns also begin to recognize important sounds in their environment, such as the voice of their mother or ...

198

Embedding speech into virtual realities  

NASA Technical Reports Server (NTRS)

In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

Bohn, Christian-Arved; Krueger, Wolfgang

1993-01-01

199

Turbo Processing for Speech Recognition}  

PubMed

Speech recognition is a classic example of a human/machine interface, typifying many of the difficulties and opportunities of human/machine interaction. In this paper, speech recognition is used as an example of applying turbo processing principles to the general problem of human/machine interface. Speech recognizers frequently involve a model representing phonemic information at a local level, followed by a language model representing information at a nonlocal level. This structure is analogous to the local (e.g., equalizer) and nonlocal (e.g., error correction decoding) elements common in digital communications. Drawing from the analogy of turbo processing for digital communications, turbo speech processing iteratively feeds back the output of the language model to be used as prior probabilities for the phonemic model. This analogy is developed here, and the performance of this turbo model is characterized by using an artificial language model. Using turbo processing, the relative error rate improves significantly, especially in high-noise settings. PMID:23757535

Moon, Todd K; Gunther, Jacob H; Broadus, Cortnie; Hou, Wendy; Nelson, Nils

2013-04-01

200

The Ontogenesis of Speech Acts  

ERIC Educational Resources Information Center

A speech act approach to the transition from pre-linguistic to linguistic communication is adopted in order to consider language in relation to behavior and to allow for an emphasis on the use, rather than the form, of language. A pilot study of mothers and infants is discussed. (Author/RM)

Bruner, Jerome S.

1975-01-01

201

SPEECH-LANGUAGE-HEARING CLINIC  

E-print Network

is committed to providing quality services regardless of your ability to pay. We accept Sooner's faculty, offer assessment and therapy services for a variety of speech, language and hearing disorders. Most patients enrolled in the clinic are seen for individual therapy. The number of therapy sessions

Veiga, Pedro Manuel Barbosa

202

Speech Motor Skill and Stuttering  

Microsoft Academic Search

The authors review converging lines of evidence from behavioral, kinematic, and neuroimaging data that point to limitations in speech motor skills in people who stutter (PWS). From their review, they conclude that PWS differ from those who do not in terms of their ability to improve with practice and retain practiced changes in the long term, and that they are

Aravind Kumar Namasivayam; Pascal van Lieshout

2011-01-01

203

Temporal characteristics of speech: the effect of age and speech style.  

PubMed

Aging affects temporal characteristics of speech. It is still a question how these changes occur in different speech styles which require various cognitive skills. In this paper speech rate, articulation rate, and pauses of 20 young and 20 old speakers are analyzed in four speech styles: spontaneous narrative, narrative recalls, a three-participant conversation, and reading aloud. Results show that age has a significant effect only on speech rate, articulation rate, and frequency of pauses. Speech style has a higher effect on temporal parameters than speakers' age. PMID:25096134

Bóna, Judit

2014-08-01

204

Perception of intersensory synchrony in audiovisual speech: not that special.  

PubMed

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made temporal order judgments (TOJ) and simultaneity judgments (SJ) about sine-wave speech (SWS) replicas of pseudowords and the corresponding video of the face. Listeners in speech and non-speech mode were equally sensitive judging audiovisual temporal order. Yet, using the McGurk effect, we could demonstrate that the sound was more likely integrated with lipread speech if heard as speech than non-speech. Judging temporal order in audiovisual speech is thus unaffected by whether the auditory and visual streams are paired. Conceivably, previously found differences between speech and non-speech stimuli are not due to the putative "special" nature of speech, but rather reflect low-level stimulus differences. PMID:21035795

Vroomen, Jean; Stekelenburg, Jeroen J

2011-01-01

205

A method for measuring the intelligibility of uninterrupted, continuous speech.  

PubMed

Speech-in-noise tests commonly use short, discrete sentences as representative samples of everyday speech. These tests cannot, however, fully represent the added demands of understanding ongoing, linguistically complex speech. Using a new monitoring method to measure the intelligibility of continuous speech and a standard trial-by-trial, speech-in-noise test the effects of target duration and linguistic complexity were examined. For a group of older hearing-impaired listeners, significantly higher speech reception thresholds were found for continuous, complex speech targets than for syntactically simple sentences. The results highlight the need to sample speech intelligibility in a variety of everyday speech-in-noise scenarios. PMID:24606245

MacPherson, Alexandra; Akeroyd, Michael A

2014-03-01

206

RATEOFSPEECH MODELING FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION  

E-print Network

by speech rate, such as delta and delta delta features, and that some pronunciation phenomenaRATE­OF­SPEECH MODELING FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Jing Zheng, Horacio, USA {zj, hef, stolcke}@speech.sri.com ABSTRACT Variations in rate of speech (ROS) produce changes

Stolcke, Andreas

207

Development of robust speech recognition middleware on microprocessor  

Microsoft Academic Search

We have developed speech recognition middleware on a RISC microprocessor which has robust processing functions against environmental noise and speaker differences. The speech recognition middleware enables developers and users to use a speech recognition process for many possible speech applications, such as car navigation systems and handheld PCs. We report implementation issues of speech recognition process in middleware of microprocessors

N. Hataoka; H. Kokubo; Y. Obuchi; A. Amano

1998-01-01

208

THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.  

ERIC Educational Resources Information Center

A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…

FOULKE, EMERSON

209

Segmenting Words from Natural Speech: Subsegmental Variation in Segmental Cues  

ERIC Educational Resources Information Center

Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We…

Rytting, C. Anton; Brew, Chris; Fosler-Lussier, Eric

2010-01-01

210

Exploring speech therapy games with children on the autism spectrum  

Microsoft Academic Search

Individuals on the autism spectrum often have difficulties producing intelligible speech with either high or low speech rate, and atypical pitch and\\/or amplitude affect. In this study, we present a novel intervention towards customizing speech enabled games to help them produce intelligible speech. In this approach, we clinically and computationally identify the areas of speech production difficulties of our participants.

Mohammed E. Hoque; Rana El Kaliouby; Matthew S. Goodwin; Rosalind W. Picard

2009-01-01

211

Speech and language delay in children.  

PubMed

Speech and language delay in children is associated with increased difficulty with reading, writing, attention, and socialization. Although physicians should be alert to parental concerns and to whether children are meeting expected developmental milestones, there currently is insufficient evidence to recommend for or against routine use of formal screening instruments in primary care to detect speech and language delay. In children not meeting the expected milestones for speech and language, a comprehensive developmental evaluation is essential, because atypical language development can be a secondary characteristic of other physical and developmental problems that may first manifest as language problems. Types of primary speech and language delay include developmental speech and language delay, expressive language disorder, and receptive language disorder. Secondary speech and language delays are attributable to another condition such as hearing loss, intellectual disability, autism spectrum disorder, physical speech problems, or selective mutism. When speech and language delay is suspected, the primary care physician should discuss this concern with the parents and recommend referral to a speech-language pathologist and an audiologist. There is good evidence that speech-language therapy is helpful, particularly for children with expressive language disorder. PMID:21568252

McLaughlin, Maura R

2011-05-15

212

Loss tolerant speech decoder for telecommunications  

NASA Technical Reports Server (NTRS)

A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.

Prieto, Jr., Jaime L. (Inventor)

1999-01-01

213

Gifts of Speech: Women's Speeches from Around the World  

NSDL National Science Digital Library

The Gifts of Speech site brings together speeches given by women from all around the world. The site is under the direction of Liz Linton Kent Leon, who is the electronic resources librarian at Sweet Briar College. First-time users may wish to click on the How To area to learn how to navigate the site. Of course, the FAQ area is a great way to learn about the site as well, and it should not be missed as it tells about the origin story for the site. In the Collections area, visitors can listen in to all of the Nobel Lectures delivered by female recipients and look at a list of the top 100 speeches in American history as determined by a group of researchers at the University of Wisconsin-Madison and Texas A & M University. Users will also want to use the Browse area to look over talks by women from Robin Abrams to Begum Kahaleda Zia, the former prime minster of the People's Republic of Bangladesh.

Leon, Liz K.

2012-09-13

214

The Levels of Speech Usage Rating Scale: Comparison of Client Self-Ratings with Speech Pathologist Ratings  

ERIC Educational Resources Information Center

Background: The term "speech usage" refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a categorical…

Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

2012-01-01

215

Unit selection in a concatenative speech synthesis system using a large speech database   

E-print Network

One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme...

Hunt, Andrew; Black, Alan W

216

The development of Malay speech audiometry.  

PubMed

Speech audiometry is a method for assessing the ability of the auditory system using speech sounds as stimuli. A list of phonemically balanced bisyllabic consonant-vowel-consonant-vowel (c-v-c-v) Malay words was produced. All the bisyllabic words (c-v-c-v) thought to be commonly used in everyday conversations were listed from the Dewan Bahasa dictionary and their suitability assessed. The chosen words were divided into 25 groups containing 10 words each. The list was then recorded by a professional male newscaster in a sound proof studio. A normal speech audiometry curve was obtained by testing 60 normal hearing subjects using the prerecorded speech material. The result of the study showed that the normal Malay speech audiometry curve was comparable to those of English and Arabic speech audiometry, in which it was sigmoidal with the optimum discrimination score of 40 dB and half peak level of 17.5 dB. PMID:1839923

Mukari, S Z; Said, H

1991-09-01

217

Primary Progressive Aphasia and Apraxia of Speech  

PubMed Central

Primary progressive aphasia is a neurodegenerative syndrome characterized by progressive language dysfunction. The majority of primary progressive aphasia cases can be classified into three subtypes: non-fluent/agrammatic, semantic, and logopenic variants of primary progressive aphasia. Each variant presents with unique clinical features, and is associated with distinctive underlying pathology and neuroimaging findings. Unlike primary progressive aphasia, apraxia of speech is a disorder that involves inaccurate production of sounds secondary to impaired planning or programming of speech movements. Primary progressive apraxia of speech is a neurodegenerative form of apraxia of speech, and it should be distinguished from primary progressive aphasia given its discrete clinicopathological presentation. Recently, there have been substantial advances in our understanding of these speech and language disorders. Here, we review clinical, neuroimaging, and histopathological features of primary progressive aphasia and apraxia of speech. The distinctions among these disorders will be crucial since accurate diagnosis will be important from a prognostic and therapeutic standpoint. PMID:24234355

Jung, Youngsin; Duffy, Joseph R.; Josephs, Keith A.

2014-01-01

218

Speechalator: Two-Way Speech-to-Speech Translation in Your Hand  

Microsoft Academic Search

This demonstration involves two-way automatic speech-to-speech translation on a consumer off-the-shelf PDA. This work was done as part of the DARPA-funded Babylon project, investigating better speech-to-speech translation systems for communication in the field. The development of the Speechalator software-based translation system required addressing a number of hard issues, including a new language for the team (Egyptian Arabic), close integration on

Alex Waibel; Ahmed Badran; Alan W. Black; Robert E. Frederking; Donna Gates; Alon Lavie; Lori S. Levin; Kevin Lenzo; Laura Mayfield Tomokiyo; Juergen Reichert; Tanja Schultz; Dorcas Wallace; Monika Woszczyna; Jing Zhang

2003-01-01

219

Prediction and imitation in speech.  

PubMed

It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i) the Episodic Theory (ET) of speech perception and production (Goldinger, 1998); (ii) the Motor Theory (MT) of speech perception (Liberman and Whalen, 2000; Galantucci et al., 2006); (iii) Communication Accommodation Theory (CAT; Giles and Coupland, 1991; Giles et al., 1991). We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT), and higher-level accounts (like CAT). We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering and Garrod, 2013). Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers' utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e., the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker's and listener's social identities, their conversational roles, the listener's intention to imitate). PMID:23801971

Gambi, Chiara; Pickering, Martin J

2013-01-01

220

Speech production units among bilinguals  

Microsoft Academic Search

This paper discusses processing units in speech production among bilinguals. First, naturally occurring intrasentential code-switching (switching languages within a sentence) was examined to show that bilinguals switch at syntactically definable constituent boundaries. Next, a code-switching elicitation experiment on Japanese\\/English bilinguals was conducted in which subjects were asked to speak about topics given and asked to switch to another language upon

Shoji Azuma

1996-01-01

221

Speech and Language Disorders in the School Setting  

MedlinePLUS

... and Swallowing › Development Frequently Asked Questions: Speech and Language Disorders in the School Setting What types of speech and language disorders affect school-age children ? Do speech-language ...

222

Applications of broad class knowledge for noise robust speech recognition  

E-print Network

This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. ...

Sainath, Tara N

2009-01-01

223

Development of auditory-visual speech perception in young children.  

E-print Network

??Unlike auditory-only speech perception, little is known about the development of auditory-visual speech perception. Recent studies show that pre-linguistic infants perceive auditory-visual speech phonetically in… (more)

Erdener, Vahit Dogu

2007-01-01

224

Concept-to-speech synthesis by phonological structure matching   

E-print Network

This paper presents a new way of generating synthetic-speech waveforms from a linguistic description. The algorithm is presented as a proposed solution to the speech-generation problem in a concept-to-speech system. Off-line, ...

Taylor, Paul

2000-04-15

225

History and Development of Speech Recognition  

Microsoft Academic Search

\\u000a Speech is the primary means of communication between humans. For reasons ranging from technological curiosity about the mechanisms\\u000a for mechanical realization of human speech capabilities to the desire to automate simple tasks which necessitate human–machine\\u000a interactions, research in automatic speech recognition by machines has attracted a great deal of attention for five decades.

Sadaoki Furui

226

GSM enhanced full rate speech codec  

Microsoft Academic Search

This paper describes the GSM enhanced full rate (EFR) speech codec that has been standardised for the GSM mobile communication system. The GSM EFR codec has been jointly developed by Nokia and University of Sherbrooke. It provides speech quality at least equivalent to that of a wireline telephony reference (32 kbit\\/s ADPCM). The EFR codec uses 12.2 kbit\\/s for speech

K. Jarvinen; J. Vainio; P. Kapanen; T. Honkanen; P. Haavisto; R. Salami; C. Laflamme; J.-P. Adoul

1997-01-01

227

The neural processing of masked speech.  

PubMed

Spoken language is rarely heard in silence, and a great deal of interest in psychoacoustics has focused on the ways that the perception of speech is affected by properties of masking noise. In this review we first briefly outline the neuroanatomy of speech perception. We then summarise the neurobiological aspects of the perception of masked speech, and investigate this as a function of masker type, masker level and task. This article is part of a Special Issue entitled "Annual Reviews 2013". PMID:23685149

Scott, Sophie K; McGettigan, Carolyn

2013-09-01

228

Automatic fingersign-to-speech translation system  

Microsoft Academic Search

The aim of this paper is to help the communication of two people, one hearing impaired and one visually impaired by converting\\u000a speech to fingerspelling and fingerspelling to speech. Fingerspelling is a subset of sign language, and uses finger signs\\u000a to spell letters of the spoken or written language. We aim to convert finger spelled words to speech and vice

Marek Hrúz; Pavel Campr; Erinç Dikici; Ahmet Alp K?nd?ro?lu; Zden?k Kr?oul; Alexander Ronzhin; Ha?im Sak; Daniel Schorno; Hülya Yalç?n; Lale Akarun; Oya Aran; Alexey Karpov; Murat Saraçlar; Milos Železný

2011-01-01

229

Subband based classification of speech under stress  

Microsoft Academic Search

This study proposes a new set of feature parameters based on subband analysis of the speech signal for classification of speech under stress. The new speech features are scale energy (SE), autocorrelation-scale-energy (ACSE), subband based cepstral parameters (SC), and autocorrelation-SC (ACSC). The parameters' ability to capture different stress types is compared to widely used mel-scale cepstrum based representations: mel-frequency cepstral

Ruhi Sarikaya; John N. Gowdy

1998-01-01

230

Strategic Importance of Speech Technology for NGNs  

E-print Network

We advocate the adoption of speech technology as a strategic component of Next Generation Networks (NGNs), and urge development and implementation of regulations and public policy for world-wide deployment of the technology embedded IN the devices accessing the NGNs. We highlight several key benefits brought by embedded speech technology to all nations of the world and describe how they fulfill important policy objectives cherished by the ITU and its member nations. Strategic Importance of Speech Technology for NGNs 1.

Stephen Rondel

231

The DEMOSTHeNES Speech Composer  

Microsoft Academic Search

In this paper we present the design and development of a modular and scalable speech composer named DEMOSTHeNES. It has been designed for converting plain or formatted text (e.g. HMTL) to a combination of speech and audio signals. DEMOSTHeNES' architecture constitutes an extension to current Text-to-Speech systems' structure that enables an open set of module-defined functions to interact with the

Gerasimos Xydas; Georgios Kouroupetroglou

2001-01-01

232

Phonetically sensitive discriminants for improved speech recognition  

Microsoft Academic Search

A phonetically sensitive transformation of speech features has yielded significant improvement in speech-recognition performance. This (linear) transformation of the speech feature vector is designed to discriminate against out-of-class confusion data and is a function of phonetic state. Evaluation of the technique on the TI\\/NBS connected digit database demonstrates word (sentence) error rates of 0.5% (1.5%) for unknown-length strings and 0.2%

G. R. Doddington

1989-01-01

233

AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION  

E-print Network

Ã?Ã? AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION Qifeng In this paper, a novel algorithm that resembles amplitude demodulation in the frequency domain is introduced be recovered by amplitude demodulation. Amplitude demodulation of the speech spectrum is achieved by a novel

Alwan, Abeer

234

DELAYED SPEECH AND LANGUAGE DEVELOPMENT, PRENTICE-HALL FOUNDATIONS OF SPEECH PATHOLOGY SERIES.  

ERIC Educational Resources Information Center

WRITTEN FOR SPEECH PATHOLOGY STUDENTS AND PROFESSIONAL WORKERS, THE BOOK BEGINS BY DEFINING LANGUAGE AND SPEECH AND TRACING THE DEVELOPMENT OF SPEECH AND LANGUAGE FROM THE INFANT THROUGH THE 4-YEAR OLD. CAUSAL FACTORS OF DELAYED DEVELOPMENT ARE GIVEN, INCLUDING CENTRAL NERVOUS SYSTEM IMPAIRMENT AND ASSOCIATED BEHAVIORAL CLUES AND LANGUAGE…

WOOD, NANCY E.

235

Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation  

E-print Network

- ferent talkers. The database published on the ICSLP'2006 web- site for Two-Talker Speech SeparationCombining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation Ji Ming , Timothy J. Hazen , James R. Glass School of Computer Science, Queen

236

L2 rated speech corpus 1 Running head: CONSTRUCTION OF A RATED L2 SPEECH CORPUS  

E-print Network

-Champaign abstract This work reports on the construction of a rated database of spontaneous speech produced by second generally. This database will be released to the public in the near future. Key-Words: rated speech corpus on the construction of a rated, spontaneous speech database of second language (L2) learners of English. The purpose

Hasegawa-Johnson, Mark

237

Fast Adaptation of Speech and Speaker Characteristics for Enhanced Speech Recognition in Adverse Intelligent Environments  

Microsoft Academic Search

In this paper we present a technique for fast adaptation of speech and speaker related information. Fast learning is particularly useful for automatic personalization of speech-controlled devices. Such a personalization of human-computer interfaces to be used in intelligent environments represents an important research issue. Speech recognition is enhanced by speaker specific profiles which are continuously adapted. A fast but robust

Tobias Herbig; Franz Gerl; Wolfgang Minker

2010-01-01

238

SPEECH ENHANCEMENT BASED ON RAYLEIGH MIXTURE MODELING OF SPEECH SPECTRAL AMPLITUDE DISTRIBUTIONS  

Microsoft Academic Search

DFT-based speech enhancement algorithms typically rely on a statistical model of the spectral amplitudes of the noise- free speech signal. It has been shown in the literature recently that the speech spectral amplitude distributions, conditi onal on estimated a priori SNR, may differ significantly from the traditional Gaussian model and are better described by super-Gaussian probability density functions. We show

J. S. Erkelens; J. Jensen; R. Heusdens

2007-01-01

239

Visible Speech Improves Human Language Understanding: Implications for Speech Processing Systems  

Microsoft Academic Search

Evidence from the study of human language understanding is presented suggesting that our ability to perceive visible speech can greatly influence our ability to understand and remember spoken language. A view of the speaker's face can greatly aid in the perception of ambiguous or noisy speech and can aid cognitive processing of speech leading to better understanding and recall. Some

Laura A. Thompson; William C. Ogden

1995-01-01

240

Speech and Language Skills of Parents of Children with Speech Sound Disorders  

ERIC Educational Resources Information Center

Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…

Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry

2007-01-01

241

SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION  

E-print Network

SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION J. Chen* , K. K subtraction to MFCC-like feature coefficients. We report on experiments on DARPA speech in noisy environments of Eq. (4). One technique called spectral subtraction [1] has proved to be an important strategy to cope

242

Spotlight on Speech Codes 2007: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

Last year, the Foundation for Individual Rights in Education (FIRE) conducted its first-ever comprehensive study of restrictions on speech at America's colleges and universities, "Spotlight on Speech Codes 2006: The State of Free Speech on our Nation's Campuses." In light of the essentiality of free expression to a truly liberal education, its…

Foundation for Individual Rights in Education (NJ1), 2007

2007-01-01

243

Spotlight on Speech Codes 2012: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

The U.S. Supreme Court has called America's colleges and universities "vital centers for the Nation's intellectual life," but the reality today is that many of these institutions severely restrict free speech and open debate. Speech codes--policies prohibiting student and faculty speech that would, outside the bounds of campus, be protected by the…

Foundation for Individual Rights in Education (NJ1), 2012

2012-01-01

244

Stability and Composition of Functional Synergies for Speech Movements in Children with Developmental Speech Disorders  

ERIC Educational Resources Information Center

The aim of this study was to investigate the consistency and composition of functional synergies for speech movements in children with developmental speech disorders. Kinematic data were collected on the reiterated productions of syllables spa(/spa[image omitted]/) and paas(/pa[image omitted]s/) by 10 6- to 9-year-olds with developmental speech

Terband, H.; Maassen, B.; van Lieshout, P.; Nijland, L.

2011-01-01

245

Robust blind dereverberation of speech signals based on characteristics of short-time speech segments  

Microsoft Academic Search

This paper addresses blind dereverberation techniques based on the inherent characteristics of speech signals. Two challenging issues for speech dereverberation involve decomposing reverberant observed signals into colored sources and room transfer functions (RTFs), and making the inverse filtering robust as regards acoustic and system noise. We show that short-time speech characteristics are very important for this task, and that multi-channel

Tomohiro Nakatani; Takafumi Hikichi; Keisuke Kinoshita; Takuya Yoshioka; Marc Delcroix; Masato Miyoshi; Biing-hwang Juang

2007-01-01

246

Speech Enhancement based on Compressive Sensing Algorithm  

NASA Astrophysics Data System (ADS)

There are various methods, in performance of speech enhancement, have been proposed over the years. The accurate method for the speech enhancement design mainly focuses on quality and intelligibility. The method proposed with high performance level. A novel speech enhancement by using compressive sensing (CS) is a new paradigm of acquiring signals, fundamentally different from uniform rate digitization followed by compression, often used for transmission or storage. Using CS can reduce the number of degrees of freedom of a sparse/compressible signal by permitting only certain configurations of the large and zero/small coefficients, and structured sparsity models. Therefore, CS is significantly provides a way of reconstructing a compressed version of the speech in the original signal by taking only a small amount of linear and non-adaptive measurement. The performance of overall algorithms will be evaluated based on the speech quality by optimise using informal listening test and Perceptual Evaluation of Speech Quality (PESQ). Experimental results show that the CS algorithm perform very well in a wide range of speech test and being significantly given good performance for speech enhancement method with better noise suppression ability over conventional approaches without obvious degradation of speech quality.

Sulong, Amart; Gunawan, Teddy S.; Khalifa, Othman O.; Chebil, Jalel

2013-12-01

247

Presented at the 1998 ESCA Conference on Speech Technology in Language Learning. Marholmen, Sweden Is Automatic Speech Recognition Ready for Non-Native Speech?  

E-print Network

for all sub- jects. Initial experiments suggest that the speech in this database is significantly more-English database covers two different types of speech: wide-band recordings of read speech and four channelPresented at the 1998 ESCA Conference on Speech Technology in Language Learning. Marholmen, Sweden

Byrne, William

248

Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech  

ERIC Educational Resources Information Center

Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…

Maas, Edwin; Mailend, Marja-Liisa

2012-01-01

249

Speech perception as an active cognitive process  

PubMed Central

One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438

Heald, Shannon L. M.; Nusbaum, Howard C.

2014-01-01

250

Hypnosis and the Reduction of Speech Anxiety.  

ERIC Educational Resources Information Center

The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…

Barker, Larry L.; And Others

251

Humanistic Speech Education to Create Leadership Models.  

ERIC Educational Resources Information Center

A theoretical framework based primarily on the humanistic psychology of Abraham Maslow is used in developing a humanistic approach to speech education. The holistic view of human learning and behavior, inherent in this approach, is seen to be compatible with a model of effective leadership. Specific applications of this approach to speech

Oka, Beverley Jeanne

252

The motor theory of speech perception revised  

Microsoft Academic Search

Abstract A motor theory of speech perception, initially proposed to account for results of early experiments with synthetic speech, is now extensively revised to accom- modate recent findings, and to relate the assumptions of the theory to those that might be made,about other perceptual modes. According to the revised theory, phonetic information is perceived in a biologically distinct system, a

ALVIN M. LIBERMAN; IGNATIUS G. MATTINGLY

1985-01-01

253

Distributed speech processing over wireless mesh networks  

Microsoft Academic Search

In this paper, we propose a new framework for dis­ tributed speech processing over wireless mesh networks (WMNs). State of the art distributed speech processing systems address issues of sending information over centralized networks namely, cellular networks and wireless LANs whose main functionality is to switch and route packets from one location to another . Here we propose a method

Rajesh M. Hegde; B. S. Manoj

2011-01-01

254

A Speech After a Circle Dance  

E-print Network

After a Circle Dance Translation of title Description (to be used in archive entry) A lca provides a speech given upon the completion of a circle dance. Genre or type (i.e. epic, song, ritual) Speech Name of recorder (if different from...

Bkra shis bzang po

2009-01-01

255

The Neural Substrates of Infant Speech Perception  

ERIC Educational Resources Information Center

Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

2014-01-01

256

Improvements in children's speech recognition performance  

Microsoft Academic Search

There are several reasons why conventional speech recognition systems modeled on adult data fail to perform satisfactorily on children's speech input. For instance, children's vocal characteristics differ significantly from those of adults. In addition, their choices of vocabulary and sentence construction modalities usually do not conform to adult patterns. We describe comparative studies demonstrating the performance gain realized by adopting

Subrata Das; Don Nix; Michael Picheny

1998-01-01

257

General-Purpose Monitoring during Speech Production  

ERIC Educational Resources Information Center

The concept of "monitoring" refers to our ability to control our actions on-line. Monitoring involved in speech production is often described in psycholinguistic models as an inherent part of the language system. We probed the specificity of speech monitoring in two psycholinguistic experiments where electroencephalographic activities were…

Ries, Stephanie; Janssen, Niels; Dufau, Stephane; Alario, F.-Xavier; Burle, Boris

2011-01-01

258

Learning the Hidden Structure of Speech.  

ERIC Educational Resources Information Center

The back-propagation neural network learning procedure was applied to the analysis and recognition of speech. Because this learning procedure requires only examples of input-output pairs, it is not necessary to provide it with any initial description of speech features. Rather, the network develops on its own set of representational features…

Elman, Jeffery Locke; Zipser, David

259

Interactive Speech Translation in the DIPLOMAT Project  

Microsoft Academic Search

The DIPLOMAT rapid-deployment speech translation system is intended to allow naive users to communicate across a language barrier, without strong do- main restrictions, despite the error- prone nature of current speech and translation technologies. Achieving this ambitious goal depends in large part on allowing the users to interactively correct recognition and translation er- rors. We briefly present the Multi- Engine

Robert Frederking; Alexander Rudnicky; Christopher Hogan

1997-01-01

260

Acoustic characteristics of listener-constrained speech  

NASA Astrophysics Data System (ADS)

Relatively little is known about the acoustical modifications speakers employ to meet the various constraints-auditory, linguistic and otherwise-of their listeners. Similarly, the manner by which perceived listener constraints interact with speakers' adoption of specialized speech registers is poorly Hypo (H&H) theory offers a framework for examining the relationship between speech production and output-oriented goals for communication, suggesting that under certain circumstances speakers may attempt to minimize phonetic ambiguity by employing a ``hyperarticulated'' speaking style (Lindblom, 1990). It remains unclear, however, what the acoustic correlates of hyperarticulated speech are, and how, if at all, we might expect phonetic properties to change respective to different listener-constrained conditions. This paper is part of a preliminary investigation concerned with comparing the prosodic characteristics of speech produced across a range of listener constraints. Analyses are drawn from a corpus of read hyperarticulated speech data comprising eight adult, female speakers of English. Specialized registers include speech to foreigners, infant-directed speech, speech produced under noisy conditions, and human-machine interaction. The authors gratefully acknowledge financial support of the Irish Higher Education Authority, allocated to Fred Cummins for collaborative work with Media Lab Europe.

Ashby, Simone; Cummins, Fred

2003-04-01

261

MULTILINGUAL PHONE RECOGNITION OF SPONTANEOUS TELEPHONE SPEECH  

E-print Network

MULTILINGUAL PHONE RECOGNITION OF SPONTANEOUS TELEPHONE SPEECH C. Corredor-Ardoy, L. Lamel, M. Adda,lamel,madda,gauvaing@limsi.fr http://www.limsi.fr/TLP ABSTRACT In this paper we report on experiments with phone recognition of spontaneoustelephone speech. Phone recognizers were trained and assessed on IDEAL, a multilingual corpus containing

262

Speech-Communication: Theory and Models.  

ERIC Educational Resources Information Center

This volume, designed for advanced undergraduate and graduate students, addresses the research needs of behavioral scientists interested in various facets of speech-communication and provides a theoretical point of departure for investigating speech as a behavioral science. Chapters 1-7 provide the background and strategy for the formulation of…

Smith, Raymond G.

263

Building Searchable Collections of Enterprise Speech Data.  

ERIC Educational Resources Information Center

The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…

Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

264

Creation of two children's speech databases  

Microsoft Academic Search

Two sets of speech recordings were made from children talkers ranging in age from 5 to 18 years. One set was recorded via telephone channels (TEL) and the other using high-fidelity recording equipment (MIC). Special considerations and techniques required for the recording of speech from children are discussed. Also presented are (1) a description of the recording environment including ambient

J. D. Miller; Sungbok Lee; R. M. Uchanski; A. F. Heidbreder; B. B. Richman; J. Tadlock

1996-01-01

265

Emotion Modulates Early Auditory Response to Speech  

E-print Network

Emotion Modulates Early Auditory Response to Speech Jade Wang1 , Trent Nicol1 , Erika Skoe1 , Mikko's physiological response to speech, subjects looked at emotion-evoking pictures while 32-channel EEG evoked re from the Interna- tional Affective Picture System database. They were rated by participants

266

The Development of the Otago Speech Database  

Microsoft Academic Search

A collection of digits and words, spoken with a New Zealand English accent, has been systematically and formally collected. This collection along with the beginning and end points of the realised phonemes from within the words, comprise the Otago Speech Corpora. A relational database management system has been developed to house the speech data. This system provides much more usability,

S. J. Sinclair; C. I. Watson

1995-01-01

267

Speech Intelligibility in Severe Adductor Spasmodic Dysphonia  

ERIC Educational Resources Information Center

This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…

Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.

2004-01-01

268

Anatomy and Physiology of the Speech Mechanism.  

ERIC Educational Resources Information Center

This monograph on the anatomical and physiological aspects of the speech mechanism stresses the importance of a general understanding of the process of verbal communication. Contents include "Positions of the Body,""Basic Concepts Linked with the Speech Mechanism,""The Nervous System,""The Respiratory System--Sound-Power Source,""The…

Sheets, Boyd V.

269

Language and Legal Speech Acts: Decisions.  

ERIC Educational Resources Information Center

The first part of this essay argues specifically that legal speech acts are not statements but question/answer constructions. The focus in this section is on the underlying interrogative structure of the legal decision. The second part of the paper touches on significant topics related to the concept of legal speech acts, including the philosophic…

Kevelson, Roberta

270

The Modulation Transfer Function for Speech Intelligibility  

PubMed Central

We systematically determined which spectrotemporal modulations in speech are necessary for comprehension by human listeners. Speech comprehension has been shown to be robust to spectral and temporal degradations, but the specific relevance of particular degradations is arguable due to the complexity of the joint spectral and temporal information in the speech signal. We applied a novel modulation filtering technique to recorded sentences to restrict acoustic information quantitatively and to obtain a joint spectrotemporal modulation transfer function for speech comprehension, the speech MTF. For American English, the speech MTF showed the criticality of low modulation frequencies in both time and frequency. Comprehension was significantly impaired when temporal modulations <12 Hz or spectral modulations <4 cycles/kHz were removed. More specifically, the MTF was bandpass in temporal modulations and low-pass in spectral modulations: temporal modulations from 1 to 7 Hz and spectral modulations <1 cycles/kHz were the most important. We evaluated the importance of spectrotemporal modulations for vocal gender identification and found a different region of interest: removing spectral modulations between 3 and 7 cycles/kHz significantly increases gender misidentifications of female speakers. The determination of the speech MTF furnishes an additional method for producing speech signals with reduced bandwidth but high intelligibility. Such compression could be used for audio applications such as file compression or noise removal and for clinical applications such as signal processing for cochlear implants. PMID:19266016

Elliott, Taffeta M.; Theunissen, Frédéric E.

2009-01-01

271

Crossed Apraxia of Speech: A Case Report  

ERIC Educational Resources Information Center

The present study reports on the first case of crossed apraxia of speech (CAS) in a 69-year-old right-handed female (SE). The possibility of occurrence of apraxia of speech (AOS) following right hemisphere lesion is discussed in the context of known occurrences of ideomotor apraxias and acquired neurogenic stuttering in several cases with right…

Balasubramanian, Venu; Max, Ludo

2004-01-01

272

An Acquired Deficit of Audiovisual Speech Processing  

ERIC Educational Resources Information Center

We report a 53-year-old patient (AWF) who has an acquired deficit of audiovisual speech integration, characterized by a perceived temporal mismatch between speech sounds and the sight of moving lips. AWF was less accurate on an auditory digit span task with vision of a speaker's face as compared to a condition in which no visual information from…

Hamilton, Roy H.; Shenton, Jeffrey T.; Coslett, H. Branch

2006-01-01

273

Analog Acoustic Expression in Speech Communication  

ERIC Educational Resources Information Center

We present the first experimental evidence of a phenomenon in speech communication we call "analog acoustic expression." Speech is generally thought of as conveying information in two distinct ways: discrete linguistic-symbolic units such as words and sentences represent linguistic meaning, and continuous prosodic forms convey information about…

Shintel, Hadas; Nusbaum, Howard C.; Okrent, Arika

2006-01-01

274

Temporal recalibration during asynchronous audiovisual speech perception.  

PubMed

We investigated the consequences of monitoring an asynchronous audiovisual speech stream on the temporal perception of simultaneously presented vowel-consonant-vowel (VCV) audiovisual speech video clips. Participants made temporal order judgments (TOJs) regarding whether the speech-sound or the visual-speech gesture occurred first, for video clips presented at various different stimulus onset asynchronies. Throughout the experiment, half of the participants also monitored a continuous stream of words presented audiovisually, superimposed over the VCV video clips. The continuous (adapting) speech stream could either be presented in synchrony, or else with the auditory stream lagging by 300 ms. A significant shift (13 ms in the direction of the adapting stimulus in the point of subjective simultaneity) was observed in the TOJ task when participants monitored the asynchronous speech stream. This result suggests that the consequences of adapting to asynchronous speech extends beyond the case of simple audiovisual stimuli (as has recently been demonstrated by Navarra et al. in Cogn Brain Res 25:499-507, 2005) and can even affect the perception of more complex speech stimuli. PMID:17431598

Vatakis, Argiro; Navarra, Jordi; Soto-Faraco, Salvador; Spence, Charles

2007-07-01

275

Enhancement of speech corrupted by acoustic noise  

Microsoft Academic Search

This paper describes a method for enhancing speech corrupted by broadband noise. The method is based on the spectral noise subtraction method. The original method entails subtracting an estimate of the noise power spectrum from the speech power spectrum, setting negative differences to zero, recombining the new power spectrum with the original phase, and then reconstructing the time waveform. While

M. Berouti; R. Schwartz; J. Makhoul

1979-01-01

276

Enhancing Speech Discrimination through Stimulus Repetition  

ERIC Educational Resources Information Center

Purpose: To evaluate the effects of sequential and alternating repetition on speech-sound discrimination. Method: Typically hearing adults' discrimination of 3 pairs of speech-sound contrasts was assessed at 3 signal-to-noise ratios using the change/no-change procedure. On change trials, the standard and comparison stimuli differ; on no-change…

Holt, Rachael Frush

2011-01-01

277

Speech masking and cancelling and voice obscuration  

DOEpatents

A non-acoustic sensor is used to measure a user's speech and then broadcasts an obscuring acoustic signal diminishing the user's vocal acoustic output intensity and/or distorting the voice sounds making them unintelligible to persons nearby. The non-acoustic sensor is positioned proximate or contacting a user's neck or head skin tissue for sensing speech production information.

Holzrichter, John F.

2013-09-10

278

Why Impromptu Speech Is Easy To Understand.  

ERIC Educational Resources Information Center

Impromptu speech is characterized by the simultaneous processes of ideation (the elaboration and structuring of reasoning by the speaker as he improvises) and expression in the speaker. Other elements accompany this characteristic: division of speech flow into short segments, acoustic relief in the form of word stress following a pause, and both…

Le Feal, K. Dejean

279

A Gaze and Speech Multimodal Interface  

Microsoft Academic Search

Eyesight and speech are two channels that humans naturally use to communicate with each other. However both the eye tracking and the speech recognition technique available today are still far from perfect. Our goal is find how to effectively make use of these error-prone information from both modes, in order to use one mode to correct errors of another mode,

Qiaohui Zhang; Atsumi Imamiya; Kentaro Go; Xiaoyang Mao

2004-01-01

280

The Need for a Speech Corpus  

ERIC Educational Resources Information Center

This paper outlines the ongoing construction of a speech corpus for use by applied linguists and advanced EFL/ESL students. In the first part, sections 1-4, the need for improvements in the teaching of listening skills and pronunciation practice for EFL/ESL students is noted. It is argued that the use of authentic native-to-native speech is…

Campbell, Dermot F.; McDonnell, Ciaran; Meinardi, Marti; Richardson, Bunny

2007-01-01

281

Speech vs. singing: infants choose happier sounds  

PubMed Central

Infants prefer speech to non-vocal sounds and to non-human vocalizations, and they prefer happy-sounding speech to neutral speech. They also exhibit an interest in singing, but there is little knowledge of their relative interest in speech and singing. The present study explored infants' attention to unfamiliar audio samples of speech and singing. In Experiment 1, infants 4–13 months of age were exposed to happy-sounding infant-directed speech vs. hummed lullabies by the same woman. They listened significantly longer to the speech, which had considerably greater acoustic variability and expressiveness, than to the lullabies. In Experiment 2, infants of comparable age who heard the lyrics of a Turkish children's song spoken vs. sung in a joyful/happy manner did not exhibit differential listening. Infants in Experiment 3 heard the happily sung lyrics of the Turkish children's song vs. a version that was spoken in an adult-directed or affectively neutral manner. They listened significantly longer to the sung version. Overall, happy voice quality rather than vocal mode (speech or singing) was the principal contributor to infant attention, regardless of age. PMID:23805119

Corbeil, Marieve; Trehub, Sandra E.; Peretz, Isabelle

2013-01-01

282

Scaffolded-Language Intervention: Speech Production Outcomes  

ERIC Educational Resources Information Center

This study investigated the effects of a scaffolded-language intervention using cloze procedures, semantically contingent expansions, contrastive word pairs, and direct models on speech abilities in two preschoolers with speech and language impairment speaking African American English. Effects of the lexical and phonological characteristics (i.e.,…

Bellon-Harn, Monica L.; Credeur-Pampolina, Maggie E.; LeBoeuf, Lexie

2013-01-01

283

Voice Modulations in German Ironic Speech  

ERIC Educational Resources Information Center

Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech, and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German "ironic criticism"…

Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

2011-01-01

284

Gesture When There Is No Speech Model.  

ERIC Educational Resources Information Center

Summarizes research on structure of gesture produced in absence of speech. Finds that gestures of both hearing individuals who have been asked not to speak and deaf individuals who depend solely on gesture to communicate (including homesigners) exhibit characteristics typically associated with speech; gestures are segmented and linear rather than…

Morford, Jill P.

1998-01-01

285

Hidden Dynamic Models for Speech Processing Applications  

E-print Network

effort to seek internal dynamics of human speech that can reflect the continuous shape change of the vocal tract and benefit the current speech technology, the second part of the thesis turns to a study of vocal-tract-resonance (VTR) dynamics, built upon the insights and experiences gained from studying

Chaudhuri, Surajit

286

The Lombard Effect on Alaryngeal Speech.  

ERIC Educational Resources Information Center

The study investigated the Lombard effect (evoking increased speech intensity by applying masking noise to ears of talker) on the speech of esophageal talkers, artificial larynx users, and normal speakers. The noise condition produced the highest intensity increase in the esophageal speakers. (Author/DB)

Zeine, Lina; Brandt, John F.

1988-01-01

287

Pronunciation Modeling for Large Vocabulary Speech Recognition  

ERIC Educational Resources Information Center

The large pronunciation variability of words in conversational speech is one of the major causes of low accuracy in automatic speech recognition (ASR). Many pronunciation modeling approaches have been developed to address this problem. Some explicitly manipulate the pronunciation dictionary as well as the set of the units used to define the…

Kantor, Arthur

2010-01-01

288

Speech Recognition with Primarily Temporal Cues  

Microsoft Academic Search

Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants,

Robert V. Shannon; Fan-Gang Zeng; Vivek Kamath; John Wygonski; Michael Ekelid

1995-01-01

289

Hate Speech: A Call to Principles.  

ERIC Educational Resources Information Center

Reviews the history of First Amendment rulings as they relate to speech codes and of other regulations directed at the content of speech. A case study, based on an experience at Trenton State College, details the legal constraints, principles, and practices that Student Affairs administrators should be aware of regarding such situations.…

Klepper, William M.; Bakken, Timothy

1997-01-01

290

Pulmonic Ingressive Speech in Shetland English  

ERIC Educational Resources Information Center

This paper presents a study of pulmonic ingressive speech, a severely understudied phenomenon within varieties of English. While ingressive speech has been reported for several parts of the British Isles, New England, and eastern Canada, thus far Newfoundland appears to be the only locality where researchers have managed to provide substantial…

Sundkvist, Peter

2012-01-01

291

Speech neglect: A strange educational blind spot  

NASA Astrophysics Data System (ADS)

Speaking is universally acknowledged as an important human talent, yet as a topic of educated common knowledge, it is peculiarly neglected. Partly, this is a consequence of the relatively recent growth of research on speech perception, production, and development, but also a function of the way that information is sliced up by undergraduate colleges. Although the basic acoustic mechanism of vowel production was known to Helmholtz, the ability to view speech production as a physiological event is evolving even now with such techniques as fMRI. Intensive research on speech perception emerged only in the early 1930s as Fletcher and the engineers at Bell Telephone Laboratories developed the transmission of speech over telephone lines. The study of speech development was revolutionized by the papers of Eimas and his colleagues on speech perception in infants in the 1970s. Dissemination of knowledge in these fields is the responsibility of no single academic discipline. It forms a center for two departments, Linguistics, and Speech and Hearing, but in the former, there is a heavy emphasis on other aspects of language than speech and, in the latter, a focus on clinical practice. For psychologists, it is a rather minor component of a very diverse assembly of topics. I will focus on these three fields in proposing possible remedies.

Harris, Katherine Safford

2005-09-01

292

The evolution of speech: vision, rhythm, cooperation  

E-print Network

University, Princeton, NJ 08544, USA A full account of human speech evolution must consider its multisensory [9]). Each of these factors may have played an important role in the evolution of human communication in a piecemeal fashion. As such, determining the many substrates required for the evolution of human speech

Ghazanfar, Asif

293

SPEECH LEVELS IN VARIOUS NOISE ENVIRONMENTS  

EPA Science Inventory

The goal of this study was to determine average speech levels used by people when conversing in different levels of background noise. The non-laboratory environments where speech was recorded were: high school classrooms, homes, hospitals, department stores, trains and commercial...

294

Speech synthesis by phonological structure matching  

Microsoft Academic Search

This paper presents a new technique for speech synthe- sis by unit selection. The technique works by specifying the synthesis target and the speech database as phonolog- ical trees, and using a selection algorithm which finds the largest parts of trees in the database which match parts of the target tree. The technique avoids many of the errors made by

Paul Taylor; Alan W. Black

1999-01-01

295

Speech Fluency in Fragile X Syndrome  

ERIC Educational Resources Information Center

The present study investigated the dysfluencies in the speech of nine French speaking individuals with fragile X syndrome. Type, number, and loci of dysfluencies were analysed. The study confirms that dysfluencies are a common feature of the speech of individuals with fragile X syndrome but also indicates that the dysfluency pattern displayed is…

Van Borsel, John; Dor, Orianne; Rondal, Jean

2008-01-01

296

Disfluencies in the Analysis of Speech Data.  

ERIC Educational Resources Information Center

Discusses a study of concord phenomena in spoken Brazilian Portuguese. Findings indicate the presence of disfluencies, including apparent corrections, in about 15% of the relevant tokens in the corpus of recorded speech data. It is concluded that speech is not overly laden with errors, and there is nothing in the data to mislead the language…

Naro, Anthony Julius; Scherre, Maria Marta Pereira

1996-01-01

297

Speech entrainment enables patients with Broca’s aphasia to produce fluent speech  

PubMed Central

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

298

Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech  

NASA Technical Reports Server (NTRS)

An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.

Hieronymus, James; Aist, Greg; Dowding, John

2006-01-01

299

Nonlinear processing of phase vocoded speech  

NASA Astrophysics Data System (ADS)

Although adaptive coding is in widespread use, the availability of very large scale integrated digital signal processing chips makes filterbank analysis and synthesis of speech signals very economical. This ear modeling simplifies the application of masking properties of the ear. Experiments were conducted to determine the number of filterbank outputs needed to reconstruct speech signals using a logarithmic bandpass filterbank consisting of an analysis and a synthesis filterbank. With eight outputs, the original speech sentence is found to be very clear; addition of 32 more outputs restores the breathiness to the sentence. A surprising degree of intelligibility is retained even with one output. The bandwidth of the speech signals is limited. The use of these filterbanks for speech enhancement and modest bitrate transmission appears favorable.

Gagnon, Luc; McGee, W. F.

300

Voice Quality Modelling for Expressive Speech Synthesis  

PubMed Central

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

Socoró, Joan Claudi

2014-01-01

301

Speech cues contribute to audiovisual spatial integration.  

PubMed

Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral 'what' and dorsal 'where' pathways. PMID:21909378

Bishop, Christopher W; Miller, Lee M

2011-01-01

302

Speech recognition technology: a critique.  

PubMed Central

This paper introduces the session on advanced speech recognition technology. The two papers comprising this session argue that current technology yields a performance that is only an order of magnitude in error rate away from human performance and that incremental improvements will bring us to that desired level. I argue that, to the contrary, present performance is far removed from human performance and a revolution in our thinking is required to achieve the goal. It is further asserted that to bring about the revolution more effort should be expended on basic research and less on trying to prematurely commercialize a deficient technology. PMID:7479808

Levinson, S E

1995-01-01

303

Speech activity detection using accelerometer.  

PubMed

The level of social activity is linked to the overall wellbeing and to various disorders, including stress. In this regard, a myriad of automatic solutions for monitoring social interactions have been proposed, usually including audio data analysis. Such approaches often face legal and ethical issues and they may also raise privacy concerns in monitored subjects thus affecting their natural behaviour. In this paper we present an accelerometer-based speech detection which does not require capturing sensitive data while being an easily applicable and a cost-effective solution. PMID:23366338

Matic, Aleksandar; Osmani, Venet; Mayora, Oscar

2012-01-01

304

Parallel systems in the control of speech.  

PubMed

Modern neuroimaging techniques have advanced our understanding of the distributed anatomy of speech production, beyond that inferred from clinico-pathological correlations. However, much remains unknown about functional interactions between anatomically distinct components of this speech production network. One reason for this is the need to separate spatially overlapping neural signals supporting diverse cortical functions. We took three separate human functional magnetic resonance imaging (fMRI) datasets (two speech production, one "rest"). In each we decomposed the neural activity within the left posterior perisylvian speech region into discrete components. This decomposition robustly identified two overlapping spatio-temporal components, one centered on the left posterior superior temporal gyrus (pSTG), the other on the adjacent ventral anterior parietal lobe (vAPL). The pSTG was functionally connected with bilateral superior temporal and inferior frontal regions, whereas the vAPL was connected with other parietal regions, lateral and medial. Surprisingly, the components displayed spatial anti-correlation, in which the negative functional connectivity of each component overlapped with the other component's positive functional connectivity, suggesting that these two systems operate separately and possibly in competition. The speech tasks reliably modulated activity in both pSTG and vAPL suggesting they are involved in speech production, but their activity patterns dissociate in response to different speech demands. These components were also identified in subjects at "rest" and not engaged in overt speech production. These findings indicate that the neural architecture underlying speech production involves parallel distinct components that converge within posterior peri-sylvian cortex, explaining, in part, why this region is so important for speech production. PMID:23723184

Simmonds, Anna J; Wise, Richard J S; Collins, Catherine; Redjep, Ozlem; Sharp, David J; Iverson, Paul; Leech, Robert

2014-05-01

305

Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance  

NASA Astrophysics Data System (ADS)

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincaré section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincaré sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincaré sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features.

Jafari, Ayyoob; Almasganj, Farshad; Bidhendi, Maryam Nabi

2010-09-01

306

An articulatorily constrained, maximum entropy approach to speech recognition and speech coding  

SciTech Connect

Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

Hogden, J.

1996-12-31

307

Audiovisual Cues and Perceptual Learning of Spectrally Distorted Speech  

ERIC Educational Resources Information Center

Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties…

Pilling, Michael; Thomas, Sharon

2011-01-01

308

Listening to talking faces: motor cortical activation during speech perception  

E-print Network

Listening to talking faces: motor cortical activation during speech perception Jeremy I. Skipper that audiovisual speech perception activated a network of brain regions that included cortical motor areas involved into the speech perception process involves a network of multimodal brain regions associated with speech

Coulson, Seana

309

The Speech Discipline in Crisis - - A Cause for Hope.  

ERIC Educational Resources Information Center

Speech communication is a distinct discipline, but one in a healthy state of conflict between theory and practice. The crisis in the speech discipline (and in academic generally) exists because speech does not present itself as a consumable value; quality program decisions are not made; speech is often conceived as only one subject matter; general…

Lanigan, Richard L.

310

Monkey Lipsmacking Develops Like the Human Speech Rhythm  

ERIC Educational Resources Information Center

Across all languages studied to date, audiovisual speech exhibits a consistent rhythmic structure. This rhythm is critical to speech perception. Some have suggested that the speech rhythm evolved "de novo" in humans. An alternative account--the one we explored here--is that the rhythm of speech evolved through the modification of rhythmic facial…

Morrill, Ryan J.; Paukner, Annika; Ferrari, Pier F.; Ghazanfar, Asif A.

2012-01-01

311

On the Dynamics of Casual and Careful Speech.  

ERIC Educational Resources Information Center

Comparative statistical data are presented on speech dynamic (as contrasted with lexical and rhetorical) aspects of major speech styles. Representative samples of story retelling, lectures, speeches, sermons, interviews, and panel discussions serve to determine posited differences between casual and careful speech. Data are drawn from 15,393…

Hieke, A. E.

312

Subtyping Children With Speech Sound Disorders by Endophenotypes  

PubMed Central

Purpose The present study examined associations of 5 endophenotypes (i.e., measurable skills that are closely associated with speech sound disorders and are useful in detecting genetic influences on speech sound production), oral motor skills, phonological memory, phonological awareness, vocabulary, and speeded naming, with 3 clinical criteria for classifying speech sound disorders: severity of speech sound disorders, our previously reported clinical subtypes (speech sound disorders alone, speech sound disorders with language impairment, and childhood apraxia of speech), and the comorbid condition of reading disorders. Participants and Method Children with speech sound disorders and their siblings were assessed at early childhood (ages 4–7 years) on measures of the 5 endophenotypes. Severity of speech sound disorders was determined using the z score for Percent Consonants Correct—Revised (developed by Shriberg, Austin, Lewis, McSweeny, & Wilson, 1997). Analyses of variance were employed to determine how these endophenotypes differed among the clinical subtypes of speech sound disorders. Results and Conclusions Phonological memory was related to all 3 clinical classifications of speech sound disorders. Our previous subtypes of speech sound disorders and comorbid conditions of language impairment and reading disorder were associated with phonological awareness, while severity of speech sound disorders was weakly associated with this endophenotype. Vocabulary was associated with mild versus moderate speech sound disorders, as well as comorbid conditions of language impairment and reading disorder. These 3 endophenotypes proved useful in differentiating subtypes of speech sound disorders and in validating current clinical classifications of speech sound disorders. PMID:22844175

Lewis, Barbara A.; Avrich, Allison A.; Freebairn, Lisa A.; Taylor, H. Gerry; Iyengar, Sudha K.; Stein, Catherine M.

2012-01-01

313

Auditory-Visual Speech Processing 2005 (AVSP'05)  

E-print Network

of multiple tiers of visual speech gestures, phonemes and syllable boundaries. The CUAVE database [6] providedAuditory-Visual Speech Processing 2005 (AVSP'05) British Columbia, Canada July 24-27, 2005 ISCA Archive http://www.isca-speech.org/archive An Agent-based Framework for Auditory-Visual Speech

Reyle, Uwe

314

THE CMU ARCTIC SPEECH DATABASES John Kominek, Alan W Black  

E-print Network

THE CMU ARCTIC SPEECH DATABASES John Kominek, Alan W Black {jkominek,awb}@cs.cmu.edu Language for the purpose of speech synthesis research. These single speaker speech databases have been carefully recorded. In addition to wavefiles, the databases provide complete support for the Festival Speech Synthesis System

Black, Alan W

315

Cues for Hesitation in Speech Synthesis Rolf Carlson1  

E-print Network

a sequence of experiments using Swedish speech synthesis. A background for our effort is a database developedCues for Hesitation in Speech Synthesis Rolf Carlson1 , Kjell Gustafson 1,2 and Eva Strangert3* 1 CSC, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden {rolf;kjellg}@speech.kth.se 2

Carlson, Rolf

316

The IIIT-H Indic Speech Databases Kishore Prahallad1  

E-print Network

The IIIT-H Indic Speech Databases Kishore Prahallad1 , E.Naresh Kumar1 , Venkatesh Keri1 , S@cs.cmu.edu Abstract This paper discusses the efforts in collecting speech databases for Indian languages ­ Bengali in collecting these databases, and demonstrate their usage in speech syn- thesis. By releasing these speech

Black, Alan W

317

Speechdat multilingual speech databases for teleservices: across the finish line  

Microsoft Academic Search

The goal of the SpeechDat project is to develop spoken language resources for speech recognisers s uited to realise voice driven teleservices. SpeechDat created speech databases for all official languages of the European Union and some major dialectal varieties and minority languages. The size of the databases ranges between 500 and 5000 speakers. In total 20 d atabases are recorded

Harald Höge; Christoph Draxler; Henk van den Heuvel; Finn Tore Johansen; Eric Sanders; Herbert S. Tropf

1999-01-01

318

Optimising selection of units from speech databases for concatenative synthesis  

Microsoft Academic Search

Concatenating units of natural speech is one methodof speech synthesis1. Most such systems use an inventoryof fixed length units, typically diphones ortriphones with one instance of each type. An alternativeis to use more varied, non-uniform units extractedfrom large speech databases containing multipleinstances of each. The greater variability insuch natural speech segments allows closer modelingof naturalness and differences in speaking styles,

Alan W. Black; Nick Campbell

1995-01-01

319

Design and collection of Czech Lombard speech database  

Microsoft Academic Search

In this paper, design, collection and parameters of newly proposed Czech Lombard Speech Database (CLSD) are presented. The database focuses on analysis and modeling of Lombard effect to achieve robust speech recognition improvement. The CLSD consists of neutral speech and speech produced in various types of simulated noisy background. In comparison to available databases dealing with Lombard effect, an extensive

Hynek Boril; Petr Pollák

2005-01-01

320

ROBUST RECOGNITION OF SMALL VOCABULARY TELEPHONE - QUALITY SPEECH  

Microsoft Academic Search

Considerable progress has been made in the field of automatic speech recognition in recent years, especially for high-quality (full bandwidth and noise-free) speech. However, good recognition accuracy is difficult to achieve when the incoming speech is passed through a telephone channel. At the same time, the task of speech recognition over telephone lines is growing in importance, as the number

Dragos BURILEANU; Mihai SIMA; Cristian NEGRESCU; Victor CROITORU

2003-01-01

321

Reference-free automatic quality assessment of tracheoesophageal speech  

Microsoft Academic Search

Evaluation of the quality of tracheoesophageal (TE) speech using machines instead of human experts can enhance the voice rehabilitation process for patients who have undergone total laryngectomy and voice restoration. Towards the goal of devising a reference-free TE speech quality estimation algorithm, we investigate the efficacy of speech signal features that are used in standard telephone-speech quality assessment algorithms, in

Andy Huang; Tiago H. Falk; Wai-Yip Chan; Vijay Parsa; Philip Doyle

2009-01-01

322

An Improved Algorithm for the Automatic Segmentation of Speech Corpora  

Microsoft Academic Search

In this paper we describe an improved algorithm for the automatic segmentation of speech corpora. Apart from their usefulness in several speech technology domains, segmentations provide easy access to speech corpora by using time stamps to couple the orthographic transcription to the speech signal. The segmentation tool we propose is based on the Forward-Backward algorithm. The Forward-Back- ward method not

Tom Laureys; Kris Demuynck; Jacques Duchateau; Patrick Wambacq

323

Discriminative approach to dynamic variance adaptation for noisy speech recognition  

Microsoft Academic Search

The performance of automatic speech recognition suffers from severe degradation in the presence of noise or reverberation. One conventional approach for handling such acoustic distortions is to use a speech enhancement technique prior to recognition. However, most speech enhancement techniques introduce artifacts that create a mismatch between the enhanced speech features and the acoustic model used for recognition, therefore limiting

Marc Delcroix; Shinji Watanabe; Tomohiro Nakatani; Atsushi Nakamura

2011-01-01

324

A Comparison of LBG and ADPCM Speech Compression Techniques  

Microsoft Academic Search

Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study

Rajesh G. Bachu; Jignasa Patel; Buket D. Barkana

2010-01-01

325

Contemporary Reflections on Speech-Based Language Learning  

ERIC Educational Resources Information Center

In "The Relation of Language to Mental Development and of Speech to Language Teaching," S.G. Davidson displayed several timeless insights into the role of speech in developing language and reasons for using speech as the basis for instruction for children who are deaf and hard of hearing. His understanding that speech includes more than merely…

Gustafson, Marianne

2009-01-01

326

Review: The speech corpus and database of Japanese dialects  

Microsoft Academic Search

is a recording of readings of words, phrases, sentences, and texts in Japanese dialects. The focus of the speech material is on prosody, in particular, on accentual variations, and to a lesser extent on intonation. In addition to the dialectal materials, SCDJD contains speech of the minority language Ainu, Japanese traditional singings, school children's speech, and speech by the foreign

Yasuko Nagano-madsen

327

Prosodic Features and Speech Naturalness in Individuals with Dysarthria  

ERIC Educational Resources Information Center

Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech

Klopfenstein, Marie I.

2012-01-01

328

Experiment in Learning to Discriminate Frequency Transposed Speech.  

ERIC Educational Resources Information Center

In order to improve speech perception by transposing the speech signals to lower frequencies, to determine which aspects of the information in the acoustic speech signals were influenced by transposition, and to compare two different methods of training speech perception, 44 subjects were trained to discriminate between transposed words or…

Ahlstrom, K.G.; And Others

329

The Effectiveness of Clear Speech as a Masker  

ERIC Educational Resources Information Center

Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

2010-01-01

330

Analysis and classification of speech mode: whispered through shouted  

Microsoft Academic Search

Variation in vocal effort represents one of the most challenging problems in maintaining speech system performance for coding, speech and speaker recognition. Changes in vocal effort (or mode) result in a fundamental change in speech production which is not simply a change in volume. This is the first study to collectively consider the five speech modes: whispered, soft, neutral, loud

Chi Zhang; John H. L. Hansen

2007-01-01

331

Computational Differences between Whispered and Non-Whispered Speech  

ERIC Educational Resources Information Center

Whispering is a common type of speech which is not often studied in speech technology. Perceptual and physiological studies show us that whispered speech is subtly different from phonated speech, and is surprisingly able to carry a tremendous amount of information. In this dissertation we consider the question: What makes whispering a good form of…

Lim, Boon Pang

2011-01-01

332

Compressed Speech Technology: Implications for Learning and Instruction.  

ERIC Educational Resources Information Center

This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…

Sullivan, LeRoy L.

333

Tracking Change in Children with Severe and Persisting Speech Difficulties  

ERIC Educational Resources Information Center

Standardised tests of whole-word accuracy are popular in the speech pathology and developmental psychology literature as measures of children's speech performance. However, they may not be sensitive enough to measure changes in speech output in children with severe and persisting speech difficulties (SPSD). To identify the best ways of doing this,…

Newbold, Elisabeth Joy; Stackhouse, Joy; Wells, Bill

2013-01-01

334

Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications  

ERIC Educational Resources Information Center

Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech

Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara

2009-01-01

335

Free Speech Movement Digital Archive  

NSDL National Science Digital Library

The Free Speech Movement that began on the Berkeley campus of the University of California in 1964 began a groundswell of student protests and campus-based social activism that would later spread across the United States for the remainder of the decade. With a substantial gift from Stephen M. Silberstein in the late 1990s, the University of California Berkeley Library began an ambitious program to document the role of those students and other participants who gave a coherent and organized voice to the Free Speech Movement. The primary documents provided here are quite extensive and include transcriptions of legal defense documents, leaflets passed out by members of the movement, letters from administrators and faculty members regarding the movement and student unrest, and oral histories. The site also provided a detailed bibliography to material dealing with the movement and a chronology of key events within its early history. Perhaps the most engaging part of the site is the Social Activism Sound Recording Project, which features numerous audio clips of faculty and academic senate debates, student protests, and discussions that were recorded during this period.

1998-01-01

336

Speech and hearing acoustics at Bell Labs  

NASA Astrophysics Data System (ADS)

A. G. Bell's interest in basic research of speech and hearing was one of the keys to the Bell Lab culture. When the first network circuits were built, speech quality was very low. Research was needed on speech articulation (the probability correct for nonsense speech sounds). George Campbell, a mathematician and ultimate engineer, and expert on Heaviside, extended work of Lord Rayleigh. In 1910 Campbell was the first to generate consonant identification confusion matrices, and show sound grouping (features). Crandall took up this work and attempted (but failed) to define the articulation density over frequency. By 1921 Fletcher had solved Crandall's problem, with the the Articulation Index theory, based on the idea of independent feature perception, across frequency and time. In 1929 he wrote his first book, Speech and Hearing, which sold over 5000 copies. His second book, Speech and Hearing in Communications, was first released in 1953, after his retirement. Other key people that worked closely with Fletcher were J. C. Steinberg, Munson, French, Galt, Hartley, Kingsbury, Nyquist, Sivian, White, and Wegel. I will try to introduce each of these people and describe their contributions to the speech and hearing field.

Allen, Jont

2001-05-01

337

SYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Bjrn Granstrm, Magnus Lundeberg, Giampiero Salvi,  

E-print Network

) were trained on a phonetically transcribed telephone speech database. The output of the HMMs phoneme strings from a database as input to the visual speech synthesis The two methods were evaluatedSYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Björn Granström

Beskow, Jonas

338

Empathy, Ways of Knowing, and Interdependence as Mediators of Gender Differences in Attitudes toward Hate Speech and Freedom of Speech  

ERIC Educational Resources Information Center

Women are more intolerant of hate speech than men. This study examined relationality measures as mediators of gender differences in the perception of the harm of hate speech and the importance of freedom of speech. Participants were 107 male and 123 female college students. Questionnaires assessed the perceived harm of hate speech, the importance…

Cowan, Gloria; Khatchadourian, Desiree

2003-01-01

339

Speech Clarity Index (?): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy  

NASA Astrophysics Data System (ADS)

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (?) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of ? as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that ? is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.

Kayasith, Prakasith; Theeramunkong, Thanaruk

340

The sensorimotor and social sides of the architecture of speech.  

PubMed

Speech is a complex skill to master. In addition to sophisticated phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps that couple heard speech sounds with motor programs for speech production; imitation and self-imitation mechanisms that can train the sensorimotor maps to reproduce heard speech sounds; and a "pedagogical" learning environment that supports tutor learning. PMID:25514959

Pezzulo, Giovanni; Barca, Laura; D'Ausilio, Alessando

2014-12-01

341

Modulation Domain Improved Adaptive Gain Equalizer for Single Channel Speech Enhancement.  

E-print Network

??Human speech is the main method for personal communication. During communication speech may get impaired with ubiquitous noise. Enduring interfering noise decreases speech intelligibility and… (more)

, ; Shaheen, Shakira

2013-01-01

342

Federal Reserve Board: Speeches and Testimonies  

NSDL National Science Digital Library

Some feel that every time US Federal Reserve Board Chairman Allan Greenspan speaks, the US stock market shudders. He gave testimony before the Senate Banking Committee on February 26, 1997 and the Dow Jones industrial average plunged over 55 points that day (after a rebound from a 122 point loss). You can read the chairman's testimony and his recent speeches at the Federal Reserve Board site (the speeches and testimony of other officials are also available). Read the speeches and testimony, watch the market, and judge for yourself the power of one man in the US economy.

343

The Asian network-based speech-to-speech translation system  

Microsoft Academic Search

This paper outlines the first Asian network-based speech-to-speech translation system developed by the Asian Speech Translation Advanced Research (A-STAR) consortium. The system was designed to translate common spoken utterances of travel conversations from a certain source language into multiple target languages in order to facilitate multiparty travel conversations between people speaking different Asian languages. Each A-STAR member contributes one or

Sakriani Sakti; Noriyuki Kimura; Michael Paul; Chiori Hori; Eiichiro Sumita; Satoshi Nakamura; Jun Park; Chai Wutiwiwatchai; Bo Xu; Hammam Riza; Karunesh Arora; Chi Mai Luong; Haizhou Li

2009-01-01

344

Children use visual speech to compensate for non-intact auditory speech.  

PubMed

We investigated whether visual speech fills in non-intact auditory speech (excised consonant onsets) in typically developing children from 4 to 14 years of age. Stimuli with the excised auditory onsets were presented in the audiovisual (AV) and auditory-only (AO) modes. A visual speech fill-in effect occurs when listeners experience hearing the same non-intact auditory stimulus (e.g., /-b/ag) as different depending on the presence/absence of visual speech such as hearing /bag/ in the AV mode but hearing /ag/ in the AO mode. We quantified the visual speech fill-in effect by the difference in the number of correct consonant onset responses between the modes. We found that easy visual speech cues /b/ provided greater filling in than difficult cues /g/. Only older children benefited from difficult visual speech cues, whereas all children benefited from easy visual speech cues, although 4- and 5-year-olds did not benefit as much as older children. To explore task demands, we compared results on our new task with those on the McGurk task. The influence of visual speech was uniquely associated with age and vocabulary abilities for the visual speech fill--in effect but was uniquely associated with speechreading skills for the McGurk effect. This dissociation implies that visual speech--as processed by children-is a complicated and multifaceted phenomenon underpinned by heterogeneous abilities. These results emphasize that children perceive a speaker's utterance rather than the auditory stimulus per se. In children, as in adults, there is more to speech perception than meets the ear. PMID:24974346

Jerger, Susan; Damian, Markus F; Tye-Murray, Nancy; Abdi, Hervé

2014-10-01

345

Towards every-citizen²s speech interface: an application generator for speech interfaces to databases  

Microsoft Academic Search

One of the acknowledged impediments to the widespread use of speech interfaces is the portability problem, namely the consider- able amount of labor, data and expertise needed to develop such interfaces in new domains. Under the Universal Speech Interface (USI) project, we have designed unified look-and-feel speech in- terfaces that employ semi-structured interaction and thus obviate the need for data

Arthur R. Toth; Thomas K. Harris; James Sanders; Stefanie Shriver; Roni Rosenfeld

2002-01-01

346

SpeechDat-Car: Towards a collection of speech databases for automotive environments  

Microsoft Academic Search

The SpeechDat-Car project is a 4 th framework EC project in the Language Engineering programme. It aims at collecting a set of nine speech databases to support training and testing of robust multilingual speech recognition for in-car applications. The consortium participants are car manufacturers, telephone communications providers, and universities. This paper describes the background o f the project, its organisation,

Henk van den Heuvel; Antonio Bonafonte; Jerome Boudy; S. Dufour; Ph. Lockwood; A. Moreno; G. Richard

1999-01-01

347

Acquisition of Ultrasound, Video and Acoustic Speech Data for a Silent-Speech Interface Application  

Microsoft Academic Search

This article addresses synchronous acquisition of high-speed multimodal speech data, composed of ultrasound and optical images of the vocal tract together with the acoustic speech signal, for a silent speech interface. Built around a laptop-based portable ultrasound machine (Terason T3000) and an industrial camera, an acquisition setup is described together with its acquisition software called Ultraspeech. The system is currently

T. Hueber; G. Chollet; B. Denby; M. Stone

2008-01-01

348

Real-time lexical competitions during speech-in-speech comprehension  

Microsoft Academic Search

This study aimed at characterizing the cognitive processes that come into play during speech-in-speech comprehension by examining lexical competitions between target speech and concurrent multi-talker babble. We investigated the effects of number of simultaneous talkers (2, 4, 6 or 8) and of the token frequency of the words that compose the babble (high or low) on lexical decision to target

Véronique Boulenger; Michel Hoen; Emmanuel Ferragne; François Pellegrino; Fanny Meunier

2010-01-01

349

Acoustic differences among casual, conversational, and read speech  

NASA Astrophysics Data System (ADS)

Speech is a complex behavior that allows speakers to use many variations to satisfy the demands connected with multiple speaking environments. Speech research typically obtains speech samples in a controlled laboratory setting using read material, yet anecdotal observations of such speech, particularly from talkers with a speech and language impairment, have identified a "performance" effect in the produced speech which masks the characteristics of impaired speech outside of the lab (Goberman, Recker, & Parveen, 2010). The aim of the current study was to investigate acoustic differences among laboratory read, laboratory conversational, and casual speech through well-defined speech tasks in the laboratory and in talkers' natural environments. Eleven healthy research participants performed lab recording tasks (19 read sentences and a dialogue about their life) and collected natural-environment recordings of themselves over 3-day periods using portable recorders. Segments were analyzed for articulatory, voice, and prosodic acoustic characteristics using computer software and hand counting. The current study results indicate that lab-read speech was significantly different from casual speech: greater articulation range, improved voice quality measures, lower speech rate, and lower mean pitch. One implication of the results is that different laboratory techniques may be beneficial in obtaining speech samples that are more like casual speech, thus making it easier to correctly analyze abnormal speech characteristics with fewer errors.

Pinnow, DeAnna

350

Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production  

PubMed Central

Auditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies) during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment. PMID:23874316

Sato, Marc; Grabski, Krystyna; Garnier, Maëva; Granjon, Lionel; Schwartz, Jean-Luc; Nguyen, Noël

2013-01-01

351

How visual timing and form information affect speech and non-speech processing.  

PubMed

Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming. PMID:25190328

Kim, Jeesun; Davis, Chris

2014-10-01

352

Speech information retrieval: a review  

SciTech Connect

Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

Hafen, Ryan P.; Henry, Michael J.

2012-11-01

353

Electrophysiological evidence for speech-specific audiovisual integration.  

PubMed

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. PMID:24291340

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

354

Perceptual Evaluation of Video-Realistic Speech  

E-print Network

abstract With many visual speech animation techniques now available, there is a clear need for systematic perceptual evaluation schemes. We describe here our scheme and its application to a new video-realistic ...

Geiger, Gadi

2003-02-28

355

Photo annotation and retrieval through speech  

E-print Network

In this thesis I describe the development of a speech-based annotation and retrieval system for digital photographs. The system uses a client/server architecture which allows photographs to be captured and annotated on ...

Sherry, Brennan P

2007-01-01

356

Visualizations: Speech, Language & Autistic Spectrum Disorder  

E-print Network

children, including those with Autistic Spectrum Disorder (ASD) have explicit difficulty developing, Experimentation, Human Fators Keywords Accessibility, Visualization, Autism, Children, Speech, Vocalization]. Children with ASD are not a homogeneous group. As a "spectrum" disorder, individuals who are considered low

Karahalios, Karrie G.

357

Making speech recognition work on the web  

E-print Network

We present an improved Audio Controller for Web-Accessible Multimodal Interface toolkit -- a system that provides a simple way for developers to add speech recognition to web pages. Our improved system offers increased ...

Varenhorst, Christopher J

2011-01-01

358

commanimation: Creating and managing animations via speech  

E-print Network

A speech controlled animation system is both a useful application program as well as a laboratory in which to investigate context aware applications as well as controlling errors. The user need not have prior knowledge or ...

Kim, Hana

359

Automated Speech Recognition in air traffic control  

E-print Network

Over the past few years, the technology and performance of Automated Speech Recognition (ASR) systems has been improving steadily. This has resulted in their successful use in a number of industrial applications. Motivated ...

Trikas, Thanassis

1987-01-01

360

Speech therapy and voice recognition instrument  

NASA Technical Reports Server (NTRS)

Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.

Cohen, J.; Babcock, M. L.

1972-01-01

361

Pronunciation learning for automatic speech recognition  

E-print Network

In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers (ASRs). Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform ...

Badr, Ibrahim

2011-01-01

362

Dialog act modelling for conversational speech   

E-print Network

We describe an integrated approach for statistical modeling of discourse structure for natural conversational speech. Our model is based on 42 'dialog acts’ (e.g., Statement, Question, Backchannel, Agreement, Disagreement, ...

Stolcke, Andreas; Shriberg, Elizabeth; Bates, Rebecca; Coccaro, Noah; Jurafsky, Daniel; Martin, Rachel; Meteer, Marie; Ries, Klaus; Taylor, Paul; Van Ess-Dykema, Carol

1998-01-01

363

CHATR: A generic speech synthesis system   

E-print Network

This paper describes a generic speech synthesis system called CHATR which is being developed at ATR. CHATR is designed in a modular way so that module parameters and even which modules are actually used may be set and ...

Black, Alan W; Taylor, Paul A

1994-01-01

364

Speech recognition via phonetically-featured syllables   

E-print Network

We describe recent work on two new automatic speech recognition systems. The first part of this paper describes the components of a system based on phonological features (which we call Espresso-P) in which the values of ...

King, Simon; Taylor, Paul; Frankel, Joe; Richmond, Korin

2000-01-01

365

Integrating Speech Recognition and Machine Translation  

Microsoft Academic Search

This paper presents a set of experiments that we conducted in order to optimize the performance of an Arabic\\/English machine translation system on broadcast news and conversational speech data. Proper integration of speech-to-text (STT) and machine translation (MT) requires special attention to issues such as sentence boundary detection, punctuation, STT accuracy, tokenization, conversion of spoken numbers and dates to written

Spyros Matsoukas; Ivan Bulyko; Bing Xiang; Kham Nguyen; Richard Schwartz; John Makhoul

2007-01-01

366

Interactive Speech Translation in the Diplomat Project  

Microsoft Academic Search

The Diplomat rapid-deployment speech-translation systemis intended to allow naï ve users to communicate across a languagebarrier, without strong domain restrictions, despite the error-pronenature of current speech and translation technologies. In addition,it should be deployable for new languages an order of magnitude morequickly than traditional technologies. Achieving this ambitious setof goals depends in large part on allowing the users to correct

Robert E. Frederking; Alexander I. Rudnicky; Christopher Hogan; Kevin Lenzo

2000-01-01

367

Speech and neurology-chemical impairment correlates  

Microsoft Academic Search

Speech correlates of alcohol\\/drug impairment and its neurological basis is presented with suggestion for further research in impairment from poly drug\\/medicine\\/inhalent\\/chew use\\/abuse, and prediagnosis of many neuro- and endocrin-related disorders. Nerve cells all over the body detect chemical entry by smoking, injection, drinking, chewing, or skin absorption, and transmit neurosignals to their corresponding cerebral subsystems, which in turn affect speech

Harb S. Hayre

2002-01-01

368

Signal modeling techniques in speech recognition  

Microsoft Academic Search

A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used. The four basic operations of signal modeling, i.e. spectral shaping, spectral analysis, parametric transformation, and statistical modeling, are discussed. Three important trends that have developed in the last five years in speech recognition are examined. First, heterogeneous parameter sets that mix absolute

JOSEPH W. PICONE; Texas Instruments

1993-01-01

369

CONCATENATIVE SPEECH SYNTHESIS FOR EUROPEAN PORTUGUESE  

Microsoft Academic Search

This paper describes our on-going work in the area of text-to- speech synthesis, specifically on concatenative techniques. Our preliminary work consisted in investigating the current trends in concatenative synthesis and the problems that could arise when we apply the existing state-of-the art solutions to the specific case of European Portuguese. Our ultimate goal is to develop a text-to-speech system that

Pedro M. Carvalho; Luís C. Oliveira; Isabel M. Trancoso; M. Céu Viana

1998-01-01

370

A speech recognizer with selectable model parameters  

Microsoft Academic Search

This paper presents the design and simulation results of a hidden Markov model (HMM) based isolated word recognizer IC. The new design can handle any combination of states and mixtures (up to 16 states and 8 mixtures). The speech IC has been verified with 353 test speech data. The recognition accuracy is 93.8% (48-bit) with no truncation and 88.9% with

Wei Han; Cheong-fat Chan; Chiu-sing Choy; Kong-pang Pun

2005-01-01

371

Speech earthquakes: scaling and universality in human voice  

E-print Network

Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such earthquakes in speech show temporal correlations, as the interevent statistics are again power-law distributed. Since this feature takes place in the intra-phoneme range, we conjecture that the responsible for this complex phenomenon is not cognitive, but it resides on the physiological speech production mechanism. Moreover, we show that these waiting time distributions are scale invariant under a renormalisation group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech ...

Luque, Jordi; Lacasa, Lucas

2014-01-01

372

Utility of TMS to understand the neurobiology of speech  

PubMed Central

According to a traditional view, speech perception and production are processed largely separately in sensory and motor brain areas. Recent psycholinguistic and neuroimaging studies provide novel evidence that the sensory and motor systems dynamically interact in speech processing, by demonstrating that speech perception and imitation share regional brain activations. However, the exact nature and mechanisms of these sensorimotor interactions are not completely understood yet. Transcranial magnetic stimulation (TMS) has often been used in the cognitive neurosciences, including speech research, as a complementary technique to behavioral and neuroimaging studies. Here we provide an up-to-date review focusing on TMS studies that explored speech perception and imitation. Single-pulse TMS of the primary motor cortex (M1) demonstrated a speech specific and somatotopically specific increase of excitability of the M1 lip area during speech perception (listening to speech or lip reading). A paired-coil TMS approach showed increases in effective connectivity from brain regions that are involved in speech processing to the M1 lip area when listening to speech. TMS in virtual lesion mode applied to speech processing areas modulated performance of phonological recognition and imitation of perceived speech. In summary, TMS is an innovative tool to investigate processing of speech perception and imitation. TMS studies have provided strong evidence that the sensory system is critically involved in mapping sensory input onto motor output and that the motor system plays an important role in speech perception. PMID:23874322

Murakami, Takenobu; Ugawa, Yoshikazu; Ziemann, Ulf

2013-01-01

373

Emil Kraepelin's dream speech: a psychoanalytic interpretation.  

PubMed

Freud's contemporary fellow psychiatrist Emil Kraepelin collected over the course of several decades some 700 specimens of speech in dreams, mostly his own, along with various concomitant data. These generally exhibit far more obvious primary-process influence than do the dream speech specimens found in Freud's corpus; but Kraepelin eschewed any depth-psychology interpretation. In this paper the authors first explore the respective orientations of Freud and Kraepelin to mind and brain, and normal and pathological phenomena, particularly as these relate to speech and dreaming. They then proceed, with the help of biographical sources, to analyze a selection of Kraepelin's deviant dream speech in the manner that was pioneered by Freud, most notably in his 'Autodidasker' dream. They find that Kraepelin's particular concern with the preservation of his rather uncommon family name--and with the preservation of his medical nomenclature, which lent prestige to that name--appears to provide a key link in a chain of associations for elucidating his dream speech specimens. They further suggest, more generally, that one's proper name, as a minimal characteristic of the ego during sleep, may prove to be a key in interpreting the dream speech of others as well. PMID:14633430

Engels, Huub; Heynick, Frank; van der Staak, Cees

2003-10-01

374

A comprehensive vowel space for whispered speech.  

PubMed

Whispered speech is a relatively common form of communications, used primarily to selectively exclude or include potential listeners from hearing a spoken message. Despite the everyday nature of whispering, and its undoubted usefulness in vocal communications, whispers have received relatively little research effort to date, apart from some studies analyzing the main whispered vowels and some quite general estimations of whispered speech characteristics. In particular, a classic vowel space determination has been lacking for whispers. For voiced speech, this type of information has played an important role in the development and testing of recognition and processing theories over the past few decades and can be expected to be equally useful for whisper-mode communications and recognition systems. This article aims to redress the shortfall by presenting a vowel formant space for whispered speech and comparing the results with corresponding phonated samples. In addition, because the study was conducted using speakers from Birmingham, the analysis extends to discuss the effect of the common British West Midlands accent in comparison with Standard English (Received Pronunciation). Thus, the article presents the analysis of formant data showing differences between normal and whispered speech while also considering an accentual effect on whispered speech. PMID:21550772

Sharifzadeh, Hamid Reza; McLoughlin, Ian V; Russell, Martin J

2012-03-01

375

Effects of human fatigue on speech signals  

NASA Astrophysics Data System (ADS)

Cognitive performance may be significantly affected by fatigue. In the case of critical personnel, such as pilots, monitoring human fatigue is essential to ensure safety and success of a given operation. One of the modalities that may be used for this purpose is speech, which is sensitive to respiratory changes and increased muscle tension of vocal cords, induced by fatigue. Age, gender, vocal tract length, physical and emotional state may significantly alter speech intensity, duration, rhythm, and spectral characteristics. In addition to changes in speech rhythm, fatigue may also affect the quality of speech, such as articulation. In a noisy environment, detecting fatigue-related changes in speech signals, particularly subtle changes at the onset of fatigue, may be difficult. Therefore, in a performance-monitoring system, speech parameters which are significantly affected by fatigue need to be identified and extracted from input signals. For this purpose, a series of experiments was performed under slowly varying cognitive load conditions and at different times of the day. The results of the data analysis are presented here.

Stamoulis, Catherine

2001-05-01

376

Development of the Cantonese speech intelligibility index.  

PubMed

A Speech Intelligibility Index (SII) for the sentences in the Cantonese version of the Hearing In Noise Test (CHINT) was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe [J. Speech Hear. Res. 34, 427-438 (1991)]. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test [Nilsson et al., J. Acoust. Soc. Am. 95, 1085-1099 (1994)], the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in ,frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors. PMID:17471747

Wong, Lena L N; Ho, Amy H S; Chua, Elizabeth W W; Soli, Sigfrid D

2007-04-01

377

42 CFR 485.715 - Condition of participation: Speech pathology services.  

...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

2014-10-01

378

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation  

E-print Network

Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation of acoustically similar masker sounds. This study attempted to isolate the effects that energetic masking, defined suggest that energetic masking plays a relatively small role in the overall masking that occurs when

Wang, DeLiang "Leon"

379

Modeling Speech Disfluency to Predict Conceptual Misalignment in Speech Survey Interfaces  

ERIC Educational Resources Information Center

Computer-based interviewing systems could use models of respondent disfluency behaviors to predict a need for clarification of terms in survey questions. This study compares simulated speech interfaces that use two such models--a generic model and a stereotyped model that distinguishes between the speech of younger and older speakers--to several…

Ehlen, Patrick; Schober, Michael F.; Conrad, Frederick G.

2007-01-01

380

Spotlight on Speech Codes 2011: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and accompanying report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…

Foundation for Individual Rights in Education (NJ1), 2011

2011-01-01

381

TONGUES: RAPID DEVELOPMENT OF A SPEECH-TO-SPEECH TRANSLATION SYSTEM  

Microsoft Academic Search

We carried out a one-year project to build a portable speech-to- speech translation system in a new language that could run on a small portable computer. Croatian was chosen as the target lan- guage. The resulting system was tested with real users on a trip to Croatia in the spring of 2001. We describe its basic components, the methods we

Alan W Black; Ralf D. Brown; Robert Frederking; Rita Singh; John Moody; Eric Steinbrecher

2002-01-01

382

Hands-free speech recognition challenge for real-world speech dialogue systems  

Microsoft Academic Search

In this paper, we describe and review our recent development of hands-free speech dialogue system which is used for railway station guidance. In the application at the real railway station, robustness against reverberation and noise is the most essential issue for the dialogue system. To address the problem, we introduce two key techniques in our proposed hands-free system; (a) speech

Hiroshi Saruwatari; Hiromichi Kawanami; Shota Takeuchi; Yu. Takahashi; Tobias Cincarek; Kiyohiro Shikano

2009-01-01

383

Assessment of signal subspace based speech enhancement for noise robust speech recognition  

Microsoft Academic Search

Subspace filtering is an extensively studied technique that has been proven very effective in the area of speech enhancement to improve the speech intelligibility. In this paper, we review different subspace estimation techniques (minimum variance, least squares, singular value adaptation, time domain constrained and spectral domain constrained) in a modified singular value decomposition (SVD) framework, and investigate their capability to

Kris Hermus; Patrick Wambacq

2004-01-01

384

Speech recognizer-based microphone array processing for robust hands-free speech recognition  

Microsoft Academic Search

We present a new array processing algorithm for microphone array speech recognition. Conventionally, the goal of array processing is to take distorted signals captured by the array and generate a cleaner output waveform. However, speech recognition systems operate on a set of features derived from the waveform, rather than the waveform itself. The goal of an array processor used in

Michael L. Seltzer; Bhiksha Raj; Richard M. Stern

2002-01-01

385

A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition  

E-print Network

Abstract-- This work proposes a novel support vector machine (SVM) based robust automatic speech, subbands, sup- port vector machines. I. INTRODUCTION AUTOMATIC speech recognition (ASR) systems suffer in extremely adverse conditions. The central premise behind the design of state-of- the-art ASR systems

Sollich, Peter

386

Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech  

ERIC Educational Resources Information Center

Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…

Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

2012-01-01

387

Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…

Foundation for Individual Rights in Education (NJ1), 2009

2009-01-01

388

Automatic Speech Activity Detection, Source Localization, and Speech Recognition on the Chil Seminar Corpus  

Microsoft Academic Search

To realize the long-term goal of ubiquitous computing, technologi- cal advances in multi-channel acoustic analysis are needed in order to solve several basic problems, including speaker localization and tracking, speech activity detection (SAD) and distant-talking auto- matic speech recognition (ASR). The European Commission inte- grated project CHIL, \\

Dusan Macho; Jaume Padrell; Alberto Abad; Climent Nadeu; Javier Hernando; John W. Mcdonough; Matthias Wölfel; Ulrich Klee; Maurizio Omologo; Alessio Brutti; Piergiorgio Svaizer; Gerasimos Potamianos; Stephen M. Chu

2005-01-01

389

Cued speech for enhancing speech perception and first language development of children with cochlear implants.  

PubMed

Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

Leybaert, Jacqueline; LaSasso, Carol J

2010-06-01

390

Dramatic Effects of Speech Task on Motor and Linguistic Planning in Severely Dysfluent Parkinsonian Speech  

ERIC Educational Resources Information Center

In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…

Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.

2012-01-01

391

Speech Motor Programming in Apraxia of Speech: Evidence from a Delayed Picture-Word Interference Task  

ERIC Educational Resources Information Center

Purpose: Apraxia of speech (AOS) is considered a speech motor programming impairment, but the specific nature of the impairment remains a matter of debate. This study investigated 2 hypotheses about the underlying impairment in AOS framed within the Directions Into Velocities of Articulators (DIVA; Guenther, Ghosh, & Tourville, 2006) model: The…

Mailend, Marja-Liisa; Maas, Edwin

2013-01-01

392

A Clinician Survey of Speech and Non-Speech Characteristics of Neurogenic Stuttering  

ERIC Educational Resources Information Center

This study presents survey data on 58 Dutch-speaking patients with neurogenic stuttering following various neurological injuries. Stroke was the most prevalent cause of stuttering in our patients, followed by traumatic brain injury, neurodegenerative diseases, and other causes. Speech and non-speech characteristics were analyzed separately for…

Theys, Catherine; van Wieringen, Astrid; De Nil, Luc F.

2008-01-01

393

A Motor Speech Assessment for Children with Severe Speech Disorders: Reliability and Validity Evidence  

ERIC Educational Resources Information Center

Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…

Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.

2013-01-01

394

Plasticity in the Human Speech Motor System Drives Changes in Speech Perception  

PubMed Central

Recent studies of human speech motor learning suggest that learning is accompanied by changes in auditory perception. But what drives the perceptual change? Is it a consequence of changes in the motor system? Or is it a result of sensory inflow during learning? Here, subjects participated in a speech motor-learning task involving adaptation to altered auditory feedback and they were subsequently tested for perceptual change. In two separate experiments, involving two different auditory perceptual continua, we show that changes in the speech motor system that accompany learning drive changes in auditory speech perception. Specifically, we obtained changes in speech perception when adaptation to altered auditory feedback led to speech production that fell into the phonetic range of the speech perceptual tests. However, a similar change in perception was not observed when the auditory feedback that subjects' received during learning fell into the phonetic range of the perceptual tests. This indicates that the central motor outflow associated with vocal sensorimotor adaptation drives changes to the perceptual classification of speech sounds. PMID:25080594

Lametti, Daniel R.; Rochet-Capellan, Amélie; Neufeld, Emily; Shiller, Douglas M.

2014-01-01

395

Spotlight on Speech Codes 2010: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a rigorous survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their legal and moral obligations to uphold students' and faculty members' rights to freedom of speech,…

Foundation for Individual Rights in Education (NJ1), 2010

2010-01-01

396

[Evaluation of speech aid's function with air flow pressure techniques  

PubMed

OBJECTIVE:To evaluated of speech aid's function with air flow pressure techniques. METHODS: The study group contained 12 patients with cleft palate,submucous cleft palate and congenital velopharyngeal insufficiency who had been found to have VPI(velopharyngeal insufficiciency) and hypernasal speech and were scheduled for speech aid. RESULTS: It was confirmed that study was to assess the relationship between oral-nasal resonance with and without the speech aid. CONCLUSION: The speech aid be able to improve the velopharyngeal function in some VPI patients,also it is useful clinical results for the speech therapiist. PMID:15071670

Wang, G M; Yuan, W H; Warren, D W

1998-06-01

397

The Role of Phase-locking to the Temporal Envelope of Speech in Auditory Perception and Speech Intelligibility.  

PubMed

The temporal envelope of speech is important for speech intelligibility. Entrainment of cortical oscillations to the speech temporal envelope is a putative mechanism underlying speech intelligibility. Here we used magnetoencephalography (MEG) to test the hypothesis that phase-locking to the speech temporal envelope is enhanced for intelligible compared with unintelligible speech sentences. Perceptual "pop-out" was used to change the percept of physically identical tone-vocoded speech sentences from unintelligible to intelligible. The use of pop-out dissociates changes in phase-locking to the speech temporal envelope arising from acoustical differences between un/intelligible speech from changes in speech intelligibility itself. Novel and bespoke whole-head beamforming analyses, based on significant cross-correlation between the temporal envelopes of the speech stimuli and phase-locked neural activity, were used to localize neural sources that track the speech temporal envelope of both intelligible and unintelligible speech. Location-of-interest analyses were carried out in a priori defined locations to measure the representation of the speech temporal envelope for both un/intelligible speech in both the time domain (cross-correlation) and frequency domain (coherence). Whole-brain beamforming analyses identified neural sources phase-locked to the temporal envelopes of both unintelligible and intelligible speech sentences. Crucially there was no difference in phase-locking to the temporal envelope of speech in the pop-out condition in either the whole-brain or location-of-interest analyses, demonstrating that phase-locking to the speech temporal envelope is not enhanced by linguistic information. PMID:25244119

Millman, Rebecca E; Johnson, Sam R; Prendergast, Garreth

2015-03-01

398

Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications  

NASA Astrophysics Data System (ADS)

This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

Liberman, A. M.

1982-03-01

399

Tuned to the Signal: The Privileged Status of Speech for Young Infants  

ERIC Educational Resources Information Center

Do young infants treat speech as a special signal, compared with structurally similar non-speech sounds? We presented 2- to 7-month-old infants with nonsense speech sounds and complex non-speech analogues. The non-speech analogues retain many of the spectral and temporal properties of the speech signal, including the pitch contour information…

Vouloumanos, Athena; Werker, Janet F.

2004-01-01

400

Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception  

ERIC Educational Resources Information Center

We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…

Stekelenburg, Jeroen J.; Vroomen, Jean

2012-01-01

401

Highband Spectrum Envelope Estimation of Telephone Speech Using Hard/Soft-Classification  

E-print Network

highband sig- nals are combined to form a wideband speech signal. Narrowband speech Synthesis filter is shown in schematic form in Fig. 1. Narrowband speech is input to the wideband speech recovery system the narrowband speech for the recovery system. This allows us to compare the actual wideband speech with the syn

Kabal, Peter

402

Cortical entrainment to continuous speech: functional roles and interpretations.  

PubMed

Auditory cortical activity is entrained to the temporal envelope of speech, which corresponds to the syllabic rhythm of speech. Such entrained cortical activity can be measured from subjects naturally listening to sentences or spoken passages, providing a reliable neural marker of online speech processing. A central question still remains to be answered about whether cortical entrained activity is more closely related to speech perception or non-speech-specific auditory encoding. Here, we review a few hypotheses about the functional roles of cortical entrainment to speech, e.g., encoding acoustic features, parsing syllabic boundaries, and selecting sensory information in complex listening environments. It is likely that speech entrainment is not a homogeneous response and these hypotheses apply separately for speech entrainment generated from different neural sources. The relationship between entrained activity and speech intelligibility is also discussed. A tentative conclusion is that theta-band entrainment (4-8 Hz) encodes speech features critical for intelligibility while delta-band entrainment (1-4 Hz) is related to the perceived, non-speech-specific acoustic rhythm. To further understand the functional properties of speech entrainment, a splitter's approach will be needed to investigate (1) not just the temporal envelope but what specific acoustic features are encoded and (2) not just speech intelligibility but what specific psycholinguistic processes are encoded by entrained cortical activity. Similarly, the anatomical and spectro-temporal details of entrained activity need to be taken into account when investigating its functional properties. PMID:24904354

Ding, Nai; Simon, Jonathan Z

2014-01-01

403

Auditory-perceptual learning improves speech motor adaptation in children.  

PubMed

Auditory feedback plays an important role in children's speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback; however, it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5- to 7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children's ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067

Shiller, Douglas M; Rochon, Marie-Lyne

2014-08-01

404

The logic of indirect speech.  

PubMed

When people speak, they often insinuate their intent indirectly rather than stating it as a bald proposition. Examples include sexual come-ons, veiled threats, polite requests, and concealed bribes. We propose a three-part theory of indirect speech, based on the idea that human communication involves a mixture of cooperation and conflict. First, indirect requests allow for plausible deniability, in which a cooperative listener can accept the request, but an uncooperative one cannot react adversarially to it. This intuition is supported by a game-theoretic model that predicts the costs and benefits to a speaker of direct and indirect requests. Second, language has two functions: to convey information and to negotiate the type of relationship holding between speaker and hearer (in particular, dominance, communality, or reciprocity). The emotional costs of a mismatch in the assumed relationship type can create a need for plausible deniability and, thereby, select for indirectness even when there are no tangible costs. Third, people perceive language as a digital medium, which allows a sentence to generate common knowledge, to propagate a message with high fidelity, and to serve as a reference point in coordination games. This feature makes an indirect request qualitatively different from a direct one even when the speaker and listener can infer each other's intentions with high confidence. PMID:18199841

Pinker, Steven; Nowak, Martin A; Lee, James J

2008-01-22

405

Learning curve of speech recognition.  

PubMed

Speech recognition (SR) speeds patient care processes by reducing report turnaround times. However, concerns have emerged about prolonged training and an added secretarial burden for radiologists. We assessed how much proofing radiologists who have years of experience with SR and radiologists new to SR must perform, and estimated how quickly the new users become as skilled as the experienced users. We studied SR log entries for 0.25 million reports from 154 radiologists and after careful exclusions, defined a group of 11 experienced radiologists and 71 radiologists new to SR (24,833 and 122,093 reports, respectively). Data were analyzed for sound file and report lengths, character-based error rates, and words unknown to the SR's dictionary. Experienced radiologists corrected 6 characters for each report and for new users, 11. Some users presented a very unfavorable learning curve, with error rates not declining as expected. New users' reports were longer, and data for the experienced users indicates that their reports, initially equally lengthy, shortened over a period of several years. For most radiologists, only minor corrections of dictated reports were necessary. While new users adopted SR quickly, with a subset outperforming experienced users from the start, identification of users struggling with SR will help facilitate troubleshooting and support. PMID:23779151

Kauppinen, Tomi A; Kaipio, Johanna; Koivikko, Mika P

2013-12-01

406

What Is Voice? What Is Speech? What Is Language?  

MedlinePLUS

... What Is Voice? What Is Speech? What Is Language? On this page: Voice Speech Language Where can ... may occur in children who have developmental disabilities. Language Language is the expression of human communication through ...

407

Optimization of acoustic feature extraction from dysarthric speech  

E-print Network

Dysarthria is a motor speech disorder characterized by weak or uncoordinated movements of the speech musculature. While unfamiliar listeners struggle to understand speakers with severe dysarthria, familiar listeners are ...

DiCicco, Thomas M., Jr. (Thomas Minotti)

2010-01-01

408

Multimodal speech interfaces for map-based applications  

E-print Network

This thesis presents the development of multimodal speech interfaces for mobile and vehicle systems. Multimodal interfaces have been shown to increase input efficiency in comparison with their purely speech or text-based ...

Liu, Sean (Sean Y.)

2010-01-01

409

TOWARDS RAPID LANGUAGE PORTABILITY OF SPEECH PROCESSING SYSTEMS Tanja Schultz  

E-print Network

project SPICE (Speech Processing: Interactive Creation and Evaluation toolkit), we will tackle one language and technology expertise. This will be implemented by providing innovative methods and toolsTOWARDS RAPID LANGUAGE PORTABILITY OF SPEECH PROCESSING SYSTEMS Tanja Schultz Interactive Systems

Schultz, Tanja

410

Encouraging Speech and Vocalization in Children with Autistic Spectrum Disorder  

E-print Network

with disabilities General Terms Human Factors Keywords Accessibility, Autism, Children, Speech, Vocalization explicit effort by parents, practitioners, or the community. However, some children, such as those in the "real world". Without speech, these children have difficulty expressing their desires, emotions

Karahalios, Karrie G.

411

Exploring speech therapy games with children on the autism spectrum  

E-print Network

Individuals on the autism spectrum often have difficulties producing intelligible speech with either high or low speech rate, and atypical pitch and/or amplitude affect. In this study, we present a novel intervention towards ...

Picard, Rosalind W.

412

Understanding speech in interactive narratives with crowd sourced data  

E-print Network

Speech recognition failures and limited vocabulary coverage pose challenges for speech interactions with characters in games. We describe an end-to-end system for automating characters from a large corpus of recorded human ...

Orkin, Jeff

413

Multi-level acoustic modeling for automatic speech recognition  

E-print Network

Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local ...

Chang, Hung-An, Ph. D. Massachusetts Institute of Technology

2012-01-01

414

Speech prosody, reward, and the corticobulbar system: an integrative perspective.  

PubMed

Speech prosody is essential for verbal communication. In this commentary I provide an integrative overview, arguing that speech prosody is subserved by the same anatomical and neurochemical mechanisms involved in the processing of reward/affective outcomes. PMID:25514963

Vicario, Carmelo M

2014-12-01

415

EEG-BASED SPEECH RECOGNITION Impact of Temporal Effects  

E-print Network

: Electroencephalography; Speech Recognition; Unspoken Speech. Abstract: In this paper, we investigate the use INTRODUCTION 1.1 Motivation Electroencephalography (EEG) has proven to be use- ful for a multitude of new

Schultz, Tanja

416

An annotation scheme for concept-to-speech synthesis.   

E-print Network

The SOLE conecept-to-speech system uses linguistic information provided by an NLG component to improve the intonation of synthetic speech. As the text is generated, the system automatically annotates the text with linguistic ...

Hitzeman, Janet; Black, Alan W; Taylor, Paul; Mellish, Chris; Oberlander, Jon

1999-01-01

417

Automatically clustering similar units for unit selection in speech synthesis.   

E-print Network

This paper describes a new method for synthesizing speech by concatenating sub-word units from a database of labelled speech. A large unit inventory is created by automatically clustering units of the same phone class ...

Black, Alan W; Taylor, Paul A

1997-01-01

418

Assigning phrase breaks from part-of-speech sequences   

E-print Network

This paper presents an algorithm for automatically assigning phrase breaks to unrestricted text for use in a text-to-speech synthesizer. Text is first converted into a sequence of part-of-speech tags. Next a Markov model ...

Taylor, Paul; Black, Alan W

419

Using intonation to constrain language models in speech recognition.   

E-print Network

This paper describes a method for using intonation to reduce word error rate in a speech recognition system designed to recognise spontaneous dialogue speech. We use a form of dialogue analysis based on the theory of ...

Taylor, Paul A; King, Simon; Isard, Stephen; Wright, Helen; Kowtko, Jacqueline C

1997-01-01

420

COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION  

E-print Network

COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION Katrin Kirchhoff 352500, Seattle, WA, USA fkatrin,bilmesg@ssli.ee.washington.edu ABSTRACT Classifier combination classifier combination in speech recognition systems. We present new techniques that generalize previously

Bilmes, Jeff

421

COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION  

E-print Network

COMBINATION AND JOINT TRAINING OF ACOUSTIC CLASSIFIERS FOR SPEECH RECOGNITION Katrin Kirchhoff 352500, Seattle, WA, USA katrin,bilmes @ssli.ee.washington.edu ABSTRACT Classifier combination classifier combination in speech recognition systems. We present new techniques that generalize previously

Bilmes, Jeff

422

Auditory perception bias in speech imitation  

PubMed Central

In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F0 found in earlier studies, may at least partly be due to the capacity to extract information about F0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent. PMID:24204361

Postma-Nilsenová, Marie; Postma, Eric

2013-01-01

423

Prior listening in rooms improves speech intelligibility.  

PubMed

Although results from previous studies have demonstrated that the acoustic effects of a single reflection are perceptually suppressed after repeated exposure to a particular configuration of source and reflection, the extent to which this dynamic echo suppression might generalize to speech understanding in room environments with multiple reflections and reverberation is largely unknown. Here speech intelligibility was measured using the coordinate response measure corpus both with and without prior listening exposure to a reverberant room environment, which was simulated using virtual auditory space techniques. Prior room listening exposure was manipulated by presenting either a two-sentence carrier phrase that preceded the target speech, or no carrier phrase within the room environment. Results from 14 listeners indicate that with prior room exposure, masked speech reception thresholds were on average 2.7 dB lower than thresholds without exposure, an improvement in intelligibility of over 18 percentage points on average. This effect, which is shown to be absent in anechoic space and greatly reduced under monaural listening conditions, demonstrates that prior binaural exposure to reverberant rooms can improve speech intelligibility, perhaps due to a process of perceptual adaptation to the acoustics of the listening room. PMID:20649224

Brandewie, Eugene; Zahorik, Pavel

2010-07-01

424

Music and speech prosody: a common rhythm  

PubMed Central

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Särkämö, Teppo

2013-01-01

425

Speech Evoked Auditory Brainstem Response in Stuttering  

PubMed Central

Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40?ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262

Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad

2014-01-01

426

AN ADAPTIVE EQUALIZER FOR ANALYSIS-BY-SYNTHESIS SPEECH CODERS  

Microsoft Academic Search

An equalizer to enhance the quality of reconstructed speech from an analysis-by-synthesis speech coder, e.g., CELP coder, is described. The equalizer makes use of the set of short-term predictor parameters normally transmitted from the speech encoder to the decoder. In addition, the equalizer computes a matching set of parameters from the recon- structed speech. The function of the equalizer is

Mark Jasiuk; Tenkasi Ramabadran

2006-01-01

427

A Comparison of LBG and ADPCM Speech Compression Techniques  

Microsoft Academic Search

\\u000a Speech compression is the technology of converting human speech into an efficiently encoded representation that can later\\u000a be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and\\u000a speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a\\u000a study

Rajesh G. Bachu; Jignasa Patel; Buket D. Barkana

2008-01-01

428

Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music  

PubMed Central

This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech. PMID:25147539

Lee, Hweeling; Noppeney, Uta

2014-01-01

429

Statistical Speech Segmentation and Word Learning in Parallel: Scaffolding from Child-Directed Speech  

PubMed Central

In order to acquire their native languages, children must learn richly structured systems with regularities at multiple levels. While structure at different levels could be learned serially, e.g., speech segmentation coming before word-object mapping, redundancies across levels make parallel learning more efficient. For instance, a series of syllables is likely to be a word not only because of high transitional probabilities, but also because of a consistently co-occurring object. But additional statistics require additional processing, and thus might not be useful to cognitively constrained learners. We show that the structure of child-directed speech makes simultaneous speech segmentation and word learning tractable for human learners. First, a corpus of child-directed speech was recorded from parents and children engaged in a naturalistic free-play task. Analyses revealed two consistent regularities in the sentence structure of naming events. These regularities were subsequently encoded in an artificial language to which adult participants were exposed in the context of simultaneous statistical speech segmentation and word learning. Either regularity was independently sufficient to support successful learning, but no learning occurred in the absence of both regularities. Thus, the structure of child-directed speech plays an important role in scaffolding speech segmentation and word learning in parallel. PMID:23162487

Yurovsky, Daniel; Yu, Chen; Smith, Linda B.

2012-01-01

430

Neural Encoding of Speech and Music: Implications for Hearing Speech in Noise  

PubMed Central

Understanding speech in a background of competing noise is challenging, especially for individuals with hearing loss or deficits in auditory processing ability. The ability to hear in background noise cannot be predicted from the audiogram, an assessment of peripheral hearing ability; therefore, it is important to consider the impact of central and cognitive factors on speech-in-noise perception. Auditory processing in complex environments is reflected in neural encoding of pitch, timing, and timbre, the crucial elements of speech and music. Musical expertise in processing pitch, timing, and timbre may transfer to enhancements in speech-in-noise perception due to shared neural pathways for speech and music. Through cognitive-sensory interactions, musicians develop skills enabling them to selectively listen to relevant signals embedded in a network of melodies and harmonies, and this experience leads in turn to enhanced ability to focus on one voice in a background of other voices. Here we review recent work examining the biological mechanisms of speech and music perception and the potential for musical experience to ameliorate speech-in-noise listening difficulties. PMID:24748717

Anderson, Samira; Kraus, Nina

2013-01-01

431

2946th ISCA Workshop on Speech Synthesis, Bonn, Germany, August 22-24, 2007 The HMM-basedSpeech SynthesisSystem (HTS) Version 2.0  

E-print Network

technique is unit selection [1­3], where appropriate sub-word units are selected from large speech databases of the speech recorded in the database. As we require speech which is more varied in voice characteris- tics-dependent HMMs are trained from databases of natural speech, and we can generate speech waveforms from the HMMs

Black, Alan W

432

Changes in breathing while listening to read speech: the effect of reader and speech mode  

PubMed Central

The current paper extends previous work on breathing during speech perception and provides supplementary material regarding the hypothesis that adaptation of breathing during perception “could be a basis for understanding and imitating actions performed by other people” (Paccalin and Jeannerod, 2000). The experiments were designed to test how the differences in reader breathing due to speaker-specific characteristics, or differences induced by changes in loudness level or speech rate influence the listener breathing. Two readers (a male and a female) were pre-recorded while reading short texts with normal and then loud speech (both readers) or slow speech (female only). These recordings were then played back to 48 female listeners. The movements of the rib cage and abdomen were analyzed for both the readers and the listeners. Breathing profiles were characterized by the movement expansion due to inhalation and the duration of the breathing cycle. We found that both loudness and speech rate affected each reader’s breathing in different ways. Listener breathing was different when listening to the male or the female reader and to the different speech modes. However, differences in listener breathing were not systematically in the same direction as reader differences. The breathing of listeners was strongly sensitive to the order of presentation of speech mode and displayed some adaptation in the time course of the experiment in some conditions. In contrast to specific alignments of breathing previously observed in face-to-face dialog, no clear evidence for a listener–reader alignment in breathing was found in this purely auditory speech perception task. The results and methods are relevant to the question of the involvement of physiological adaptations in speech perception and to the basic mechanisms of listener–speaker coupling. PMID:24367344

Rochet-Capellan, Amélie; Fuchs, Susanne

2013-01-01

433

75 FR 29914 - Telecommunications Relay Services, Speech-to-Speech Services, E911 Requirements for IP-Enabled...  

Federal Register 2010, 2011, 2012, 2013

...03-123; WC Docket No. 05-196; FCC 08-275] Telecommunications Relay Services, Speech-to-Speech Services, E911...collection requirements associated with the Commission's Telecommunications Relay Services, [[Page 29915

2010-05-28

434

System And Method For Characterizing Voiced Excitations Of Speech And Acoustic Signals, Removing Acoustic Noise From Speech, And Synthesizi  

DOEpatents

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

2006-04-25

435

Speech Rates of Turkish Prelingually Hearing-Impaired Children  

ERIC Educational Resources Information Center

The aim of training children with hearing impairment in the auditory oral approach is to develop good speaking abilities. However, children with profound hearing-impairment show a wide range of spoken language abilities, some having highly intelligible speech while others have unintelligible speech. This is due to errors in speech production.…

Girgin, M. Cem

2008-01-01

436

Neural specialization for speech in the first months of life  

PubMed Central

How does the brain’s response to speech change over the first months of life? Although behavioral findings indicate that neonates’ listening biases are sharpened over the first months of life, with a species-specific preference for speech emerging by 3 months, the neural substrates underlying this developmental change are unknown. We examined neural responses to speech compared with biological non-speech sounds in 1- to 4-month-old infants using fMRI. Infants heard speech and biological non-speech sounds, including heterospecific vocalizations and human non-speech. We observed a left-lateralized response in temporal cortex for speech compared to biological non-speech sounds, indicating that this region is highly selective for speech by the first month of life. Specifically, this brain region becomes increasingly selective for speech over the next 3 months as neural substrates become less responsive to non-speech sounds. These results reveal specific changes in neural responses during a developmental period characterized by rapid behavioral changes. PMID:24576182

Shultz, Sarah; Vouloumanos, Athena; Bennett, Randi H; Pelphrey, Kevin

2014-01-01

437

Developmental Apraxia of Speech: II. Toward a Diagnostic Marker.  

ERIC Educational Resources Information Center

Discusses a study that compared speech and prosody-voice profiles of children (ages 4-14) with suspected developmental apraxia of speech (DAS) to profiles of 73 children with speech delay. Also describes a second study of 20 children (ages 3-9) that investigated whether stress was a diagnostic marker of DAS. (Author/CR)

Shriberg, Lawrence D.; And Others

1997-01-01

438

Profiles of a Secondary School Speech Teacher: The Teacher's View.  

ERIC Educational Resources Information Center

A survey of Illinois secondary school speech teachers was conducted in 1970 so that a profile of the secondary school speech teacher could be constructed from the perspective of how teachers view themselves. Replies were received from 55 teachers representing 37% of the sample and 12% of Illinois schools with speech activity programs. Among the…

Mesner, Linda; Tuttle, George

439

Acoustical and Environmental Robustness in Automatic Speech Recognition  

E-print Network

a higher degree of integration within SPHINX, the Carnegie Mellon speech recognition system, that uses in SPHINX. In this dissertation we describe several algorithms including the SNR-Dependent Cepstral of SPHINX when trained on speech recorded with a close-talking microphone and tested on speech recorded

Stern, Richard

440

TUNING SPHINX TO OUTPERFORM GOOGLE'S SPEECH RECOGNITION API  

E-print Network

TUNING SPHINX TO OUTPERFORM GOOGLE'S SPEECH RECOGNITION API Patrick Lange1,2,3 and David whether the open-source speech recognizer Sphinx can be tuned to outperform Google's cloud-based speech corpus and tuning a num- ber of Sphinx's parameters, we achieve a WER of 51.2%. This result

Suendermann, David

441

Environment Mismatch Compensation using Average Eigenspace for Speech Recognition  

Microsoft Academic Search

The performance of speech recognition systems is adversely af- fected by mismatch in training and testing environmental con- ditions. In addition to test data from noisy environments, there are scenarios where the training data itself is noisy. Speech en- hancement techniques which solely focus on finding a clean speech estimate from the noisy signal are not effective here. Model adaptation

Abhishek Kumar; John H. L. Hansen

2008-01-01

442

Auditory Long Latency Responses to Tonal and Speech Stimuli  

ERIC Educational Resources Information Center

Purpose: The effects of type of stimuli (i.e., nonspeech vs. speech), speech (i.e., natural vs. synthetic), gender of speaker and listener, speaker (i.e., self vs. other), and frequency alteration in self-produced speech on the late auditory cortical evoked potential were examined. Method: Young adult men (n = 15) and women (n = 15), all with…

Swink, Shannon; Stuart, Andrew

2012-01-01

443

Perception of Intersensory Synchrony in Audiovisual Speech: Not that Special  

ERIC Educational Resources Information Center

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made…

Vroomen, Jean; Stekelenburg, Jeroen J.

2011-01-01

444

Development of speechreading supplements based on automatic speech recognition  

Microsoft Academic Search

In manual-cued speech (MCS) a speaker produces hand gestures to resolve ambiguities among speech elements that are often confused by speechreaders. The shape of the hand distinguishes among consonants; the position of the hand relative to the face distinguishes among vowels. Experienced receivers of MCS achieve nearly perfect reception of everyday connected speech. MCS has been taught to very young

Paul Duchnowski; David S. Lum; Jean C. Krause; Matthew G. Sexton; Maroula S. Bratakos; Louis D. Braida

2000-01-01

445

Priority discarding of speech in integrated packet networks  

Microsoft Academic Search

The authors discuss the control of short-term congestion, which is referred to as overload, in integrated packet networks (IPNs) containing a mix of data, speech, and possibly other types of signals. A system model is proposed that assigns a delivery priority to each packet (speech or otherwise) at the transmitter and discards speech packets according to delivery priority at any

D. W. Petr; V. S. Frost

1989-01-01

446

Further Research on Speeded Speech as an Educational Medium.  

ERIC Educational Resources Information Center

The practicality of using speeded speech as an educational medium was explored in an Immersion study, in a Criterion study, and in Retention studies. Tapes of novels were used for listening; compression was achieved by a device that removed small segments of the tape-recorded speech sounds and, then, abutted the remainder of the speech record…

Friedman, Herbert L.; And Others

447

Infants’ brain responses to speech suggest Analysis by Synthesis  

PubMed Central

Historic theories of speech perception (Motor Theory and Analysis by Synthesis) invoked listeners’ knowledge of speech production to explain speech perception. Neuroimaging data show that adult listeners activate motor brain areas during speech perception. In two experiments using magnetoencephalography (MEG), we investigated motor brain activation, as well as auditory brain activation, during discrimination of native and nonnative syllables in infants at two ages that straddle the developmental transition from language-universal to language-specific speech perception. Adults are also tested in Exp. 1. MEG data revealed that 7-mo-old infants activate auditory (superior temporal) as well as motor brain areas (Broca’s area, cerebellum) in response to speech, and equivalently for native and nonnative syllables. However, in 11- and 12-mo-old infants, native speech activates auditory brain areas to a greater degree than nonnative, whereas nonnative speech activates motor brain areas to a greater degree than native speech. This double dissociation in 11- to 12-mo-old infants matches the pattern of results obtained in adult listeners. Our infant data are consistent with Analysis by Synthesis: auditory analysis of speech is coupled with synthesis of the motor plans necessary to produce the speech signal. The findings have implications for: (i) perception-action theories of speech perception, (ii) the impact of “motherese” on early language learning, and (iii) the “social-gating” hypothesis and humans’ development of social understanding. PMID:25024207

Kuhl, Patricia K.; Ramírez, Rey R.; Bosseler, Alexis; Lin, Jo-Fu Lotus; Imada, Toshiaki

2014-01-01

448

Rhythmic Priming Enhances the Phonological Processing of Speech  

ERIC Educational Resources Information Center

While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…

Cason, Nia; Schon, Daniele

2012-01-01

449

Do 6-Month-Olds Understand That Speech Can Communicate?  

ERIC Educational Resources Information Center

Adults and 12-month-old infants recognize that even unfamiliar speech can communicate information between third parties, suggesting that they can separate the communicative function of speech from its lexical content. But do infants recognize that speech can communicate due to their experience understanding and producing language, or do they…

Vouloumanos, Athena; Martin, Alia; Onishi, Kristine H.

2014-01-01

450

Research on Speech Perception. Progress Report No. 12.  

ERIC Educational Resources Information Center

Summarizing research activities in 1986, this is the twelfth annual report of research on speech perception, analysis, synthesis, and recognition conducted in the Speech Research Laboratory of the Department of Psychology at Indiana University. The report contains the following 23 articles: "Comprehension of Digitally Encoded Natural Speech Using…

Pisoni, David B.; And Others

451

SOME PERSPECTIVES ON SPEECH DATABASE DEVELOPMENT Lori F. Lamel  

E-print Network

SOME PERSPECTIVES ON SPEECH DATABASE DEVELOPMENT Lori F. Lamel LIMSI-CNRS BP 133 91403 ORSAY Cedex FRANCE ABSTRACT The article, Speech Database Development: Design and Analysis of the Acoustic Phonetic. Below are a few comments related to the design of speech databases, based on the development

452

ACCURATE SPECTRAL ENVELOPE ESTIMATION FOR ARTICULATION-TO-SPEECH SYNTHESIS  

E-print Network

to synthesise speech from articulator positions based on the search of a database composed of pairsACCURATE SPECTRAL ENVELOPE ESTIMATION FOR ARTICULATION-TO-SPEECH SYNTHESIS Yoshinori Shiga and Simon King Centre for Speech Technology Research, University of Edinburgh, U.K. yoshi

Edinburgh, University of

453

The Influence of Bilingualism on Speech Production: A Systematic Review  

ERIC Educational Resources Information Center

Background: Children who are bilingual and have speech sound disorder are likely to be under-referred, possibly due to confusion about typical speech acquisition in bilingual children. Aims: To investigate what is known about the impact of bilingualism on children's acquisition of speech in English to facilitate the identification and treatment of…

Hambly, Helen; Wren, Yvonne; McLeod, Sharynne; Roulstone, Sue

2013-01-01

454

Blind Model Selection for Automatic Speech Recognition in Reverberant Environments  

E-print Network

out of a library of models trained on artificially reverberated speech databases corresponding to vari the best acoustic model, i.e., the model trained on a speech database most closely matching the estimated to be trained on large speech databases. Unfortunately, the performance of the ASR systems degrade dramatically

Dupont, Stéphane

455

A WELSH SPEECH DATABASE: PRELIMINARY RESULTS Briony Williams  

E-print Network

A WELSH SPEECH DATABASE: PRELIMINARY RESULTS Briony Williams CSTR, University of Edinburgh 80 South A speech database for Welsh was recorded in a studio from read text by a few speakers. The purpose is a labelled speech database. This is also needed in basic phonetic research into the characteristics

Edinburgh, University of

456

A PCM\\/VCR speech database exchange format  

Microsoft Academic Search

The use of PCM\\/VCR technology is described for use as a storage and exchange medium for speech databases. In order to provide a limited amount of digital data, use is made of a recorded modem signal for ASCII character string headers associated with the speech tokens. This format can be used to store field recordings of speech material for subsequent

David S. Pallett

1986-01-01

457

Blind Model Selection for Automatic Speech Recognition in Reverberant Environments  

Microsoft Academic Search

This communication presents a new method for automatic speech recognition in reverberant environ- ments. Our approach consists in the selection of the best acoustic model out of a library of models trained on artificially reverberated speech databases corresponding to various reverberant conditions. Given a speech utterance recorded within a reverberant room, a Maximum Likelihood estimate of the fullband room reverberation

Laurent Couvreur; Christophe Couvreur

2004-01-01

458

Telephone speech recognition using simulated data from clean database  

Microsoft Academic Search

Speech recognition over lines forms an integral part of various applications of large vocabulary continuous speech recognition (LVCSR). This paper describes an implementation system completely in software form to produce simulated telephone data starting from clean databases. Filters adopted in this system are well-designed to simulate the frequency properties of analogue transmission equipments in telephone connection. A speech recognizer was

Guoyu Zuot; Wenju Liut; Xiaogang Ruantt

2003-01-01

459

IITKGP-SESC: Speech Database for Emotion Analysis  

Microsoft Academic Search

In this paper, we are introducing the speech database for analyzing the emotions present in speech signals. The proposed database is recorded in Telugu language using the professional artists from All India Radio (AIR), Vijayawada, India. The speech corpus is collected by simulating eight different emotions using the neutral (emotion free) statements. The database is named as Indian Institute of

Shashidhar G. Koolagudi; Sudhamay Maity; Vuppala Anil Kumar; Saswat Chakrabarti; K. Sreenivasa Rao

2009-01-01

460

Cries and Whispers Classification of Vocal Effort in Expressive Speech  

E-print Network

9912-STMS Paris, France nobin@ircam.fr Abstract The expansion of the video games industry raises innova-based speech processing and speech recognition systems in the context of video games post- production and voice speech for video games. Changes in vocal effort conduct to substan- tial modifications

461

Speech Acts with Institutional Effects in Agent Societies  

Microsoft Academic Search

A general logical framework is presented to represent speech acts that have institutional eects. It is based on the concepts of the Speech Act Theory and takes the form of the FIPA Agent Communica- tion Language. The most important feature is that the illocutionary force of all of these speech acts is declarative. The formal language that is proposed to

Robert Demolombe; Vincent Louis

2006-01-01

462

Student Speech and the Internet: A Legal Analysis  

ERIC Educational Resources Information Center

This article lays the foundation of American First Amendment jurisprudence in public schools and examines recent cases relating to student Internet speech. Particular emphasis is placed on the ability of schools to regulate student off-campus Internet speech. School authorities who wish to regulate nonthreatening off-campus speech in the…

Graca, Thomas J.; Stader, David L.

2007-01-01

463

Automated Discovery of Speech Act Categories in Educational Games  

ERIC Educational Resources Information Center

In this paper we address the important task of automated discovery of speech act categories in dialogue-based, multi-party educational games. Speech acts are important in dialogue-based educational systems because they help infer the student speaker's intentions (the task of speech act classification) which in turn is crucial to providing adequate…

Rus, Vasile; Moldovan, Cristian; Niraula, Nobal; Graesser, Arthur C.

2012-01-01

464

Amodal Processing of Visual Speech as Revealed by Priming  

ERIC Educational Resources Information Center

This study investigated the linguistic processing of visual speech (video of a talker's utterance without audio) by determining if such has the capacity to prime subsequently presented word and nonword targets. The priming procedure is well suited for the investigation of whether speech perception is amodal since visual speech primes can be used…

Kim, Jeesun; Davis, Chris; Krins, Phil

2004-01-01

465

RESEARCH Open Access Subcortical processing of speech regularities  

E-print Network

with musical skill, relates to the brainstem processing of speech regularities is unknown. An association brainstem responses to the same speech sound presented in predictable and variable speech streams. Results established in the auditory cortex [1,3] and was recently documented at and below the level of the brainstem

466

TWO AUTOMATIC APPROACHES FOR ANALYZING CONNECTED SPEECH PROCESSES IN DUTCH  

E-print Network

, indications of which CSPs are present in the material can be found. These indications can be used to generate by many complex factors such as speech style, speech rate, word frequency, information load, dialectal of the type of processes that might occur in the speech material and that can further be tested by means

Edinburgh, University of

467

Elements of a Plan-Based Theory of Speech Acts  

Microsoft Academic Search

This paper explores the truism that people think about what they say. It proposes hat, to satisfy their own goals, people often plan their speech acts to affect their listenerr' beliefs, goals, and emotional states. Such language use mn be mod- elled by viewing speech acts as operators in a planning system, thus allowing both physical and speech acts to

Philip R. Cohen; C. Raymond Perrault

2003-01-01

468

Recent advances in the automatic recognition of audiovisual speech  

Microsoft Academic Search

Visual speech information from the speaker's mouth region has been successfully shown to improve noise robustness of automatic speech recognizers, thus promising to extend their usability in the human computer interface. In this paper, we review the main components of audiovisual automatic speech recognition (ASR) and present novel contributions in two main areas: first, the visual front-end design, based on

GERASIMOS POTAMIANOS; CHALAPATHY NETI; GUILLAUME GRAVIER; ASHUTOSH GARG; ANDREW W. SENIOR

2003-01-01

469

Developing a Weighted Measure of Speech Sound Accuracy  

ERIC Educational Resources Information Center

Purpose: To develop a system for numerically quantifying a speaker's phonetic accuracy through transcription-based measures. With a focus on normal and disordered speech in children, the authors describe a system for differentially weighting speech sound errors on the basis of various levels of phonetic accuracy using a Weighted Speech Sound…

Preston, Jonathan L.; Ramsdell, Heather L.; Oller, D. Kimbrough; Edwards, Mary Louise; Tobin, Stephen J.

2011-01-01

470

Index to NASA news releases and speeches, 1993  

NASA Technical Reports Server (NTRS)

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1993. The index is arranged in six sections: subject index, personal names index, news release number index, accession number index, speeches, and news releases.

1994-01-01

471

CONSTRAINED SPECTRUM NORMALIZATION FOR ROBUST SPEECH RECOGNITION IN NOISE  

Microsoft Academic Search

This paper presents a new approach to robust speech recogni- tion in noise based on spectral subtraction. A conventional spectral subtraction technique leads to nonlinear distortions of the normal- ized speech signals and resulting degradation of speech recognition accuracy. A new method is proposed to constrain spectral subtrac- tion by imposing upper bounds on the estimates of the noise spectra.

Filipp Korkmazskiy; Frank K. Soong; Olivier Siohan

472

Dysarthric Speech Database for Universal Access Research Heejin Kim1  

E-print Network

of dysarthric speech produced by 19 speakers with cerebral palsy. Speech materials consist of 765 isolated words secure ftp upon request. Index Terms: speech recognition, dysarthria, cerebral palsy 1. Introduction passage produced by each of six individ- uals with cerebral palsy. This database includes one normal

Hasegawa-Johnson, Mark

473

SWITCHBOARD: telephone speech corpus for research and development  

Microsoft Academic Search

SWITCHBOARD is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition. About 2500 conversations by 500 speakers from around the US were collected automatically over T1 lines at Texas Instruments. Designed for training and testing of a variety of speech processing algorithms, especially in speaker

John J. Godfrey; Edward C. Holliman; Jane McDaniel

1992-01-01

474

Automated Assessment of Speech Fluency for L2 English Learners  

ERIC Educational Resources Information Center

This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…

Yoon, Su-Youn

2009-01-01

475

Breathing-Impaired Speech after Brain Haemorrhage: A Case Study  

ERIC Educational Resources Information Center

Results are presented from an auditory and acoustic analysis of the speech of an adult male with impaired prosody and articulation due to brain haemorrhage. They show marked effects on phonation, speech rate and articulator velocity, and a speech rhythm disrupted by "intrusive" stresses. These effects are discussed in relation to the speaker's…

Heselwood, Barry

2007-01-01

476

Using on-line altered auditory feedback treating Parkinsonian speech  

NASA Astrophysics Data System (ADS)

Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].

Wang, Emily; Verhagen, Leo; de Vries, Meinou H.

2005-09-01

477

Cortical activity patterns predict robust speech discrimination ability in noise  

PubMed Central

The neural mechanisms that support speech discrimination in noisy conditions are poorly understood. In quiet conditions, spike timing information appears to be used in the discrimination of speech sounds. In this study, we evaluated the hypothesis that spike timing is also used to distinguish between speech sounds in noisy conditions that significantly degrade neural responses to speech sounds. We tested speech sound discrimination in rats and recorded primary auditory cortex (A1) responses to speech sounds in background noise of different intensities and spectral compositions. Our behavioral results indicate that rats, like humans, are able to accurately discriminate consonant sounds even in the presence of background noise that is as loud as the speech signal. Our neural recordings confirm that speech sounds evoke degraded but detectable responses in noise. Finally, we developed a novel neural classifier that mimics behavioral discrimination. The classifier discriminates between speech sounds by comparing the A1 spatiotemporal activity patterns evoked on single trials with the average spatiotemporal patterns evoked by known sounds. Unlike classifiers in most previous studies, this classifier is not provided with the stimulus onset time. Neural activity analyzed with the use of relative spike timing was well correlated with behavioral speech discrimination in quiet and in noise. Spike timing information integrated over longer intervals was required to accurately predict rat behavioral speech discrimination in noisy conditions. The similarity of neural and behavioral discrimination of speech in noise suggests that humans and rats may employ similar brain mechanisms to solve this problem. PMID:22098331

Shetake, Jai A.; Wolf, Jordan T.; Cheung, Ryan J.; Engineer, Crystal T.; Ram, Satyananda K.; Kilgard, Michael P.

2012-01-01

478

The Prevalence of Speech Disorders among University Students in Jordan  

ERIC Educational Resources Information Center

Problem: There are no available studies on the prevalence, and distribution of speech disorders among Arabic speaking undergraduate students in Jordan. Method: A convenience sample of 400 undergraduate students at the University of Jordan was screened for speech disorders. Two spontaneous speech samples and an oral reading of a passage were…

Alaraifi, Jehad Ahmad; Amayreh, Mousa Mohammad; Saleh, Mohammad Yusef

2014-01-01

479

Neural speech enhancement using dual extended Kalman filtering  

Microsoft Academic Search

The removal of noise from speech signals has applications ranging from speech enhancement for cellular communications, to front ends for speech recognition systems. Spectral techniques are commonly used in these applications, but frequently result in audible distortion of the signal. A nonlinear time-domain method called dual extended Kalman filtering (DEKF) is presented that demonstrates significant advantages for removing nonstationary and

A. T. Nelson; E. A. Wan

1997-01-01

480

Neural Speech Enhancement Using Dual Extended Kalman Filtering  

Microsoft Academic Search

The removal of noise from speech signals has applications ranging from speech enhancement for cellular communica- tions, to front ends for speech recognition systems. Spectr al techniques are commonly used in these applications, but fre - quently result in audible distortion of the signal. A nonlin - ear time-domain method calleddual extended Kalman fil- tering (DEKF) is presented that demonstrates

Alex T. Nelson; Eric A. Wan

1976-01-01

481

Multilingual Speech Processing Activities in Quaero: Application to Multimedia  

E-print Network

) transcription, speaker diarization, language recognition, spoken language translation, and the detection Language identification transcription Speech translation diarization Speaker Signal Figure 1. Speech, research aims to substantially improve the state-of- the-art in speech-to-text transcription, speaker

482

Human phoneme recognition depending on speech-intrinsic variabilitya)  

E-print Network

for a variety of speaking rates, different regional accents and different vocal effort of the received speech of speaker, gender, speech rate, vocal effort, regional accents, and speaking style . Various methods September 2010 The influence of different sources of speech-intrinisic variation speaking rate, effort

Meyer, Bernd T.

483

ccsd00004318, The Self-Organization of Speech Sounds  

E-print Network

ccsd­00004318, version 1 ­ 21 Feb 2005 The Self-Organization of Speech Sounds Pierre-Yves Oudeyer that has properties similar to the human speech code. This result relies on the self-organizing properties how self-organization might have helped natural selection to #12;nd speech. Key words: origins

484

Motor Profile of Children With Developmental Speech and Language Disorders  

Microsoft Academic Search

OBJECTIVES.The purpose of this study was to investigate the motor profile of 125 children with developmental speech and language disorders and to test for differ- ences, if any, in motor profile among subgroups of children with developmental speech and language disorders. METHODS.The participants were 125 children with developmental speech and lan- guage disorders aged 6 to 9 years from 2

Chris Visscher; Suzanne Houwen; Erik J. A. Scherder; Ben Moolenaar; Esther Hartman

2010-01-01

485

Use of Computer Speech Technologies To Enhance Learning.  

ERIC Educational Resources Information Center

Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

Ferrell, Joe

1999-01-01

486

Index to NASA news releases and speeches, 1990  

NASA Technical Reports Server (NTRS)

This issue of the annual Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of headquarters staff during 1990. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number, Speeches, and New Releases Indices.

1991-01-01

487

The Tuning of Human Neonates' Preference for Speech  

ERIC Educational Resources Information Center

Human neonates prefer listening to speech compared to many nonspeech sounds, suggesting that humans are born with a bias for speech. However, neonates' preference may derive from properties of speech that are not unique but instead are shared with the vocalizations of other species. To test this, thirty neonates and sixteen 3-month-olds were…

Vouloumanos, Athena; Hauser, Marc D.; Werker, Janet F.; Martin, Alia

2010-01-01

488

Speech-Perception-in-Noise Deficits in Dyslexia  

ERIC Educational Resources Information Center

Speech perception deficits in developmental dyslexia were investigated in quiet and various noise conditions. Dyslexics exhibited clear speech perception deficits in noise but not in silence. "Place-of-articulation" was more affected than "voicing" or "manner-of-articulation." Speech-perception-in-noise deficits persisted when performance of…

Ziegler, Johannes C.; Pech-Georgel, Catherine; George, Florence; Lorenzi, Christian

2009-01-01

489

When speech sounds like music.  

PubMed

Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains. PMID:24911013

Falk, Simone; Rathcke, Tamara; Dalla Bella, Simone

2014-08-01

490

Template based low data rate speech encoder  

NASA Astrophysics Data System (ADS)

The 2400-b/s linear predictive coder (LPC) is currently being widely deployed to support tactical voice communication over narrowband channels. However, there is a need for lower-data-rate voice encoders for special applications: improved performance in high bit-error conditions, low-probability-of-intercept (LPI) voice communication, and narrowband integrated voice/data systems. An 800-b/s voice encoding algorithm is presented which is an extension of the 2400-b/s LPC. To construct template tables, speech samples of 420 speakers uttering 8 sentences each were excerpted from the Texas Instrument - Massachusetts Institute of Technology (TIMIT) Acoustic-Phonetic Speech Data Base. Speech intelligibility of the 800-b/s voice encoding algorithm measured by the diagnostic rhyme test (DRT) is 91.5 for three male speakers. This score compares favorably with the 2400-b/s LPC of a few years ago.

Fransen, Lawrence

1993-09-01

491

Predictive trellis coded quantization of speech  

NASA Astrophysics Data System (ADS)

Trellis-coded quantization (TCQ) is incorporated into a predictive coding structure for encoding sampled speech. The modest complexity of the resulting structure is seen to be a direct consequence of the TCQ formulation. Simulation results are presented for systems using fixed-prediction/fixed-residual encoding, fixed-prediction/adaptive-residual encoding, and adaptive-prediction/adaptive-residual encoding. The performance of predictive TCQ (PTCQ) is compared to that of other waveform coders, and the effects of channel errors on PTCQ performance are discussed. For a fully adaptive 16-kb/s speech coding system, segmental signal-to-noise ratios in the range of 19.1-21.9 dB are obtained for a variety of speakers and test sentences. Reconstructed speech obtained from this system is of excellent communication quality.

Marcellin, Michael W.; Fischer, Thomas R.; Gibson, Jerry D.

1990-01-01

492

Rehabilitation of impaired speech function (dysarthria, dysglossia)  

PubMed Central

Speech disorders can result (1) from sensorimotor impairments of articulatory movements = dysarthria, or (2) from structural changes of the speech organs, in adults particularly after surgical and radiochemical treatment of tumors = dysglossia. The decrease of intelligibility, a reduced vocal stamina, the stigmatization of a conspicuous voice and manner of speech, the reduction of emotional expressivity all mean greatly diminished quality of life, restricted career opportunities and diminished social contacts. Intensive therapy based on the pathophysiological facts is absolutely essential: Functional exercise therapy plays a central role; according to symptoms and their progression it can be complemented with prosthetic and surgical approaches. In severe cases communicational aids have to be used. All rehabilitation measures have to take account of frequently associated disorders of body motor control and/or impairment of cognition and behaviour. PMID:22073063

Schröter-Morasch, Heidrun; Ziegler, Wolfram

2005-01-01

493

Speech recognition in advanced rotorcraft - Using speech controls to reduce manual control overload  

NASA Technical Reports Server (NTRS)

An experiment has been conducted to ascertain the usefulness of helicopter pilot speech controls and their effect on time-sharing performance, under the impetus of multiple-resource theories of attention which predict that time-sharing should be more efficient with mixed manual and speech controls than with all-manual ones. The test simulation involved an advanced, single-pilot scout/attack helicopter. Performance and subjective workload levels obtained supported the claimed utility of speech recognition-based controls; specifically, time-sharing performance was improved while preparing a data-burst transmission of information during helicopter hover.

Vidulich, Michael A.; Bortolussi, Michael R.

1988-01-01

494

Significance of past statements: speech act theory.  

PubMed

In W v M, a judge concluded that M's past statements should not be given weight in a best interests assessment. Several commentators in the ethics literature have argued this approach ignored M's autonomy. In this short article I demonstrate how the basic tenets of speech act theory can be used to challenge the inherent assumption that past statements represent an individual's beliefs, choices or decisions. I conclude that speech act theory, as a conceptual tool, has a valuable contribution to make to this debate. PMID:23632009

Gordon, Joanne

2013-09-01

495

Longitudinal Study of Speech Perception, Speech, and Language for Children with Hearing Loss in an Auditory-Verbal Therapy Program  

ERIC Educational Resources Information Center

This study examined the speech perception, speech, and language developmental progress of 25 children with hearing loss (mean Pure-Tone Average [PTA] 79.37 dB HL) in an auditory verbal therapy program. Children were tested initially and then 21 months later on a battery of assessments. The speech and language results over time were compared with…

Dornan, Dimity; Hickson, Louise; Murdoch, Bruce; Houston, Todd

2009-01-01

496

The effects of hearing loss on the contribution of high- and low-frequency speech information to speech understanding  

Microsoft Academic Search

The speech understanding of persons with ``flat'' hearing loss (HI) was compared to a normal-hearing (NH) control group to examine how hearing loss affects the contribution of speech information in various frequency regions. Speech understanding in noise was assessed at multiple low- and high-pass filter cutoff frequencies. Noise levels were chosen to ensure that the noise, rather than quiet thresholds,

Benjamin W. Y. Hornsby; Todd A. Ricketts

2003-01-01

497

Improved Speech-to-Text Translation with the Fisher and Callhome SpanishEnglish Speech Translation Corpus  

E-print Network

of speech-to- text transcription and text-to-text translation varying wildly across a number of dimensions-to-text translation, assembling a four- way parallel dataset of audio, transcriptions, ASR output, and translationsImproved Speech-to-Text Translation with the Fisher and Callhome Spanish­English Speech Translation

Lopez, Adam

498

Effects of interior aircraft noise on speech intelligibility and annoyance  

NASA Technical Reports Server (NTRS)

Recordings of the aircraft ambiance from ten different types of aircraft were used in conjunction with four distinct speech interference tests as stimuli to determine the effects of interior aircraft background levels and speech intelligibility on perceived annoyance in 36 subjects. Both speech intelligibility and background level significantly affected judged annoyance. However, the interaction between the two variables showed that above an 85 db background level the speech intelligibility results had a minimal effect on annoyance ratings. Below this level, people rated the background as less annoying if there was adequate speech intelligibility.

Pearsons, K. S.; Bennett, R. L.

1977-01-01

499

Enhancing the magnitude spectrum of speech features for robust speech recognition  

NASA Astrophysics Data System (ADS)

In this article, we present an effective compensation scheme to improve noise robustness for the spectra of speech signals. In this compensation scheme, called magnitude spectrum enhancement (MSE), a voice activity detection (VAD) process is performed on the frame sequence of the utterance. The magnitude spectra of non-speech frames are then reduced while those of speech frames are amplified. In experiments conducted on the Aurora-2 noisy digits database, MSE achieves an error reduction rate of nearly 42% relative to baseline processing. This method outperforms well-known spectral-domain speech enhancement techniques, including spectral subtraction (SS) and Wiener filtering (WF). In addition, the proposed MSE can be integrated with cepstral-domain robustness methods, such as mean and variance normalization (MVN) and histogram normalization (HEQ), to achieve further improvements in recognition accuracy under noise-corrupted environments.

Hung, Jeih-weih; Fan, Hao-teng; Tu, Wen-hsiang

2012-12-01

500

Real-time audiovisual speech capture and motion tracking for speech-driven facial animation  

E-print Network

Currently, some methods for implementing facial animation systems are based on a direct subphonemic mapping of speech acoustics onto orofacial motion. Although these systems provide all of the necessary components for the detection of facial...

Jablonski, Karl Adam

2013-02-22