These are representative sample records from Science.gov related to your search topic.
For comprehensive and current results, perform a real-time search at Science.gov.
1

Symbolic Speech  

ERIC Educational Resources Information Center

The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)

Podgor, Ellen S.

1976-01-01

2

Speech coding  

SciTech Connect

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.

Ravishankar, C., Hughes Network Systems, Germantown, MD

1998-05-08

3

Speech Technologies for Language Learning.  

ERIC Educational Resources Information Center

Provides a description of the following speech technologies for language learning: recorded speech (from analog to digital); speech recognition; speech synthesis; multilingual speech to speech; speech on the Web. A resource list is also provided. (Author/VWL)

Goodwin-Jones, Bob

2000-01-01

4

Speech Problems  

MedlinePLUS

... If you're in your teens and still stuttering, though, you may not feel like it's so ... million Americans have the speech disorder known as stuttering (or stammering, as it's known in Britain). It's ...

5

Speech and Language Impairments  

MedlinePLUS

... speech or language impairment will need speech-language pathology services . This related service is defined by IDEA as follows: (15) Speech-language pathology services includes— (i) Identification of children with speech ...

6

Does Freedom of Speech Include Hate Speech?  

Microsoft Academic Search

I take it that liberal justice recognises special protections against the restriction of speech and expression; this is what\\u000a I call the Free Speech Principle. I ask if this Principle includes speech acts which might broadly be termed ‘hate speech’,\\u000a where ‘includes’ is sensitive to the distinction between coverage and protection, and between speech that is regulable and speech that

Caleb Yong

7

Free Speech Yearbook: 1972.  

ERIC Educational Resources Information Center

This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…

Tedford, Thomas L., Ed.

8

Great American Speeches  

NSDL National Science Digital Library

Watch the video presentations of each of these speeches. Gettysburg address Martin Luther King- I Have a Dream Freedom of Speech by Mario Savio Mario Savio Speech New worker plan Speech by FDR For manuscripts, audio and video of many other modern and past speeches follow the link below: American Speech Bank ...

Olsen, Ms.

2006-11-14

9

Speech Research  

NASA Astrophysics Data System (ADS)

Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.

10

Speech analyzer  

NASA Technical Reports Server (NTRS)

A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.

Lokerson, D. C. (inventor)

1977-01-01

11

78 FR 49717 - Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services...  

Federal Register 2010, 2011, 2012, 2013

...FCC 13-101] Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech...may be filed electronically using the Internet by accessing the Commission's Electronic...Commission's Speech-to-Speech and Internet Protocol (IP)...

2013-08-15

12

Speech Intelligibility  

NASA Astrophysics Data System (ADS)

Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.

Brand, Thomas

13

Japanese speech databases for robust speech recognition  

Microsoft Academic Search

At ATR, a next-generation speech translation system is under development towards natural trans-language communication. To cope with the various requirements to speech recognition technology for the new system, further research efforts should emphasize the robustness for large vocabulary, speaking variations often found in fast spontaneous speech and speaker variances. These are key problems to be solved not only for speech

Atsushi Nakamura; Shoichi Matsunaga; Tohru Shimizu; Masahiro Tonomura; Yoshinori Sagisaka

1996-01-01

14

Indirect Speech Acts  

Microsoft Academic Search

In this paper, we address several puzzles concerning speech acts, particularly indirect speech acts. We show how a formal semantic theory of discourse interpretation can be used to define speech acts and to avoid murky issues concerning the metaphysics of action. We provide a formally precise definition of indirect speech acts, including the subclass of so-called conventionalized indirect speech acts.

Nicholas Asher; Alex Lascarides

15

SPEECH ENHANCEMENT FOR NOISE-ROBUST SPEECH RECOGNITION  

E-print Network

SPEECH ENHANCEMENT FOR NOISE-ROBUST SPEECH RECOGNITION Vikramjit Mitra and Carol Y. Espy-robust Speech Recognition Speech Enhancement Noise robust automated transcription Corpus `Speech in Speech Recognition accuracy for speech corrupted by speech shaped noise Conclusion ·Proposed scheme increases CLD

Shapiro, Benjamin

16

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS  

E-print Network

SPEECH PARAMETERIZATION FOR AUTOMATIC SPEECH RECOGNITION IN NOISY CONDITIONS Bojana Gaji of automatic speech recognition systems (ASR) against additive background noise, by finding speech parameters noises. 1. INTRODUCTION State-of-the-art automatic speech recognition (ASR) systems are capable

17

Speech disorders - children  

MedlinePLUS

Articulation deficiency; Voice disorders; Vocal disorders; Disfluency; Communication disorder - speech disorder ... together to create speech. See also: Phonological disorders Voice disorders are caused by problems when air passes ...

18

Speech research  

NASA Astrophysics Data System (ADS)

Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.

1992-06-01

19

Speech Recognition Technology for Dysarthric Speech  

E-print Network

Abstract:- The initial results of investigations into the use of current commercial automatic speech recognition (ASR) software by people with speech disability (dysarthria) is presented, together with a brief summary of the history of the development of ASR and its applications for the disabled. Results confirm the viability of dysarthric use, identify areas of further investigation for improved recognition performance and for development of a clinical tool for speech measurement. Key-Words:- Software, Speech, Recognition, Dysarthria

Peter E Roberts Ceng

20

Speech recognition and understanding  

SciTech Connect

This article discusses the automatic processing of speech signals with the aim of finding a sequence of works (speech recognition) or a concept (speech understanding) being transmitted by the speech signal. The goal of the research is to develop an automatic typewriter that will automatically edit and type text under voice control. A dynamic programming method is proposed in which all possible class signals are stored, after which the presented signal is compared to all the stored signals during the recognition phase. Topics considered include element-by-element recognition of words of speech, learning speech recognition, phoneme-by-phoneme speech recognition, the recognition of connected speech, understanding connected speech, and prospects for designing speech recognition and understanding systems. An application of the composition dynamic programming method for the solution of basic problems in the recognition and understanding of speech is presented.

Vintsyuk, T.K.

1983-05-01

21

Monaural speech segregation using synthetic speech signals.  

PubMed

When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments. PMID:16642846

Brungart, Douglas S; Iyer, Nandini; Simpson, Brian D

2006-04-01

22

Speech Recognition in Machines 785 Speech Recognition in Machines  

E-print Network

Speech Recognition in Machines 785 Speech Recognition in Machines Over the past several decades (speech recognition systems) human speech. We concentrate on speech recognition systems in this section. Speech recognition by machine refers to the capability of a machine to convert human speech to a textual

Liebling, Michael

23

Challenges in Speech Synthesis  

Microsoft Academic Search

\\u000a Similar to other speech- and language-processing disciplines such as speech recognition or machine translation, speech synthesis,\\u000a the artificial production of human-like speech, has become very powerful over the last 10 years.

David Suendermann; Harald Höge; Alan Black

24

Models of speech synthesis.  

PubMed Central

The term "speech synthesis" has been used for diverse technical approaches. In this paper, some of the approaches used to generate synthetic speech in a text-to-speech system are reviewed, and some of the basic motivations for choosing one method over another are discussed. It is important to keep in mind, however, that speech synthesis models are needed not just for speech generation but to help us understand how speech is created, or even how articulation can explain language structure. General issues such as the synthesis of different voices, accents, and multiple languages are discussed as special challenges facing the speech synthesis community. PMID:7479805

Carlson, R

1995-01-01

25

Intelligibility of children's speech in digitized speech.  

PubMed

The current investigation examined the intelligibility of digitized speech recorded from typically developing child speakers, ages 4, 5, 6, and 7 years, and reproduced on an augmentative and alternative communication (AAC) device with digitized speech output. The study used a between group design. Forty adults were asked to transcribe 120 words spoken by child speakers in one of the age groups, and presented on an AAC device with digitized speech output. The dependent variable was intelligibility (percent of words correctly identified) of the children's speech. Overall, the intelligibility of children's speech increases with the age of the child speaker. However, there was a lot of individual variation in the intelligibility of children's voices. There was no clear cut-off age, although the speech of some young children may not be sufficiently intelligible on an AAC device that uses digitized speech. Clinicians and parents choosing child speakers for AAC devices with digitized speech are cautioned to carefully consider the speakers used for recording digitized speech output and the characteristics of the speech of the individual speaker. Future research directions are discussed. PMID:22946993

Drager, Kathryn D R; Finke, Erinn H

2012-09-01

26

Speech research directions  

SciTech Connect

This paper presents an overview of the current activities in speech research. The authors discuss the state of the art in speech coding, text-to-speech synthesis, speech recognition, and speaker recognition. In the speech coding area, current algorithms perform well at bit rates down to 9.6 kb/s, and the research is directed at bringing the rate for high-quality speech coding down to 2.4 kb/s. In text-to-speech synthesis, what we currently are able to produce is very intelligible but not yet completely natural. Current research aims at providing higher quality and intelligibility to the synthetic speech that these systems produce. Finally, today's systems for speech and speaker recognition provide excellent performance on limited tasks; i.e., limited vocabulary, modest syntax, small talker populations, constrained inputs, etc.

Atal, B.S.; Rabiner, L.R.

1986-09-01

27

Cochlear implant speech recognition with speech maskers  

Microsoft Academic Search

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners,

Ginger S. Stickney; Fan-Gang Zeng; Ruth Litovsky; Peter Assmann

2004-01-01

28

ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION  

E-print Network

ROBUST SPEECH RECOGNITION USING MULTIPLE PRIOR MODELS FOR SPEECH RECONSTRUCTION Arun Narayanan speech recognition to enhance noisy speech. Typically, a single prior model is trained by pooling to reconstruct noisy speech. Significant improvements are obtained on the Aurora-4 robust speech recognition task

Wang, DeLiang "Leon"

29

Silence, speech, and responsibility  

E-print Network

Pornography deserves special protections, it is often said, because it qualifies as speech; therefore, no matter what we think of it, we must afford it the protections that we extend to most speech, but don't extend to ...

Maitra, Ishani, 1974-

2002-01-01

30

Speech Intelligibility After Glossectomy and Speech Rehabilitation  

Microsoft Academic Search

Background: Oral tumor resections cause articulation deficiencies, depending on the site, extent of resection, type of reconstruction, and tongue stump mobility. Objectives: To evaluate the speech intelligibility of pa- tients undergoing total, subtotal, or partial glossec- tomy, before and after speech therapy. Patients and Methods: Twenty-seven patients (24 men and 3 women), aged 34 to 77 years (mean age, 56.5

Cristina L. B. Furia; Luiz P. Kowalski; Maria R. D. O. Latorre; Elisabete C. Angelis; Nivia M. S. Martins; Ana P. B. Barros; Karina C. B. Ribeiro

2001-01-01

31

Automaticity in Speech Perception: Some Speech\\/Nonspeech Comparisons  

Microsoft Academic Search

Three experiments used sine-wave replicas of speech sounds to explore some differences between speech perception and general auditory perception. The experiments compared patterns of behavior in categorization and discrimination tasks for listeners reporting either speech or nonspeech percepts of sine-wave replicas of speech. We hypothesized that the perception of speech sounds is automatized, while the perception of less familiar sounds

Keith Johnson; James V. Ralston

1994-01-01

32

EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE  

E-print Network

EFFECT OF SPEECH CODERS ON SPEECH RECOGNITION PERFORMANCE B.T. Lilly and K.K. Paliwal School as input to a recognition system. In this paper, the results of a study to examine the effects speech/s to 40 kbits/s are used with two different speech recognition systems 1) isolated word recogntion and 2

33

Electronic commerce and free speech  

Microsoft Academic Search

For commercial purveyors of digital speech, information and entertainment, the biggest threat posed by the Internet isn't the threat of piracy, but the threat posed by free speech -- speech that doesn't cost any money. Free speech has the potential to squeeze out expensive speech. A glut of high quality free stuff has the potential to run companies in the

Jessica Litman

1999-01-01

34

Speech Recognition: A General Overview.  

ERIC Educational Resources Information Center

Speech recognition is one of five main areas in the field of speech processing. Difficulties in speech recognition include variability in sound within and across speakers, in channel, in background noise, and of speech production. Speech recognition can be used in a variety of situations: to perform query operations and phone call transfers; for…

de Sopena, Luis

35

Early recognition of speech  

PubMed Central

Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454

Remez, Robert E; Thomas, Emily F

2013-01-01

36

Multimodal Speech Synthesis  

Microsoft Academic Search

Speech output generation in the SmartKom system is realized by a corpus-based unit selection strategy that preserves many properties of the human voice. When the\\u000a system’s avatar “Smartakus” is present on the screen, the synthetic speech signal is temporally synchronized with Smartakus\\u000a visible speech gestures and prosodically adjusted to his pointing gestures to enhance multimodal communication. The unit selection\\u000a voice

Antje Schweitzer; Norbert Braunschweiler; Grzegorz Dogil; Tanja Klankert; Bernd Möbius; Gregor Möhler; Edmilson Morais; Bettina Säuberlich; Matthias Thomae

37

Speech input and output  

NASA Astrophysics Data System (ADS)

Possibilities for acoustical dialogs with electronic data processing equipment were investigated. Speech recognition is posed as recognizing word groups. An economical, multistage classifier for word string segmentation is presented and its reliability in dealing with continuous speech (problems of temporal normalization and context) is discussed. Speech synthesis is considered in terms of German linguistics and phonetics. Preprocessing algorithms for total synthesis of written texts were developed. A macrolanguage, MUSTER, is used to implement this processing in an acoustic data information system (ADES).

Class, F.; Mangold, H.; Stall, D.; Zelinski, R.

1981-12-01

38

Analyzing a Famous Speech  

NSDL National Science Digital Library

After gaining skill through analyzing a historic and contemporary speech as a class, students will select a famous speech from a list compiled from several resources and write an essay that identifies and explains the rhetorical strategies that the author deliberately chose while crafting the text to make an effective argument. Their analysis will consider questions such as: What makes the speech an argument?, How did the author's rhetoric evoke a response from the audience?, and Why are the words still venerated today?

Noel, Melissa W.

2012-08-01

39

STUDENTS AND FREE SPEECH  

NSDL National Science Digital Library

Free speech is a constitutional right, correct? What about in school? The US Constitution protects everyone, young or old big or small. As Horton said "A person is a person no matter how small". Yet does that mean people can say what ever they want, whenever they want? Does the right to free speech give ...

Amsden

2013-04-22

40

Chief Seattle's Speech Revisited  

ERIC Educational Resources Information Center

Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but apparently inevitable…

Krupat, Arnold

2011-01-01

41

Regulation of Hate Speech  

Microsoft Academic Search

Facing an increase of hate speech incidents on campus and in society at large, egalitarians have made great efforts to advocate (when there is no regulation) or to defend (when there is regulation) hate speech regulation. Meanwhile, civil libertarians have counter argued forcefully. This paper is designed to do an internal critique of various egalitarian arguments. Part I is introduction.

Haiping Deng

2004-01-01

42

Private Speech in Ballet  

ERIC Educational Resources Information Center

Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…

Johnston, Dale

2006-01-01

43

Speech Acts On Trial  

Microsoft Academic Search

In this document we discuss the general applicability of the speech act theory, as a theoretical foundation in the design of information technology (IT). We pay special attention to the acclimatization that speech act theory has undergone when applied in the IT-field. One of the questions we address concerns what happens when we import passive descriptive theories from other disciplines

Jan Ljungberg; Peter Holm

1996-01-01

44

Pauses in Deceptive Speech  

Microsoft Academic Search

We use a corpus of spontaneous interview speech to investigate the relationship between the distributional and prosodic characteristics of silent and filled pauses and the intent of an interviewee to deceive an interviewer. Our data suggest that the use of pauses correlates more with truthful than with deceptive speech, and that prosodic features extracted from filled pauses themselves as well

Stefan Benus; Frank Enos; Julia Hirschberg; Elizabeth Shriberg

2006-01-01

45

Combined Gesture-Speech Analysis and Speech Driven Gesture Synthesis  

Microsoft Academic Search

Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine inter- action systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, rela- tively little work has been done to investigate the correlations be- tween speech and gesture. Detection and modeling of head,

Mehmet Emre Sargin; Oya Aran; Alexey Karpov; Ferda Ofli; Yelena Yasinnik; Stephen Wilson; Engin Erzin; Yücel Yemez; A. Murat Tekalp

2006-01-01

46

Towards Efficient Human Machine Speech Communication: The Speech Graffiti Project  

E-print Network

Towards Efficient Human Machine Speech Communication: The Speech Graffiti Project STEFANIE TOMKO University This research investigates the design and performance of the Speech Graffiti interface for spoken interaction with simple machines. Speech Graffiti is a standardized interface designed to address issues

Rosenfeld, Roni

47

Robust speech recognition by integrating speech separation and hypothesis testing  

E-print Network

Robust speech recognition by integrating speech separation and hypothesis testing Soundararajan estimation and recognition accuracy. First, an n-best lattice consistent with a speech separation mask significant improvement in recognition performance compared to that using speech separation alone. Ã? 2009

Wang, DeLiang "Leon"

48

.ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT  

E-print Network

.ROBUST SPEECH RECOGNITION USING SINGULAR VALUE DECOMPOSITION BASED SPEECH ENHANCEMENT B. T. Lilly Brisbane, QLD 4111, Australia B.Ldy, K.Paliwal@me.gu.edu.au ABSTRACT Speech recognition systems work as a preprocessor for recognising speech in the presence of noise. It was found to improve the recognition

49

THE EFFECT OF SPEECH AND AUDIO COMPRESSION ON SPEECH RECOGNITION  

E-print Network

THE EFFECT OF SPEECH AND AUDIO COMPRESSION ON SPEECH RECOGNITION PERFORMANCE L. Besacier, C on the performance of our continuous speech recognition engine. GSM full rate, G711, G723.1 and MPEG coders are investigated. It is shown that MPEG transcoding degrades the speech recognition performance for low bitrates

Boyer, Edmond

50

SpeechDat(E) - Eastern European Telephone Speech Databases  

Microsoft Academic Search

This paper describes the creation of five new telephony speech databases for Central and Eastern European lanuages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-IIspecifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlierSpeechDat projects with ragrd to databse items such as generation

Petr Pollak; J. Cernocky; J. Boudy; K. Choukri; H. van den Heuvel; K. Vicsi; A. Virag; R. Siemund; W. Majewski; J. Sadowski; P. Staroniewicz; H. S. Tropf; J. Kochanina; A. Ostrouchov; M. Rusko; M. Trnka

2000-01-01

51

Speech Technology Applied to Children with Speech Disorders  

Microsoft Academic Search

This paper introduces an informatic applications for speech therapy in Spanish language based on the use of Speech Technology.\\u000a The objective of this work is to help children with different speech impairments to improve their communication skills. Speech\\u000a technology provides methods which can help children who suffer from speech disorders to develop pre-Language and Language.\\u000a For pre-Language development the informatic

William Ricardo Rodríguez Dueñas; Carlos Vaquero; Oscar Saz; Eduardo Lleida

52

Emotive qualities in robot speech  

Microsoft Academic Search

This paper explores the expression of emotion in synthesized speech for an anthropomorphic robot. We adapted several key emotional correlates of human speech to the robot speech synthesizer to allow the robot to speak in either an angry, calm, disgusted, fearful, happy, sad, or surprised manner. We evaluated our approach thorough an acoustic analysis of the speech patterns for each

Cynthia Breazeal

2001-01-01

53

On speech recognition during anaesthesia  

E-print Network

On speech recognition during anaesthesia Alexandre Alapetite NOVEMBER 2007 ROSKILDE UNIVERSITY speech recognition during anaesthesia Alexandre Alapetite #12;Revision 2007-10-30 #12;On speech Research Training Network #12;#12;Alapetite 2007: On speech recognition during anaesthesia. PhD thesis. 5

54

Speech-Language Pathologists  

MedlinePLUS

... Offices of physical, occupational and speech therapists, and audiologists 17 Hospitals; state, local, and private 13 Nursing ... and emotional well-being. Bachelor’s degree $42,280 Audiologists Audiologists diagnose and treat a patient’s hearing and ...

55

Maynard Dixon: "Free Speech."  

ERIC Educational Resources Information Center

Based on Maynard Dixon's oil painting, "Free Speech," this lesson attempts to expand high school students' understanding of art as a social commentary and the use of works of art to convey ideas and ideals. (JDH)

Day, Michael

1987-01-01

56

Speech Culture and Personality.  

ERIC Educational Resources Information Center

Psychological and pedagogical benefits of studying and promoting speech culture are discussed, particularly as they affect the formation of personality in young people. Ethical and moral implications are mentioned. (15 references) (LB)

Vasic, Smiljka

1991-01-01

57

Robust Speech Recognition Under Noisy Ambient  

E-print Network

CHAPTER Robust Speech Recognition Under Noisy Ambient Conditions 6Kuldip K. Paliwal School ............................................................................................................. 136 6.2 Speech Recognition Overview ............................................................................... 141 6.4 Robust Speech Recognition Techniques

58

Speech-Recognition Interfaces for Music Information Retrieval: 'Speech Completion' and 'Speech Spotter  

Microsoft Academic Search

This paper describes music information retrieval (MIR) systems featuring automatic speech recognition. Al- though various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musi- cal pieces have not been studied. We propose two differ- ent speech-recognition interfaces for MIR, speech com- pletion and speech spotter, and describe two MIR-based hands-free jukebox systems that enable a user to

Masataka Goto; Katunobu Itou; Koji Kitayama; Tetsunori Kobayashi

2004-01-01

59

Clinical Focus Identifying Residual Speech Sound  

E-print Network

AJSLP Clinical Focus Identifying Residual Speech Sound Disorders in Bilingual Children: A Japanese speech sound disorders (SSDs) in bilinguals by distinguishing speech patterns associated with second understanding of a client's strengths and needs. Key Words: bilingualism, articulation, residual speech sound

60

Convolutional Neural Networks for Distant Speech Recognition  

E-print Network

1 Convolutional Neural Networks for Distant Speech Recognition Pawel Swietojanski, Student Member convolutional neural networks (CNNs) for large vocabulary distant speech recognition, trained using speech, convolutional neural networks, meetings, AMI corpus I. INTRODUCTION DISTANT speech recognition (DSR) [1

Edinburgh, University of

61

Phonetic recalibration only occurs in speech mode.  

PubMed

Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds are perceived as non-speech. In contrast, selective speech adaptation occurred irrespective of whether listeners were in speech or non-speech mode. These results provide new evidence for the distinction between a speech and non-speech processing mode, and they demonstrate that different mechanisms underlie recalibration and selective speech adaptation. PMID:19059584

Vroomen, Jean; Baart, Martijn

2009-02-01

62

Speech processing using maximum likelihood continuity mapping  

SciTech Connect

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

Hogden, J.E.

2000-04-18

63

Audio-Visual Speech Modeling for Continuous Speech Recognition  

Microsoft Academic Search

This paper describes a speech recognition system that uses both acoustic and visual speech information to improve recognition performance in noisy environments. The system consists of three components: a visual module; an acoustic module; and a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This task is

Stéphane Dupont; Juergen Luettin

2000-01-01

64

Constructing emotional speech synthesizers with limited speech database  

Microsoft Academic Search

This paper describes an emotional speech synthesis system based on HMMs and related modeling techniques. For con- catenative speech synthesis, we require all of the concatena- tion units that will be used to be recorded beforehand and made available at synthesis time. To adopt this approach for synthe- sizing the wide variety of human emotions possible in speech, implies that

Heiga Zen; Tadashi Kitamura; Murtaza Bulut; Shrikanth Narayanan; Ryosuke Tsuzuki; Keiichi Tokuda

2004-01-01

65

Towards speech as a knowledge resource  

Microsoft Academic Search

Speech is a tantalizing mode of human communication. On the one hand, humans understand speech with ease and use speech to express complex ideas, information, and knowledge. On the other hand, automatic speech recognition with computers is still very hard, and extracting knowledge from speech is even harder. In this paper we motivate the study of speech as a knowledge

Eric W. Brown; Savitha Srinivasan; Anni Coden; Dulce B. Ponceleon; James W. Cooper; Arnon Amir; Jan Pieper

2001-01-01

66

Recent Advances in Automatic Speech Summarization  

Microsoft Academic Search

Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant challenges that distinguish it from general text summarization. Fundamental problems with speech summarization include speech recognition errors, disfluencies, and difficulties

Sadaoki Furui

2007-01-01

67

RECENT ADVANCES IN AUTOMATIC SPEECH SUMMARIZATION  

Microsoft Academic Search

Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant challenges that distinguish it from general text summarization. Fundamental problems with speech summarization include speech recognition errors, disfluencies, and difficulties

Sadaoki Furui

2006-01-01

68

A virtual vocabulary speech recognizer  

E-print Network

A system for the automatic recognition of human speech is described. A commercially available speech recognizer sees its recognition vocabulary increased through the use of virtual memory management techniques. central to ...

Pathe, Peter D

1983-01-01

69

Development of a speech autocuer  

NASA Technical Reports Server (NTRS)

A wearable, visually based prosthesis for the deaf based upon the proven method for removing lipreading ambiguity known as cued speech was fabricated and tested. Both software and hardware developments are described, including a microcomputer, display, and speech preprocessor.

Bedles, R. L.; Kizakvich, P. N.; Lawson, D. T.; Mccartney, M. L.

1980-01-01

70

Speech coding: a tutorial review  

Microsoft Academic Search

The past decade has witnessed substantial progress towards the application of low-rate speech coders to civilian and military communications as well as computer-related voice applications. Central to this progress has been the development of new speech coders capable of producing high-quality speech at low data rates. Most of these coders incorporate mechanisms to: represent the spectral properties of speech, provide

ANDREAS S. SPANIAS

1994-01-01

71

Speech Communication and Math  

NSDL National Science Digital Library

Patty Amick, Cheryl Hawkins, and Lori Trumbo of Greenville Technical College created this resource to connect the art of public speaking with the task of demographic data collection. This course will help students create and interpret charts and graphs using mean, median, mode, and percentages. It will also allow students to recognize flawed surveys and design their own in order to produce valid data, all while writing a persuasive speech to incorporate their findings. This is a great website for educators looking to combine speech communication and math in a very hands-on way.

Amick, Patty; Hawkins, Cheryl; Trumbo, Lori

2008-05-02

72

Speed of speech and persuasion  

Microsoft Academic Search

The relationship between speaking rate and attitude change was investigated in 2 field experiments with 449 Ss. Manipulations of speech rate were crossed with (a) credibility of the speaker and (b) complexity of the spoken message. Results suggest that speech rate functions as a general cue that augments credibility; rapid speech enhances persuasion, and therefore argues against information-processing interpretations of

Norman Miller; Geoffrey Maruyama; Rex J. Beaber; Keith Valone

1976-01-01

73

SPEECH-LANGUAGE-HEARING CLINIC  

E-print Network

's faculty, offer assessment and therapy services for a variety of speech, language and hearing disorders and phonology · Voice · Hearing loss · Receptive and expressive language · Resonance · Aphasia · ReadingSPEECH-LANGUAGE- HEARING CLINIC AT OSU-TULSA The OSU-Tulsa Speech-Language-Hearing Clinic provides

Veiga, Pedro Manuel Barbosa

74

[Focus: Speech in the Classroom].  

ERIC Educational Resources Information Center

This journal is devoted to the art of teaching in the field of speech communication. Articles collected in this issue address topics in the development of speech communication courses at the higher-education level and include the following titles: "Classroom as Process: A Dramatistic Observational Model for Speech Communication Teachers,""Taking…

Williams, Donald E., Ed.; Taylor, K. Phillip, Ed.

1974-01-01

75

Signed Soliloquy: Visible Private Speech  

ERIC Educational Resources Information Center

Talking to oneself can be silent (inner speech) or vocalized for others to hear (private speech, or soliloquy). We investigated these two types of self-communication in 28 deaf signers and 28 hearing adults. With a questionnaire specifically developed for this study, we established the visible analog of vocalized private speech in deaf signers.…

Zimmermann, Kathrin; Brugger, Peter

2013-01-01

76

Thesis Seminar Articulatory Speech Processing  

E-print Network

real speech production data from a database containing simultaneous audio and mouth movement recordingsThesis Seminar Articulatory Speech Processing Sam Roweis Computation and Neural Systems Wednesday recognition or pattern completion module. In the case of human speech perception and production, the models

Roweis, Sam

77

Speech therapy for Parkinson's disease  

Microsoft Academic Search

Twenty-six patients with the speech disorder of Parkinson's disease received daily speech therapy (prosodic exercises) at home for 2 to 3 weeks. There were significant improvements in speech as assessed by scores for prosodic abnormality and intelligibility' and these were maintained in part for up to 3 months. The degree of improvement was clinically and psychologically important, and relatives commented

S Scott; F I Caird

1983-01-01

78

Interlocutor Informative Speech  

ERIC Educational Resources Information Center

Sharing information orally is an important skill that public speaking classes teach well. However, the author's students report that they do not often see informative lectures, demonstrations, presentations, or discussions that follow the structures and formats of an informative speech as it is discussed in their textbooks. As a result, the author…

Gray, Jonathan M.

2005-01-01

79

Microprocessor for speech recognition  

SciTech Connect

A new single-chip microprocessor for speech recognition has been developed utilizing multi-processor architecture and pipelined structure. By DP-matching algorithm, the processor recognizes up to 340 isolated words or 40 connected words in realtime. 6 references.

Ishizuka, H.; Watari, M.; Sakoe, H.; Chiba, S.; Iwata, T.; Matsuki, T.; Kawakami, Y.

1983-01-01

80

Media Criticism Group Speech  

ERIC Educational Resources Information Center

Objective: To integrate speaking practice with rhetorical theory. Type of speech: Persuasive. Point value: 100 points (i.e., 30 points based on peer evaluations, 30 points based on individual performance, 40 points based on the group presentation), which is 25% of course grade. Requirements: (a) References: 7-10; (b) Length: 20-30 minutes; (c)…

Ramsey, E. Michele

2004-01-01

81

Concept-to-Speech Synthesis by Phonological Structure Matching  

E-print Network

gen- eration problem in a concept-to-speech system. Off-line, a database of recorded speech Speech Text-to-Speech Concept-to-Speech speech generation Database Query Figure 1. Text to speechConcept-to-Speech Synthesis by Phonological Structure Matching BY P A TAYLOR Centre for Speech

Edinburgh, University of

82

System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech  

DOEpatents

Low power EM waves are used to detect motions of vocal tract tissues of the human speech system before, during, and after voiced speech. A voiced excitation function is derived. The excitation function provides speech production information to enhance speech characterization and to enable noise removal from human speech.

Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

2002-01-01

83

Ultraspeech-tools: Acquisition, processing and visualization of ultrasound speech data for phonetics and speech therapy  

E-print Network

for phonetics and speech therapy Thomas Hueber, CNRS/GIPSA-lab, Grenoble, France thomas of ultrasound speech data recorded using Ultraspeech. Ultraspeech-player is designed for speech therapyUltraspeech-tools: Acquisition, processing and visualization of ultrasound speech data

Edinburgh, University of

84

Headphone localization of speech  

NASA Technical Reports Server (NTRS)

Three-dimensional acoustic display systems have recently been developed that synthesize virtual sound sources over headphones based on filtering by head-related transfer functions (HRTFs), the direction-dependent spectral changes caused primarily by the pinnae. In this study, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with nonindividualized HRTFs. About half of the subjects 'pulled' their judgments toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgments; 15 to 46 percent of stimuli were heard inside the head, with the shortest estimates near the median plane. The results suggest that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized HRTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

Begault, Durand R.; Wenzel, Elizabeth M.

1993-01-01

85

SpeechBot  

NSDL National Science Digital Library

This new experimental search engine from Compaq indexes over 2,500 hours of content from 20 popular American radio shows. Using its speech recognition software, Compaq creates "a time-aligned 'transcript' of the program and build[s] an index of the words spoken during the program." Users can then search the index by keyword or advanced search. Search returns include the text of the clip, a link to a longer transcript, the relevant audio clip in RealPlayer format, the entire program in RealPlayer format, and a link to the radio show's Website. The index is updated daily. Please note that, while SpeechBot worked fine on Windows/NT machines, the Scout Project was unable to access the audio clips using Macs.

86

Hate Speech: Power in the Marketplace.  

ERIC Educational Resources Information Center

A discussion of hate speech and freedom of speech on college campuses examines the difference between hate speech from normal, objectionable interpersonal comments and looks at Supreme Court decisions on the limits of student free speech. Two cases specifically concerning regulation of hate speech on campus are considered: Chaplinsky v. New…

Harrison, Jack B.

1994-01-01

87

Speech characteristics in the Kabuki syndrome  

Microsoft Academic Search

Six children with Kabuki syndrome were studied to investigate speech patterns as- sociated with the syndrome. Each child's speech was characterized with regard to articulation (types of errors and intelligibi- lity), pitch (high or low), loudness (volume of speech), and prosody (general quality of speech that combines rate and inflection). All six children had a history of delayed speech and

Sheila Upton; Carmella S. Stadter; Pat Landis; Eric A. Wulfsberg

2003-01-01

88

Off-Campus, Harmful Online Student Speech  

Microsoft Academic Search

This article discusses issues related to off-campus, harmful student speech on the Internet. The article explores the characteristics of this harmful speech, including (a) the speech originates off-campus. It has been created and disseminated without the use of school computers or Internet capabilities; (b) the speech has a school nexus. The speech relates to the district, the school, board members,

Nancy Willard

2003-01-01

89

Limited connected speech experiment  

NASA Astrophysics Data System (ADS)

The purpose of this contract was to demonstrate that connected Speech Recognition (CSR) can be performed in real-time on a vocabulary of one hundred words and to test the performance of the CSR system for twenty-five male and twenty-five female speakers. This report describes the contractor's real-time laboratory CSR system, the data base and training software developed in accordance with the contract, and the results of the performance tests.

Landell, P. B.

1983-03-01

90

Automatic Speech Recognition  

Microsoft Academic Search

\\u000a \\u000a Automatic speech recognition (ASR) is a critical component for CHIL services. For example, it provides the input to higher-level technologies, such as\\u000a summarization and question answering, as discussed in Chapter 8. In the spirit of ubiquitous computing, the goal of ASR in\\u000a CHIL is to achieve a high performance using far-field sensors (networks of microphone arrays and distributed far-field microphones).

Gerasimos Potamianos; Lori Lamel; Matthias Wölfel; Jing Huang; Etienne Marcheret; Claude Barras; Xuan Zhu; John McDonough; Javier Hernando; Dusan Macho; Climent Nadeu

2009-01-01

91

Neurophysiology of speech differences in childhood apraxia of speech.  

PubMed

Event-related potentials (ERPs) were recorded during a picture naming task of simple and complex words in children with typical speech and with childhood apraxia of speech (CAS). Results reveal reduced amplitude prior to speaking complex (multisyllabic) words relative to simple (monosyllabic) words for the CAS group over the right hemisphere during a time window thought to reflect phonological encoding of word forms. Group differences were also observed prior to production of spoken tokens regardless of word complexity during a time window just prior to speech onset (thought to reflect motor planning/programming). Results suggest differences in pre-speech neurolinguistic processes. PMID:25090016

Preston, Jonathan L; Molfese, Peter J; Gumkowski, Nina; Sorcinelli, Andrea; Harwood, Vanessa; Irwin, Julia R; Landi, Nicole

2014-01-01

92

Applications for Subvocal Speech  

NASA Technical Reports Server (NTRS)

A research and development effort now underway is directed toward the use of subvocal speech for communication in settings in which (1) acoustic noise could interfere excessively with ordinary vocal communication and/or (2) acoustic silence or secrecy of communication is required. By "subvocal speech" is meant sub-audible electromyographic (EMG) signals, associated with speech, that are acquired from the surface of the larynx and lingual areas of the throat. Topics addressed in this effort include recognition of the sub-vocal EMG signals that represent specific original words or phrases; transformation (including encoding and/or enciphering) of the signals into forms that are less vulnerable to distortion, degradation, and/or interception; and reconstruction of the original words or phrases at the receiving end of a communication link. Potential applications include ordinary verbal communications among hazardous- material-cleanup workers in protective suits, workers in noisy environments, divers, and firefighters, and secret communications among law-enforcement officers and military personnel in combat and other confrontational situations.

Jorgensen, Charles; Betts, Bradley

2007-01-01

93

Speech Accent Archive  

NSDL National Science Digital Library

The George Mason University archive of speech accents is a tool for linguists, speech pathologists, phoneticians, engineers who train speech recognition machines, and even interested laypeople. Volunteers who are native English speakers and non-native speakers were asked to read an elicitation paragraph in English that "uses common English words, but contains a variety of difficult English sounds and sound sequences." Visitors will quickly have the paragraph memorized while exploring different accents. There are several ways for visitors to find accents to listen to, one of which is by clicking on a map of the world, labeled "atlas/regions", or by language, labeled "language/speakers". Once visitors have chosen a region or language, the gender, and birthplace of the speaker will appear. Age and other data, such as "other languages" and "age of English onset", are provided to visitors when the link to a speaker is chosen. The "Generalizations" section contains "general rules that describe a speaker's accent", and they are based on General American English (GAE).

94

Speech endpoint detection with non-language speech sounds for generic speech processing applications  

NASA Astrophysics Data System (ADS)

Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

McClain, Matthew; Romanowski, Brian

2009-05-01

95

Disorders of Speech and Voice  

Microsoft Academic Search

\\u000a Speech is a learned behavior that requires rapid coordination of respiratory, phonatory, and articulatory systems coupled\\u000a with intact language, cognition, and hearing functions. Speech is often divided into sub-domains that include speech sound\\u000a production (articulation), fluency, resonance, and voice quality. Children develop control of each of these sub-domains over\\u000a a period of years, often raising questions for parents and pediatricians

Helen M. Sharp; Stephen M. Tasko

96

Real-time speech animation system  

E-print Network

We optimize the synthesis procedure of a videorealistic speech animation system [7] to achieve real-time speech animation synthesis. A synthesis rate must be high enough for real-time video streaming for speech animation ...

Fu, Jieyun

2011-01-01

97

Predicting confusions and intelligibility of noisy speech  

E-print Network

Current predictors of speech intelligibility are inadequate for making predictions of speech confusions caused by acoustic interference. This thesis is inspired by the need for a capability to understand and predict speech ...

Messing, David P. (David Patrick), 1979-

2007-01-01

98

American Speech-Language-Hearing Association  

MedlinePLUS

... Management Research Member Center Information For: The Public Audiologists Speech-Language Pathologists Students Academic Programs & Faculty Featured ... than 173,000 members and affiliates who are audiologists; speech-language pathologists; speech, language, and hearing scientists; ...

99

An Articulatory Speech-Prosthesis System  

E-print Network

We investigate speech-coding strategies for brain-machine-interface (BMI) based speech prostheses. We present an articulatory speech-synthesis system using an experimental integrated-circuit vocal tract that models the ...

Wee, Keng Hoong

100

Speech Recognition: How Do We Teach It?  

ERIC Educational Resources Information Center

States that growing use of speech recognition software has made voice writing an essential computer skill. Describes how to present the topic, develop basic speech recognition skills, and teach speech recognition outlining, writing, proofreading, and editing. (Contains 14 references.) (SK)

Barksdale, Karl

2002-01-01

101

Crowdsourcing Correction of Speech Recognition Captioning Errors  

E-print Network

Crowdsourcing Correction of Speech Recognition Captioning Errors M Wald, University of Southampton crowdsourcing correction of speech recognition captioning errors to provide a sustainable method of making using speech recognition technologies but this results in many recognition errors requiring manual

Southampton, University of

102

Large Vocabulary, Multilingual Speech Recognition: Session Overview  

E-print Network

Large Vocabulary, Multilingual Speech Recognition: Session Overview Lori LAMEL, Yoshinori SAGISAKA developed for a given language provide cruical input to speech recognition technology world-wide. However associate knowledge on speaker-independent, large vocabulary, continuous speech recognition technology among

103

X-RAY MICROBEAM SPEECH PRODUCTION DATABASE  

E-print Network

X-RAY MICROBEAM SPEECH PRODUCTION DATABASE USER'S HANDBOOK Version 1.0 (June 1994) prepared by John . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Chapter Three: Speech & Task Sample) Physiology of Speech Production, the now-classic cineradiographic account of thirteen disyllables spoken

104

ConcepttoSpeech Synthesis by Phonological Structure Matching  

E-print Network

gen­ eration problem in a concept­to­speech system. Off­line, a database of recorded speech generation waveform Speech Text­to­Speech Concept­to­Speech speech generation Database Query Figure 1. TextConcept­to­Speech Synthesis by Phonological Structure Matching BY P A TAYLOR Centre for Speech

Edinburgh, University of

105

Enhancing Peer Feedback and Speech Preparation: The Speech Video Activity  

ERIC Educational Resources Information Center

In the typical public speaking course, instructors or assistants videotape or digitally record at least one of the term's speeches in class or lab to offer students additional presentation feedback. Students often watch and self-critique their speeches on their own. Peers often give only written feedback on classroom presentations or completed…

Opt, Susan

2012-01-01

106

Speech-in-Speech Recognition: A Training Study  

ERIC Educational Resources Information Center

This study aims to identify aspects of speech-in-noise recognition that are susceptible to training, focusing on whether listeners can learn to adapt to target talkers ("tune in") and learn to better cope with various maskers ("tune out") after short-term training. Listeners received training on English sentence recognition in speech-shaped noise…

Van Engen, Kristin J.

2012-01-01

107

Statistical models for noise-robust speech recognition  

E-print Network

A standard way of improving the robustness of speech recognition systems to noise is model compensation. This replaces a speech recogniser's distributions over clean speech by ones over noise-corrupted speech. For each clean speech component, model...

van Dalen, Rogier Christiaan

2011-11-08

108

Detecting and Correcting Speech Repairs  

Microsoft Academic Search

Interactive spoken dialog provides many new challenges for spoken language systems. One of the most critical is the prevalence of speech repairs. This paper presents an al- gorithm that detects and corrects speech repairs based on finding the repair pattern. The repair pattern is built by find- ing word matches and word replacements, and identifying fragments and editing terms. Rather

Peter Heeman; James Allen

1993-01-01

109

Speech Analysis Systems: An Evaluation.  

ERIC Educational Resources Information Center

Performance characteristics are reviewed for seven computerized systems marketed for acoustic speech analysis: CSpeech, CSRE, ILS-PC, Kay Elemetrics model 550 Sona-Graph, MacSpeech Lab II, MSL, and Signalyze. Characteristics reviewed include system components, basic capabilities, documentation, user interface, data formats and journaling, and…

Read, Charles; And Others

1992-01-01

110

SILENT SPEECH DURING SILENT READING.  

ERIC Educational Resources Information Center

EFFORTS WERE MADE IN THIS STUDY TO (1) RELATE THE AMOUNT OF SILENT SPEECH DURING SILENT READING TO LEVEL OF READING PROFICIENCY, INTELLIGENCE, AGE, AND GRADE PLACEMENT OF SUBJECTS, AND (2) DETERMINE WHETHER THE AMOUNT OF SILENT SPEECH DURING SILENT READING IS AFFECTED BY THE LEVEL OF DIFFICULTY OF PROSE READ AND BY THE READING OF A FOREIGN…

MCGUIGAN, FRANK J.

111

Continuous speech dictation in French  

Microsoft Academic Search

A major research activity at LIMSI is multilingual, speaker- independent, large vocabulary speech dictation. In this pa per we report on efforts in large vocabulary, speaker-independen t con- tinuous speech recognition of French using the BREF corpus. Recognition experiments were carried out with vocabularies con- taining up to 20k words. The recognizer makes use of continuous density HMM with Gaussian

Jean-Luc Gauvain; Lori Lamel; Gilles Adda; Martine Adda-Decker

1994-01-01

112

Speech Communication and Multimodal Interfaces  

Microsoft Academic Search

Within the area of advanced man-machine interaction, speech communication has always played a major role for several decades. The idea of replacing the con- vential input devices such as buttons and keyboard by voice control and thus increas- ing the comfort and the input speed considerably, seems that much attractive, that even the quite slow progress of speech technology during

Björn Schuller; Markus Ablaßmeier; Ronald Müller; Stefan Reifinger; Tony Poitschke; Gerhard Rigoll

113

Speech Recognition of Malayalam Numbers  

Microsoft Academic Search

Digit speech recognition is important in many applications such as automatic data entry, PIN entry, voice dialing telephone, automated banking system, etc. This paper presents speaker independent speech recognition system for Malayalam digits. The system employs Mel frequency cepstrum coefficient (MFCC) as feature for signal processing and hidden Markov model (HMM) for recognition. The system is trained with 21 male

Cini Kurian; Kannan Balakrishnan

2009-01-01

114

Taking a Stand for Speech.  

ERIC Educational Resources Information Center

Asserts that freedom of speech issues were among the first major confrontations in U.S. constitutional law. Maintains that lessons from the controversies surrounding the Sedition Act of 1798 have continuing practical relevance. Describes and discusses the significance of freedom of speech to the U.S. political system. (CFR)

Moore, Wayne D.

1995-01-01

115

SPEECH--MAN'S NATURAL COMMUNICATION.  

ERIC Educational Resources Information Center

SESSION 63 OF THE 1967 INSTITUTE OF ELECTRICAL AND ELECTRONIC ENGINEERS INTERNATIONAL CONVENTION BROUGHT TOGETHER SEVEN DISTINGUISHED MEN WORKING IN FIELDS RELEVANT TO LANGUAGE. THEIR TOPICS INCLUDED ORIGIN AND EVOLUTION OF SPEECH AND LANGUAGE, LANGUAGE AND CULTURE, MAN'S PHYSIOLOGICAL MECHANISMS FOR SPEECH, LINGUISTICS, AND TECHNOLOGY AND…

DUDLEY, HOMER; AND OTHERS

116

Modeling disfluencies in conversational speech  

Microsoft Academic Search

Conversational speech is notably different from read speech in several ways, particularly in the presence of disfluencies but also in the frequent use of a small set of words that mark the flow of the discourse. Disfluenaes are sometimes viewed as a ''problemn in language modeling, where most previous work has focused on written text. In this paper, we take

Man-Hung Siu; Mari Ostendorf

1996-01-01

117

Free Speech and Hostile Environments  

Microsoft Academic Search

One major concern about sexual harassment law is that employers will restrict employee speech in order to avoid hostile environment liability, thus violating free speech principles. In this Essay, Professor Balkin argues that this “collateral censorship” is constitutionally permissible when there are good grounds for vicarious liability. Because employers actively control workplace culture, and because they are better able to

Jack M Balkin

1999-01-01

118

Hate Speech or Free Speech: Can Broad Campus Speech Regulations Survive Current Judicial Reasoning?  

ERIC Educational Resources Information Center

Federal courts have found speech regulations overbroad in suits against the University of Michigan and the University of Wisconsin System. Attempts to assess the theoretical justification and probable fate of broad speech regulations that have not been explicitly rejected by the courts. Concludes that strong arguments for broader regulation will…

Heiser, Gregory M.; Rossow, Lawrence F.

1993-01-01

119

Speech Anxiety: The Importance of Identification in the Basic Speech Course.  

ERIC Educational Resources Information Center

A study investigated speech anxiety in the basic speech course by means of pre and post essays. Subjects, 73 students in 3 classes in the basic speech course at a southwestern multiuniversity, wrote a two-page essay on their perceptions of their speech anxiety before the first speaking project. Students discussed speech anxiety in class and were…

Mandeville, Mary Y.

120

Hate Speech/Free Speech: Using Feminist Perspectives To Foster On-Campus Dialogue.  

ERIC Educational Resources Information Center

Explores the complex issues inherent in the tension between hate speech and free speech, focusing on the phenomenon of hate speech on college campuses. Describes the challenges to hate speech made by critical race theorists and explains how a feminist critique can reorient the parameters of hate speech. (SLD)

Cornwell, Nancy; Orbe, Mark P.; Warren, Kiesha

1999-01-01

121

Speech Perception and Short-Term Memory Deficits in Persistent Developmental Speech Disorder  

ERIC Educational Resources Information Center

Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech

Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

2006-01-01

122

AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION  

E-print Network

Ã?Ã? AM-DEMODULATION OF SPEECH SPECTRA AND ITS APPLICATION TO NOISE ROBUST SPEECH RECOGNITION Qifeng, and its application to automatic speech recognition (ASR) is studied. Speech production can be regarded or pitch. For example, the VTTF is often used in feature extraction for Automatic Speech Recognition (ASR

Alwan, Abeer

123

SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION  

E-print Network

SUBTRACTION OF ADDITIVE NOISE FROM CORRUPTED SPEECH FOR ROBUST SPEECH RECOGNITION J. Chen* , K. K the performance of speech recognition systems. For many speech recognition applications the most important source of acoustical distortion is the additive noise. Much research effort in robust speech recognition has been

124

An open source speech synthesis module for a visual-speech recognition system  

E-print Network

An open source speech synthesis module for a visual-speech recognition system S. Manitsarisa , B technology that permits speech communication without vocalization. The visual-speech recognition engine the opportunity to speak with his/her original voice. The visual- speech recognition engine of the SSI outputs

Paris-Sud XI, Université de

125

An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition  

E-print Network

An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition Simon Lucey features is per- formed specifically for the tasks of speech and speaker recognition. Unlike acoustic speech processing, we demonstrate that the features employed for effective speech and speaker recognition

Chen, Tsuhan

126

Multifractal nature of unvoiced speech signals  

NASA Astrophysics Data System (ADS)

A refinement is made in the nonlinear dynamic modeling of speech signals. Previous research successfully characterized speech signals as chaotic. Here, we analyze fricative speech signals using multifractal measures to determine various fractal regimes present in their chaotic attractors. Results support the hypothesis that speech signals have multifractal measures.

Adeyemi, Olufemi A.; Hartt, K.; Boudreaux-Bartels, G. F.

1996-06-01

127

Phonetic Recalibration Only Occurs in Speech Mode  

ERIC Educational Resources Information Center

Upon hearing an ambiguous speech sound dubbed onto lipread speech, listeners adjust their phonetic categories in accordance with the lipread information (recalibration) that tells what the phoneme should be. Here we used sine wave speech (SWS) to show that this tuning effect occurs if the SWS sounds are perceived as speech, but not if the sounds…

Vroomen, Jean; Baart, Martijn

2009-01-01

128

Graduate Student SPEECH-LANGUAGE PATHOLOGY  

E-print Network

is wide ­ covering variety of Speech, Language and hearing disorders from birth to old age. Speech-Language1 Graduate Student Handbook SPEECH-LANGUAGE PATHOLOGY PROGRAM STEPHEN F. AUSTIN STATE UNIVERSITY P;2 Dear Prospective student: Welcome to the Program website for Speech-Language Pathology and Audiology

Long, Nicholas

129

On the Relationship of Speech to Language.  

ERIC Educational Resources Information Center

A framework which considers speech and language as separate entities in a symbiotic relationship is presented, and basic questions are raised concerning how speech and language function together and what their reciprocal effects are. Based on the notion that speech and language are independent, various examples of speech without language and of…

Cutting, James E.; Kavanagh, James F.

1976-01-01

130

Emerging Technologies Speech Tools and Technologies  

ERIC Educational Resources Information Center

Using computers to recognize and analyze human speech goes back at least to the 1970's. Developed initially to help the hearing or speech impaired, speech recognition was also used early on experimentally in language learning. Since the 1990's, advances in the scientific understanding of speech as well as significant enhancements in software and…

Godwin-Jones, Robert

2009-01-01

131

POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes  

E-print Network

POLYPHASE SPEECH RECOGNITION Hui Lin, Jeff Bilmes {hlin,bilmes}@ee.washington.edu Department for speech recognition that consists of multiple semi-synchronized recognizers operating on a polyphase problem in many speech recognition systems ­ i.e., that speech modulation energy is most important below

Bilmes, Jeff

132

ROBUST SPEECH RECOGNITION K.K. Paliwal  

E-print Network

ROBUST SPEECH RECOGNITION K.K. Paliwal School of Microelectronic Engineering Griffith University. The aim of ro- bust speech recognition is to overcome the mismatch problem so as to result in a moderate of an automatic speech recognition system, describe sources of speech variability that cause mismatch between

133

Analysis of False Starts in Spontaneous Speech.  

ERIC Educational Resources Information Center

A primary difference between spontaneous speech and read speech concerns the use of false starts, where a speaker interrupts the flow of speech to restart his or her utterance. A study examined the acoustic aspects of such restarts in a widely-used speech database, examining approximately 1000 utterances, about 10% of which contained a restart.…

O'Shaughnessy, Douglas

134

Norwegian Speech Recognition for Telephone Applications  

E-print Network

tele- phone speech database, TABU.0. We discuss the database design speci cation and some ex- periences-based recogniser trained on a subset of the database. 1. INTRODUCTION Automatic speech recognition has now reached to be the speech database used to train the recogniser. Therefore, a Norwegian speech database is necessary

Amdal, Ingunn

135

Bachelor of Speech and Language Pathology  

E-print Network

. This is a relatively new area of study in speech and language therapy, so research is required to provide evidence about the most effective methods of rehabilitation. Phoebe Macrae PhD in Speech and Language TherapyBachelor of Speech and Language Pathology Bachelor of Speech and Language Pathology Career

Hickman, Mark

136

Audio-visual speech perception is special.  

PubMed

In face-to-face conversation speech is perceived by ear and eye. We studied the prerequisites of audio-visual speech perception by using perceptually ambiguous sine wave replicas of natural speech as auditory stimuli. When the subjects were not aware that the auditory stimuli were speech, they showed only negligible integration of auditory and visual stimuli. When the same subjects learned to perceive the same auditory stimuli as speech, they integrated the auditory and visual stimuli in a similar manner as natural speech. These results demonstrate the existence of a multisensory speech-specific mode of perception. PMID:15833302

Tuomainen, Jyrki; Andersen, Tobias S; Tiippana, Kaisa; Sams, Mikko

2005-05-01

137

Speech recovery device  

SciTech Connect

There is provided an apparatus and method for assisting speech recovery in people with inability to speak due to aphasia, apraxia or another condition with similar effect. A hollow, rigid, thin-walled tube with semi-circular or semi-elliptical cut out shapes at each open end is positioned such that one end mates with the throat/voice box area of the neck of the assistor and the other end mates with the throat/voice box area of the assisted. The speaking person (assistor) makes sounds that produce standing wave vibrations at the same frequency in the vocal cords of the assisted person. Driving the assisted person's vocal cords with the assisted person being able to hear the correct tone enables the assisted person to speak by simply amplifying the vibration of membranes in their throat.

Frankle, Christen M.

2000-10-19

138

Temporally nonadjacent nonlinguistic sounds affect speech categorization.  

PubMed

Speech perception is an ecologically important example of the highly context-dependent nature of perception; adjacent speech, and even nonspeech, sounds influence how listeners categorize speech. Some theories emphasize linguistic or articulation-based processes in speech-elicited context effects and peripheral (cochlear) auditory perceptual interactions in non-speech-elicited context effects. The present studies challenge this division. Results of three experiments indicate that acoustic histories composed of sine-wave tones drawn from spectral distributions with different mean frequencies robustly affect speech categorization. These context effects were observed even when the acoustic context temporally adjacent to the speech stimulus was held constant and when more than a second of silence or multiple intervening sounds separated the nonlinguistic acoustic context and speech targets. These experiments indicate that speech categorization is sensitive to statistical distributions of spectral information, even if the distributions are composed of nonlinguistic elements. Acoustic context need be neither linguistic nor local to influence speech perception. PMID:15828978

Holt, Lori L

2005-04-01

139

Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation for Speech  

E-print Network

Bayesian Discriminative Adaptation for Speech Recognition Bayesian Discriminative Adaptation for Speech Recognition C. K. Raut, Kai Yu and Mark Gales 2007 April 12 Cambridge University Engineering Recognition Overview · Adaptation and Adaptive Training ­ Speech Recognition in Varying Acoustic Conditions

de Gispert, Adrià

140

Speech in the Marxist State.  

ERIC Educational Resources Information Center

Describes the field of speech communication in East Germany with emphasis on the influence of the ideology of Marxism upon its nature and status in academic settings. Contrasts the East German system with the American. (JMF)

McGuire, Michael; Berger, Lothar

1979-01-01

141

If I Had - Slurred Speech  

MedlinePLUS Videos and Cool Tools

... this page to a friend News For October 17, 2007 Sushi & Parasites, Benefits of Napping, Knee Buckling, ... Had - Slurred Speech - Dr. Michel Melanson, MD (October 17, 2007 - Insidermedicine) Welcome to Insidermedicine's If I Had, ...

142

Speech processing: An evolving technology  

SciTech Connect

As we enter the information age, speech processing is emerging as an important technology for making machines easier and more convenient for humans to use. It is both an old and a new technology - dating back to the invention of the telephone and forward, at least in aspirations, to the capabilities of HAL in 2001. Explosive advances in microelectronics now make it possible to implement economical real-time hardware for sophisticated speech processing - processing that formerly could be demonstrated only in simulations on main-frame computers. As a result, fundamentally new product concepts - as well as new features and functions in existing products - are becoming possible and are being explored in the marketplace. As the introductory piece to this issue, the authors draw a brief perspective on the evolving field of speech processing and assess the technology in the the three constituent sectors: speech coding, synthesis, and recognition.

Crochiere, R.E.; Flanagan, J.L.

1986-09-01

143

Speech Convergence with Animated Personas  

Microsoft Academic Search

A new dimension of speaker stylistic variation was identified during human-computer communication: the convergence of users' speech with the text-to-speech (TTS) heard from an animated software partner. Twenty-four 7-to-10-year-old children conversed with digital fish that embodied different TTS voices as they learned about marine biology. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously

Sharon Oviatt; Courtney Darves; Rachel Coulston; Matt Wesson

144

The contribution of sensitivity to speech rhythm and non?speech rhythm to early reading development  

Microsoft Academic Search

Both sensitivity to speech rhythm and non?speech rhythm have been associated with successful phonological awareness and reading development in separate studies. However, the extent to which speech rhythm, non?speech rhythm and literacy skills are interrelated has not been examined. As a result, five? to seven?year?old English?speaking children were assessed on measures of speech rhythm sensitivity, non?speech rhythm sensitivity (both receptive

Andrew J. Holliman; Clare Wood; Kieron Sheehy

2010-01-01

145

Elderly perception of speech from a computer  

NASA Astrophysics Data System (ADS)

An aging population still needs to access information, such as bus schedules. It is evident that they will be doing so using computers and especially interfaces using speech input and output. This is a preliminary study to the use of synthetic speech for the elderly. In it twenty persons between the ages of 60 and 80 were asked to listen to speech emitted by a robot (CMU's VIKIA) and to write down what they heard. All of the speech was natural prerecorded speech (not synthetic) read by one female speaker. There were four listening conditions: (a) only speech emitted, (b) robot moves before emitting speech, (c) face has lip movement during speech, (d) both (b) and (c). There were very few errors for conditions (b), (c), and (d), but errors existed for condition (a). The presentation will discuss experimental conditions, show actual figures and try to draw conclusions for speech communication between computers and the elderly.

Black, Alan; Eskenazi, Maxine; Simmons, Reid

2002-05-01

146

Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.  

PubMed

During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse. PMID:16521772

Larm, Petra; Hongisto, Valtteri

2006-02-01

147

Temporally Nonadjacent Nonlinguistic Sounds Affect Speech Categorization  

Microsoft Academic Search

ABSTRACT—Speech perceptionisan,ecologicallyimportant example,of the highly context-dependent nature,of per- ception; adjacent speech, and even nonspeech, sounds influence how,listeners categorize speech. Some theories emphasize,linguistic or articulation-based processes,in speech-elicited context effects and peripheral,(cochlear) auditory,perceptual,interactions,in non-speech-elicited context effects. The present studies challenge this division. Results of three experiments,indicate that acoustic histo- ries composed,of sine-wave tones drawn,from,spectral distributions with different mean,frequencies,robustly affect speech categorization. These

Lori L. Holt

2005-01-01

148

Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1  

E-print Network

Discriminative pronunciation modeling for dialectal speech recognition Maider Lehr1 , Kyle Gorman1 recognition, dialec- tal speech recognition, pronunciation modeling, discriminative training 1. Introduction Speech recognition technology is increasingly ubiquitous in ev- eryday life. Automatic speech recognition

Cortes, Corinna

149

Contextual variability during speech-in-speech recognition.  

PubMed

This study examined the influence of background language variation on speech recognition. English listeners performed an English sentence recognition task in either "pure" background conditions in which all trials had either English or Dutch background babble or in mixed background conditions in which the background language varied across trials (i.e., a mix of English and Dutch or one of these background languages mixed with quiet trials). This design allowed the authors to compare performance on identical trials across pure and mixed conditions. The data reveal that speech-in-speech recognition is sensitive to contextual variation in terms of the target-background language (mis)match depending on the relative ease/difficulty of the test trials in relation to the surrounding trials. PMID:24993234

Brouwer, Susanne; Bradlow, Ann R

2014-07-01

150

System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech  

DOEpatents

The present invention is a system and method for characterizing human (or animate) speech voiced excitation functions and acoustic signals, for removing unwanted acoustic noise which often occurs when a speaker uses a microphone in common environments, and for synthesizing personalized or modified human (or other animate) speech upon command from a controller. A low power EM sensor is used to detect the motions of windpipe tissues in the glottal region of the human speech system before, during, and after voiced speech is produced by a user. From these tissue motion measurements, a voiced excitation function can be derived. Further, the excitation function provides speech production information to enhance noise removal from human speech and it enables accurate transfer functions of speech to be obtained. Previously stored excitation and transfer functions can be used for synthesizing personalized or modified human speech. Configurations of EM sensor and acoustic microphone systems are described to enhance noise cancellation and to enable multiple articulator measurements.

Burnett, Greg C. (Livermore, CA); Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

2006-08-08

151

Determining the threshold for usable speech within co-channel speech with the SPHINX automated speech recognition system  

NASA Astrophysics Data System (ADS)

Much research has been and is continuing to be done in the area of separating the original utterances of two speakers from co-channel speech. This is very important in the area of automated speech recognition (ASR), where the current state of technology is not nearly as accurate as human listeners when the speech is co-channel. It is desired to determine what types of speech (voiced, unvoiced, and silence) and at what target to interference ratio (TIR) two speakers can speak at the same time and not reduce speech intelligibility of the target speaker (referred to as usable speech). Knowing which segments of co-channel speech are usable in ASR can be used to improve the reconstruction of single speaker speech. Tests were performed using the SPHINX ASR software and the TIDIGITS database. It was found that interfering voiced speech with a TIR of 6 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech. It was further found that interfering unvoiced speech with a TIR of 18 dB or greater (on a per frame basis) did not significantly reduce the intelligibility of the target speaker in co-channel speech.

Hicks, William T.; Yantorno, Robert E.

2004-10-01

152

The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading  

ERIC Educational Resources Information Center

Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…

Davidow, Jason H.; Ingham, Roger J.

2013-01-01

153

In this paper we compare speech recognition accuracy for high-quality speech recorded under controlled conditions with speech  

E-print Network

speech transmitted over long distance telephone lines. The TIMIT database is a continuous, speaker independent, phonetically-bal- anced and phonetically-labelled speech database. The NTIMIT SOURCES response of a typical telephone channel [2], and (4) speech from the NTIMIT database. We note that

Stern, Richard

154

Bangla Speech Recognition System Using LPC and ANN  

Microsoft Academic Search

This paper presents the Bangla speech recognition system. Bangla speech recognition system is divided mainly into two major parts. The first part is speech signal processing and the second part is speech pattern recognition technique. The speech processing stage consists of speech starting and end point detection, windowing, filtering, calculating the linear predictive coding (LPC) and cepstral coefficients and finally

Anup Kumar Paul; Dipankar Das

2009-01-01

155

Articulatory features for robust visual speech recognition  

E-print Network

This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic ...

Saenko, Ekaterina, 1976-

2004-01-01

156

Multimodal speech recognition with ultrasonic sensors  

E-print Network

Ultrasonic sensing of articulator movement is an area of multimodal speech recognition that has not been researched extensively. The widely-researched audio-visual speech recognition (AVSR), which relies upon video data, ...

Zhu, Bo, M. Eng. Massachusetts Institute of Technology

2008-01-01

157

Speech Recognition: Its Place in Business Education.  

ERIC Educational Resources Information Center

Suggests uses of speech recognition devices in the classroom for students with disabilities. Compares speech recognition software packages and provides guidelines for selection and teaching. (Contains 14 references.) (SK)

Szul, Linda F.; Bouder, Michele

2003-01-01

158

Speech Recognition by Machine, A Review  

E-print Network

This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recog...

Anusuya, M A

2010-01-01

159

Unified Theory of Speech : Speech Processing for the Hearing Impaired and Beyond  

Microsoft Academic Search

This paper presents some speech processing algorithms that were originally developed for hearing aid applications. However these algorithms are also applicable for other speech and audio applications. Considering that the basic properties of speech remain invariant across applications, it is logical to consider these algorithms under the broader umbrella of 'unified theory of speech.' These algorithms have been implemented on

N. Magotra; F. Livingston; S. Savadatti

2000-01-01

160

Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy  

ERIC Educational Resources Information Center

Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

161

SPEECHALATOR: TWO-WAY SPEECH-TO-SPEECH TRANSLATION IN YOUR HAND Alex Waibel  

E-print Network

- tion of automatic voice translation systems. The Phrasalator is a one-way device that can recognizeSPEECHALATOR: TWO-WAY SPEECH-TO-SPEECH TRANSLATION IN YOUR HAND Alex Waibel ¢¡¤£ , Ahmed Badran This demonstration involves two-way automatic speech- to-speech translation on a consumer off-the-shelf PDA

Schultz, Tanja

162

THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION  

E-print Network

THIRD-ORDER MOMENTS OF FILTERED SPEECH SIGNALS FOR ROBUST SPEECH RECOGNITION Kevin M. Indrebo are introduced and studied for robust speech recognition. These features have the potential to capture nonlinear. Introduction Spectral-based acoustic features have been the standard in speech recognition for many years, even

Johnson, Michael T.

163

A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition  

E-print Network

1 A High-Dimensional Subband Speech Representation and SVM Framework for Robust Speech Recognition the individual front-ends across the full range of noise levels. Index Terms--Speech recognition, robustness, subbands, sup- port vector machines. I. INTRODUCTION AUTOMATIC speech recognition (ASR) systems suffer

Sollich, Peter

164

What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W. Massaro  

E-print Network

What can Visual Speech Synthesis tell Visual Speech Recognition? Michael M. Cohen and Dominic W Abstract We consider the problem of speech recognition given visual and auditory information, and discuss, and third, the use of these production models to help guide automatic speech recognition. Finally, we

Massaro, Dominic

165

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition  

E-print Network

A Novel Visual Speech Representation and HMM Classification for Visual Speech Recognition Abstract. This paper presents the development of a novel visual speech recognition (VSR) system based on a new noting that they are problematic when applied to the continuous visual speech recognition. To circumvent

Whelan, Paul F.

166

The Role of Visual Speech Information in Supporting Perceptual Learning of Degraded Speech  

ERIC Educational Resources Information Center

Following cochlear implantation, hearing-impaired listeners must adapt to speech as heard through their prosthesis. Visual speech information (VSI; the lip and facial movements of speech) is typically available in everyday conversation. Here, we investigate whether learning to understand a popular auditory simulation of speech as transduced by a…

Wayne, Rachel V.; Johnsrude, Ingrid S.

2012-01-01

167

IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Rigid Head Motion in Expressive Speech  

E-print Network

are derived from an audiovisual database, comprising synchronized facial gestures and speech, which revealedIEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING 1 Rigid Head Motion in Expressive Speech characteristic patterns in emotional head motion sequences. Head motion patterns with neutral speech

Busso, Carlos

168

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES FOR CONCATENATIVE SPEECH SYNTHESIS  

E-print Network

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied of controlled units (e.g. diphones) the avail­ ability of more units taken from large speech databases seems

Greenberg, Albert

169

EFFECT OF SPEECH AND NOISE CROSS CORRELATION ON AMFCC SPEECH RECOGNITION FEATURES  

E-print Network

EFFECT OF SPEECH AND NOISE CROSS CORRELATION ON AMFCC SPEECH RECOGNITION FEATURES Benjamin J speech recognition feature extraction algorithms, it is common to assume that the noise and speech signal per- formed using the AURORA II database. From these evalua- tions, we show that the assumption

170

Running Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing  

E-print Network

to provide a comprehensive but user-friendly database from speech perception experiments. The experimentsRunning Head: Speech Data Warehouse Data Wareehouse for Speech Perception and Model Testing Dominic;ABSTRACT Theories of speech perception, like most theories, have tended to be qualitative rather than

Massaro, Dominic

171

Development and Evaluation of Polish Speech Corpus for Unit Selection Speech Synthesis Systems  

E-print Network

and suprasegmental features, the size of databases for speech technology purposes is expected to be substantial, e the database structure influence on the quality of the resulting synthesised speech. 2. Polish Speech Corpus 2 language, we have decided to use various speech units from different mixed databases as follows: · Base A

Möbius, Bernd

172

The design of Polish Speech Corpus for Unit Selection Speech Synthesis  

E-print Network

is to select at run-time from a large recorded speech database the longest available strings of phonetic;naturalness of synthetic speech. In a speech database comprising several hours of recordings, it is likely a segment or a diphone. Defining the optimal speech database for unit selection is a crucial, yet difficult

Möbius, Bernd

173

IMPROVING THE UNDERSTANDABILITY OF SPEECH SYNTHESIS BY MODELING SPEECH IN NOISE  

E-print Network

that produced the CMU SIN database for speech synthesis [3] to record a small (30 sentence) database of speech, the result is that two recording sessions are required to build a database of speech in noise, with the in-noise and not-in-noise conditions reversed for each session, giv- ing us an identical database of plain speech

Eskenazi, Maxine

174

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES FOR CONCATENATIVE SPEECH SYNTHESIS  

E-print Network

ASSESSMENT AND CORRECTION OF VOICE QUALITY VARIABILITIES IN LARGE SPEECH DATABASES of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied of controlled units (e.g. diphones) the avail- ability of more units taken from large speech databases seems

Greenberg, Albert

175

Assessment and correction of voice quality variabilities in large speech databases for concatenative speech synthesis  

Microsoft Academic Search

In an effort to increase the naturalness of concatenative speech synthesis, large speech databases may be recorded. While it is desirable to have varied prosodic and spectral characteristics in the database, it is not desirable to have variable voice quality. We present an automatic method for voice quality assessment and correction, whenever necessary, of large speech databases for concatenative speech

Yannis Stylianou

1999-01-01

176

UNIT SELECTION IN A CONCATENATIVE SPEECH SYNTHESIS SYSTEM USING A LARGE SPEECH DATABASE  

E-print Network

UNIT SELECTION IN A CONCATENATIVE SPEECH SYNTHESIS SYSTEM USING A LARGE SPEECH DATABASE Andrew J from the database. This approach to waveform synthesis permits training from natural speech: two meth by concatenating the waveforms of units selected from large, single­speaker speech databases. The primary

Black, Alan W

177

An Investigation of Audio-Visual Speech Recognition as Applied to Multimedia Speech Therapy Applications  

Microsoft Academic Search

A multimedia speech therapy system should be able to be used for customized speech therapy for different problems and for different ages. The speech recognition must be designed to work with high inter- and intra-speaker variability. In addition to displaying text on a screen, recording the voice reading the text, analyzing the recorded spoken signal and performing speech recognition which

Voula C. Georgopoulos

1999-01-01

178

Effects of speech therapy and pharmacologic and surgical treatments on voice and speech in parkinson's disease  

Microsoft Academic Search

The purpose of this review was to examine the different treatment approaches for persons with Parkinson's Disease (PD) and to examine the effects of these treatments on speech. Treatment methods reviewed include speech therapy, pharmacological, and surgical. Research from the 1950s through the 1970s had not demonstrated significant improvements following speech therapy. Recent research has shown that speech therapy (when

GERALYN M. SCHULZ; MEGAN K. GRANT

2000-01-01

179

Nonlinear Statistical Modeling of Speech  

NASA Astrophysics Data System (ADS)

Contemporary approaches to speech and speaker recognition decompose the problem into four components: feature extraction, acoustic modeling, language modeling and search. Statistical signal processing is an integral part of each of these components, and Bayes Rule is used to merge these components into a single optimal choice. Acoustic models typically use hidden Markov models based on Gaussian mixture models for state output probabilities. This popular approach suffers from an inherent assumption of linearity in speech signal dynamics. Language models often employ a variety of maximum entropy techniques, but can employ many of the same statistical techniques used for acoustic models. In this paper, we focus on introducing nonlinear statistical models to the feature extraction and acoustic modeling problems as a first step towards speech and speaker recognition systems based on notions of chaos and strange attractors. Our goal in this work is to improve the generalization and robustness properties of a speech recognition system. Three nonlinear invariants are proposed for feature extraction: Lyapunov exponents, correlation fractal dimension, and correlation entropy. We demonstrate an 11% relative improvement on speech recorded under noise-free conditions, but show a comparable degradation occurs for mismatched training conditions on noisy speech. We conjecture that the degradation is due to difficulties in estimating invariants reliably from noisy data. To circumvent these problems, we introduce two dynamic models to the acoustic modeling problem: (1) a linear dynamic model (LDM) that uses a state space-like formulation to explicitly model the evolution of hidden states using an autoregressive process, and (2) a data-dependent mixture of autoregressive (MixAR) models. Results show that LDM and MixAR models can achieve comparable performance with HMM systems while using significantly fewer parameters. Currently we are developing Bayesian parameter estimation and discriminative training algorithms for these new models to improve noise robustness.

Srinivasan, S.; Ma, T.; May, D.; Lazarou, G.; Picone, J.

2009-12-01

180

Text Independent Methods for Speech Segmentation  

Microsoft Academic Search

\\u000a This paper describes several text independent speech segmentation methods. State-of-the-art applications and the prospected\\u000a use of automatic speech segmentation techniques are presented, including the direct applicability of automatic segmentation\\u000a in recognition, coding and speech corpora annotation, which is a central issue in today’s speech technology. Moreover, a novel\\u000a parametric segmentation algorithm will be presented and performance will be evaluated by

Anna Esposito; Guido Aversano

2004-01-01

181

European speech databases for telephone applications  

Microsoft Academic Search

The SpeechDat project aims to produce speech databases for all official languages of the European Union and some major dialectal variants and minority languages resulting in 28 speech databases. They will be recorded over fixed and mobile telephone networks. This will provide a realistic basis for training and assessment of both isolated and continuous-speech utterances, employing whole-word or subword approaches,

H. Hoge; H. S. Tropf; R. Winski; H. van den Heuvel; R. Haeb-Umbach; K. Choukri

1997-01-01

182

Speech coding, reconstruction and recognition using acoustics and electromagnetic waves  

DOEpatents

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

Holzrichter, J.F.; Ng, L.C.

1998-03-17

183

Speech coding, reconstruction and recognition using acoustics and electromagnetic waves  

DOEpatents

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

Holzrichter, John F. (Berkeley, CA); Ng, Lawrence C. (Danville, CA)

1998-01-01

184

Perceiving identical sounds as speech or non-speech modulates activity in the left posterior superior temporal sulcus  

Microsoft Academic Search

The left superior temporal cortex shows greater responsiveness to speech than to non-speech sounds according to previous neuroimaging studies, suggesting that this brain region has a special role in speech processing. However, since speech sounds differ acoustically from the non-speech sounds, it is possible that this region is not involved in speech perception per se, but rather in processing of

Riikka Möttönen; Gemma A. Calvert; Iiro P. Jääskeläinen; Paul M. Matthews; Thomas Thesen; Jyrki Tuomainen; Mikko Sams

2006-01-01

185

Unified framework for single channel speech enhancement  

Microsoft Academic Search

In this paper we describe a generic architecture for single channel speech enhancement. We assume processing in frequency domain and suppression based speech enhancement methods. The framework consists of a two stage voice activity detector, noise variance estimator, a suppression rule, and an uncertain presence of the speech signal modifier. The evaluation corpus is a synthetic mixture of a clean

Ivan Tashev; Andrew Lovitt; Alex Acero

2009-01-01

186

Audiovisual Speech Integration and Lipreading in Autism  

ERIC Educational Resources Information Center

Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…

Smith, Elizabeth G.; Bennetto, Loisa

2007-01-01

187

Towards continuous speech recognition using surface electromyography  

Microsoft Academic Search

We present our research on continuous speech recognition of the surface electromyographic signals that are generated by the hu- man articulatory muscles. Previous research on electromyographic speech recognition was limited to isolated word recognition be- cause it was very difficult to train phoneme-based acoustic mod- els for the electromyographic speech recognizer. In this paper, we demonstrate how to train the

Szu-Chen Stan Jou; Tanja Schultz; Matthias Walliczek; Florian Kraft; Alex Waibel

2006-01-01

188

Automatic labeling schemes for concatenative speech synthesis  

Microsoft Academic Search

This article discusses problems and solutions related to the labeling of the speech which is further used in speech synthesis. Although there are several synthesis methods, here we put focus on the concatenative speech synthesis, which is especially sensitive to labeling errors. As it uses huge amount of data it must be processed automatically. It is well accepted that the

J. Kacur; J. Cepko; Andrej Páleník

2008-01-01

189

Bimodal Emotion Recognition from Speech and Text  

Microsoft Academic Search

This paper presents an approach to emotion recognition from speech signals and textual content. In the analysis of speech signals, thirty-seven acoustic features are extracted from the speech input. Two different classifiers Support Vector Machines (SVMs) and BP neural network are adopted to classify the emotional states. In text analysis, we use the two-step classification method to recognize the emotional

Weilin Ye; Xinghua Fan

2014-01-01

190

Liberalism, Speech Codes, and Related Problems.  

ERIC Educational Resources Information Center

It is argued that universities are pervasively and necessarily engaged in regulation of speech, which complicates many existing claims about hate speech codes on campus. The ultimate test is whether the restriction on speech is a legitimate part of the institution's mission, commitment to liberal education. (MSE)

Sunstein, Cass R.

1993-01-01

191

Hate Speech on Campus: A Practical Approach.  

ERIC Educational Resources Information Center

Looks at arguments concerning hate speech and speech codes on college campuses, arguing that speech codes are likely to be of limited value in achieving civil rights objectives, and that there are alternatives less harmful to civil liberties and more successful in promoting civil rights. Identifies specific goals, and considers how restriction of…

Hogan, Patrick

1997-01-01

192

Speech training devices for profoundly deaf children  

Microsoft Academic Search

Prelingually, profoundly deaf children have great difficulty achieving intelligible speech. Even after intensive therapy, their speech is deficient in voice pitch, rhythm, stress and intonation, as well as segmental phonetic characteristics. To facilitate the speech training of these children, we are developing two interrelated personal computer (PC) based systems: a school system and a home system. In the school system,

Lynne E. Bernstein; James B. Ferguson

1986-01-01

193

GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS  

E-print Network

GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large- vocabulary speech recognition

Allauzen, Cyril

194

GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS  

E-print Network

GENERALIZED OPTIMIZATION ALGORITHM FOR SPEECH RECOGNITION TRANSDUCERS Cyril Allauzen and Mehryar provide a common representation for the components of a speech recognition system. In previous work, we, determinization. However, not all weighted automata and transducers used in large­ vocabulary speech recognition

Mohri, Mehryar

195

Algorithmic Aspects in Speech Recognition: An Introduction  

E-print Network

Algorithmic Aspects in Speech Recognition: An Introduction Adam L. Buchsbaum AT&T Labs, Florham Machinery, Inc., 1515 Broadway, New York, NY 10036, USA, Tel: (212) 869-7440 Speech recognition is an area recognition. This paper presents the field of speech recognition and describes some of its major open problems

Buchsbaum, Adam

196

Speech Recognition Experiments Silicon Auditory Models  

E-print Network

Speech Recognition Experiments with Silicon Auditory Models John Lazzaro and John Wawrzynek CS the performance of this speech recognition system on a speaker-independent 13-word recognition task. 1 in this difference. Current engineering applications of auditory models under study include speech recognition

Lazzaro, John

197

Speech recognition using noise-adaptive prototypes  

Microsoft Academic Search

A probabilistic mixture mode is described for a frame (the short term spectrum) of speech to be used in speech recognition. Each component of the mixture is regarded as a prototype for the labeling phase of a hidden Markov model based speech recognition system. Since the ambient noise during recognition can differ from that present in the training data, the

ARTHUR NADAS; DAVID NAHAMOO; MICHAEL A. PICHENY

1989-01-01

198

Speech recognition with amplitude and frequency modulations  

E-print Network

Speech recognition with amplitude and frequency modulations Fan-Gang Zeng* , Kaibao Nie*, Ginger S, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived number of spectral bands may be sufficient for speech recognition in quiet, FM significantly en- hances

Allen, Jont

199

GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION  

E-print Network

GRAPHICAL MODELS AND AUTOMATIC SPEECH RECOGNITION JEFFREY A. BILMES Abstract. Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition as part of a speech recognition system can be described by a graph ­ this includes Gaussian dis

Bilmes, Jeff

200

Robust Speech Recognition Using Articulatory Information  

E-print Network

Robust Speech Recognition Using Articulatory Information Der Technischen FakultË? at der Universit­up acoustic modeling component in a speech recognition system. The second focus point of this thesis different speech recognition tasks. The first of these is an American English corpus of telephone

Kirchhoff, Katrin

201

HIDDENARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION  

E-print Network

HIDDEN­ARTICULATOR MARKOV MODELS FOR SPEECH RECOGNITION Matt Richardson, Jeff Bilmes and Chris speech recognition using Hidden Markov Models (HMMs), each state represents an acoustic portion assist speech recognition. We demonstrate this by showing that our mapping of articulatory configurations

Bilmes, Jeff

202

Regularizing Linear Discriminant Analysis for Speech Recognition  

E-print Network

Regularizing Linear Discriminant Analysis for Speech Recognition Hakan Erdogan Faculty in a pattern recognition system is the feature extractor. Feature extraction is an important step for speech recognition since the time-domain speech signal is highly variable, thus complex linear and nonlinear

Erdogan, Hakan

203

Speech Perception in Individuals with Auditory Neuropathy  

ERIC Educational Resources Information Center

Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…

Zeng, Fan-Gang; Liu, Sheng

2006-01-01

204

The First Amendment and Commercial Speech  

Microsoft Academic Search

After a quick summary of constitutional treatment of commercial speech, this essay outlines four reasons why commercial speech should be denied First Amendment protection. Working from the claim that the primary rationale for constitutional protection of speech is the mandate that government respect individual freedom or autonomy, the essay argues: 1) that the individual does not choose, but rather the

C. Edwin Baker

2008-01-01

205

Speech-Song Interface of Chinese Speakers  

ERIC Educational Resources Information Center

Pitch is a psychoacoustic construct crucial in the production and perception of speech and songs. This article is an exploration of the interface of speech and song performance of Chinese speakers. Although parallels might be drawn from the prosodic and sound structures of the linguistic and musical systems, perceiving and producing speech and…

Mang, Esther

2007-01-01

206

Interventions for Speech Sound Disorders in Children  

ERIC Educational Resources Information Center

With detailed discussion and invaluable video footage of 23 treatment interventions for speech sound disorders (SSDs) in children, this textbook and DVD set should be part of every speech-language pathologist's professional preparation. Focusing on children with functional or motor-based speech disorders from early childhood through the early…

Williams, A. Lynn, Ed.; McLeod, Sharynne, Ed.; McCauley, Rebecca J., Ed.

2010-01-01

207

Speech & Hearing Clinic College of Science  

E-print Network

: · the problems the children have · how speech sounds are linked to written words · the therapy offered through that phonological awareness therapy can help. The therapy teaches children to: · use clearer speech · become aware of how sounds make up words · see the link between speech sounds and written words. The therapy will help

Hickman, Mark

208

Hate Speech and the First Amendment.  

ERIC Educational Resources Information Center

This document is comprised of California state statutes, federal legislation, and court litigation pertaining to hate speech and the First Amendment. The document provides an overview of California education code sections relating to the regulation of speech; basic principles of the First Amendment; government efforts to regulate hate speech,…

Rainey, Susan J.; Kinsler, Waren S.; Kannarr, Tina L.; Reaves, Asa E.

209

Speech Perception in Children with Speech Output Disorders  

ERIC Educational Resources Information Center

Research in the field of speech production pathology is dominated by describing deficits in output. However, perceptual problems might underlie, precede, or interact with production disorders. The present study hypothesizes that the level of the production disorders is linked to level of perception disorders, thus lower-order production problems…

Nijland, Lian

2009-01-01

210

Spectral integration in speech and non-speech sounds  

NASA Astrophysics Data System (ADS)

Spectral integration (or formant averaging) was proposed in vowel perception research to account for the observation that a reduction of the intensity of one of two closely spaced formants (as in /u/) produced a predictable shift in vowel quality [Delattre et al., Word 8, 195-210 (1952)]. A related observation was reported in psychoacoustics, indicating that when the components of a two-tone periodic complex differ in amplitude and frequency, its perceived pitch is shifted toward that of the more intense tone [Helmholtz, App. XIV (1875/1948)]. Subsequent research in both fields focused on the frequency interval that separates these two spectral components, in an attempt to determine the size of the bandwidth for spectral integration to occur. This talk will review the accumulated evidence for and against spectral integration within the hypothesized limit of 3.5 Bark for static and dynamic signals in speech perception and psychoacoustics. Based on similarities in the processing of speech and non-speech sounds, it is suggested that spectral integration may reflect a general property of the auditory system. A larger frequency bandwidth, possibly close to 3.5 Bark, may be utilized in integrating acoustic information, including speech, complex signals, or sound quality of a violin.

Jacewicz, Ewa

2005-04-01

211

Application of Malay speech technology in Malay Speech Therapy Assistance Tools  

Microsoft Academic Search

Malay speech therapy assistance tools (MSTAT) is a system which assists the therapist to diagnose children for language disorder and to train children with stuttering problem. The main engine behind it is the speech technologies; consist of speech recognition system, Malay text-to-speech system and Malay talking head by Tan, T.S. (2003). In this project, speech recognition system utilizes the hidden

Tian-Swee Tan; Helbin-Liboh; A. K. Ariff; Chee-Ming Ting; S.-H. Salleh

2007-01-01

212

Speech Motor Skill and Stuttering  

Microsoft Academic Search

The authors review converging lines of evidence from behavioral, kinematic, and neuroimaging data that point to limitations in speech motor skills in people who stutter (PWS). From their review, they conclude that PWS differ from those who do not in terms of their ability to improve with practice and retain practiced changes in the long term, and that they are

Aravind Kumar Namasivayam; Pascal van Lieshout

2011-01-01

213

Speech and Language Developmental Milestones  

MedlinePLUS

... exploring the role this genetic variant may also play in dyslexia, autism, and speech-sound disorders. A long- ... questions YES NO Talks about activities at daycare, preschool, or friends’ ... most of what is said at home and in school YES NO Uses sentences that give many ...

214

Laterality in Visual Speech Perception  

Microsoft Academic Search

The lateralization of visual speech perception was examined in 3 experiments. Participants were presented with a realistic computer-animated face articulating 1 of 4 consonant–vowel syllables without sound. The face appeared at 1 of 5 locations in the visual field. The participants' task was to identify each test syllable. To prevent eye movement during the presentation of the face, participants had

Paula M. T. Smeele; Dominic W. Massaro; Michael M. Cohen; Anne C. Sittig

1998-01-01

215

Motor Speech Disorders in Neurodevelopmental  

E-print Network

4/14/13 1 Motor Speech Disorders in Neurodevelopmental Syndromes Shelley Velleman University Cure Autism Now U.S. Department of Education/OSEP the Fulbright Program the children & their parents Disorders §Primary distinction: §Dysarthria §Apraxia §Distinguished by: §Area of (presumed) neurological

Shoubridge, Eric

216

Prosodic Contrasts in Ironic Speech  

ERIC Educational Resources Information Center

Prosodic features in spontaneous speech help disambiguate implied meaning not explicit in linguistic surface structure, but little research has examined how these signals manifest themselves in real conversations. Spontaneously produced verbal irony utterances generated between familiar speakers in conversational dyads were acoustically analyzed…

Bryant, Gregory A.

2010-01-01

217

Speech Processing Lab 1 Familiarisation  

E-print Network

of a sine wave vary over time? · What about the square and pulse waveforms? How do they differ from the sine. Summary Setting up accounts and familiarisation with the lab. Examining sounds with Wavesurfer. Inspecting at the same signal in both the frequency and time domains. Inspecting various types of sound, including speech

Edinburgh, University of

218

Acoustic Analysis of PD Speech  

PubMed Central

According to the U.S. National Institutes of Health, approximately 500,000 Americans have Parkinson's disease (PD), with roughly another 50,000 receiving new diagnoses each year. 70%–90% of these people also have the hypokinetic dysarthria associated with PD. Deep brain stimulation (DBS) substantially relieves motor symptoms in advanced-stage patients for whom medication produces disabling dyskinesias. This study investigated speech changes as a result of DBS settings chosen to maximize motor performance. The speech of 10 PD patients and 12 normal controls was analyzed for syllable rate and variability, syllable length patterning, vowel fraction, voice-onset time variability, and spirantization. These were normalized by the controls' standard deviation to represent distance from normal and combined into a composite measure. Results show that DBS settings relieving motor symptoms can improve speech, making it up to three standard deviations closer to normal. However, the clinically motivated settings evaluated here show greater capacity to impair, rather than improve, speech. A feedback device developed from these findings could be useful to clinicians adjusting DBS parameters, as a means for ensuring they do not unwittingly choose DBS settings which impair patients' communication. PMID:21977333

Chenausky, Karen; MacAuslan, Joel; Goldhor, Richard

2011-01-01

219

Why is Speech Recognition Difficult?  

Microsoft Academic Search

In this paper we will elaborate on some of the difficulties with Automatic Speech Recognition (ASR). We will argue that the main motivation for ASR is efficient interfaces to computers, and for the interfaces to be truly useful, it should provide coverage for a large group of users. We will discuss some of the issues that make the recognition of

Markus Forsberg

220

Linguistic aspects of speech synthesis.  

PubMed Central

The conversion of text to speech is seen as an analysis of the input text to obtain a common underlying linguistic description, followed by a synthesis of the output speech waveform from this fundamental specification. Hence, the comprehensive linguistic structure serving as the substrate for an utterance must be discovered by analysis from the text. The pronunciation of individual words in unrestricted text is determined by morphological analysis or letter-to-sound conversion, followed by specification of the word-level stress contour. In addition, many text character strings, such as titles, numbers, and acronyms, are abbreviations for normal words, which must be derived. To further refine these pronunciations and to discover the prosodic structure of the utterance, word part of speech must be computed, followed by a phrase-level parsing. From this structure the prosodic structure of the utterance can be determined, which is needed in order to specify the durational framework and fundamental frequency contour of the utterance. In discourse contexts, several factors such as the specification of new and old information, contrast, and pronominal reference can be used to further modify the prosodic specification. When the prosodic correlates have been computed and the segmental sequence is assembled, a complete input suitable for speech synthesis has been determined. Lastly, multilingual systems utilizing rule frameworks are mentioned, and future directions are characterized. PMID:7479807

Allen, J

1995-01-01

221

“Eigenlips” for robust speech recognition  

Microsoft Academic Search

We improve the performance of a hybrid connectionist speech recognition system by incorporating visual information about the corresponding lip movements. Specifically, we investigate the benefits of adding visual features in the presence of additive noise and crosstalk (cocktail party effect). Our study extends our previous experiments by using a new visual front end, and an alternative architecture for combining the

Christoph Bregler; Yochai Konig

1994-01-01

222

Gender Differences in Speech Behavior  

Microsoft Academic Search

Men and women behave differently in applying the Politeness Principle. The fact can be shown in using slang, humor, approbation, sympathy and using euphemism. By comparing the gender-related differences in discourse from the four factors above, the paper focuses on the chief differences between men and women in speech behavior, and interprets the possible causes for the existence of such

LI Xi

2007-01-01

223

Multilingual Speech Databases at LDC  

Microsoft Academic Search

As multilingual products and technology grow in importance, the Linguistic Data Consortium (LDC) intends to provide the resources needed for research and development activities, especially in telephone-based, small-vocabulary recognition applications; language identification research; and large vocabulary continuous speech recognition research.The POLYPHONE corpora, a multilingual \\

John J. Godfrey

1994-01-01

224

Embedding speech into virtual realities  

NASA Technical Reports Server (NTRS)

In this work a speaker-independent speech recognition system is presented, which is suitable for implementation in Virtual Reality applications. The use of an artificial neural network in connection with a special compression of the acoustic input leads to a system, which is robust, fast, easy to use and needs no additional hardware, beside a common VR-equipment.

Bohn, Christian-Arved; Krueger, Wolfgang

1993-01-01

225

Acceptance speech doctor honoris causa  

E-print Network

in the present scene it was very practical and very much enjoyed. I thank you and, of course, if there are women professors who wish to enjoy an embrace, we can make time after. I should say I was asked to write a speech saying now, because all those people have worked three years, four years, maybe many more years, night

226

Dysarthric Speech Database for Universal Access Research  

E-print Network

This paper describes a database of dysarthric speech produced by 19 speakers with cerebral palsy. Speech materials consist of 765 isolated words per speaker: 300 distinct uncommon words and 3 repetitions of digits, computer commands, radio alphabet and common words. Data is recorded through an 8-microphone array and one digital video camera. Our database provides a fundamental resource for automatic speech recognition development for people with neuromotor disability. Research on articulation errors in dysarthria will benefit clinical treatments and contribute to our knowledge of neuromotor mechanisms in speech production. Data files are available via secure ftp upon request. Index Terms: speech recognition, dysarthria, cerebral palsy 1.

Heejin Kim; Mark Hasegawa-johnson; Adrienne Perlman; Jon Gunderson; Thomas Huang; Kenneth Watkin; Simone Frame

227

Adaptive Redundant Speech Transmission over Wireless Multimedia Sensor Networks Based on Estimation of Perceived Speech Quality  

PubMed Central

An adaptive redundant speech transmission (ARST) approach to improve the perceived speech quality (PSQ) of speech streaming applications over wireless multimedia sensor networks (WMSNs) is proposed in this paper. The proposed approach estimates the PSQ as well as the packet loss rate (PLR) from the received speech data. Subsequently, it decides whether the transmission of redundant speech data (RSD) is required in order to assist a speech decoder to reconstruct lost speech signals for high PLRs. According to the decision, the proposed ARST approach controls the RSD transmission, then it optimizes the bitrate of speech coding to encode the current speech data (CSD) and RSD bitstream in order to maintain the speech quality under packet loss conditions. The effectiveness of the proposed ARST approach is then demonstrated using the adaptive multirate-narrowband (AMR-NB) speech codec and ITU-T Recommendation P.563 as a scalable speech codec and the PSQ estimation, respectively. It is shown from the experiments that a speech streaming application employing the proposed ARST approach significantly improves speech quality under packet loss conditions in WMSNs. PMID:22164086

Kang, Jin Ah; Kim, Hong Kook

2011-01-01

228

Gifts of Speech: Women's Speeches from Around the World  

NSDL National Science Digital Library

The Gifts of Speech site brings together speeches given by women from all around the world. The site is under the direction of Liz Linton Kent Leon, who is the electronic resources librarian at Sweet Briar College. First-time users may wish to click on the How To area to learn how to navigate the site. Of course, the FAQ area is a great way to learn about the site as well, and it should not be missed as it tells about the origin story for the site. In the Collections area, visitors can listen in to all of the Nobel Lectures delivered by female recipients and look at a list of the top 100 speeches in American history as determined by a group of researchers at the University of Wisconsin-Madison and Texas A & M University. Users will also want to use the Browse area to look over talks by women from Robin Abrams to Begum Kahaleda Zia, the former prime minster of the People's Republic of Bangladesh.

Leon, Liz K.

2012-09-13

229

Speech and language delay in children.  

PubMed

Speech and language delay in children is associated with increased difficulty with reading, writing, attention, and socialization. Although physicians should be alert to parental concerns and to whether children are meeting expected developmental milestones, there currently is insufficient evidence to recommend for or against routine use of formal screening instruments in primary care to detect speech and language delay. In children not meeting the expected milestones for speech and language, a comprehensive developmental evaluation is essential, because atypical language development can be a secondary characteristic of other physical and developmental problems that may first manifest as language problems. Types of primary speech and language delay include developmental speech and language delay, expressive language disorder, and receptive language disorder. Secondary speech and language delays are attributable to another condition such as hearing loss, intellectual disability, autism spectrum disorder, physical speech problems, or selective mutism. When speech and language delay is suspected, the primary care physician should discuss this concern with the parents and recommend referral to a speech-language pathologist and an audiologist. There is good evidence that speech-language therapy is helpful, particularly for children with expressive language disorder. PMID:21568252

McLaughlin, Maura R

2011-05-15

230

Some articulatory details of emotional speech  

NASA Astrophysics Data System (ADS)

Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.

Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth

2005-09-01

231

Loss tolerant speech decoder for telecommunications  

NASA Technical Reports Server (NTRS)

A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.

Prieto, Jr., Jaime L. (Inventor)

1999-01-01

232

Perception of intersensory synchrony in audiovisual speech: not that special.  

PubMed

Perception of intersensory temporal order is particularly difficult for (continuous) audiovisual speech, as perceivers may find it difficult to notice substantial timing differences between speech sounds and lip movements. Here we tested whether this occurs because audiovisual speech is strongly paired ("unity assumption"). Participants made temporal order judgments (TOJ) and simultaneity judgments (SJ) about sine-wave speech (SWS) replicas of pseudowords and the corresponding video of the face. Listeners in speech and non-speech mode were equally sensitive judging audiovisual temporal order. Yet, using the McGurk effect, we could demonstrate that the sound was more likely integrated with lipread speech if heard as speech than non-speech. Judging temporal order in audiovisual speech is thus unaffected by whether the auditory and visual streams are paired. Conceivably, previously found differences between speech and non-speech stimuli are not due to the putative "special" nature of speech, but rather reflect low-level stimulus differences. PMID:21035795

Vroomen, Jean; Stekelenburg, Jeroen J

2011-01-01

233

Brain activation abnormalities during speech and non-speech in stuttering speakers  

PubMed Central

Although stuttering is regarded as a speech-specific disorder, there is a growing body of evidence suggesting that subtle abnormalities in the motor planning and execution of non-speech gestures exist in stuttering individuals. We hypothesized that people who stutter (PWS) would differ from fluent controls in their neural responses during motor planning and execution of both speech and non-speech gestures that had auditory targets. Using fMRI with sparse sampling, separate BOLD responses were measured for perception, planning, and fluent production of speech and non-speech vocal tract gestures. During both speech and non-speech perception and planning, PWS had less activation in the frontal and temporoparietal regions relative to controls. During speech and non-speech production, PWS had less activation than the controls in the left superior temporal gyrus (STG) and the left pre-motor areas (BA 6) but greater activation in the right STG, bilateral Heschl’s gyrus (HG), insula, putamen, and precentral motor regions (BA 4). Differences in brain activation patterns between PWS and controls were greatest in the females and less apparent in males. In conclusion, similar differences in PWS from the controls were found during speech and non-speech; during perception and planning they had reduced activation while during production they had increased activity in the auditory area on the right and decreased activation in the left sensorimotor regions. These results demonstrated that neural activation differences in PWS are not speech-specific. PMID:19401143

Chang, Soo-Eun; Kenney, Mary Kay; Loucks, Torrey M.J.; Ludlow, Christy L.

2009-01-01

234

Headphone localization of speech stimuli  

NASA Technical Reports Server (NTRS)

Recently, three dimensional acoustic display systems have been developed that synthesize virtual sound sources over headphones based on filtering by Head-Related Transfer Functions (HRTFs), the direction-dependent spectral changes caused primarily by the outer ears. Here, 11 inexperienced subjects judged the apparent spatial location of headphone-presented speech stimuli filtered with non-individualized HRTFs. About half of the subjects 'pulled' their judgements toward either the median or the lateral-vertical planes, and estimates were almost always elevated. Individual differences were pronounced for the distance judgements; 15 to 46 percent of stimuli were heard inside the head with the shortest estimates near the median plane. The results infer that most listeners can obtain useful azimuth information from speech stimuli filtered by nonindividualized RTFs. Measurements of localization error and reversal rates are comparable with a previous study that used broadband noise stimuli.

Begault, Durand R.; Wenzel, Elizabeth M.

1991-01-01

235

Lombard speech: Auditory (A), Visual (V) and AV effects  

Microsoft Academic Search

This study examined Auditory (A) and Visual (V) speech (speech-related head and face movement) as a function of noise environment. Measures of AV speech were recorded for 3 males and 1 female for 10 sentences spoken in quiet as well as four styles of background noise (Lombard speech). Auditory speech was analyzed in terms of overall intensity, duration, spectral tilt

Chris Davis; Jeesun Kim; Katja Grauwinkel; Hansjörg Mixdorff

2006-01-01

236

Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric  

E-print Network

Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech Kinfe by an automatic speech recognition system (ASR) and na¨ive adult human listeners. The results show that despite recognition, dysarthric speech, intelligibility 1 Introduction Dysarthria is a neurogenic motor speech

Stevenson, Suzanne

237

Noise adaptive speech recognition based on sequential noise parameter estimation  

E-print Network

Noise adaptive speech recognition based on sequential noise parameter estimation Kaisheng Yao a In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition

238

Rethinking Speech Recognition on Mobile Devices Anuj Kumar1  

E-print Network

1 Rethinking Speech Recognition on Mobile Devices Anuj Kumar1 , Anuj Tewari2 , Seth Horrigan2 for automatic speech recognition (ASR) systems on mobile devices that are currently used ­ embedded speech recognition, speech recognition in the cloud, and distributed speech recognition; evaluate their advantages

Kam, Matthew

239

Exploring speech therapy games with children on the autism spectrum  

Microsoft Academic Search

Individuals on the autism spectrum often have difficulties producing intelligible speech with either high or low speech rate, and atypical pitch and\\/or amplitude affect. In this study, we present a novel intervention towards customizing speech enabled games to help them produce intelligible speech. In this approach, we clinically and computationally identify the areas of speech production difficulties of our participants.

Mohammed E. Hoque; Rana El Kaliouby; Matthew S. Goodwin; Rosalind W. Picard

2009-01-01

240

Prediction and imitation in speech  

PubMed Central

It has been suggested that intra- and inter-speaker variability in speech are correlated. Interlocutors have been shown to converge on various phonetic dimensions. In addition, speakers imitate the phonetic properties of voices they are exposed to in shadowing, repetition, and even passive listening tasks. We review three theoretical accounts of speech imitation and convergence phenomena: (i) the Episodic Theory (ET) of speech perception and production (Goldinger, 1998); (ii) the Motor Theory (MT) of speech perception (Liberman and Whalen, 2000; Galantucci et al., 2006); (iii) Communication Accommodation Theory (CAT; Giles and Coupland, 1991; Giles et al., 1991). We argue that no account is able to explain all the available evidence. In particular, there is a need to integrate low-level, mechanistic accounts (like ET and MT), and higher-level accounts (like CAT). We propose that this is possible within the framework of an integrated theory of production and comprehension (Pickering and Garrod, 2013). Similarly to both ET and MT, this theory assumes parity between production and perception. Uniquely, however, it posits that listeners simulate speakers' utterances by computing forward-model predictions at many different levels, which are then compared to the incoming phonetic input. In our account phonetic imitation can be achieved via the same mechanism that is responsible for sensorimotor adaptation; i.e., the correction of prediction errors. In addition, the model assumes that the degree to which sensory prediction errors lead to motor adjustments is context-dependent. The notion of context subsumes both the preceding linguistic input and non-linguistic attributes of the situation (e.g., the speaker's and listener's social identities, their conversational roles, the listener's intention to imitate). PMID:23801971

Gambi, Chiara; Pickering, Martin J.

2013-01-01

241

Nonlinear Speech Enhancement: An Overview  

Microsoft Academic Search

This paper deals with the problem of enhancing the quality of speech signals, which has received growing attention in the\\u000a last few decades. Many different approaches have been proposed in the literature under various configurations and operating\\u000a hypotheses. The aim of this paper is to give an overview of the main classes of noise reduction algorithms proposed to-date,\\u000a focusing on

Amir Hussain; Mohamed Chetouani; Stefano Squartini; Alessandro Bastari; Francesco Piazza

2005-01-01

242

78 FR 63152 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...  

Federal Register 2010, 2011, 2012, 2013

...over (HCO), speech-to-speech, ASCII/Baudot-compatible...With HCO, a person who has a speech disability, but who is able...be drawn for service in low bandwidth environments. The Commission...exempted IP CTS providers. 21. Speech-to-Speech....

2013-10-23

243

The Levels of Speech Usage Rating Scale: Comparison of Client Self-Ratings with Speech Pathologist Ratings  

ERIC Educational Resources Information Center

Background: The term "speech usage" refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a categorical…

Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

2012-01-01

244

The Norwegian part of SpeechDat: A European Speech Database for Creation of Voice Driven Teleservices  

E-print Network

The Norwegian part of SpeechDat: A European Speech Database for Creation of Voice Driven ABSTRACT In this paper we describe the Norwegian part of a European telephone speech database, Speech- Dat recognizer on the database. 1. PROJECT OVERVIEW The development of automatic speech recogni- tion is highly

Amdal, Ingunn

245

Using Hybrid HMM-Based Speech Segmentation to Improve Synthetic Speech Quality  

Microsoft Academic Search

The automatic phonetic time-alignment of speech databases is essential for the development cycle of a text-to-speech (TTS) system. Furthermore, the quality of the synthesized speech signals is strongly related to the precision of the produced alignment. In the present work we study the performance of a new HMM-based speech segmentation method. The method is based on hybrid embedded and isolated-unit

Iosif Mporas; Alexandros Lazaridis; Todor Ganchev; Nikos Fakotakis

2009-01-01

246

The application of naturalistic conversation training to speech production in children with speech disabilities.  

PubMed Central

The purpose of this experiment was to test the effectiveness of including speech production into naturalistic conversation training for 2 children with speech production disabilities. A multiple baseline design across behaviors (target phonemes) and across subjects (for the same phoneme) indicated that naturalistic conversation training resulted in improved spontaneous speech production. The implications of these findings are discussed relative to existing models of speech production training and other aspects of communication disorders. PMID:8331014

Camarata, S

1993-01-01

247

Lexical bias revisited: Detecting, rejecting and repairing speech errors in inner speech  

Microsoft Academic Search

This paper confirms and exploits the observation that early overt self-interruptions and repairs of phonological speech errors very likely are reactions to inner speech, not to overt speech. In an experiment eliciting word-word and nonword-nonword phonological spoonerisms it is found that self-interruptions and repairs come in two classes, one class of reactions to inner speech, another with reactions to overt

Sieb G. Nooteboom

2005-01-01

248

Acoustic Packaging: Maternal Speech and Action Synchrony  

Microsoft Academic Search

The current study addressed the degree to which maternal speech and action are synchronous in interactions with infants. English-speaking mothers demonstrated the function of two toys, stacking rings and nesting cups to younger infants (6-9.5 months) and older infants (9.5-13 months). Action and speech units were identified, and speech units were coded as being ongoing action descriptions or nonaction descriptions

Meredith Meyer; Bridgette Hard; Rebecca J. Brand; Molly McGarvey; Dare A. Baldwin

2011-01-01

249

Hydromorphone effects on human conversational speech  

Microsoft Academic Search

The present study provides an objective assessment of the increased talkativeness associated with acute opiate drug administration. Speech of five methadone-maintenance subjects was recorded continuously for 1 h following the injection of 0, 10, 14, or 18 mg hydromorphone. Dose-related increases in subjects' speech were observed, while no systematic changes were seen in speech of an undrugged partner. Dose-related increases

Maxine L. Stitzer; Mary E. McCaul; George E. Bigleow; Ira A. Liebson

1984-01-01

250

Post-laryngectomy speech respiration patterns  

PubMed Central

Objectives The goal of this study was to determine if speech breathing changes over time in laryngectomy patients using an electrolarynx, and to explore the potential of using respiratory signals to control an artificial voice source. Methods Respiratory patterns during serial speech tasks (counting, days of the week) with an electrolarynx were prospectively studied in six individuals across their first 1–2 years after total laryngectomy, as well as in an additional eight individuals at least 1 year post-laryngectomy using inductance plethysmography. Results In contrast to normal speech that is only produced during exhalation, all individuals were found to engage in inhalation during speech production, with those studied longitudinally displaying increased occurrences of inhalation during speech production with time post-laryngectomy. These trends appear to be stronger for individuals who used an electrolarynx as their primary means of oral communication rather than tracheoesophageal (TE) speech, possibly due to continued dependence on respiratory support for the production of TE speech. Conclusions Our results indicate that there are post-laryngectomy changes in electrolarynx speech breathing behaviors. This has implications for designing improved electrolarynx communication systems, which could use signals derived from respiratory function as one of many potential physiologically based sources for more natural control of electrolarynx speech. PMID:18771069

Stepp, Cara E.; Heaton, James T.; Hillman, Robert E.

2012-01-01

251

Auditory-Motor Processing of Speech Sounds  

PubMed Central

The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds. PMID:22581846

Mottonen, Riikka; Dutton, Rebekah; Watkins, Kate E.

2013-01-01

252

Speech evaluation for patients with cleft palate.  

PubMed

Children with cleft palate are at risk for speech problems, particularly those caused by velopharyngeal insufficiency. There may be an additional risk of speech problems caused by malocclusion. This article describes the speech evaluation for children with cleft palate and how the results of the evaluation are used to make treatment decisions. Instrumental procedures that provide objective data regarding the function of the velopharyngeal valve, and the 2 most common methods of velopharyngeal imaging, are also described. Because many readers are not familiar with phonetic symbols for speech phonemes, Standard English letters are used for clarity. PMID:24607192

Kummer, Ann W

2014-04-01

253

Statistical modeling of infant-directed versus adult-directed speech: Insights from speech recognition  

NASA Astrophysics Data System (ADS)

Studies on infant speech perception have shown that infant-directed speech (motherese) exhibits exaggerated acoustic properties, which are assumed to guide infants in the acquisition of phonemic categories. Training an automatic speech recognizer on such data might similarly lead to improved performance since classes can be expected to be more clearly separated in the training material. This claim was tested by training automatic speech recognizers on adult-directed (AD) versus infant-directed (ID) speech and testing them under identical versus mismatched conditions. 32 mother-infant conversations and 32 mother-adult conversations were used as training and test data. Both sets of conversations included a set of cue words containing unreduced vowels (e.g., sheep, boot, top, etc.), which mothers were encouraged to use repeatedly. Experiments on continuous speech recognition of the entire data set showed that recognizers trained on infant-directed speech did perform significantly better than those trained on adult-directed speech. However, isolated word recognition experiments focusing on the above-mentioned cue words showed that the drop in performance of the ID-trained speech recognizer on AD test speech was significantly smaller than vice versa, suggesting that speech with over-emphasized phonetic contrasts may indeed constitute better training material for speech recognition. [Work supported by CMBL, University of Washington.

Kirchhoff, Katrin; Schimmel, Steven

2003-10-01

254

EVALUATION OF PITCH ESTIMATION IN NOISY SPEECH FOR APPLICATION IN NON-INTRUSIVE SPEECH QUALITY ASSESSMENT  

Microsoft Academic Search

Pitch estimation has a central role in many speech pro- cessing applications. In voiced speech, pitch can be objec- tively defined as the rate of vibration of the vocal folds. How- ever, pitch is an inherently subjective quantity and cannot be directly measured from the speech signal. It is a nonlinear function of the signal's spectral and temporal energy dis-

Dushyant Sharma; A. Naylor

2009-01-01

255

PINPOINTING PRONUNCIATION ERRORS IN CHILDREN S SPEECH: EXAMINING THE ROLE OF THE SPEECH  

E-print Network

PINPOINTING PRONUNCIATION ERRORS IN CHILDREN S SPEECH: EXAMINING THE ROLE OF THE SPEECH RECOGNIZER Maxine Eskenazi12 , Gary Pelton2 1 Language Technologies Institute, 5000 Forbes Ave. Pittsburgh, PA 15213 in English. It first discusses children s speech production. Then it describes adaptation that is centered

Eskenazi, Maxine

256

Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis  

ERIC Educational Resources Information Center

Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…

Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

2009-01-01

257

Speech Content Integrity Verification Integrated with ITU G.723.1 Speech Coding  

Microsoft Academic Search

A speech content integrity verification scheme integrated with ITU G.723.1 speech coding to minimize the total com- putational cost is proposed in this research. Speech fea- tures relevant to the semantic meaning are extracted, en- crypted and attached as the header information. This scheme is not only much faster than cryptographic bitstream integrity algorithms, but also more compatible with a

Chung-ping Wu; C. C. Jay Kuo

2001-01-01

258

Speech in the Junior High School. Michigan Speech Association Curriculum Guide Series, No. 4.  

ERIC Educational Resources Information Center

Designed to provide the student with experience in oral communication, this curriculum guide presents a one-semester speech course for junior high school students with "normal" rather than defective speech. The eight units cover speech in social interaction; group discussion and business meetings; demonstrations and reports; creative dramatics;…

Herman, Deldee; Ratliffe, Sharon

259

Speech and pause characteristics following speech rate reduction in hypokinetic dysarthria  

Microsoft Academic Search

The effect of speech rate reduction on speech and pause characteristics during a reading task was examined for speakers with Parkinson's disease (PD) and a group of control speakers. Duration of utterances and characteristics of pausing (duration, interpause phrase length, and location) were determined. At habitual reading rate, subjects with PD had shorter speech duration and greater time per pause

Kathryn M. Yorkston

1996-01-01

260

Algorithms for Alignment of Co-Channel Speech Signals in Adaptive Speech Separation  

Microsoft Academic Search

This paper presents preprocessing algorithms for reducing misalignment of co-channel speech signals in order to improve the accuracy of adaptive speech separation. The use of adaptive decorrelation filtering as a front end processor in speech recognition systems improve performance when the input signals from the two channels are synchronized. However the performance of ADF is degraded when the co-channel signals

Nicholas J Cox; C. J. Page; R. F. Petrescu; S. M. Mahfuzul

2007-01-01

261

Speech and Language Skills of Parents of Children with Speech Sound Disorders  

ERIC Educational Resources Information Center

Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…

Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry

2007-01-01

262

Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech in the JUPITER Domain1  

E-print Network

Lexical Stress Modeling for Improved Speech Recognition of Spontaneous Telephone Speech an approach of using lexical stress mod- els to improve the speech recognition performance on sponta- neous with lexical stress on a large corpus of spontaneous utterances, and identified the most informative features

263

Construction of a Rated Speech Corpus of L2 Learners' Spontaneous Speech  

ERIC Educational Resources Information Center

This work reports on the construction of a rated database of spontaneous speech produced by second language (L2) learners of English. Spontaneous speech was collected from 28 L2 speakers representing six language backgrounds and five different proficiency levels. Speech was elicited using formats similar to that of the TOEFL iBT and the Speaking…

Yoon, Su-Youn; Pierce, Lisa; Huensch, Amanda; Juul, Eric; Perkins, Samantha; Sproat, Richard; Hasegawa-Johnson, Mark

2009-01-01

264

Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition  

E-print Network

as by using automatic speech recognition. Experiments on the PASCAL CHiME challenge corpus, which contains53 Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition Jort F. Gemmeke1 , Tuomas Virtanen2 , Antti Hurmalainen2 1 Department ESAT, Katholieke

Virtanen, Tuomas

265

A MANUAL ON SPEECH THERAPY FOR PARENTS' USE WITH CHILDREN WHO HAVE MINOR SPEECH PROBLEMS.  

ERIC Educational Resources Information Center

A MANUAL, TO PROVIDE PARENTS WITH AN UNDERSTANDING OF THE WORK OF THE SPEECH TEACHER AND WITH METHODS TO CORRECT THE POOR SPEECH HABITS OF THEIR CHILDREN IS PRESENTED. AREAS INCLUDE THE ORGANS OF SPEECH, WHERE THEY SHOULD BE PLACED TO MAKE EACH SOUND, AND HOW THEY SHOULD OR SHOULD NOT MOVE. EASY DIRECTIONS ARE GIVEN FOR PRODUCING THE MOST…

OGG, HELEN LOREE

266

Speech Analysis and Synthesis by Linear Prediction of the Speech Wave  

Microsoft Academic Search

We describe a procedure for efficient encoding of the speech wave by representing it in terms of time-varying parameters related to the transfer function of the vocal tract and the characteristics of the excitation. The speech wave, sampled at 10 kHz, is analyzed by predicting the present speech sample as a linear combination of the 12 previous samples. The 12

B. S. Atal; SUZANNE L. HANAUER

1971-01-01

267

L2 rated speech corpus 1 Running head: CONSTRUCTION OF A RATED L2 SPEECH CORPUS  

E-print Network

-Champaign abstract This work reports on the construction of a rated database of spontaneous speech produced by second generally. This database will be released to the public in the near future. Key-Words: rated speech corpus on the construction of a rated, spontaneous speech database of second language (L2) learners of English. The purpose

Hasegawa-Johnson, Mark

268

Combining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation  

E-print Network

- ferent talkers. The database published on the ICSLP'2006 web- site for Two-Talker Speech SeparationCombining Missing-Feature Theory, Speech Enhancement and Speaker-Dependent/-Independent Modeling for Speech Separation Ji Ming , Timothy J. Hazen , James R. Glass School of Computer Science, Queen

269

The response of the apparent receptive speech disorder of Parkinson's disease to speech therapy  

Microsoft Academic Search

Eleven patients with Parkinson's disease were tested for prosodic abnormality, on three tests of speech production (of angry, questioning, and neutral statement forms), and four tests of appreciation of the prosodic features of speech and facial expression. The tests were repeated after a control period of two weeks without speech therapy and were not substantially different. After two weeks of

S Scott; F I Caird

1984-01-01

270

Applications of broad class knowledge for noise robust speech recognition  

E-print Network

This thesis introduces a novel technique for noise robust speech recognition by first describing a speech signal through a set of broad speech units, and then conducting a more detailed analysis from these broad classes. ...

Sainath, Tara N

2009-01-01

271

Using Speech for Handwritten Mathematical Expression Recognition Disambiguation  

E-print Network

Using Speech for Handwritten Mathematical Expression Recognition Disambiguation Sofiane MEDJKOUNE1. In the proposed architecture, the transcription coming out from a speech recognition system is used system. Keywords-Mathematical expression; Handwriting recogni- tion; Speech recognition; Data Fusion; I

Paris-Sud XI, Université de

272

Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies  

E-print Network

Multichannel Speech Recognition using Distributed Microphone Signal Fusion Strategies Marek B, or squared distance, before passing the enhanced single-channel signal into the speech recognition system contained in the signals, speech recognition systems can achieve higher recognition accuracies. 1

Johnson, Michael T.

273

Robust speech recognition from binary masks Arun Narayanana)  

E-print Network

Robust speech recognition from binary masks Arun Narayanana) Department of Computer Science may provide sufficient information for human speech recognition, this letter proposes a fundamentally different approach to robust automatic speech recognition. Specifically, recognition is performed

Wang, DeLiang "Leon"

274

HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION  

E-print Network

HIGH-DIMENSIONAL LINEAR REPRESENTATIONS FOR ROBUST SPEECH RECOGNITION Matthew Ager , Zoran Cvetkovi-- acoustic waveforms, phoneme, classification, robust, speech recognition 1. INTRODUCTION Many studies have shown that automatic speech recognition (ASR) systems still lack performance when compared to human

Sollich, Peter

275

SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan  

E-print Network

SUBSPACE KERNEL DISCRIMINANT ANALYSIS FOR SPEECH RECOGNITION Hakan Erdogan Faculty of Engineering vectors. For speech recognition, N is usually prohibitively high increasing com- putational requirements version of KDA that enables its application to speech recognition, thus conveniently enabling nonlinear

Erdogan, Hakan

276

COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE  

E-print Network

COMPUTATIONAL AUDITORY SCENE ANALYSIS EXPLOITING SPEECH-RECOGNITION KNOWLEDGE Dan Ellis of high level knowledge of real-world signal structure exploited by listeners. Speech recognition, while approaches will require more radical adaptation of current speech recognition approaches. 1. INTRODUCTION

Ellis, Dan

277

Binaural model-based speech intelligibility enhancement and assessment in  

E-print Network

#12;Binaural model-based speech intelligibility enhancement and assessment in hearing aids beamforming and the effect on binaural cues and speech intelligibility . . . . . . . . . . 31 2.3.4 Cepstral smoothing of masks . . . . . . . . . . . . . . . . . . 35 2.4 Binaural CASA speech

278

Underspecified Semantic Interpretation in an EMail Speech Interface  

E-print Network

envisage speech­ based SMS messaging, text­TV speech interfacing, informa­ tion retrieval from spoken, the ver­ bosity of her commands or intonation, or possibly her speech properties such as dialect, sex, etc

Gambäck, Björn

279

Speech Planning Happens before Speech Execution: Online Reaction Time Methods in the Study of Apraxia of Speech  

ERIC Educational Resources Information Center

Purpose: The purpose of this article is to present an argument for the use of online reaction time (RT) methods to the study of apraxia of speech (AOS) and to review the existing small literature in this area and the contributions it has made to our fundamental understanding of speech planning (deficits) in AOS. Method: Following a brief…

Maas, Edwin; Mailend, Marja-Liisa

2012-01-01

280

Sorin Dusan and Lawrence R. Rabiner In Trends in Speech Technology, C. Burileanu (Ed.), Romanian Academic Publisher 21 CAN AUTOMATIC SPEECH RECOGNITION LEARN MORE FROM HUMAN SPEECH  

E-print Network

Academic Publisher 21 CAN AUTOMATIC SPEECH RECOGNITION LEARN MORE FROM HUMAN SPEECH PERCEPTION? Sorin DUSAN of progress has been made during the last two decades in automatic speech recognition (ASR), the performance in automatic speech recognition appears to have reached a plateau in the past few years. New techniques

Allen, Jont

281

Presented at the 1998 ESCA Conference on Speech Technology in Language Learning. Marholmen, Sweden Is Automatic Speech Recognition Ready for Non-Native Speech?  

E-print Network

for all sub- jects. Initial experiments suggest that the speech in this database is significantly more-English database covers two different types of speech: wide-band recordings of read speech and four channelPresented at the 1998 ESCA Conference on Speech Technology in Language Learning. Marholmen, Sweden

Byrne, William

282

77 FR 75894 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...  

Federal Register 2010, 2011, 2012, 2013

...Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities; E911 Requirements for IP-Enabled...remainder of the Petition relating to the database mapping requirements and...

2012-12-26

283

Speech perception as an active cognitive process  

PubMed Central

One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438

Heald, Shannon L. M.; Nusbaum, Howard C.

2014-01-01

284

NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION  

E-print Network

NON-NEGATIVE MATRIX FACTORIZATION BASED COMPENSATION OF MUSIC FOR AUTOMATIC SPEECH RECOGNITION automatic recognition of mixtures of speech and music. We represent magnitude spectra of noisy speech robustness, automatic speech recognition, non-negative matrix factorization, speech enhancement 1

Virtanen, Tuomas

285

Network speech systems technology program  

NASA Astrophysics Data System (ADS)

This report documents work performed during FY 1981 on the DCA-sponsored Network Speech Systems Technology Program. The two areas of work reported are: (1) communication system studies in support of the evolving Defense Switched Network (DSN) and (2) design and implementation of satellite/terrestrial interfaces for the Experimental Integrated Switched Network (EISN). The system studies focus on the development and evaluation of economical and endurable network routing procedures. Satellite/terrestrial interface development includes circuit-switched and packet-switched connections to the experimental wideband satellite network. Efforts in planning and coordination of EISN experiments are reported in detail in a separate EISN Experiment Plan.

Weinstein, C. J.

1981-09-01

286

The inhibition of stuttering via the presentation of natural speech and sinusoidal speech analogs.  

PubMed

Sensory signals containing speech or gestural (articulatory) information (e.g., choral speech) have repeatedly been found to be highly effective inhibitors of stuttering. Sine wave analogs of speech consist of a trio of changing pure tones representative of formant frequencies. They are otherwise devoid of traditional speech cues, yet have proven to evoke consistent linguistic percepts in listeners. Thus, we investigated the potency of sinusoidal speech for inhibiting stuttering. Ten adults who stutter read while listening to (a) forward-flowing natural speech; (b) forward-flowing sinusoid analogs of natural speech; (c) reversed natural speech; (d) reversed sinusoid analogs of natural speech; and (e) a continuous 1000 Hz pure tone. The levels of stuttering inhibition achieved using the sinusoidal stimuli were potent and not significantly different from those achieved using natural speech (approximately 50% in forward conditions and approximately 25% in the reversed conditions), suggesting that the patterns of undulating pure tones are sufficient to endow sinusoidal sentences with 'quasi-gestural' qualities. These data highlight the sensitivity of a specialized 'phonetic module' for extracting gestural information from sensory stimuli. Stuttering inhibition is thought to occur when perceived gestural information facilitates fluent productions via the engagement of mirror neurons (e.g., in Broca's area), which appear to play a crucial role in our ability to perceive and produce speech. PMID:16806702

Saltuklaroglu, Tim; Kalinowski, Joseph

2006-08-14

287

Method and apparatus for obtaining complete speech signals for speech recognition applications  

NASA Technical Reports Server (NTRS)

The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

2009-01-01

288

Teaching Speech to Your Language Delayed Child.  

ERIC Educational Resources Information Center

Intended for parents, the booklet focuses on the speech and language development of children with language delays. The following topics are among those considered: the parent's role in the initial diagnosis of deafness, intellectual handicap, and neurological difficulties; diagnoses and single causes of difficultiy with speech; what to say to…

Rees, Roger J.; Pryor, Jan, Ed.

1980-01-01

289

Toddlers' recognition of noise-vocoded speech.  

PubMed

Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item ("Find the cat!"). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation. PMID:23297920

Newman, Rochelle; Chatterjee, Monita

2013-01-01

290

Repeated Speech Errors: Evidence for Learning  

ERIC Educational Resources Information Center

Three experiments elicited phonological speech errors using the SLIP procedure to investigate whether there is a tendency for speech errors on specific words to reoccur, and whether this effect can be attributed to implicit learning of an incorrect mapping from lemma to phonology for that word. In Experiment 1, when speakers made a phonological…

Humphreys, Karin R.; Menzies, Heather; Lake, Johanna K.

2010-01-01

291

Speech interfaces based upon surface electromyography  

Microsoft Academic Search

This paper discusses the use of surface electromyography (EMG) to recognize and synthesize speech. The acoustic speech signal can be significantly corrupted by high noise in the environment or impeded by garments or masks. Such situations occur, for example, when firefighters wear pressurized suits with self-contained breathing apparatus (SCBA) or when astronauts perform operations in pressurized gear. In these conditions

Charles Jorgensen; Sorin Dusan

2010-01-01

292

Speech after Mao: Literature and Belonging  

ERIC Educational Resources Information Center

This dissertation aims to understand the apparent failure of speech in post-Mao literature to fulfill its conventional functions of representation and communication. In order to understand this pattern, I begin by looking back on the utility of speech for nation-building in modern China. In addition to literary analysis of key authors and works,…

Hsieh, Victoria Linda

2012-01-01

293

Philosophy of Research in Motor Speech Disorders  

ERIC Educational Resources Information Center

The primary objective of this position paper is to assess the theoretical and empirical support that exists for the Mayo Clinic view of motor speech disorders in general, and for oromotor, nonverbal tasks as a window to speech production processes in particular. Literature both in support of and against the Mayo clinic view and the associated use…

Weismer, Gary

2006-01-01

294

The evolution of speech: vision, rhythm, cooperation  

E-print Network

University, Princeton, NJ 08544, USA A full account of human speech evolution must consider its multisensory [9]). Each of these factors may have played an important role in the evolution of human communication in a piecemeal fashion. As such, determining the many substrates required for the evolution of human speech

Ghazanfar, Asif

295

Speech Fluency in Fragile X Syndrome  

ERIC Educational Resources Information Center

The present study investigated the dysfluencies in the speech of nine French speaking individuals with fragile X syndrome. Type, number, and loci of dysfluencies were analysed. The study confirms that dysfluencies are a common feature of the speech of individuals with fragile X syndrome but also indicates that the dysfluency pattern displayed is…

Van Borsel, John; Dor, Orianne; Rondal, Jean

2008-01-01

296

Visualizations: Speech, Language & Autistic Spectrum Disorder  

E-print Network

Visualizations: Speech, Language & Autistic Spectrum Disorder Abstract Without speech, we can have children, including those with Autistic Spectrum Disorder (ASD) have explicit difficulty developing at Urbana Champaign 901 South Sixth St Champaign, IL 61820 USA mcoletto@uiuc.edu #12;Introduction Autistic

Karahalios, Karrie G.

297

Automatic phonetic segmentation of Malay speech database  

Microsoft Academic Search

This paper deals with automatic phonetic segmentation for Malay continuous speech. This study investigates fast and automatic phone segmentation in preparing database for Malay concatenative Text-to-Speech (TTS) systems. A 35 Malay phone set has been chosen, which is suitable for building Malay TTS. The segmentation experiment is based on this phone set. HMM based segmentation approach which uses Viterbi force

Chee-Ming Ting; Sh-Hussain Salleh; Tian-Swee Tan; A. K. Ariff

2007-01-01

298

Enhancing Speech Discrimination through Stimulus Repetition  

ERIC Educational Resources Information Center

Purpose: To evaluate the effects of sequential and alternating repetition on speech-sound discrimination. Method: Typically hearing adults' discrimination of 3 pairs of speech-sound contrasts was assessed at 3 signal-to-noise ratios using the change/no-change procedure. On change trials, the standard and comparison stimuli differ; on no-change…

Holt, Rachael Frush

2011-01-01

299

Speech recognition with amplitude and frequency modulations  

Microsoft Academic Search

Amplitude modulation (AM) and frequency modulation (FM) are commonly used in communication, but their relative contributions to speech recognition have not been fully explored. To bridge this gap, we derived slowly varying AM and FM from speech sounds and conducted listening tests using stimuli with different modulations in normal-hearing and cochlear-implant subjects. We found that although AM from a limited

Fan-Gang Zeng; Kaibao Nie; Ginger S. Stickney; Ying-Yee Kong; Michael Vongphoe; Ashish Bhargave; Chaogang Wei; Keli Cao

2005-01-01

300

CLEFT PALATE. FOUNDATIONS OF SPEECH PATHOLOGY SERIES.  

ERIC Educational Resources Information Center

DESIGNED TO PROVIDE AN ESSENTIAL CORE OF INFORMATION, THIS BOOK TREATS NORMAL AND ABNORMAL DEVELOPMENT, STRUCTURE, AND FUNCTION OF THE LIPS AND PALATE AND THEIR RELATIONSHIPS TO CLEFT LIP AND CLEFT PALATE SPEECH. PROBLEMS OF PERSONAL AND SOCIAL ADJUSTMENT, HEARING, AND SPEECH IN CLEFT LIP OR CLEFT PALATE INDIVIDUALS ARE DISCUSSED. NASAL RESONANCE…

RUTHERFORD, DAVID; WESTLAKE, HAROLD

301

Speech Intelligibility in Severe Adductor Spasmodic Dysphonia  

ERIC Educational Resources Information Center

This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…

Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.

2004-01-01

302

Hate Speech and the First Amendment  

Microsoft Academic Search

A cornerstone of democracy is the First Amendment's protection of free speech. The founding fathers saw this as contributing to democratic government. Ironically, contemporary free speech protects groups such as Nazis, White and Black supremacists, pornographers, gangster rappers, TV violence, and gratuitous film profiteers; in short, these are agents of disorder, and have practically nothing of discourse value. This article

MICHAEL ISRAEL

1999-01-01

303

Hate Speech: A Call to Principles.  

ERIC Educational Resources Information Center

Reviews the history of First Amendment rulings as they relate to speech codes and of other regulations directed at the content of speech. A case study, based on an experience at Trenton State College, details the legal constraints, principles, and practices that Student Affairs administrators should be aware of regarding such situations.…

Klepper, William M.; Bakken, Timothy

1997-01-01

304

Fighting Words. The Politics of Hateful Speech.  

ERIC Educational Resources Information Center

This book explores issues typified by a series of hateful speech events at Kean College (New Jersey) and on other U.S. campuses in the early 1990s, by examining the dichotomies that exist between the First and the Fourteenth Amendments and between civil liberties and civil rights, and by contrasting the values of free speech and academic freedom…

Marcus, Laurence R.

305

Only Speech Codes Should Be Censored  

ERIC Educational Resources Information Center

In this article, the author discusses the enforcement of "hate speech" codes and confirms research that considers why U.S. colleges and universities continue to promulgate student disciplinary rules prohibiting expression that "subordinates" others or is "demeaning, offensive, or hateful." Such continued adherence to speech codes is by now…

Pavela, Gary

2006-01-01

306

Speech Recognition with Primarily Temporal Cues  

Microsoft Academic Search

Nearly perfect speech recognition was observed under conditions of greatly reduced spectral information. Temporal envelopes of speech were extracted from broad frequency bands and were used to modulate noises of the same bandwidths. This manipulation preserved temporal envelope cues in each band but restricted the listener to severely degraded information on the distribution of spectral energy. The identification of consonants,

Robert V. Shannon; Fan-Gang Zeng; Vivek Kamath; John Wygonski; Michael Ekelid

1995-01-01

307

Modulation Features for Speech and Music Classification  

Microsoft Academic Search

Many attempts to accurately classify speech and music have been investigated over the years. This paper presents modulation features for effective speech and music classification. A Gammatone filter bank is used as a front-end for this classification system, where amplitude modulation (AM) and frequency modulation (FM) features are extracted from the critical band outputs of the Gammatone filters. In addition,

Omer Mohsin Mubarak; Eliathamby Ambikairajah; Julien Epps; Teddy Surya Gunawan

2006-01-01

308

TOWARDS AUTOMATIC SPEECH RECOGNITION IN ADVERSE ENVIRONMENTS  

Microsoft Academic Search

Some of our research efforts towards building Automatic Speech Recognition (ASR) systems designed to work in real-world conditions are presented. The methods we pro- pose exhibit improved performance in noisy environments and offer robustness against speaker variability. Advanced nonlinear signal processing techniques, modulation- and chaotic-based, are utilized for auditory feature extraction. The auditory features are complemented with visual speech cues

D. Dimitriadis; N. Katsamanis; P. Maragos; G. Papandreou; V. Pitsikalis

309

Localization of Sublexical Speech Perception Components  

ERIC Educational Resources Information Center

Models of speech perception are in general agreement with respect to the major cortical regions involved, but lack precision with regard to localization and lateralization of processing units. To refine these models we conducted two Activation Likelihood Estimation (ALE) meta-analyses of the neuroimaging literature on sublexical speech perception.…

Turkeltaub, Peter E.; Coslett, H. Branch

2010-01-01

310

Speech-Language-Hearing Department of Communication  

E-print Network

-sound disorders, early childhood language disorders, and school age language- literacy disorders. Prof. PlanteUNH Speech-Language-Hearing Center Department of Communication Sciences & Disorders Meet faculty at the UNH Speech- Language-Hearing Center. She considers herself a generalist, with a particular

New Hampshire, University of

311

The Neural Substrates of Infant Speech Perception  

ERIC Educational Resources Information Center

Infants often pay special attention to speech sounds, and they appear to detect key features of these sounds. To investigate the neural foundation of speech perception in infants, we measured cortical activation using near-infrared spectroscopy. We presented the following three types of auditory stimuli while 3-month-old infants watched a silent…

Homae, Fumitaka; Watanabe, Hama; Taga, Gentaro

2014-01-01

312

"EIGENLIPS" FOR ROBUST SPEECH RECOGNITION Christoph Bregler ,  

E-print Network

the performance of a hybrid con- nectionist speech recognition system by incorporating vi- sual information about information) over just the acoustic system in the presence of additive noise and crosstalk. 1. INTRODUCTION and teeth positions). In fact it is well known that human speech perception is inherently bi-modal as well

Bregler, Christoph

313

Pronunciation Modeling for Spontaneous Mandarin Speech Recognition  

Microsoft Academic Search

Pronunciation variations in spontaneous speech can be classified into complete changes and partial changes. A complete change is the replacement of a canonical phoneme by another alternative phone, such as 'b' being pronounced as 'p'. Partial changes are variations within the phoneme such as nasalization, centralization and voiced. Most current work in pronunciation modeling for spontaneous Mandarin speech remains at

Yi Liu; Pascale Fung

2004-01-01

314

Building an Interdepartmental Major in Speech Communication.  

ERIC Educational Resources Information Center

This paper describes a popular and innovative major program of study in speech communication at St. Cloud University in Minnesota: the Speech Communication Interdepartmental Major. The paper provides background on the program, discusses overall program requirements, presents sample student options, identifies ingredients for program success,…

Litterst, Judith K.

315

Toddlers' recognition of noise-vocoded speech  

PubMed Central

Despite their remarkable clinical success, cochlear-implant listeners today still receive spectrally degraded information. Much research has examined normally hearing adult listeners' ability to interpret spectrally degraded signals, primarily using noise-vocoded speech to simulate cochlear implant processing. Far less research has explored infants' and toddlers' ability to interpret spectrally degraded signals, despite the fact that children in this age range are frequently implanted. This study examines 27-month-old typically developing toddlers' recognition of noise-vocoded speech in a language-guided looking study. Children saw two images on each trial and heard a voice instructing them to look at one item (“Find the cat!”). Full-spectrum sentences or their noise-vocoded versions were presented with varying numbers of spectral channels. Toddlers showed equivalent proportions of looking to the target object with full-speech and 24- or 8-channel noise-vocoded speech; they failed to look appropriately with 2-channel noise-vocoded speech and showed variable performance with 4-channel noise-vocoded speech. Despite accurate looking performance for speech with at least eight channels, children were slower to respond appropriately as the number of channels decreased. These results indicate that 2-yr-olds have developed the ability to interpret vocoded speech, even without practice, but that doing so requires additional processing. These findings have important implications for pediatric cochlear implantation. PMID:23297920

Newman, Rochelle; Chatterjee, Monita

2013-01-01

316

Tampa Bay International Business Summit Keynote Speech  

NASA Technical Reports Server (NTRS)

A keynote speech outlining the importance of collaboration and diversity in the workplace. The 20-minute speech describes NASA's challenges and accomplishments over the years and what lies ahead. Topics include: diversity and inclusion principles, international cooperation, Kennedy Space Center planning and development, opportunities for cooperation, and NASA's vision for exploration.

Clary, Christina

2011-01-01

317

Speech Recognition Thresholds for Multilingual Populations.  

ERIC Educational Resources Information Center

This article traces the development of speech audiometry in the United States and reports on the current status, focusing on the needs of a multilingual population in terms of measuring speech recognition threshold (SRT). It also discusses sociolinguistic considerations, alternative SRT stimuli for second language learners, and research on using…

Ramkissoon, Ishara

2001-01-01

318

How do humans process and recognize speech?  

Microsoft Academic Search

Until the performance of automatic speech recognition (ASR) hardware surpasses human performance in accuracy and robustness, we stand to gain by understanding the basic principles behind human speech recognition (HSR). This problem was studied exhaustively at Bell Labs between the years of 1918 and 1950 by Harvey Fletcher and his colleagues. The motivation for these studies was to quantify the

Jont B. Allen

1994-01-01

319

Speech recognition in noisy environments: A survey  

Microsoft Academic Search

The performance levels of most current speech recognizers degrade significantly when environmental noise occurs during use. Such performance degradation is mainly caused by mismatches in training and operating environments. During recent years much effort has been directed to reducing this mismatch. This paper surveys research results in the area of digital techniques for single microphone noisy speech recognition classified in

Yifan Gong

1995-01-01

320

Speech recognition by machine: A review  

Microsoft Academic Search

This paper provides a review of recent developments in speech recognition research. The concept of sources of knowledge is introduced and the use of knowledge to generate and verify hypotheses is discussed. The difficulties that arise in the construction of different types of speech recognition systems are discussed and the structure and performance of several such systems is presented. Aspects

D. R. Reddy

1976-01-01

321

Visualauditory integration during speech imitation in autism  

E-print Network

Visual­auditory integration during speech imitation in autism Justin H.G. Williamsa,* , Dominic W. All rights reserved. doi:10.1016/j.ridd.2004.01.008 #12;1. Introduction 1.1. Imitation, autism or stereotyped speech that contributes to the diagnosis within standard research instruments (Lord, Rutter & Le

Massaro, Dominic

322

Speech production after glossectomy: methodological aspects  

E-print Network

available for the patients: chemotherapy, radiation therapy and surgery. Surgical treatments of tongue) and its connections to the tongue tissues (Konstantinovic & Dimic, 1998; Bressmann, 2004, 2007). Radiation the quality of speech after clinical treatments of tongue cancer. For speech therapists, quantitative methods

Paris-Sud XI, Université de

323

Articulatory Features for Robust Visual Speech Recognition  

E-print Network

the performance of speech recognition systems in noisy acoustic environments. However, most audio-visual speech extraction, articulatory features, support vector machines. 1. INTRODUCTION A major weakness of current sources of linguistic information, including nonacoustic sensors [24], to provide greater redundancy

324

A Speech After a Circle Dance  

E-print Network

After a Circle Dance Translation of title Description (to be used in archive entry) A lca provides a speech given upon the completion of a circle dance. Genre or type (i.e. epic, song, ritual) Speech Name of recorder (if different from...

Bkra shis bzang po

2009-01-01

325

An Acquired Deficit of Audiovisual Speech Processing  

ERIC Educational Resources Information Center

We report a 53-year-old patient (AWF) who has an acquired deficit of audiovisual speech integration, characterized by a perceived temporal mismatch between speech sounds and the sight of moving lips. AWF was less accurate on an auditory digit span task with vision of a speaker's face as compared to a condition in which no visual information from…

Hamilton, Roy H.; Shenton, Jeffrey T.; Coslett, H. Branch

2006-01-01

326

Hypnosis and the Reduction of Speech Anxiety.  

ERIC Educational Resources Information Center

The purposes of this paper are (1) to review the background and nature of hypnosis, (2) to synthesize research on hypnosis related to speech communication, and (3) to delineate and compare two potential techniques for reducing speech anxiety--hypnosis and systematic desensitization. Hypnosis has been defined as a mental state characterised by…

Barker, Larry L.; And Others

327

Speech Processing with VRIO and Linux  

Microsoft Academic Search

Speech processing is often anticipated as the future human-computer interface. Previ- ously, this tendency has been supported by science fiction authors, who describe the wonderful perspectives of speech processing for the user. Yet, the possibility to formu- late and interact with the computer via natural language seems finally mature enough for novel applications. The VRIO appliance is an approach to

Dieter Kranzlmüller; Ingo Hackl

328

Perception of Speech in Noise: Neural Correlates  

Microsoft Academic Search

The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F0) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the

Judy H. Song; Erika Skoe; Karen Banai; Nina Kraus

2011-01-01

329

The Learning of Complex Speech Act Behaviour.  

ERIC Educational Resources Information Center

Pre- and posttraining measurement of adult English-as-a-Second-Language learners' (N=18) apology speech act behavior found no clear-cut quantitative improvement after training, although there was an obvious qualitative approximation of native-like speech act behavior in terms of types of intensification and downgrading, choice of strategy, and…

Olshtain, Elite; Cohen, Andrew

1990-01-01

330

Creation of two children's speech databases  

Microsoft Academic Search

Two sets of speech recordings were made from children talkers ranging in age from 5 to 18 years. One set was recorded via telephone channels (TEL) and the other using high-fidelity recording equipment (MIC). Special considerations and techniques required for the recording of speech from children are discussed. Also presented are (1) a description of the recording environment including ambient

J. D. Miller; Sungbok Lee; R. M. Uchanski; A. F. Heidbreder; B. B. Richman; J. Tadlock

1996-01-01

331

The Development of the Otago Speech Database  

Microsoft Academic Search

A collection of digits and words, spoken with a New Zealand English accent, has been systematically and formally collected. This collection along with the beginning and end points of the realised phonemes from within the words, comprise the Otago Speech Corpora. A relational database management system has been developed to house the speech data. This system provides much more usability,

S. J. Sinclair; C. I. Watson

1995-01-01

332

The Effects of TV on Speech Education  

ERIC Educational Resources Information Center

Generally, the speaking aspect is not properly debated when discussing the positive and negative effects of television (TV), especially on children. So, to highlight this point, this study was first initialized by asking the question: "What are the effects of TV on speech?" and secondly, to transform the effects that TV has on speech in a…

Gocen, Gokcen; Okur, Alpaslan

2013-01-01

333

Scaffolded-Language Intervention: Speech Production Outcomes  

ERIC Educational Resources Information Center

This study investigated the effects of a scaffolded-language intervention using cloze procedures, semantically contingent expansions, contrastive word pairs, and direct models on speech abilities in two preschoolers with speech and language impairment speaking African American English. Effects of the lexical and phonological characteristics (i.e.,…

Bellon-Harn, Monica L.; Credeur-Pampolina, Maggie E.; LeBoeuf, Lexie

2013-01-01

334

SPEECH LEVELS IN VARIOUS NOISE ENVIRONMENTS  

EPA Science Inventory

The goal of this study was to determine average speech levels used by people when conversing in different levels of background noise. The non-laboratory environments where speech was recorded were: high school classrooms, homes, hospitals, department stores, trains and commercial...

335

Performing speech recognition research with hypercard  

NASA Technical Reports Server (NTRS)

The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.

Shepherd, Chip

1993-01-01

336

Pauses in Deceptive Speech Stefan Benus*  

E-print Network

of pauses correlates more with truthful than with deceptive speech, and that prosodic features extractedPauses in Deceptive Speech Stefan Benus* , Frank Enos* , Julia Hirschberg* & Elizabeth Shriberg to investigate the relationship between the distributional and prosodic characteristics of silent and filled

Hirschberg, Julia

337

| Speech Home | Java Speech API | Java Speech Technologies | SpeechActs | Publications | Staff | Published in CHI '95 Proceedings, Conference on Human Factors in Computing Systems, Denver, CO,  

E-print Network

. Further, conversational interfaces are young, and transferring design principles from other media listed below. nicole.yankelovich@east.sun.com gina@ai.mit.edu groucho@media.mit.edu Postscript Version - 8 Pages ABSTRACT SpeechActs is an experimental conversational speech system. Experience

Levow, Gina-Anne

338

The Effect of "Developmental Speech-Language Training through Music" on Speech Production in Children with Autism Spectrum Disorders.  

E-print Network

??Children with Autism Spectrum Disorders demonstrate deficits in speech and language, with the most outstanding speech impairments being in comprehension, semantics, prosody, and pragmatics. Perception… (more)

Lim, Hayoung Audrey

2007-01-01

339

Improving robustness of speech recognition systems  

NASA Astrophysics Data System (ADS)

Current Automatic Speech Recognition (ASR) systems fail to perform nearly as good as human speech recognition performance due to their lack of robustness against speech variability and noise contamination. The goal of this dissertation is to investigate these critical robustness issues, put forth different ways to address them and finally present an ASR architecture based upon these robustness criteria. Acoustic variations adversely affect the performance of current phone-based ASR systems, in which speech is modeled as 'beads-on-a-string', where the beads are the individual phone units. While phone units are distinctive in cognitive domain, they are varying in the physical domain and their variation occurs due to a combination of factors including speech style, speaking rate etc.; a phenomenon commonly known as 'coarticulation'. Traditional ASR systems address such coarticulatory variations by using contextualized phone-units such as triphones. Articulatory phonology accounts for coarticulatory variations by modeling speech as a constellation of constricting actions known as articulatory gestures. In such a framework, speech variations such as coarticulation and lenition are accounted for by gestural overlap in time and gestural reduction in space. To realize a gesture-based ASR system, articulatory gestures have to be inferred from the acoustic signal. At the initial stage of this research an initial study was performed using synthetically generated speech to obtain a proof-of-concept that articulatory gestures can indeed be recognized from the speech signal. It was observed that having vocal tract constriction trajectories (TVs) as intermediate representation facilitated the gesture recognition task from the speech signal. Presently no natural speech database contains articulatory gesture annotation; hence an automated iterative time-warping architecture is proposed that can annotate any natural speech database with articulatory gestures and TVs. Two natural speech databases: X-ray microbeam and Aurora-2 were annotated, where the former was used to train a TV-estimator and the latter was used to train a Dynamic Bayesian Network (DBN) based ASR architecture. The DBN architecture used two sets of observation: (a) acoustic features in the form of mel-frequency cepstral coefficients (MFCCs) and (b) TVs (estimated from the acoustic speech signal). In this setup the articulatory gestures were modeled as hidden random variables, hence eliminating the necessity for explicit gesture recognition. Word recognition results using the DBN architecture indicate that articulatory representations not only can help to account for coarticulatory variations but can also significantly improve the noise robustness of ASR system.

Mitra, Vikramjit

340

Predicting the intelligibility of vocoded speech  

PubMed Central

Objectives The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different SNR levels (-5, 0 and 5 dB) and two types of maskers (steady-state noise and two-talker). Tone-vocoder simulations were used as well as simulations of combined electric-acoustic stimulation (EAS). The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech-transmission index (STI) and articulation index (AI) based measures, as well as distortions in hearing aids (e.g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal envelope distortions, such as those based on the speech-transmission index, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r=0.9-0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100 Hz) were used. The performance of these measures remained high in both steady-noise and fluctuating masker conditions. The correlations with conditions involving tone-vocoded speech were found to be a bit higher than the correlations with conditions involving EAS-vocoded speech. Conclusions The present study demonstrated that some of the speech intelligibility indices that have been found previously to correlate highly with wideband speech can also be used to predict the intelligibility of vocoded speech. Both the coherence-based and STI-based measures have been found to be good measures for modeling the intelligibility of vocoded speech. The highest correlation (r=0.96) was obtained with a derived coherence measure that placed more emphasis on information contained in vowel/consonant spectral transitions and less emphasis on information contained in steady sonorant segments. High (100 Hz) modulation rates were found to be necessary in the implementation of the STI-based measures for better modeling of the intelligibility of vocoded speech. We believe that the difference in modulation rates needed for modeling the intelligibility of wideband versus vocoded speech can be attributed to the increased importance of higher modulation rates in situations where the amount of spectral information available to the listeners is limited (8 channels in our study). Unlike the traditional STI method which has been found to perform poorly in terms of predicting the intelligibility of processed speech wherein non-linear operations are involved, the STI-based measure used in the present study has been found to perform quite well. In summary, the present study took the first step in modeling the intelligibility of vocoded speech. Access to such intelligibility measures is of high significance as they can be used to guide the development of new speech coding algorithms for cochlear implants. PMID:21206363

Chen, Fei; Loizou, Philipos C.

2010-01-01

341

Integration of speech with natural language understanding.  

PubMed Central

The integration of speech recognition with natural language understanding raises issues of how to adapt natural language processing to the characteristics of spoken language; how to cope with errorful recognition output, including the use of natural language information to reduce recognition errors; and how to use information from the speech signal, beyond just the sequence of words, as an aid to understanding. This paper reviews current research addressing these questions in the Spoken Language Program sponsored by the Advanced Research Projects Agency (ARPA). I begin by reviewing some of the ways that spontaneous spoken language differs from standard written language and discuss methods of coping with the difficulties of spontaneous speech. I then look at how systems cope with errors in speech recognition and at attempts to use natural language information to reduce recognition errors. Finally, I discuss how prosodic information in the speech signal might be used to improve understanding. PMID:7479813

Moore, R C

1995-01-01

342

Speech Cues Contribute to Audiovisual Spatial Integration  

PubMed Central

Speech is the most important form of human communication but ambient sounds and competing talkers often degrade its acoustics. Fortunately the brain can use visual information, especially its highly precise spatial information, to improve speech comprehension in noisy environments. Previous studies have demonstrated that audiovisual integration depends strongly on spatiotemporal factors. However, some integrative phenomena such as McGurk interference persist even with gross spatial disparities, suggesting that spatial alignment is not necessary for robust integration of audiovisual place-of-articulation cues. It is therefore unclear how speech-cues interact with audiovisual spatial integration mechanisms. Here, we combine two well established psychophysical phenomena, the McGurk effect and the ventriloquist's illusion, to explore this dependency. Our results demonstrate that conflicting spatial cues may not interfere with audiovisual integration of speech, but conflicting speech-cues can impede integration in space. This suggests a direct but asymmetrical influence between ventral ‘what’ and dorsal ‘where’ pathways. PMID:21909378

Bishop, Christopher W.; Miller, Lee M.

2011-01-01

343

Voice Quality Modelling for Expressive Speech Synthesis  

PubMed Central

This paper presents the perceptual experiments that were carried out in order to validate the methodology of transforming expressive speech styles using voice quality (VoQ) parameters modelling, along with the well-known prosody (F0, duration, and energy), from a neutral style into a number of expressive ones. The main goal was to validate the usefulness of VoQ in the enhancement of expressive synthetic speech in terms of speech quality and style identification. A harmonic plus noise model (HNM) was used to modify VoQ and prosodic parameters that were extracted from an expressive speech corpus. Perception test results indicated the improvement of obtained expressive speech styles using VoQ modelling along with prosodic characteristics. PMID:24587738

Socoro, Joan Claudi

2014-01-01

344

Speech entrainment enables patients with Broca's aphasia to produce fluent speech  

PubMed Central

A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889

Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris

2012-01-01

345

Inconsistency of speech in children with childhood apraxia of speech, phonological disorders, and typical speech  

NASA Astrophysics Data System (ADS)

There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were moderately-highly correlated, even the most highly-related segmental and lexical measures agreed on only 76% of classifications (i.e., to CAS and PD). Finally, VOT analyses revealed that CAS utilized a distinct distribution pattern relative to PD and TD. Discussion frames the current findings within a profile of CAS and provides a validated list of criteria for the differential diagnosis of CAS and PD.

Iuzzini, Jenya

346

Using the short-time speech transmission index to predict speech reception thresholds in fluctuating noise.  

PubMed

The Speech Transmission Index (STI) predicts the intelligibility of speech degraded by noise and reverberation. Recently, Payton and Shrestha [J. Acoust. Soc. Am. 134, 3818-3827 (2013)] reported on the ability of a short-time speech-based STI (ssSTI) to predict the intelligibility of speech in the presence of fluctuating noise using analysis windows shorter than 1 s. They found the ssSTI highly correlated with theoretical STI calculations using windows as short as 0.3 s. In the current work, extended versions of the ssSTI were investigated for their ability to improve speech intelligibility prediction in the presence of fluctuating noise; a condition for which the long-term STI incorrectly predicts the same intelligibility as for stationary noise. No STI metric predicts a normal-hearing listener's improved ability to perceive speech in the presence of fluctuating noise as compared to stationary noise at the same signal-to-noise ratio. The investigated technique used window lengths that varied with octave band, based on human auditory temporal resolution as in the Extended Speech Intelligibility Index [Rhebergen and Versfeld, J. Acoust. Soc. Am. 117, 2181-2192 (2005)]. An extended sSTI using speech-shaped noise instead of speech as a probe predicted published speech reception thresholds for a variety of conditions. PMID:25235329

Ferreira, Matthew; Payton, Karen

2014-04-01

347

Statistical properties of infant-directed versus adult-directed speech: Insights from speech recognition  

NASA Astrophysics Data System (ADS)

Previous studies have shown that infant-directed speech (`motherese') exhibits overemphasized acoustic properties which may facilitate the acquisition of phonetic categories by infant learners. It has been suggested that the use of infant-directed data for training automatic speech recognition systems might also enhance the automatic learning and discrimination of phonetic categories. This study investigates the properties of infant-directed vs. adult-directed speech from the point of view of the statistical pattern recognition paradigm underlying automatic speech recognition. Isolated-word speech recognizers were trained on adult-directed vs. infant-directed data sets and were tested on both matched and mismatched data. Results show that recognizers trained on infant-directed speech did not always exhibit better recognition performance; however, their relative loss in performance on mismatched data was significantly less severe than that of recognizers trained on adult-directed speech and presented with infant-directed test data. An analysis of the statistical distributions of a subset of phonetic classes in both data sets showed that this pattern is caused by larger class overlaps in infant-directed speech. This finding has implications for both automatic speech recognition and theories of infant speech perception. .

Kirchhoff, Katrin; Schimmel, Steven

2005-04-01

348

Speech evoked auditory brainstem responses: a new tool to study brainstem encoding of speech sounds.  

PubMed

The neural encoding of speech sound begins in the auditory nerve and travels to the auditory brainstem. Non speech stimuli such as click or tone bursts stimulus are used to check the auditory neural integrity routinely. Recently Speech evoked Auditory Brainstem measures (ABR) are being used as a tool to study the brainstem processing of Speech sounds. The aim of the study was to study the Speech evoked ABR to a consonant vowel (CV) stimulus. 30 subjects with normal hearing participated for the study. Speech evoked ABR were measured to a CV stimulus in all the participants. The speech stimulus used was a 40 ms synthesized/da/sound. The consonant and vowel portion was analysed separately. Speech evoked ABR was present in all the normal hearing subjects. The consonant portion of the stimulus elicited peak V in response waveform. Response to the vowel portion elicited a frequency following response (FFR). The FFR further showed a coding of the fundamental frequency (F0) and the first formant frequency (F1). The results of the present study throw light on the processing of speech in brainstem. The understanding of speech evoked ABR has other applications both in research as well as in clinical purposes. Such understanding is specially important if one is interested in studying the central auditory system function. PMID:22319700

Sinha, Sujeet Kumar; Basavaraj, Vijayalakshmi

2010-10-01

349

An articulatorily constrained, maximum entropy approach to speech recognition and speech coding  

SciTech Connect

Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values are constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.

Hogden, J.

1996-12-31

350

Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance  

NASA Astrophysics Data System (ADS)

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincaré section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincaré sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincaré sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features.

Jafari, Ayyoob; Almasganj, Farshad; Bidhendi, Maryam Nabi

2010-09-01

351

An introduction to the assessment of intelligibility of tracheoesophageal speech  

Microsoft Academic Search

In cases of laryngeal cancer, it is sometimes necessary to perform a total laryngectomy. This procedure changes the anatomy and physiology of the vocal tract, with the most noticeable effect on speech. By applying a voice prosthesis, enabling the patient to use tracheoesophageal speech, speech is of better quality than with esophageal or electrolarynx speech, but still very deviant from

Petra Jongmans; As van C. J; Louis Pols; Frans Hilgers

2003-01-01

352

Correlation study of predictive and descriptive metrics of speech intelligibility  

Microsoft Academic Search

There exists a wide range of speech-intelligibility metrics, each of which is designed to encapsulate a different aspect of room acoustics that relates to speech intelligibility. This study reviews the different definitions of and correlations between various proposed speech intelligibility measures. Speech Intelligibility metrics can be grouped by two main uses: prediction of designed rooms and description of existing rooms.

Abigail Stefaniw; Yasushi Shimizu; Dana Smith

2002-01-01

353

A joint acoustic and phonological approach to speech intelligibility assessment  

Microsoft Academic Search

While current models of speech intelligibility rely on intricate acoustic analyses of speech attributes, they are limited by the lack of any linguistic information; hence failing to capture natural variability of speech sounds and confining their applicability to average intelligibility assessments. Another important limitation is that the existing models rely on the use of reference clean speech templates (or average

Sridhar Krishna Nemala; Mounya Elhilali

2010-01-01

354

Fantasy Play in Preschool Classrooms: Age Differences in Private Speech.  

ERIC Educational Resources Information Center

Private speech is speech overtly directed to a young child's self and not directly spoken to another listener. Private speech develops differently during fantasy play than constructive play. This study examined age differences in the amount of fantasy play in the preschool classroom and in the amount and type of private speech that occurs during…

Kirby, Kathleen Campano

355

Digital Signal Processing of Speech for the Hearing Impaired  

Microsoft Academic Search

This paper presents some speech processing algorithms developed for hearing aid applications. However these algorithms are also applicable for other speech and audio applications. Considering that the basic properties of speech remain invariant across applications, it is logical to consider these algorithms under the broader umbrella of 'unified theory of speech.' These algorithms have been implemented on Texas Instruments (TI)

N. Magotra; F. Livingston; S. Savadatti; S. Kamath

356

Deep Learning in Speech Synthesis August 31st, 2013  

E-print Network

Deep Learning in Speech Synthesis Heiga Zen Google August 31st, 2013 #12;Outline Background Deep Learning Deep Learning in Speech Synthesis Motivation Deep learning-based approaches DNN-based statistical-to-speech synthesis (TTS) Text (discrete symbol sequence) Speech (continuous time series) Heiga Zen Deep Learning

Tomkins, Andrew

357

Speech Sound Disorders in a Community Study of Preschool Children  

ERIC Educational Resources Information Center

Purpose: To undertake a community (nonclinical) study to describe the speech of preschool children who had been identified by parents/teachers as having difficulties "talking and making speech sounds" and compare the speech characteristics of those who had and had not accessed the services of a speech-language pathologist (SLP). Method:…

McLeod, Sharynne; Harrison, Linda J.; McAllister, Lindy; McCormack, Jane

2013-01-01

358

Sticks and Stones: The Nexis Between Hate Speech and Violence  

Microsoft Academic Search

The panelists discussed hate speech and how it relates to bias crimes. Examples were given of hate speech experienced by people of LGBT and HIV status. Panelists discussed legislative activity in different states, how hate crime legislation works, Supreme Court speech jurisprudence, and about pending Congressional legislation that sought to include sexual orientation. Finally, the panel focused on hate speech

Jack Chen; Laura Edidin; Brian Levin; Jack Battaglia

1999-01-01

359

Decoding speech in the presence of other sources  

Microsoft Academic Search

The statistical theory of speech recognition introduced several decades ago has brought about low word error rates for clean speech. However, it has been less suc- cessful in noisy conditions. Since extraneous acoustic sources are present in virtually all everyday speech communication conditions, the failure of the speech recognition model to take noise into account is perhaps the most serious

J. P. Barker; M. P. Cooke; Daniel P. W. Ellis

2005-01-01

360

Optimizing acoustical conditions for speech intelligibility in classrooms  

Microsoft Academic Search

High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation

Wonyoung Yang

2006-01-01

361

Excitable Speech: Judith Butler, Mae West, and Sexual Innuendo  

Microsoft Academic Search

Working with Judith Butler's Excitable Speech: A Politics of the Performative, this essay pursues a series of questions on the performativity of speech acts, using sexual innuendo as an example. As performed by the provocative American playwright and classic Hollywood film star, Mae West, sexual innuendo provides an instance of “excitable speech” that allows for the exploration of speech as

Angela Failler

2001-01-01

362

University of Colorado at Boulder SPEECH LANGUAGE HEARING CENTER  

E-print Network

University of Colorado at Boulder SPEECH LANGUAGE HEARING CENTER Summer 2012 PRESCHOOL-ADOLESCENT SPEECH-LANGUAGE PREVENTION & INTERVENTION SERVICES SPEECH Speech Street: M-Th June 11 th ­ June 28 th 9:15-10:30 or 10:00-11:15. ($250 tuition) This group is for preschoolers (age 3

Vasilyev, Oleg V.

363

Contemporary Reflections on Speech-Based Language Learning  

ERIC Educational Resources Information Center

In "The Relation of Language to Mental Development and of Speech to Language Teaching," S.G. Davidson displayed several timeless insights into the role of speech in developing language and reasons for using speech as the basis for instruction for children who are deaf and hard of hearing. His understanding that speech includes more than merely…

Gustafson, Marianne

2009-01-01

364

Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks  

E-print Network

Hidden Feature Models for Speech Recognition Using Dynamic Bayesian Networks Karen Livescu, James features, such as articulatory or other phonological features, for auto- matic speech recognition The majority of current speech recognition research assumes a model of speech consisting of a stream

Noble, William Stafford

365

AUDIO SOURCE SEPARATION WITH ONE SENSOR FOR ROBUST SPEECH RECOGNITION  

E-print Network

AUDIO SOURCE SEPARATION WITH ONE SENSOR FOR ROBUST SPEECH RECOGNITION L. Benaroya, F. Bimbot, G of noise compensa- tion in speech signals for robust speech recognition. Sev- eral classical denoising- perimposed to the voice of the speaker(s). While automatic speech recognition is a rather mature technology

Paris-Sud XI, Université de

366

A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION  

E-print Network

A BLOCK COSINE TRANSFORM AND ITS APPLICATION IN SPEECH RECOGNITION Jingdong Chen U*, Kuldip K-mail: {jingdong.chen, nakamura}@slt.atr.co.jp, k.paliwal@me.gu.edu.au ABSTRACT Noise robust speech recognition has in automatic speech recognition. This has led to sub-band based speech recognition in which the full

367

An improved automatic lipreading system to enhance speech recognition  

Microsoft Academic Search

Current acoustic speech recognition technology performs well with very small vocabularies in noise or with large vocabularies in very low noise. Accurate acoustic speech recognition in noise with vocabularies over 100 words has yet to be achieved. Humans frequently lipread the visible facial speech articulations to enhance speech recognition, especially when the acoustic signal is degraded by noise or hearing

Eric Petajan; Bradford Bischoff; David Bodoff; N. M. Brooke

1988-01-01

368

Using semantic analysis to improve speech recognition performance  

E-print Network

Using semantic analysis to improve speech recognition performance Hakan Erdogan a,*,1 , Ruhi modeling for speech recognition attempts to model the probability P(W) of observ- ing a word sequence W natural language for speech recognition. The purpose of language modeling is to bias a speech recognizer

Erdogan, Hakan

369

The SPHINX-II speech recognition system: an overview  

Microsoft Academic Search

In order for speech recognizers to deal with increased task p erplexity, speaker variation, and environment variation, improved speech recognition is critical. Stead y progress has been made along these three dimensions at Carnegie Mellon. In this paper, we review the SPHINX-II speech recognition system and summarize our recent efforts on improved speech recognition.

Xuedong Huang; Fileno Alleva; Hsiao-Wuen Hon; Mei-Yuh Hwang; Ronald Rosenfeld

1993-01-01

370

Speaker Recognition from Coded Speech in Matched and Mismatched Conditions  

E-print Network

Speaker Recognition from Coded Speech in Matched and Mismatched Conditions R.B. Dunn , T speech coders are used. Speaker recognition from coded speech using handset dependent score normalization system, where, for example, the speaker model is trained from uncoded speech and in the recognition phase

371

Integrating Stress Information in Large Vocabulary Continuous Speech Recognition  

E-print Network

Integrating Stress Information in Large Vocabulary Continuous Speech Recognition Bogdan Ludusan, by performing well even for foreign-accented speech. Index Terms: speech recognition, stress, rhythm 1 investigated, stress seems suitable for speech recognition tasks. This is due to its intrinsic characteristics

Paris-Sud XI, Université de

372

The Effectiveness of Clear Speech as a Masker  

ERIC Educational Resources Information Center

Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…

Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

2010-01-01

373

Compressed Speech Technology: Implications for Learning and Instruction.  

ERIC Educational Resources Information Center

This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…

Sullivan, LeRoy L.

374

Modulation of the Auditory Cortex during Speech: An MEG Study  

Microsoft Academic Search

Several behavioral and brain imaging studies have demonstrated a significant interaction between speech perception and speech production. In this study, auditory cortical responses to speech were examined during self-production and feedback alteration. Magnetic field recordings were obtained from both hemispheres in subjects who spoke while hearing controlled acoustic versions of their speech feedback via earphones. These responses were compared to

John F. Houde; Srikantan S. Nagarajan; Kensuke Sekihara; Michael M. Merzenich

2002-01-01

375

Preliminary Intelligibility Tests of a Monaural Speech Segregation System  

Microsoft Academic Search

Human listeners are able to understand speech in the presence of a noisy background. How to simulate this perceptual ability remains a great challenge. This paper describes a preliminary evaluation of intelligibility of the output of a monaural speech segregation system. The system performs speech segregation in two stages. The first stage segregates voiced speech using supervised learning of harmonic

Ke Hu; Pierre Divenyi; Dan Ellis; Zhaozhang Jin; Barbara G. Shinn-Cunningham; DeLiang Wang

376

Speech Characteristics Associated with Three Genotypes of Ataxia  

ERIC Educational Resources Information Center

Purpose: Advances in neurobiology are providing new opportunities to investigate the neurological systems underlying motor speech control. This study explores the perceptual characteristics of the speech of three genotypes of spino-cerebellar ataxia (SCA) as manifest in four different speech tasks. Methods: Speech samples from 26 speakers with SCA…

Sidtis, John J.; Ahn, Ji Sook; Gomez, Christopher; Sidtis, Diana

2011-01-01

377

Speechdat multilingual speech databases for teleservices: across the finish line  

Microsoft Academic Search

The goal of the SpeechDat project is to develop spoken language resources for speech recognisers s uited to realise voice driven teleservices. SpeechDat created speech databases for all official languages of the European Union and some major dialectal varieties and minority languages. The size of the databases ranges between 500 and 5000 speakers. In total 20 d atabases are recorded

Harald Höge; Christoph Draxler; Henk van den Heuvel; Finn Tore Johansen; Eric Sanders; Herbert S. Tropf

1999-01-01

378

Auditory-Visual Speech Processing 2005 (AVSP'05)  

E-print Network

of multiple tiers of visual speech gestures, phonemes and syllable boundaries. The CUAVE database [6] providedAuditory-Visual Speech Processing 2005 (AVSP'05) British Columbia, Canada July 24-27, 2005 ISCA Archive http://www.isca-speech.org/archive An Agent-based Framework for Auditory-Visual Speech

Reyle, Uwe

379

Cues for Hesitation in Speech Synthesis Rolf Carlson1  

E-print Network

a sequence of experiments using Swedish speech synthesis. A background for our effort is a database developedCues for Hesitation in Speech Synthesis Rolf Carlson1 , Kjell Gustafson 1,2 and Eva Strangert3* 1 CSC, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden {rolf;kjellg}@speech.kth.se 2

Carlson, Rolf

380

Optimising selection of units from speech databases for concatenative synthesis  

Microsoft Academic Search

Concatenating units of natural speech is one methodof speech synthesis1. Most such systems use an inventoryof fixed length units, typically diphones ortriphones with one instance of each type. An alternativeis to use more varied, non-uniform units extractedfrom large speech databases containing multipleinstances of each. The greater variability insuch natural speech segments allows closer modelingof naturalness and differences in speaking styles,

Alan W. Black; Nick Campbell

1995-01-01

381

Human phoneme recognition depending on speech-intrinsic variabilitya)  

E-print Network

in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which. The database and detailed results have been made available for comparisons between human speech recognition HSRHuman phoneme recognition depending on speech-intrinsic variabilitya) Bernd T. Meyer,b Tim Jürgens

Meyer, Bernd T.

382

The IIIT-H Indic Speech Databases Kishore Prahallad1  

E-print Network

The IIIT-H Indic Speech Databases Kishore Prahallad1 , E.Naresh Kumar1 , Venkatesh Keri1 , S@cs.cmu.edu Abstract This paper discusses the efforts in collecting speech databases for Indian languages ­ Bengali in collecting these databases, and demonstrate their usage in speech syn- thesis. By releasing these speech

Black, Alan W

383

Design and collection of Czech Lombard speech database  

Microsoft Academic Search

In this paper, design, collection and parameters of newly proposed Czech Lombard Speech Database (CLSD) are presented. The database focuses on analysis and modeling of Lombard effect to achieve robust speech recognition improvement. The CLSD consists of neutral speech and speech produced in various types of simulated noisy background. In comparison to available databases dealing with Lombard effect, an extensive

Hynek Boril; Petr Pollák

2005-01-01

384

On the Dynamics of Casual and Careful Speech.  

ERIC Educational Resources Information Center

Comparative statistical data are presented on speech dynamic (as contrasted with lexical and rhetorical) aspects of major speech styles. Representative samples of story retelling, lectures, speeches, sermons, interviews, and panel discussions serve to determine posited differences between casual and careful speech. Data are drawn from 15,393…

Hieke, A. E.

385

Speech animation using electromagnetic articulography as motion capture data  

E-print Network

, these synthesizers are generally focused towards clinical applications such as speech therapy or biome- chanicalSpeech animation using electromagnetic articulography as motion capture data Ingmar Steiner 1, during speech. As such, it performs the same function for speech that conventional motion capture does

Edinburgh, University of

386

Bachelor of Science in Speech-Language Pathology and Audiology  

E-print Network

licensure as assistants in speech-language pathology and can provide therapy services under the supervisionBachelor of Science in Speech- Language Pathology and Audiology Speech-language pathologists and audiologists are health professionals focused on the processes and disorders of speech, language and hearing

O'Toole, Alice J.

387

Speech animation using electromagnetic articulography as motion capture data  

E-print Network

are generally focused towards clinical applications such as speech therapy or biomechanical simulation. WhileSpeech animation using electromagnetic articulography as motion capture data Ingmar Steiner 1, during speech. As such, it performs the same function for speech that conventional motion capture does

Boyer, Edmond

388

Brain-Computer Interfaces for Speech Communication  

PubMed Central

This paper briefly reviews current silent speech methodologies for normal and disabled individuals. Current techniques utilizing electromyographic (EMG) recordings of vocal tract movements are useful for physically healthy individuals but fail for tetraplegic individuals who do not have accurate voluntary control over the speech articulators. Alternative methods utilizing EMG from other body parts (e.g., hand, arm, or facial muscles) or electroencephalography (EEG) can provide capable silent communication to severely paralyzed users, though current interfaces are extremely slow relative to normal conversation rates and require constant attention to a computer screen that provides visual feedback and/or cueing. We present a novel approach to the problem of silent speech via an intracortical microelectrode brain computer interface (BCI) to predict intended speech information directly from the activity of neurons involved in speech production. The predicted speech is synthesized and acoustically fed back to the user with a delay under 50 ms. We demonstrate that the Neurotrophic Electrode used in the BCI is capable of providing useful neural recordings for over 4 years, a necessary property for BCIs that need to remain viable over the lifespan of the user. Other design considerations include neural decoding techniques based on previous research involving BCIs for computer cursor or robotic arm control via prediction of intended movement kinematics from motor cortical signals in monkeys and humans. Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions. The success of this initial trial validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates. PMID:20204164

Brumberg, Jonathan S.; Nieto-Castanon, Alfonso; Kennedy, Philip R.; Guenther, Frank H.

2010-01-01

389

Network Speech Systems Technology Program  

NASA Astrophysics Data System (ADS)

This report documents work performed during FY 1980 on the DCA-sponsored Network Speech Systems Technology Program. The areas of work reported are: (1) communication systems studies in Demand-Assignment Multiple Access (DAMA), voice/data integration, and adaptive routing, in support of the evolving Defense Communications System (DCS) and Defense Switched Network (DSN); (2) a satellite/terrestrial integration design study including the functional design of voice and data interfaces to interconnect terrestrial and satellite network subsystems; and (3) voice-conferencing efforts dealing with support of the Secure Voice and Graphics Conferencing (SVGC) Test and Evaluation Program. Progress in definition and planning of experiments for the Experimental Integrated Switched Network (EISN) is detailed separately in an FY 80 Experiment Plan Supplement.

Weinstein, C. J.

1980-09-01

390

Speech information retrieval: a review  

SciTech Connect

Audio is an information-rich component of multimedia. Information can be extracted from audio in a number of different ways, and thus there are several established audio signal analysis research fields. These fields include speech recognition, speaker recognition, audio segmentation and classification, and audio finger-printing. The information that can be extracted from tools and methods developed in these fields can greatly enhance multimedia systems. In this paper, we present the current state of research in each of the major audio analysis fields. The goal is to introduce enough back-ground for someone new in the field to quickly gain high-level understanding and to provide direction for further study.

Hafen, Ryan P.; Henry, Michael J.

2012-11-01

391

Federal Reserve Board: Speeches and Testimonies  

NSDL National Science Digital Library

Some feel that every time US Federal Reserve Board Chairman Allan Greenspan speaks, the US stock market shudders. He gave testimony before the Senate Banking Committee on February 26, 1997 and the Dow Jones industrial average plunged over 55 points that day (after a rebound from a 122 point loss). You can read the chairman's testimony and his recent speeches at the Federal Reserve Board site (the speeches and testimony of other officials are also available). Read the speeches and testimony, watch the market, and judge for yourself the power of one man in the US economy.

392

Acoustic Speech Analysis Of Wayang Golek Puppeteer  

NASA Astrophysics Data System (ADS)

Active disguising speech is one problem to be taken into account in forensic speaker verification or identification processes. The verification processes are usually carried out by comparison between unknown samples and known samples. Active disguising can be occurred on both samples. To simulate the condition of speech disguising, voices of Wayang Golek Puppeteer were used. It is assumed that wayang golek puppeteer is a master of disguise. He can manipulate his voice into many different types of character's voices. This paper discusses the speech characteristics of 2 puppeteers. Comparison was made between the voices of puppeteer's habitual voice with his manipulated voice.

Hakim, Faisal Abdul; Mandasari, Miranti Indar; Sarwono, Joko

2010-12-01

393

Two Sides of the Same Coin: The Scope of Free Speech and Hate Speech in the College Community.  

ERIC Educational Resources Information Center

This article presents the Two Sides interviews, which confront the serious and immediate conflict between free speech and hate speech on college campuses. Dr. Robert O' Neil discusses the scope of free speech in the college community, while Dr. Timothy Shiell focuses on hate speech on campuses. Contains 12 references. (VWC)

Schuett, Faye

2000-01-01

394

Empathy, Ways of Knowing, and Interdependence as Mediators of Gender Differences in Attitudes toward Hate Speech and Freedom of Speech  

ERIC Educational Resources Information Center

Women are more intolerant of hate speech than men. This study examined relationality measures as mediators of gender differences in the perception of the harm of hate speech and the importance of freedom of speech. Participants were 107 male and 123 female college students. Questionnaires assessed the perceived harm of hate speech, the importance…

Cowan, Gloria; Khatchadourian, Desiree

2003-01-01

395

Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech Recognition with Prior  

E-print Network

Model-based Noisy Speech Recognition with Environment Parameters Estimated by Noise Adaptive Speech.paliwal@griffith.edu.au nakamura@slt.atr.co.jp Abstract We have proposed earlier a noise adaptive speech recognition ap- proach that this method performs better than the previous methods. 1. Introduction Speech recognition has to be carried

396

State-based labelling for a sparse representation of speech and its application to robust speech recognition  

E-print Network

this labelling in noise- robust automatic speech recognition. Acoustic time-frequency segments of speech the transcriptions. In the recognition phase, noisy speech is mod- eled by a sparse linear combination of noise was tested in the connected digit recognition task with noisy speech material from the Aurora-2 database

Virtanen, Tuomas

397

SYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Bjrn Granstrm, Magnus Lundeberg, Giampiero Salvi,  

E-print Network

) were trained on a phonetically transcribed telephone speech database. The output of the HMMs phoneme strings from a database as input to the visual speech synthesis The two methods were evaluatedSYNTHETIC VISUAL SPEECH DRIVEN FROM AUDITORY SPEECH Eva Agelfors, Jonas Beskow, Björn Granström

Beskow, Jonas

398

Audio-visual integration of speech with time-varying sine wave speech replicas  

Microsoft Academic Search

We tested whether listener's knowledge about the nature of the auditory stimuli had an effect on audio-visual (AV) integration of speech. First, subjects were taught to categorize two sine-wave (sw) replicas of the real speech tokens \\/omso\\/ and \\/onso\\/ into two arbitrary nonspeech categories without knowledge of the speech-like nature of the sounds. A test with congruent and incongruent AV-stimulus

Jyrki Tuomainen; Tobias Andersen; Kaisa Tiippana; Mikko Sams

2002-01-01

399

Towards every-citizen²s speech interface: an application generator for speech interfaces to databases  

Microsoft Academic Search

One of the acknowledged impediments to the widespread use of speech interfaces is the portability problem, namely the consider- able amount of labor, data and expertise needed to develop such interfaces in new domains. Under the Universal Speech Interface (USI) project, we have designed unified look-and-feel speech in- terfaces that employ semi-structured interaction and thus obviate the need for data

Arthur R. Toth; Thomas K. Harris; James Sanders; Stefanie Shriver; Roni Rosenfeld

2002-01-01

400

SpeechDat-Car: Towards a collection of speech databases for automotive environments  

Microsoft Academic Search

The SpeechDat-Car project is a 4 th framework EC project in the Language Engineering programme. It aims at collecting a set of nine speech databases to support training and testing of robust multilingual speech recognition for in-car applications. The consortium participants are car manufacturers, telephone communications providers, and universities. This paper describes the background o f the project, its organisation,

Henk van den Heuvel; Antonio Bonafonte; Jerome Boudy; S. Dufour; Ph. Lockwood; A. Moreno; G. Richard

1999-01-01

401

An Analysis of Interstate Speeches: Are They Structurally Different?  

Microsoft Academic Search

The purpose of this study is to examine Interstate Oratorical speeches and deter- mine if these speeches demonstrate common characteristics that can be traced from the earliest championship speech given in 1875 to the championship speech in 2000. Twenty-seven 1OA speeches were analyzed using a coding system based around the categories of topic, organizational pattern\\/structure, evidence usage, stylistic features and

Leah E. White; Mankato Lucas Messer

402

Variable-speech-rate audiometry for hearing aid evaluation  

Microsoft Academic Search

A new hearing aid evaluation method using variable-speech-rate audiometry (VSRA) was developed. VSRA was newly created based on the Japanese speech audiometry authorized by the Japan Audiological Society. The ordinary speech audiometry can not reveal a temporal factor in word discrimination ability of the hearing impaired. Since, with VSRA, we can compare several performance-intensity curves obtained from different speech-rate speech

Hiroshi Hosoi; Yoshiaki Tsuta; Takashi Nishida; Kiyotaka Murata; Fumihiko Ohta; Tsuyoshi Mekata; Yumiko Kato

1999-01-01

403

Speech Perception and Short Term Memory Deficits in Persistent Developmental Speech Disorder  

PubMed Central

Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech perception and short-term memory. Nine adults with a persistent familial developmental speech disorder without language impairment were compared with 20 controls on tasks requiring the discrimination of fine acoustic cues for word identification and on measures of verbal and nonverbal short-term memory. Significant group differences were found in the slopes of the discrimination curves for first formant transitions for word identification with stop gaps of 40 and 20 ms with effect sizes of 1.60 and 1.56. Significant group differences also occurred on tests of nonverbal rhythm and tonal memory, and verbal short-term memory with effect sizes of 2.38, 1.56 and 1.73. No group differences occurred in the use of stop gap durations for word identification. Because frequency-based speech perception and short-term verbal and nonverbal memory deficits both persisted into adulthood in the speech-impaired adults, these deficits may be involved in the persistence of speech disorders without language impairment. PMID:15896836

Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.

2008-01-01

404

Reading speech from still and moving faces: the neural substrates of visible speech.  

PubMed

Speech is perceived both by ear and by eye. Unlike heard speech, some seen speech gestures can be captured in stilled image sequences. Previous studies have shown that in hearing people, natural time-varying silent seen speech can access the auditory cortex (left superior temporal regions). Using functional magnetic resonance imaging (fMRI), the present study explored the extent to which this circuitry was activated when seen speech was deprived of its time-varying characteristics. In the scanner, hearing participants were instructed to look for a prespecified visible speech target sequence ("voo" or "ahv") among other monosyllables. In one condition, the image sequence comprised a series of stilled key frames showing apical gestures (e.g., separate frames for "v" and "oo" [from the target] or "ee" and "m" [i.e., from nontarget syllables]). In the other condition, natural speech movement of the same overall segment duration was seen. In contrast to a baseline condition in which the letter "V" was superimposed on a resting face, stilled speech face images generated activation in posterior cortical regions associated with the perception of biological movement, despite the lack of apparent movement in the speech image sequence. Activation was also detected in traditional speech-processing regions including the left inferior frontal (Broca's) area, left superior temporal sulcus (STS), and left supramarginal gyrus (the dorsal aspect of Wernicke's area). Stilled speech sequences also generated activation in the ventral premotor cortex and anterior inferior parietal sulcus bilaterally. Moving faces generated significantly greater cortical activation than stilled face sequences, and in similar regions. However, a number of differences between stilled and moving speech were also observed. In the visual cortex, stilled faces generated relatively more activation in primary visual regions (V1/V2), while visual movement areas (V5/MT+) were activated to a greater extent by moving faces. Cortical regions activated more by naturally moving speaking faces included the auditory cortex (Brodmann's Areas 41/42; lateral parts of Heschl's gyrus) and the left STS and inferior frontal gyrus. Seen speech with normal time-varying characteristics appears to have preferential access to "purely" auditory processing regions specialized for language, possibly via acquired dynamic audiovisual integration mechanisms in STS. When seen speech lacks natural time-varying characteristics, access to speech-processing systems in the left temporal lobe may be achieved predominantly via action-based speech representations, realized in the ventral premotor cortex. PMID:12590843

Calvert, Gemma A; Campbell, Ruth

2003-01-01

405

IMPLEMENTING SRI’S PASHTO SPEECH-TO-SPEECH TRANSLATION SYSTEM ON A  

E-print Network

We describe our recent effort implementing SRI’s UMPCbased Pashto speech-to-speech (S2S) translation system on a smart phone running the Android operating system. In order to maintain very low latencies of system response on computationally limited smart phone platforms, we developed efficient algorithms and data structures and optimized model sizes for various system components. Our current Android-based S2S system requires less than onefourth the system memory and significantly lower processor speed with a sacrifice of 15 % relative loss of system accuracy, compared to a laptop-based platform. Index Terms — speech-to-speech translation, mobile computing, smart phone, Android 1.

Smart Phone; Jing Zheng; Arindam M; Xin Lei; Michael Fr; Necip Fazil Ayan; Dimitra Vergyri; Wen Wang; Murat Akbacak; Kristin Precoda

406

The evolution of speech: vision, rhythm, cooperation.  

PubMed

A full account of human speech evolution must consider its multisensory, rhythmic, and cooperative characteristics. Humans, apes, and monkeys recognize the correspondence between vocalizations and their associated facial postures, and gain behavioral benefits from them. Some monkey vocalizations even have a speech-like acoustic rhythmicity but lack the concomitant rhythmic facial motion that speech exhibits. We review data showing that rhythmic facial expressions such as lip-smacking may have been linked to vocal output to produce an ancestral form of rhythmic audiovisual speech. Finally, we argue that human vocal cooperation (turn-taking) may have arisen through a combination of volubility and prosociality, and provide comparative evidence from one species to support this hypothesis. PMID:25048821

Ghazanfar, Asif A; Takahashi, Daniel Y

2014-10-01

407

Pronunciation learning for automatic speech recognition  

E-print Network

In many ways, the lexicon remains the Achilles heel of modern automatic speech recognizers (ASRs). Unlike stochastic acoustic and language models that learn the values of their parameters from training data, the baseform ...

Badr, Ibrahim

2011-01-01

408

Speech emotional features extraction based on electroglottograph.  

PubMed

This study proposes two classes of speech emotional features extracted from electroglottography (EGG) and speech signal. The power-law distribution coefficients (PLDC) of voiced segments duration, pitch rise duration, and pitch down duration are obtained to reflect the information of vocal folds excitation. The real discrete cosine transform coefficients of the normalized spectrum of EGG and speech signal are calculated to reflect the information of vocal tract modulation. Two experiments are carried out. One is of proposed features and traditional features based on sequential forward floating search and sequential backward floating search. The other is the comparative emotion recognition based on support vector machine. The results show that proposed features are better than those commonly used in the case of speaker-independent and content-independent speech emotion recognition. PMID:24047321

Chen, Lijiang; Mao, Xia; Wei, Pengfei; Compare, Angelo

2013-12-01

409

Teaming for Speech and Auditory Training.  

ERIC Educational Resources Information Center

The article suggests three strategies for the audiologist and speech/communication specialist to use in assisting the preschool teacher to implement student's individualized education program: (1) demonstration teaming, (2) dual teaming; and (3) rotation teaming. (CL)

Nussbaum, Debra B.; Waddy-Smith, Bettie

1985-01-01

410

Coherence and the speech intelligibility index  

NASA Astrophysics Data System (ADS)

The speech intelligibility index (SII) (ANSI S3.5-1997) provides a means for estimating speech intelligibility under conditions of additive stationary noise or bandwidth reduction. The SII concept for estimating intelligibility is extended in this paper to include broadband peak-clipping and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects. The speech intelligibility predictions using the new procedure are compared with intelligibility scores obtained from normal-hearing and hearing-impaired subjects for conditions of additive noise and peak-clipping and center-clipping distortion. The most effective procedure divides the speech signal into low-, mid-, and high-level regions, computes the coherence SII separately for the signal segments in each region, and then estimates intelligibility from a weighted combination of the three coherence SII values. .

Kates, James M.; Arehart, Kathryn H.

2005-04-01

411

Multifractal analysis of unvoiced speech signals  

NASA Astrophysics Data System (ADS)

In this thesis, we analyze the complexity involved in the production of unvoiced speech signals with measures from nonlinear dynamics and chaos theory. Previous research successfully characterized some speech signals as chaotic. However, in this dissertation, we use multifractal measures to postulate the presence of various fractal regimes present in the attractors of unvoiced speech signals. We extend prior work which used only correlation dimension D2 and Lyapunov Exponents to analyze some speech sounds. We capture the chaotic properties of unvoiced speech signals in the embedded vector space more succinctly by not only estimating the correlation dimension D2, but by also estimating the generalized dimension Dq. The (non-constant) generalized dimensions were estimated from phase space reconstructed vectors of single scalar variable realization of unvoiced speech signals. The largest of those dimensions is an indicator of the minimum dimension required in the phase space of any realistic dynamical model of speech signals. Results of the generalized dimension estimation support the hypothesis that unvoiced speech signals indeed have multifractal measures. The multifractal analysis also reveals that unvoiced speech signals exhibit low- dimensional chaos as well as 'soft' turbulence. This is in contrast to the opinion that unvoiced speech signals are generated from what is technically known as 'hard' turbulent flow, in which the dimension of a dynamical model is very high. Unvoiced speech signals may actually be generated from 'soft' turbulent flow. In this dissertation, we also explore the relationship between the estimated generalized dimensions Dq and the singularity spectrum f(/alpha). Existing algorithms for accurately estimating the resulting singularity spectrum f(/alpha) from the samples of generalized dimensions Dq of a multifractal chaotic time series use either (a) linear interpolation of the known, coarsely sampled, Dq values or (b) a finely sampled Dq curve obtained at great computational/experimental expense. Also, in conventional techniques the derivative in the expression for Legendre transform necessary to go from Dq to f(/alpha) is approximated using first order centered difference equation. Finely sampling the Dq is computationally intensive and the simple linear approximations to interpolation and differentiation give erroneous end points in the f(/alpha) curve. We propose using standard min-max filter design methods to more accurately interpolate between known samples of the Dq values and compute the differentiation needed to evaluate the Legendre transform. We use optimum (min- max) interpolators and differentiators designed with the Parks-McClellan algorithm. We have computed the generalized dimensions and singularity spectrum of 20 unvoiced speech sounds from the ISOLET database. The results not only indicate multifractality of certain unvoiced speech sounds, but also may lead to nonlinear maps that may be useful in improving the nonlinear dynamical modeling of speech sounds. This new approach to f(/alpha) singularity spectrum calculation exhibits computational reduction and improved accuracy. The proposed method also provides estimates of the generalized dimensions at D? and D- ? which are almost impossible to obtain from real data with limited number of data samples. Also, the asymmetric spread of ? values with the corresponding f(/alpha) around the maximum of f(/alpha) reveal the inhomogeneity in the attractors of unvoiced speech signals just like the variations in the Dq values. The asymmetric spread of ? values may also be an indication that the turbulent energy fields generated during unvoiced speech production are made of non-homogeneous fractals.

Adeyemi, Olufemi A.

412

Speech transformations based on a sinusoidal representation  

NASA Astrophysics Data System (ADS)

A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.

Quatieri, T. E.; McAulay, R. J.

1986-05-01

413

commanimation: Creating and managing animations via speech  

E-print Network

A speech controlled animation system is both a useful application program as well as a laboratory in which to investigate context aware applications as well as controlling errors. The user need not have prior knowledge or ...

Kim, Hana

414

Sounds and speech perception Productivity of language  

E-print Network

­ Nasal/sinus passages ­ Lips and teeth · All effect sound made Phonetic features · Speech sounds differ on features · Vowel/consonant ­ Is there an obstruction of the vocal tract · For consonants (vowels have

Pillow, Jonathan

415

Managing to Speak by Managing the Speech.  

ERIC Educational Resources Information Center

The essence of giving a good speech is to view it as a managerial problem/opportunity and apply the four management functions to resolve it. These four functions are (1) planning; (2) organizing; (3) motivating; and (4) controlling. (JOW)

Sussman, Lyle

1988-01-01

416

The unattended speech effect: perception or memory?  

PubMed

Broadbent (1983) has suggested that the influence of unattended speech on immediate serial recall is a perceptual phenomenon rather than a memory phenomenon. In order to test this, subjects were required to classify visually presented pairs of consonants on the basis of either case or rhyme. They were tested both in silence and against a background of continuous spoken Arabic presented at 75 dB(A). No effect of unattended speech was observed on either the speed or accuracy of processing. A further study required subjects to decide whether visually presented nonwords were homophonous with real words. Again, performance was not impaired by unattended speech, although a clear effect was observed on an immediate serial memory task. Our results give no support to the perceptual interpretation of the unattended speech effect. PMID:2945899

Baddeley, A; Salamé, P

1986-10-01

417

Infinite Support Vector Machines in Speech Recognition  

E-print Network

Generative feature spaces provide an elegant way to apply discriminative models in speech recognition, and system performance has been improved by adapting this framework. However, the classes in the feature space may be not linearly separable...

Yang, Jingzhou; van Dalien, Rogier C.; Gales, M. J. F.

2013-01-01

418

Autoregressive clustering for HMM speech synthesis  

E-print Network

of autoregressive clustering for autoregressive HMM-based speech synthesis. We describe decision tree clustering for the autoregressive HMM and highlight differences to the standard clustering procedure. Subjective listening evaluation results suggest...

Shannon, Matt; Byrne, William

2010-09-27

419

An introduction to the speechBITE database: Speech pathology database for best interventions and treatment efficacy  

Microsoft Academic Search

This paper describes the development of the Speech Pathology Database for Best Interventions and Treatment Efficacy (speechBITE) at The University of Sydney. The speechBITE database is designed to provide better access to the intervention research relevant to speech pathology and to help clinicians interpret treatment research. The challenges speech pathologists face when locating research to support evidence-based practice have been

Katherine Smith; Patricia McCabe; Leanne Togher; Emma Power; Natalie Munro; Elizabeth Murray; Michelle Lincoln

2010-01-01

420

Phonetic segmentation using multiple speech features  

Microsoft Academic Search

In this paper we propose a method for improving the performance of the segmentation of speech waveforms to phonetic units.\\u000a The proposed method is based on the well known Viterbi time-alignment algorithm and utilizes the phonetic boundary predictions\\u000a from multiple speech parameterization techniques. Specifically, we utilize the most appropriate, with respect to boundary\\u000a type, phone transition position prediction as initial

Iosif Mporas; Todor Ganchev; Nikos Fakotakis

2008-01-01

421

Bimodal codebooks for CELP speech coding  

E-print Network

BIMODAL CODEBOOKS FOR CELP SPEECH CODING A Thesis by HONG CHAE WOO Submitted to the Office of Graduate Studies of Texas ASM University in partial fulfillment of the requirement for the degree of MASTER OF SCIENCE December 1988 Major Subject...: Electrical Engineering BIMODAL CODEBOOKS FOR CELP SPEECH CODING A Thesis by HONG CHAE WOO Approved as to style and content by: J y D. Gibson R/~* Don R. Halverson (Member) Shiping Li (Member) Ronald R. Hockin (Member) Jo W. Howze (Head...

Woo, Hong Chae

2012-06-07

422

Automatic Speech Recognition: An Improved Paradigm  

Microsoft Academic Search

\\u000a In this paper we present a short survey of automatic speech recognition systems underlining the current achievements and capabilities\\u000a of current day solutions as well as their inherent limitations and shortcomings. In response to which we propose an improved\\u000a paradigm and algorithm for building an automatic speech recognition system that actively adapts its recognition model in an\\u000a unsupervised fashion by

Tudor-Sabin Topoleanu; Gheorghe Leonte Mogan

2011-01-01

423

Mothers' speech in three social classes  

Microsoft Academic Search

Functional and linguistic aspects of the speech of Dutch-speaking mothers from three social classes to their 2-year-old children were studied. Mothers' speech in Dutch showed the same characteristics of simplicity and redundancy found in other languages. In a free play situation, both academic and lower middle class mothers produced more expansions and used fewer imperatives, more substantive deixis, and fewer

C. E. Snow; A. Arlman-Rupp; Y. Hassing; J. Jobse; J. Joosten; J. Vorster

1976-01-01

424

Review of Neural Networks for Speech Recognition  

Microsoft Academic Search

The performance of current speech recognition systems is far below that of humans. Neural nets offer the potential of providing massive parallelism, adaptation, and new algorithmic approaches to problems in speech recognition. Initial studies have demonstrated that multilayer networks with time delays can provide excellent discrimination between small sets of pre-segmented difficult-to-discriminate words, consonants, and vowels. Performance for these small

Richard P. Lippmann

1989-01-01

425

Large vocabulary continuous speech recognition using HTK  

Microsoft Academic Search

HTK is a portable software toolkit for building speech recognition systems using continuous density hidden Markov models developed by the Cambridge University Speech Group. One particularly successful type of system uses mixture density tied-state triphones. We have used this technique for the 5 k\\/20 k word ARPA Wall Street Journal (WSJ) task. We have extended our approach from using word-internal

P. C. Woodland; J. J. Odell; V. Valtchev; S. J. Young

1994-01-01

426

Signal modeling techniques in speech recognition  

Microsoft Academic Search

A tutorial on signal processing in state-of-the-art speech recognition systems is presented, reviewing those techniques most commonly used. The four basic operations of signal modeling, i.e. spectral shaping, spectral analysis, parametric transformation, and statistical modeling, are discussed. Three important trends that have developed in the last five years in speech recognition are examined. First, heterogeneous parameter sets that mix absolute

JOSEPH W. PICONE; Texas Instruments

1993-01-01

427

The fragility of freedom of speech.  

PubMed

Freedom of speech is a fundamental liberty that imposes a stringent duty of tolerance. Tolerance is limited by direct incitements to violence. False notions and bad laws on speech have obscured our view of this freedom. Hence, perhaps, the self-righteous intolerance, incitements and threats in response to Giubilini and Minerva. Those who disagree have the right to argue back but their attempts to shut us up are morally wrong. PMID:23637438

Shackel, Nicholas

2013-05-01

428

Dialog Act Modeling for Conversational Speech  

Microsoft Academic Search

We describe an integrated approach for statistical modeling of discourse structure for natural conversa- tional speech. Our model is based on 42 ~dialog acts' (e.g., Statement, Question, Backchannel, Agreement, Disagreement, Apology), which were hand-labeled in 1155 conversations from the Switchboard corpus of spontaneous human-to-human telephone speech. We developed several models and algorithms to automati- cally detect dialog acts from transcribed

Andreas Stolcke; Elizabeth Shriberg

1998-01-01

429

Electrophysiological evidence for speech-specific audiovisual integration.  

PubMed

Lip-read speech is integrated with heard speech at various neural levels. Here, we investigated the extent to which lip-read induced modulations of the auditory N1 and P2 (measured with EEG) are indicative of speech-specific audiovisual integration, and we explored to what extent the ERPs were modulated by phonetic audiovisual congruency. In order to disentangle speech-specific (phonetic) integration from non-speech integration, we used Sine-Wave Speech (SWS) that was perceived as speech by half of the participants (they were in speech-mode), while the other half was in non-speech mode. Results showed that the N1 obtained with audiovisual stimuli peaked earlier than the N1 evoked by auditory-only stimuli. This lip-read induced speeding up of the N1 occurred for listeners in speech and non-speech mode. In contrast, if listeners were in speech-mode, lip-read speech also modulated the auditory P2, but not if listeners were in non-speech mode, thus revealing speech-specific audiovisual binding. Comparing ERPs for phonetically congruent audiovisual stimuli with ERPs for incongruent stimuli revealed an effect of phonetic stimulus congruency that started at ~200 ms after (in)congruence became apparent. Critically, akin to the P2 suppression, congruency effects were only observed if listeners were in speech mode, and not if they were in non-speech mode. Using identical stimuli, we thus confirm that audiovisual binding involves (partially) different neural mechanisms for sound processing in speech and non-speech mode. PMID:24291340

Baart, Martijn; Stekelenburg, Jeroen J; Vroomen, Jean

2014-01-01

430

Voice modulations in German ironic speech.  

PubMed

Previous research has shown that in different languages ironic speech is acoustically modulated compared to literal speech,and these modulations are assumed to aid the listener in the comprehension process by acting as cues that mark utterances as ironic. The present study was conducted to identify paraverbal features of German 'ironic criticism' that may possibly act as irony cues by comparing acoustic measures of ironic and literal speech. For this purpose, samples of scripted ironic and literal target utterances produced by 14 female speakers were recorded and acoustically analyzed. Results showed that in contrast to literal remarks, ironic criticism was characterized by a decreased mean fundamental frequency (F0), raised energy levels and increased vowel duration, whereas F0-contours differed only marginally between both speech types. Furthermore, we found ironic speech to be characterized by vowel hyperarticulation,an acoustic feature which has so far not been considered as a possible irony cue. Contrary to our expectations, voice modulations in ironic speech were applied independently from the availability of additional, visual irony cues.The results are discussed in light of previous findings on acoustic features of irony yielded for other languages. PMID:22338786

Scharrer, Lisa; Christmann, Ursula; Knoll, Monja

2011-12-01

431

Speech earthquakes: scaling and universality in human voice  

E-print Network

Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such earthquakes in speech show temporal correlations, as the interevent statistics are again power-law distributed. Since this feature takes place in the intra-phoneme range, we conjecture that the responsible for this complex phenomenon is not cognitive, but it resides on the physiological speech production mechanism. Moreover, we show that these waiting time distributions are scale invariant under a renormalisation group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech ...

Luque, Jordi; Lacasa, Lucas

2014-01-01

432

Conversational quality evaluation of artificial bandwidth extension of telephone speech.  

PubMed

Artificial bandwidth extension methods have been developed to improve the quality and intelligibility of narrowband telephone speech and to reduce the difference with wideband speech. Such methods have commonly been evaluated with objective measures or subjective listening-only tests, but conversational evaluations have been rare. This article presents a conversational evaluation of two methods for the artificial bandwidth extension of telephone speech. Bandwidth-extended narrowband speech is compared with narrowband and wideband speech in a test setting including a simulated telephone connection, realistic conversation tasks, and various background noise conditions. The responses of the subjects indicate that speech processed with one of the methods is preferred to narrowband speech in noise, but wideband speech is superior to both narrowband and bandwidth-extended speech. Bandwidth extension was found to be beneficial for telephone conversation in noisy listening conditions. PMID:22894208

Pulakka, Hannu; Laaksonen, Laura; Yrttiaho, Santeri; Myllylä, Ville; Alku, Paavo

2012-08-01

433

Foundational tuning: how infants' attention to speech predicts language development.  

PubMed

Orienting biases for speech may provide a foundation for language development. Although human infants show a bias for listening to speech from birth, the relation of a speech bias to later language development has not been established. Here, we examine whether infants' attention to speech directly predicts expressive vocabulary. Infants listened to speech or non-speech in a preferential listening procedure. Results show that infants' attention to speech at 12 months significantly predicted expressive vocabulary at 18 months, while indices of general development did not. No predictive relationships were found for infants' attention to non-speech, or overall attention to sounds, suggesting that the relationship between speech and expressive vocabulary was not a function of infants' general attentiveness. Potentially ancient evolutionary perceptual capacities such as biases for conspecific vocalizations may provide a foundation for proficiency in formal systems such language, much like the approximate number sense may provide a foundation for formal mathematics. PMID:25098703

Vouloumanos, Athena; Curtin, Suzanne

2014-11-01

434

Utility of TMS to understand the neurobiology of speech  

PubMed Central

According to a traditional view, speech perception and production are processed largely separately in sensory and motor brain areas. Recent psycholinguistic and neuroimaging studies provide novel evidence that the sensory and motor systems dynamically interact in speech processing, by demonstrating that speech perception and imitation share regional brain activations. However, the exact nature and mechanisms of these sensorimotor interactions are not completely understood yet. Transcranial magnetic stimulation (TMS) has often been used in the cognitive neurosciences, including speech research, as a complementary technique to behavioral and neuroimaging studies. Here we provide an up-to-date review focusing on TMS studies that explored speech perception and imitation. Single-pulse TMS of the primary motor cortex (M1) demonstrated a speech specific and somatotopically specific increase of excitability of the M1 lip area during speech perception (listening to speech or lip reading). A paired-coil TMS approach showed increases in effective connectivity from brain regions that are involved in speech processing to the M1 lip area when listening to speech. TMS in virtual lesion mode applied to speech processing areas modulated performance of phonological recognition and imitation of perceived speech. In summary, TMS is an innovative tool to investigate processing of speech perception and imitation. TMS studies have provided strong evidence that the sensory system is critically involved in mapping sensory input onto motor output and that the motor system plays an important role in speech perception. PMID:23874322

Murakami, Takenobu; Ugawa, Yoshikazu; Ziemann, Ulf

2013-01-01

435

Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures during Metronome-Paced Speech in Persons Who Stutter  

ERIC Educational Resources Information Center

Background: Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control…

Davidow, Jason H.

2014-01-01

436

Using links between speech perception and speech production to evaluate different acoustic metrics: A preliminary report  

Microsoft Academic Search

This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide

Rochelle S. Newman

2003-01-01

437

A Clinician Survey of Speech and Non-Speech Characteristics of Neurogenic Stuttering  

ERIC Educational Resources Information Center

This study presents survey data on 58 Dutch-speaking patients with neurogenic stuttering following various neurological injuries. Stroke was the most prevalent cause of stuttering in our patients, followed by traumatic brain injury, neurodegenerative diseases, and other causes. Speech and non-speech characteristics were analyzed separately for…

Theys, Catherine; van Wieringen, Astrid; De Nil, Luc F.

2008-01-01

438

Spotlight on Speech Codes 2009: The State of Free Speech on Our Nation's Campuses  

ERIC Educational Resources Information Center

Each year, the Foundation for Individual Rights in Education (FIRE) conducts a wide, detailed survey of restrictions on speech at America's colleges and universities. The survey and resulting report explore the extent to which schools are meeting their obligations to uphold students' and faculty members' rights to freedom of speech, freedom of…

Foundation for Individual Rights in Education (NJ1), 2009

2009-01-01

439

A Motor Speech Assessment for Children with Severe Speech Disorders: Reliability and Validity Evidence  

ERIC Educational Resources Information Center

Purpose: In this article, the authors report reliability and validity evidence for the Dynamic Evaluation of Motor Speech Skill (DEMSS), a new test that uses dynamic assessment to aid in the differential diagnosis of childhood apraxia of speech (CAS). Method: Participants were 81 children between 36 and 79 months of age who were referred to the…

Strand, Edythe A.; McCauley, Rebecca J.; Weigand, Stephen D.; Stoeckel, Ruth E.; Baas, Becky S.

2013-01-01

440

Optimal speech level for speech transmission in a noisy environment for young adults and aged persons  

NASA Astrophysics Data System (ADS)

Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.

Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi

2005-04-01

441

Speech-to-Speech Translation Services for the Olympic Games 2008  

E-print Network

Speech-to-Speech Translation Services for the Olympic Games 2008 Sebastian St¨uker1 , Chengqing Olympics. One of the objectives of the program is the use of artificial intelligence technology to overcome transparent national borders, the number of international tourists rises steadily. As a tourist in a foreign

Zong, Chengqing

442

Cued Speech for Enhancing Speech Perception and First Language Development of Children With Cochlear Implants  

PubMed Central

Nearly 300 million people worldwide have moderate to profound hearing loss. Hearing impairment, if not adequately managed, has strong socioeconomic and affective impact on individuals. Cochlear implants have become the most effective vehicle for helping profoundly deaf children and adults to understand spoken language, to be sensitive to environmental sounds, and, to some extent, to listen to music. The auditory information delivered by the cochlear implant remains non-optimal for speech perception because it delivers a spectrally degraded signal and lacks some of the fine temporal acoustic structure. In this article, we discuss research revealing the multimodal nature of speech perception in normally-hearing individuals, with important inter-subject variability in the weighting of auditory or visual information. We also discuss how audio-visual training, via Cued Speech, can improve speech perception in cochlear implantees, particularly in noisy contexts. Cued Speech is a system that makes use of visual information from speechreading combined with hand shapes positioned in different places around the face in order to deliver completely unambiguous information about the syllables and the phonemes of spoken language. We support our view that exposure to Cued Speech before or after the implantation could be important in the aural rehabilitation process of cochlear implantees. We describe five lines of research that are converging to support the view that Cued Speech can enhance speech perception in individuals with cochlear implants. PMID:20724357

Leybaert, Jacqueline; LaSasso, Carol J.

2010-01-01

443

Constructing Adequate Non-Speech Analogues: What Is Special about Speech Anyway?  

ERIC Educational Resources Information Center

Vouloumanos and Werker (2007) claim that human neonates have a (possibly innate) bias to listen to speech based on a preference for natural speech utterances over sine-wave analogues. We argue that this bias more likely arises from the strikingly different saliency of voice melody in the two kinds of sounds, a bias that has already been shown to…

Rosen, Stuart; Iverson, Paul

2007-01-01

444

State Speech vs. Hate Speech: What to Do About Words that Wound?  

Microsoft Academic Search

This is, indeed, another work on the subject of hate speech regulation in the United States. And yet, it is not just another such work. For my goal here is not to settle the jurisprudential arguments regarding the possibility of any specific hate speech regulation, either extant or yet to be conceived, withstanding a Constitutional test. Nor is it my

Michael Weinman

2006-01-01

445

Racist-Sexist-Hate Speech on College Campuses: Free Speech v. Equal Protection.  

ERIC Educational Resources Information Center

On college campuses today, the debate rages over whether self-restraint and tolerance for nonconformity is overriding a need to protect certain individuals and groups from objectionable speech. Some administrators, students, and alumni wish to prevent "bad speech" in the form of expressions of racism, sexism, and the like. Advocates for limiting…

Jahn, Karon L.

446

Implementing Speech Supplementation Strategies: Effects on Intelligibility and Speech Rate of Individuals with Chronic Severe Dysarthria.  

ERIC Educational Resources Information Center

A study compared intelligibility and speech rate differences following speaker implementation of 3 strategies (topic, alphabet, and combined topic and alphabet supplementation) and a habitual speech control condition for 5 speakers with severe dysarthria. Combined cues and alphabet cues yielded significantly higher intelligibility scores and…

Hustad, Katherine C.; Jones, Tabitha; Dailey, Suzanne

2003-01-01

447

Masculine and Feminine Speech in Dyads and Groups: A Study of Speech Style and Gender Salience  

Microsoft Academic Search

An examination of the impact of the situational salience of gender on males' and females' speech styles is identified as a lacuna in the sex and language literature. An experiment was conducted in which subjects' ratings of speakers for whom sex was more or less salient were employed to monitor real speech differences. Tape-recorded extracts of the spontaneous discourse of

Michael A. Hogg

1985-01-01

448

Unit selection in a concatenative speech synthesis system using a large speech database  

Microsoft Academic Search

One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database

Andrew J. Hunt; Alan W. Black

1996-01-01

449

Speech Communication 18 ( 1996) 3 17-334 A microphone array processing technique for speech  

E-print Network

ELSEVIER SPEECH Speech Communication 18 ( 1996) 3 17-334 A microphone array processing technique; revised 11January 1996 Abstract In this paper, a new microphone array processing technique is proposed processing of the minimum-phase and all-pass components of delay-steered multi-microphone signals

Kabal, Peter

450

NON-LINEAR MAPPING FOR MUTLI-CHANNEL SPEECH SEPARATION AND ROBUST OVERLAPPING SPEECH RECOGNITION  

E-print Network

-speaker conditions. Index Terms-- microphone array, speech separation, binary masking, overlapping speech recognition fundamental and important multi-channel method is the microphone array beamformer method, which consists that the motivation behind microphone array techniques such as the beamforming described above is to enhance

451

Neural Encoding of Speech and Music: Implications for Hearing Speech in Noise  

E-print Network

-in-noise perception. Auditory processing in complex environments is reflected in neural encoding of pitch, timing, and timbre, the crucial elements of speech and music. Musical expertise in processing pitch, timing recent work examining the biological mechanisms of speech and music perception and the potential

452

Modeling Speech Disfluency to Predict Conceptual Misalignment in Speech Survey Interfaces  

ERIC Educational Resources Information Center

Computer-based interviewing systems could use models of respondent disfluency behaviors to predict a need for clarification of terms in survey questions. This study compares simulated speech interfaces that use two such models--a generic model and a stereotyped model that distinguishes between the speech of younger and older speakers--to several…

Ehlen, Patrick; Schober, Michael F.; Conrad, Frederick G.

2007-01-01

453

Women's Speech/Men's Speech: Does Forensic Training Make a Difference?  

ERIC Educational Resources Information Center

A study of cross examination speeches of males and females was conducted to determine gender differences in intercollegiate debate. The theory base for gender differences in speech is closely tied to the analysis of dyadic conversation. It is based on the belief that women are less forceful and dominant in cross examination, and will exhibit…

Larson, Suzanne; Vreeland, Amy L.

454

Autonomic and Emotional Responses of Graduate Student Clinicians in Speech-Language Pathology to Stuttered Speech  

ERIC Educational Resources Information Center

Background: Fluent speakers and people who stutter manifest alterations in autonomic and emotional responses as they view stuttered relative to fluent speech samples. These reactions are indicative of an aroused autonomic state and are hypothesized to be triggered by the abrupt breakdown in fluency exemplified in stuttered speech. Furthermore,…

Guntupalli, Vijaya K.; Nanjundeswaran, Chayadevie; Dayalu, Vikram N.; Kalinowski, Joseph

2012-01-01

455

Learning curve of speech recognition.  

PubMed

Speech recognition (SR) speeds patient care processes by reducing report turnaround times. However, concerns have emerged about prolonged training and an added secretarial burden for radiologists. We assessed how much proofing radiologists who have years of experience with SR and radiologists new to SR must perform, and estimated how quickly the new users become as skilled as the experienced users. We studied SR log entries for 0.25 million reports from 154 radiologists and after careful exclusions, defined a group of 11 experienced radiologists and 71 radiologists new to SR (24,833 and 122,093 reports, respectively). Data were analyzed for sound file and report lengths, character-based error rates, and words unknown to the SR's dictionary. Experienced radiologists corrected 6 characters for each report and for new users, 11. Some users presented a very unfavorable learning curve, with error rates not declining as expected. New users' reports were longer, and data for the experienced users indicates that their reports, initially equally lengthy, shortened over a period of several years. For most radiologists, only minor corrections of dictated reports were necessary. While new users adopted SR quickly, with a subset outperforming experienced users from the start, identification of users struggling with SR will help facilitate troubleshooting and support. PMID:23779151

Kauppinen, Tomi A; Kaipio, Johanna; Koivikko, Mika P

2013-12-01

456

The logic of indirect speech  

PubMed Central

When people speak, they often insinuate their intent indirectly rather than stating it as a bald proposition. Examples include sexual come-ons, veiled threats, polite requests, and concealed bribes. We propose a three-part theory of indirect speech, based on the idea that human communication involves a mixture of cooperation and conflict. First, indirect requests allow for plausible deniability, in which a cooperative listener can accept the request, but an uncooperative one cannot react adversarially to it. This intuition is supported by a game-theoretic model that predicts the costs and benefits to a speaker of direct and indirect requests. Second, language has two functions: to convey information and to negotiate the type of relationship holding between speaker and hearer (in particular, dominance, communality, or reciprocity). The emotional costs of a mismatch in the assumed relationship type can create a need for plausible deniability and, thereby, select for indirectness even when there are no tangible costs. Third, people perceive language as a digital medium, which allows a sentence to generate common knowledge, to propagate a message with high fidelity, and to serve as a reference point in coordination games. This feature makes an indirect request qualitatively different from a direct one even when the speaker and listener can infer each other's intentions with high confidence. PMID:18199841

Pinker, Steven; Nowak, Martin A.; Lee, James J.

2008-01-01

457

Speech research: Studies on the nature of speech, instrumentation for its investigation, and practical applications  

NASA Astrophysics Data System (ADS)

This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation and practical applications. Manuscripts cover the following topics: Speech perception and memory coding in relation to reading ability; The use of orthographic structure by deaf adults: Recognition of finger-spelled letters; Exploring the information support for speech; The stream of speech; Using the acoustic signal to make inferences about place and duration of tongue-palate contact. Patterns of human interlimb coordination emerge from the the properties of nonlinear limit cycle oscillatory processes: Theory and data; Motor control: Which themes do we orchestrate? Exploring the nature of motor control in Down's syndrome; Periodicity and auditory memory: A pilot study; Reading skill and language skill: On the role of sign order and morphological structure in memory for American Sign Language sentences; Perception of nasal consonants with special reference to Catalan; and Speech production Characteristics of the hearing impaired.

Liberman, A. M.

1982-03-01

458

The intelligibility of interrupted speech depends upon its uninterrupted intelligibility.  

PubMed

Recognition of sentences containing periodic, 5-Hz, silent interruptions of differing duty cycles was assessed for three types of processed speech. Processing conditions employed different combinations of spectral resolution and the availability of fundamental frequency (F0) information, chosen to yield similar, below-ceiling performance for uninterrupted speech. Performance declined with decreasing duty cycle similarly for each processing condition, suggesting that, at least for certain forms of speech processing and interruption rates, performance with interrupted speech may reflect that obtained with uninterrupted speech. This highlights the difficulty in interpreting differences in interrupted speech performance across conditions for which uninterrupted performance is at ceiling. PMID:25324110

Ardoint, Marine; Green, Tim; Rosen, Stuart

2014-10-01

459

Speech processing based on short-time Fourier analysis  

SciTech Connect

Short-time Fourier analysis (STFA) is a mathematical technique that represents nonstationary signals, such as speech, music, and seismic signals in terms of time-varying spectra. This representation provides a formalism for such intuitive notions as time-varying frequency components and pitch contours. Consequently, STFA is useful for speech analysis and speech processing. This paper shows that STFA provides a convenient technique for estimating and modifying certain perceptual parameters of speech. As an example of an application of STFA of speech, the problem of time-compression or expansion of speech, while preserving pitch and time-varying frequency content is presented.

Portnoff, M.R.

1981-06-02

460

Preschool Speech Intelligibility and Vocabulary Skills Predict Long-Term Speech and Language Outcomes Following Cochlear Implantation in Early Childhood  

PubMed Central

Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants, but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine if early preschool measures of speech and language performance predict speech-language functioning in long-term users of cochlear implants. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3 – 6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes. PMID:23998347

Castellanos, Irina; Kronenberger, William G.; Beer, Jessica; Henning, Shirley C.; Colson, Bethany G.; Pisoni, David B.

2013-01-01

461

ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION Vikramjit Mitra1  

E-print Network

ARTICULATORY TRAJECTORIES FOR LARGE-VOCABULARY SPEECH RECOGNITION Vikramjit Mitra1 , Wen Wang1 and can potentially help to improve speech recognition performance. Most of the studies involving used such features for speech recognition. Speech recognition studies using articulatory information

Stolcke, Andreas

462

Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization  

E-print Network

Scalable Distributed Speech Recognition Using Multi-Frame GMM-Based Block Quantization Kuldip K cepstral coefficient (MFCC) features in distributed speech recognition (DSR) applications. This coding speech recognition (ASR) technology in the context of mobile communication systems. Speech recognition

463

SPEECH PRODUCTION AND PERCEPTION MODELS AND THEIR APPLICATIONS TO SYNTHESIS, RECOGNITION, AND CODING  

E-print Network

SPEECH PRODUCTION AND PERCEPTION MODELS AND THEIR APPLICATIONS TO SYNTHESIS, RECOGNITION andperception mechanismsandleadtohigh- quality computer synthesis of speech, robust automatic speech recognition performance, reliability,andwide-spread use of speech-processing devices. Using mathematical modelsof

Alwan, Abeer

464

Speech Pathology in Ancient India--A Review of Sanskrit Literature.  

ERIC Educational Resources Information Center

The paper is a review of ancient Sanskrit literature for information on the origin and development of speech and language, speech production, normality of speech and language, and disorders of speech and language and their treatment. (DB)

Savithri, S. R.

1987-01-01

465

42 CFR 485.715 - Condition of participation: Speech pathology services.  

Code of Federal Regulations, 2012 CFR

...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

2012-10-01

466

42 CFR 485.715 - Condition of participation: Speech pathology services.  

Code of Federal Regulations, 2011 CFR

...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

2011-10-01

467

42 CFR 485.715 - Condition of participation: Speech pathology services.  

Code of Federal Regulations, 2013 CFR

...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

2013-10-01

468

42 CFR 485.715 - Condition of participation: Speech pathology services.  

Code of Federal Regulations, 2010 CFR

...Condition of participation: Speech pathology services. 485.715 Section 485...Physical Therapy and Speech-Language Pathology Services § 485.715 Condition of participation: Speech pathology services. If speech pathology...

2010-10-01

469

Speech Production as State Feedback Control  

PubMed Central

Spoken language exists because of a remarkable neural process. Inside a speaker's brain, an intended message gives rise to neural signals activating the muscles of the vocal tract. The process is remarkable because these muscles are activated in just the right way that the vocal tract produces sounds a listener understands as the intended message. What is the best approach to understanding the neural substrate of this crucial motor control process? One of the key recent modeling developments in neuroscience has been the use of state feedback control (SFC) theory to explain the role of the CNS in motor control. SFC postulates that the CNS controls motor output by (1) estimating the current dynamic state of the thing (e.g., arm) being controlled, and (2) generating controls based on this estimated state. SFC has successfully predicted a great range of non-speech motor phenomena, but as yet has not received attention in the speech motor control community. Here, we review some of the key characteristics of speech motor control and what they say about the role of the CNS in the process. We then discuss prior efforts to model the role of CNS in speech motor control, and argue that these models have inherent limitations – limitations that are overcome by an SFC model of speech motor control which we describe. We conclude by discussing a plausible neural substrate of our model. PMID:22046152

Houde, John F.; Nagarajan, Srikantan S.

2011-01-01

470

Speech evoked auditory brainstem response in stuttering.  

PubMed

Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40?ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262

Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad

2014-01-01

471

Markers of deception in italian speech.  

PubMed

Lying is a universal activity and the detection of lying a universal concern. Presently, there is great interest in determining objective measures of deception. The examination of speech, in particular, holds promise in this regard; yet, most of what we know about the relationship between speech and lying is based on the assessment of English speaking participants. Few studies have examined indicators of deception in languages other than English. The world's languages differ in significant ways, and cross-linguistic studies of deceptive communications are a research imperative. Here we review some of these differences amongst the world's languages, and provide an overview of a number of recent studies demonstrating that cross-linguistic research is a worthwhile endeavor. In addition, we report the results of an empirical investigation of pitch, response latency, and speech rate as cues to deception in Italian speech. True and false opinions were elicited in an audio-taped interview. A within-subjects analysis revealed no significant difference between the average pitch of the two conditions; however, speech rate was significantly slower, while response latency was longer, during deception compared with truth-telling. We explore the implications of these findings and propose directions for future research, with the aim of expanding the cross-linguistic branch of research on markers of deception. PMID:23162502

Spence, Katelyn; Villar, Gina; Arciuli, Joanne

2012-01-01

472

Music and speech prosody: a common rhythm  

PubMed Central

Disorders of music and speech perception, known as amusia and aphasia, have traditionally been regarded as dissociated deficits based on studies of brain damaged patients. This has been taken as evidence that music and speech are perceived by largely separate and independent networks in the brain. However, recent studies of congenital amusia have broadened this view by showing that the deficit is associated with problems in perceiving speech prosody, especially intonation and emotional prosody. In the present study the association between the perception of music and speech prosody was investigated with healthy Finnish adults (n = 61) using an on-line music perception test including the Scale subtest of Montreal Battery of Evaluation of Amusia (MBEA) and Off-Beat and Out-of-key tasks as well as a prosodic verbal task that measures the perception of word stress. Regression analyses showed that there was a clear association between prosody perception and music perception, especially in the domain of rhythm perception. This association was evident after controlling for music education, age, pitch perception, visuospatial perception, and working memory. Pitch perception was significantly associated with music perception but not with prosody perception. The association between music perception and visuospatial perception (measured using analogous tasks) was less clear. Overall, the pattern of results indicates that there is a robust link between music and speech perception and that this link can be mediated by rhythmic cues (time and stress). PMID:24032022

Hausen, Maija; Torppa, Ritva; Salmela, Viljami R.; Vainio, Martti; Sarkamo, Teppo

2013-01-01

473

Cross-Modal Prediction in Speech Perception  

PubMed Central

Speech perception often benefits from vision of the speaker's lip movements when they are available. One potential mechanism underlying this reported gain in perception arising from audio-visual integration is on-line prediction. In this study we address whether the preceding speech context in a single modality can improve audiovisual processing and whether this improvement is based on on-line information-transfer across sensory modalities. In the experiments presented here, during each trial, a speech fragment (context) presented in a single sensory modality (voice or lips) was immediately continued by an audiovisual target fragment. Participants made speeded judgments about whether voice and lips were in agreement in the target fragment. The leading single sensory context and the subsequent audiovisual target fragment could be continuous in either one modality only, both (context in one modality continues into both modalities in the target fragment) or neither modalities (i.e., discontinuous). The results showed quicker audiovisual matching responses when context was continuous with the target within either the visual or auditory channel (Experiment 1). Critically, prior visual context also provided an advantage when it was cross-modally continuous (with the auditory channel in the target), but auditory to visual cross-modal continuity resulted in no advantage (Experiment 2). This suggests that visual speech information can provide an on-line benefit for processing the upcoming auditory input through the use of predictive mechanisms. We hypothesize that this benefit is expressed at an early level of speech analysis. PMID:21998642

Sánchez-García, Carolina; Alsius, Agnès; Enns, James T.; Soto-Faraco, Salvador

2011-01-01

474

Electroglottographic and perceptual evaluation of tracheoesophageal speech.  

PubMed

To optimize tracheoesophageal (TO) speech after total laryngectomy, it is vital to have a robust tool of assessment to help investigate deficiencies, document changes, and facilitate therapy. We sought to evaluate and validate electroglottography (EGG) as an important tool in the multidimensional assessment of TO speech. This study is a cross-sectional study of the largest cohort of TO speakers treated by a single surgeon. A second group of normal laryngeal speakers served as a control group. EGG analysis of both groups using connected speech and sustained vowels was performed. Two trained expert raters undertook perceptual evaluation using two accepted scales. EGG measures were then analyzed for correlation with treatment variables. A separate correlation analysis was performed to identify EGG measures that may be associated with perceptual dimensions. Our data from EGG analysis are similar to data obtained from conventional acoustic signal analysis of TO speakers. Sustained vowel and connected speech parameters were poorer in TO speakers than in normal laryngeal speakers. In perceptual evaluation, only grade (G) of the GRBAS scale and Overall Voice Quality appeared reproducible and reliable. T stage, pharyngeal reconstruction and method of closure, cricopharyngeal myotomy, and postoperative complications appear to be correlated with the EGG measures. Five voice measures-jitter, shimmer, average frequency, normalized noise energy, and irregularity-correlated well with the key dimensions of perceptual assessment. EGG is an important assessment tool of TO speech, and can now be reliably used in a clinical setting. PMID:17490856

Kazi, Rehan; Kanagalingam, Jeeve; Venkitaraman, Ramachandran; Prasad, Vyas; Clarke, Peter; Nutting, Christopher M; Rhys-Evans, Peter; Harrington, Kevin J

2009-03-01

475

Gesture facilitates the syntactic analysis of speech.  

PubMed

Recent research suggests that the brain routinely binds together information from gesture and speech. However, most of this research focused on the integration of representational gestures with the semantic content of speech. Much less is known about how other aspects of gesture, such as emphasis, influence the interpretation of the syntactic relations in a spoken message. Here, we investigated whether beat gestures alter which syntactic structure is assigned to ambiguous spoken German sentences. The P600 component of the Event Related Brain Potential indicated that the more complex syntactic structure is easier to process when the speaker emphasizes the subject of a sentence with a beat. Thus, a simple flick of the hand can change our interpretation of who has been doing what to whom in a spoken sentence. We conclude that gestures and speech are integrated systems. Unlike previous studies, which have shown that the brain effortlessly integrates semantic information from gesture and speech, our study is the first to demonstrate that this integration also occurs for syntactic information. Moreover, the effect appears to be gesture-specific and was not found for other stimuli that draw attention to certain parts of speech, including prosodic emphasis, or a moving visual stimulus with the same trajectory as the gesture. This suggests that only visual emphasis produced with a communicative intention in mind (that is, beat gestures) influences language comprehension, but not a simple visual movement lacking such an intention. PMID:22457657

Holle, Henning; Obermeier, Christian; Schmidt-Kassow, Maren; Friederici, Angela D; Ward, Jamie; Gunter, Thomas C

2012-01-01

476

Auditory perception bias in speech imitation  

PubMed Central

In an experimental study, we explored the role of auditory perception bias in vocal pitch imitation. Psychoacoustic tasks involving a missing fundamental indicate that some listeners are attuned to the relationship between all the higher harmonics present in the signal, which supports their perception of the fundamental frequency (the primary acoustic correlate of pitch). Other listeners focus on the lowest harmonic constituents of the complex sound signal which may hamper the perception of the fundamental. These two listener types are referred to as fundamental and spectral listeners, respectively. We hypothesized that the individual differences in speakers' capacity to imitate F0 found in earlier studies, may at least partly be due to the capacity to extract information about F0 from the speech signal. Participants' auditory perception bias was determined with a standard missing fundamental perceptual test. Subsequently, speech data were collected in a shadowing task with two conditions, one with a full speech signal and one with high-pass filtered speech above 300 Hz. The results showed that perception bias toward fundamental frequency was related to the degree of F0 imitation. The effect was stronger in the condition with high-pass filtered speech. The experimental outcomes suggest advantages for fundamental listeners in communicative situations where F0 imitation is used as a behavioral cue. Future research needs to determine to what extent auditory perception bias may be related to other individual properties known to improve imitation, such as phonetic talent. PMID:24204361

Postma-Nilsenova, Marie; Postma, Eric

2013-01-01

477

Speech Evoked Auditory Brainstem Response in Stuttering  

PubMed Central

Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40?ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262

Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad

2014-01-01

478

MODELLING THE PREPAUSAL LENGTHENING EFFECT FOR SPEECH RECOGNITION: A DYNAMIC BAYESIAN NETWORK APPROACH  

E-print Network

MODELLING THE PREPAUSAL LENGTHENING EFFECT FOR SPEECH RECOGNITION: A DYNAMIC BAYESIAN NETWORK- bust speech recognition, dynamic Bayesian networks 1. INTRODUCTION Automatic speech recognition (ASR Speech has a property that the speech unit preceding a speech pause tends to lengthen. This work presents

Noble, William Stafford

479

SIGNIFICANCE OF EARLY TAGGED CONTEXTUAL GRAPHEMES IN GRAPHEME BASED SPEECH SYNTHESIS AND RECOGNITION SYSTEMS  

E-print Network

a significant role in improving the performance of grapheme based speech synthesis and speech recognition systems. Index Terms-- Grapheme, Speech Synthesis, Speech Recognition, Contextual Graphemes, Minority of a language and thus play a vi- tal role in building speech synthesis and speech recognition systems. Fig. 1

Black, Alan W

480

Tuned to the Signal: The Privileged Status of Speech for Young Infants  

ERIC Educational Resources Information Center

Do young infants treat speech as a special signal, compared with structurally similar non-speech sounds? We presented 2- to 7-month-old infants with nonsense speech sounds and complex non-speech analogues. The non-speech analogues retain many of the spectral and temporal properties of the speech signal, including the pitch contour information…

Vouloumanos, Athena; Werker, Janet F.

2004-01-01

481

Electrophysiological Evidence for a Multisensory Speech-Specific Mode of Perception  

ERIC Educational Resources Information Center

We investigated whether the interpretation of auditory stimuli as speech or non-speech affects audiovisual (AV) speech integration at the neural level. Perceptually ambiguous sine-wave replicas (SWS) of natural speech were presented to listeners who were either in "speech mode" or "non-speech mode". At the behavioral level, incongruent lipread…

Stekelenburg, Jeroen J.; Vroomen, Jean

2012-01-01

482

Parsing the phonological loop: Activation timing in the dorsal speech stream determines accuracy in speech reproduction  

PubMed Central

Summary Despite significant research and important clinical correlates, direct neural evidence for a phonological loop linking speech perception, short-term memory and production remains elusive. To investigate these processes, we acquired whole-head magnetoencephalographic (MEG) recordings from human subjects performing a variable-length syllable sequence reproduction task. The MEG sensor data was source-localized using a time-frequency spatially adaptive filter, and we examined the time-courses of cortical oscillatory power and the correlations of oscillatory power with behavior, between onset of the audio stimulus and the overt speech response. We found dissociations between time-courses of behaviorally relevant activations in a network of regions falling largely within the dorsal speech stream. In particular, verbal working memory load modulated high gamma power (HGP) in both Sylvian-Parietal-Temporal (Spt) and Broca’s Areas. The time-courses of the correlations between HGP and subject performance clearly alternated between these two regions throughout the task. Our results provide the first evidence of a reverberating input-output buffer system in the dorsal stream underlying speech sensorimotor integration, consistent with recent phonological loop, competitive queuing and speech-motor control models. These findings also shed new light on potential sources of speech dysfunction in aphasia and neuropsychiatric disorders, identifying anatomically and behaviorally dissociable activation time-windows critical for successful speech reproduction. PMID:23536060

Herman, Alexander B.; Houde, John F.; Vinogradov, Sophia; Nagarajan, Srikantan

2013-01-01

483

Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music.  

PubMed

This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech. PMID:25147539

Lee, Hweeling; Noppeney, Uta

2014-01-01

484

A Danish open-set speech corpus for competing-speech studies.  

PubMed

Studies investigating speech-on-speech masking effects commonly use closed-set speech materials such as the coordinate response measure [Bolia et al. (2000). J. Acoust. Soc. Am. 107, 1065-1066]. However, these studies typically result in very low (i.e., negative) speech recognition thresholds (SRTs) when the competing speech signals are spatially separated. To achieve higher SRTs that correspond more closely to natural communication situations, an open-set, low-context, multi-talker speech corpus was developed. Three sets of 268 unique Danish sentences were created, and each set was recorded with one of three professional female talkers. The intelligibility of each sentence in the presence of speech-shaped noise was measured. For each talker, 200 approximately equally intelligible sentences were then selected and systematically distributed into 10 test lists. Test list homogeneity was assessed in a setup with a frontal target sentence and two concurrent masker sentences at ±50° azimuth. For a group of 16 normal-hearing listeners and a group of 15 elderly (linearly aided) hearing-impaired listeners, overall SRTs of, respectively, +1.3?dB and +6.3?dB target-to-masker ratio were obtained. The new corpus was found to be very sensitive to inter-individual differences and produced consistent results across test lists. The corpus is publicly available. PMID:24437781

Nielsen, Jens Bo; Dau, Torsten; Neher, Tobias

2014-01-01

485

What Is Voice? What Is Speech? What Is Language?  

MedlinePLUS

... What Is Speech? What Is Language? What Is Voice? What Is Speech? What Is Language? On this ... professionals, it is important to distinguish among them. Voice Voice (or vocalization) is the sound produced by ...

486

Modelling out-of-vocabulary words for robust speech recognition  

E-print Network

This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some ...

Bazzi, Issam

2002-01-01

487

A Rating of Doctoral Programs in Speech Communication, 1976  

ERIC Educational Resources Information Center

Reviews a survey evaluation of speech communication doctoral programs existing in 1976. Available from: ACA Bulletin, Robert Hall, Editor, Speech Communication Association, 5205 Leesburg Pike, Suite 1001, Falls Church, VA 22041. (MH)

Edwards, Renee; Barker, Larry

1977-01-01

488

Graduate Programs in Speech Communication: A Position Paper  

ERIC Educational Resources Information Center

Details a position paper concerning the major focus of graduate programs in speech communication. Available from: ACA Bulletin, Robert Hall, Editor, Speech Communication Association, 5205 Leesburg Pike, Suite 1001, Falls Church, VA 22041. (MH)

Goldberg, Alvin A.

1977-01-01

489

"Thoughts Concerning Education": John Locke On Teaching Speech  

ERIC Educational Resources Information Center

Locke's suggestions for more effective speech instruction have gone largely unnoticed. Consequently, it is the purpose of this article to consider John Locke's criticisms, theory and specific methods of speech education. (Author)

Baird, John E.

1971-01-01

490

Multi-level acoustic modeling for automatic speech recognition  

E-print Network

Context-dependent acoustic modeling is commonly used in large-vocabulary Automatic Speech Recognition (ASR) systems as a way to model coarticulatory variations that occur during speech production. Typically, the local ...

Chang, Hung-An, Ph. D. Massachusetts Institute of Technology

2012-01-01

491

Getting Your Employer to Cover Speech, Language and Hearing Services  

MedlinePLUS

... hearing benefit provided by speech-language pathologists and audiologists should cost less than 35 cents per member ... employer to contact your speech-language pathologist or audiologist to learn more about their services. Follow-up ...

492

Toward a social signaling framework : activity and emphasis in speech  

E-print Network

Language is not the only form of verbal communication. Loudness, pitch, speaking rate, and other non-linguistic speech features are crucial aspects of human spoken interaction. In this thesis, we separate these speech ...

Stoltzman, William T

2006-01-01

493

Optimization of acoustic feature extraction from dysarthric speech  

E-print Network

Dysarthria is a motor speech disorder characterized by weak or uncoordinated movements of the speech musculature. While unfamiliar listeners struggle to understand speakers with severe dysarthria, familiar listeners are ...

DiCicco, Thomas M., Jr. (Thomas Minotti)

2010-01-01

494

Adding Speech, Language, and Hearing Benefits to Your Policy  

MedlinePLUS

... Adding Speech, Language, and Hearing Benefits to Your Policy Introduction Why add speech, language, and hearing benefits? ... language, and hearing benefits to your health insurance policy. ASHA understands that you may have several questions ...

495

Bayesian network structures and inference techniques for automatic speech recognition  

E-print Network

in realistic environments. These phenomena include gender and age differences, pronunciation variability the theory and implementation of Bayesian networks in the context of automatic speech recognition. Bayesian, differences in articulation, microphone and channel variability, and ambient noise. Computer Speech

Hunt, Galen

496

Multimodal speech interfaces for map-based applications  

E-print Network

This thesis presents the development of multimodal speech interfaces for mobile and vehicle systems. Multimodal interfaces have been shown to increase input efficiency in comparison with their purely speech or text-based ...

Liu, Sean (Sean Y.)

2010-01-01

497

DESCRIBING THE EMOTIONAL STATES EXPRESSED IN SPEECH Roddy Cowie  

E-print Network

DESCRIBING THE EMOTIONAL STATES EXPRESSED IN SPEECH Roddy Cowie Psychology, Queen's University that somebody's voice is tinged with emotion. Research on emotion in psychology and biology has tended, Belfast ABSTRACT Describing relationships between speech and emotion depends on identifying appropriate

Hirschberg, Julia

498

PRONUNCIATION VERIFICATION OF CHILDREN'S SPEECH FOR AUTOMATIC LITERACY ASSESSMENT  

E-print Network

PRONUNCIATION VERIFICATION OF CHILDREN'S SPEECH FOR AUTOMATIC LITERACY ASSESSMENT Joseph Tepperman1 part of automatically assessing a new reader's literacy is in verifying his pronunciation of read% of the time. Index Terms: children's speech, literacy, pronunciation 1. INTRODUCTION Automatically assessing

Alwan, Abeer

499

AUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH Ani Nenkova  

E-print Network

AUTOMATIC DETECTION OF CONTRASTIVE ELEMENTS IN SPONTANEOUS SPEECH Ani Nenkova University impor- tant. Contrastive elements are often produced with stronger than usual prominence corpus of conversational speech to study the acoustic characteristics of contrastive elements and the dif

Plotkin, Joshua B.

500

Overview of speech technology of the 80's  

SciTech Connect

The author describes the technology innovations necessary to accommodate the market need which is the driving force toward greater perceived computer intelligence. The author discusses aspects of both speech synthesis and speech recognition.

Crook, S.B.

1981-01-01