unit selection text-to-speech: Topics by Science.gov

Sample records for unit selection text-to-speech

Advancements in text-to-speech technology and implications for AAC applications

NASA Astrophysics Data System (ADS)

Syrdal, Ann K.

2003-10-01

Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.
The Role of Music in Speech Intelligibility of Learners with Post Lingual Hearing Impairment in Selected Units in Lusaka District

ERIC Educational Resources Information Center

Katongo, Emily Mwamba; Ndhlovu, Daniel

2015-01-01

This study sought to establish the role of music in speech intelligibility of learners with Post Lingual Hearing Impairment (PLHI) and strategies teachers used to enhance speech intelligibility in learners with PLHI in selected special units for the deaf in Lusaka district. The study used a descriptive research design. Qualitative and quantitative…
Accurate visible speech synthesis based on concatenating variable length motion capture data.

PubMed

Ma, Jiyong; Cole, Ron; Pellom, Bryan; Ward, Wayne; Wise, Barbara

2006-01-01

We present a novel approach to synthesizing accurate visible speech based on searching and concatenating optimal variable-length units in a large corpus of motion capture data. Based on a set of visual prototypes selected on a source face and a corresponding set designated for a target face, we propose a machine learning technique to automatically map the facial motions observed on the source face to the target face. In order to model the long distance coarticulation effects in visible speech, a large-scale corpus that covers the most common syllables in English was collected, annotated and analyzed. For any input text, a search algorithm to locate the optimal sequences of concatenated units for synthesis is desrcribed. A new algorithm to adapt lip motions from a generic 3D face model to a specific 3D face model is also proposed. A complete, end-to-end visible speech animation system is implemented based on the approach. This system is currently used in more than 60 kindergarten through third grade classrooms to teach students to read using a lifelike conversational animated agent. To evaluate the quality of the visible speech produced by the animation system, both subjective evaluation and objective evaluation are conducted. The evaluation results show that the proposed approach is accurate and powerful for visible speech synthesis.
Implementation of Three Text to Speech Systems for Kurdish Language

NASA Astrophysics Data System (ADS)

Bahrampour, Anvar; Barkhoda, Wafa; Azami, Bahram Zahir

Nowadays, concatenative method is used in most modern TTS systems to produce artificial speech. The most important challenge in this method is choosing appropriate unit for creating database. This unit must warranty smoothness and high quality speech, and also, creating database for it must reasonable and inexpensive. For example, syllable, phoneme, allophone, and, diphone are appropriate units for all-purpose systems. In this paper, we implemented three synthesis systems for Kurdish language based on syllable, allophone, and diphone and compare their quality using subjective testing.
A Selective Deficit in Phonetic Recalibration by Text in Developmental Dyslexia.

PubMed

Keetels, Mirjam; Bonte, Milene; Vroomen, Jean

2018-01-01

Upon hearing an ambiguous speech sound, listeners may adjust their perceptual interpretation of the speech input in accordance with contextual information, like accompanying text or lipread speech (i.e., phonetic recalibration; Bertelson et al., 2003). As developmental dyslexia (DD) has been associated with reduced integration of text and speech sounds, we investigated whether this deficit becomes manifest when text is used to induce this type of audiovisual learning. Adults with DD and normal readers were exposed to ambiguous consonants halfway between /aba/ and /ada/ together with text or lipread speech. After this audiovisual exposure phase, they categorized auditory-only ambiguous test sounds. Results showed that individuals with DD, unlike normal readers, did not use text to recalibrate their phoneme categories, whereas their recalibration by lipread speech was spared. Individuals with DD demonstrated similar deficits when ambiguous vowels (halfway between /wIt/ and /wet/) were recalibrated by text. These findings indicate that DD is related to a specific letter-speech sound association deficit that extends over phoneme classes (vowels and consonants), but - as lipreading was spared - does not extend to a more general audio-visual integration deficit. In particular, these results highlight diminished reading-related audiovisual learning in addition to the commonly reported phonological problems in developmental dyslexia.
A Study of Text-to-Speech (TTS) in Children's English Learning

ERIC Educational Resources Information Center

Huang, Yi-Ching; Liao, Lung-Chuan

2015-01-01

The purpose of this study was to explore the effects of the digital material incorporated into Text-to- Speech system for students' English spelling. The digital material was made on the basis of the Spelling Bee vocabulary list (approximately 300 words) issued by the selected school. 21 third graders from a private bilingual school in Taiwan were…
Speechmaking as a Public Relations Technique: A Descriptive Study of Speechmaking Practices and Attitudes Among Selected Public Relations Professionals.

ERIC Educational Resources Information Center

Roman, Charles Vasile

This study surveys speech-making practices and attitudes of practitioners in firms not primarily engaged in providing public relations services. Questionnaires designed to assess the uses of speech making as a public relations technique were sent to the 50 largest United States advertising agencies and to the 34 largest United States business and…
Reducing language to rhythm: Amazonian Bora drummed language exploits speech rhythm for long-distance communication

NASA Astrophysics Data System (ADS)

Seifart, Frank; Meyer, Julien; Grawunder, Sven; Dentel, Laure

2018-04-01

Many drum communication systems around the world transmit information by emulating tonal and rhythmic patterns of spoken languages in sequences of drumbeats. Their rhythmic characteristics, in particular, have not been systematically studied so far, although understanding them represents a rare occasion for providing an original insight into the basic units of speech rhythm as selected by natural speech practices directly based on beats. Here, we analyse a corpus of Bora drum communication from the northwest Amazon, which is nowadays endangered with extinction. We show that four rhythmic units are encoded in the length of pauses between beats. We argue that these units correspond to vowel-to-vowel intervals with different numbers of consonants and vowel lengths. By contrast, aligning beats with syllables, mora or only vowel length yields inconsistent results. Moreover, we also show that Bora drummed messages conventionally select rhythmically distinct markers to further distinguish words. The two phonological tones represented in drummed speech encode only few lexical contrasts. Rhythm thus appears to crucially contribute to the intelligibility of drummed Bora. Our study provides novel evidence for the role of rhythmic structures composed of vowel-to-vowel intervals in the complex puzzle concerning the redundancy and distinctiveness of acoustic features embedded in speech.
Reducing language to rhythm: Amazonian Bora drummed language exploits speech rhythm for long-distance communication

PubMed Central

Grawunder, Sven; Dentel, Laure

2018-01-01

Many drum communication systems around the world transmit information by emulating tonal and rhythmic patterns of spoken languages in sequences of drumbeats. Their rhythmic characteristics, in particular, have not been systematically studied so far, although understanding them represents a rare occasion for providing an original insight into the basic units of speech rhythm as selected by natural speech practices directly based on beats. Here, we analyse a corpus of Bora drum communication from the northwest Amazon, which is nowadays endangered with extinction. We show that four rhythmic units are encoded in the length of pauses between beats. We argue that these units correspond to vowel-to-vowel intervals with different numbers of consonants and vowel lengths. By contrast, aligning beats with syllables, mora or only vowel length yields inconsistent results. Moreover, we also show that Bora drummed messages conventionally select rhythmically distinct markers to further distinguish words. The two phonological tones represented in drummed speech encode only few lexical contrasts. Rhythm thus appears to crucially contribute to the intelligibility of drummed Bora. Our study provides novel evidence for the role of rhythmic structures composed of vowel-to-vowel intervals in the complex puzzle concerning the redundancy and distinctiveness of acoustic features embedded in speech. PMID:29765620
Objective voice and speech analysis of persons with chronic hoarseness by prosodic analysis of speech samples.

PubMed

Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar

2016-10-01

Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units

PubMed Central

Tilsen, Sam; Spincemaille, Pascal; Xu, Bo; Doerschuk, Peter; Luh, Wen-Ming; Feldman, Elana; Wang, Yi

2016-01-01

Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors. PMID:26760511
Anticipatory Posturing of the Vocal Tract Reveals Dissociation of Speech Movement Plans from Linguistic Units.

PubMed

Tilsen, Sam; Spincemaille, Pascal; Xu, Bo; Doerschuk, Peter; Luh, Wen-Ming; Feldman, Elana; Wang, Yi

2016-01-01

Models of speech production typically assume that control over the timing of speech movements is governed by the selection of higher-level linguistic units, such as segments or syllables. This study used real-time magnetic resonance imaging of the vocal tract to investigate the anticipatory movements speakers make prior to producing a vocal response. Two factors were varied: preparation (whether or not speakers had foreknowledge of the target response) and pre-response constraint (whether or not speakers were required to maintain a specific vocal tract posture prior to the response). In prepared responses, many speakers were observed to produce pre-response anticipatory movements with a variety of articulators, showing that that speech movements can be readily dissociated from higher-level linguistic units. Substantial variation was observed across speakers with regard to the articulators used for anticipatory posturing and the contexts in which anticipatory movements occurred. The findings of this study have important consequences for models of speech production and for our understanding of the normal range of variation in anticipatory speech behaviors.
Comprehension of synthetic speech and digitized natural speech by adults with aphasia.

PubMed

Hux, Karen; Knollman-Porter, Kelly; Brown, Jessica; Wallace, Sarah E

2017-09-01

Using text-to-speech technology to provide simultaneous written and auditory content presentation may help compensate for chronic reading challenges if people with aphasia can understand synthetic speech output; however, inherent auditory comprehension challenges experienced by people with aphasia may make understanding synthetic speech difficult. This study's purpose was to compare the preferences and auditory comprehension accuracy of people with aphasia when listening to sentences generated with digitized natural speech, Alex synthetic speech (i.e., Macintosh platform), or David synthetic speech (i.e., Windows platform). The methodology required each of 20 participants with aphasia to select one of four images corresponding in meaning to each of 60 sentences comprising three stimulus sets. Results revealed significantly better accuracy given digitized natural speech than either synthetic speech option; however, individual participant performance analyses revealed three patterns: (a) comparable accuracy regardless of speech condition for 30% of participants, (b) comparable accuracy between digitized natural speech and one, but not both, synthetic speech option for 45% of participants, and (c) greater accuracy with digitized natural speech than with either synthetic speech option for remaining participants. Ranking and Likert-scale rating data revealed a preference for digitized natural speech and David synthetic speech over Alex synthetic speech. Results suggest many individuals with aphasia can comprehend synthetic speech options available on popular operating systems. Further examination of synthetic speech use to support reading comprehension through text-to-speech technology is thus warranted. Copyright © 2017 Elsevier Inc. All rights reserved.
Speech-Language Pathologists' Opinions on Response to Intervention

ERIC Educational Resources Information Center

Sanger, Dixie; Mohling, Sara; Stremlau, Aliza

2012-01-01

The purpose of this study was to survey the opinions of speech-language pathologists (SLPs) on response to intervention (RTI). Questionnaires were mailed to 2,000 randomly selected elementary and secondary SLPs throughout the United States. Mean results of 583 respondents (29.15%) indicated that SLPs agreed on 37 Likert-type items and responded…
Approaching the Linguistic Complexity

NASA Astrophysics Data System (ADS)

Drożdż, Stanisław; Kwapień, Jarosław; Orczyk, Adam

We analyze the rank-frequency distributions of words in selected English and Polish texts. We compare scaling properties of these distributions in both languages. We also study a few small corpora of Polish literary texts and find that for a corpus consisting of texts written by different authors the basic scaling regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scaling regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, based on the British National Corpus, we consider the rank-frequency distributions of the grammatically basic forms of words (lemmas) tagged with their proper part of speech. We find that these distributions do not scale if each part of speech is analyzed separately. The only part of speech that independently develops a trace of scaling is verbs.
Developing the Alphabetic Principle to Aid Text-Based Augmentative and Alternative Communication Use by Adults With Low Speech Intelligibility and Intellectual Disabilities.

PubMed

Schmidt-Naylor, Anna C; Saunders, Kathryn J; Brady, Nancy C

2017-05-17

We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy.
Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex.

PubMed

Yao, Bo; Belin, Pascal; Scheepers, Christoph

2011-10-01

In human communication, direct speech (e.g., Mary said: "I'm hungry") is perceived to be more vivid than indirect speech (e.g., Mary said [that] she was hungry). However, for silent reading, the representational consequences of this distinction are still unclear. Although many of us share the intuition of an "inner voice," particularly during silent reading of direct speech statements in text, there has been little direct empirical confirmation of this experience so far. Combining fMRI with eye tracking in human volunteers, we show that silent reading of direct versus indirect speech engenders differential brain activation in voice-selective areas of the auditory cortex. This suggests that readers are indeed more likely to engage in perceptual simulations (or spontaneous imagery) of the reported speaker's voice when reading direct speech as opposed to meaning-equivalent indirect speech statements as part of a more vivid representation of the former. Our results may be interpreted in line with embodied cognition and form a starting point for more sophisticated interdisciplinary research on the nature of auditory mental simulation during reading.
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming

ERIC Educational Resources Information Center

Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.

2009-01-01

Purpose: The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial…
A survey of acoustic conditions in semi-open plan classrooms in the United Kingdom.

PubMed

Greenland, Emma E; Shield, Bridget M

2011-09-01

This paper reports the results of a large scale, detailed acoustic survey of 42 open plan classrooms of varying design in the UK each of which contained between 2 and 14 teaching areas or classbases. The objective survey procedure, which was designed specifically for use in open plan classrooms, is described. The acoustic measurements relating to speech intelligibility within a classbase, including ambient noise level, intrusive noise level, speech to noise ratio, speech transmission index, and reverberation time, are presented. The effects on speech intelligibility of critical physical design variables, such as the number of classbases within an open plan unit and the selection of acoustic finishes for control of reverberation, are examined. This analysis enables limitations of open plan classrooms to be discussed and acoustic design guidelines to be developed to ensure good listening conditions. The types of teaching activity to provide adequate acoustic conditions, plus the speech intelligibility requirements of younger children, are also discussed. © 2011 Acoustical Society of America
Selected Print and Nonprint Resources in Speech Communication: An Annotated Bibliography, K-12.

ERIC Educational Resources Information Center

Feezel, Jerry D., Comp.; And Others

This annotated guide to resources in speech communication will be valuable for K-12 teachers seeking resources for both required and elective units. Entries are organized by grade level within the various content areas and are grouped under the following section headings: print, nonprint, multimedia, and major sources. Within each of these four…

The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

PubMed

Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

2018-01-01

Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Developing the Alphabetic Principle to Aid Text-Based Augmentative and Alternative Communication Use by Adults With Low Speech Intelligibility and Intellectual Disabilities

PubMed Central

Schmidt-Naylor, Anna C.; Brady, Nancy C.

2017-01-01

Purpose We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Method Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Results Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. Conclusions This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy. PMID:28474087
Re-Presenting Subversive Songs: Applying Strategies for Invention and Arrangement to Nontraditional Speech Texts

ERIC Educational Resources Information Center

Charlesworth, Dacia

2010-01-01

Invention deals with the content of a speech, arrangement involves placing the content in an order that is most strategic, style focuses on selecting linguistic devices, such as metaphor, to make the message more appealing, memory assists the speaker in delivering the message correctly, and delivery ideally enables great reception of the message.…
NaturalReader: A New Generation Text Reader

ERIC Educational Resources Information Center

Flood, Jacqueline

2007-01-01

NaturalReader (http://www.naturalreaders.com/) is a new generation text reader, which means that it reads any machine readable text using synthesized speech without having to copy and paste the selected text into the NaturalReader application window. It installs a toolbar directly into all of the Microsoft Office[TM] programs and uses a mini-board…
Development of A Two-Stage Procedure for the Automatic Recognition of Dysfluencies in the Speech of Children Who Stutter: I. Psychometric Procedures Appropriate for Selection of Training Material for Lexical Dysfluency Classifiers

PubMed Central

Howell, Peter; Sackin, Stevie; Glenn, Kazan

2007-01-01

This program of work is intended to develop automatic recognition procedures to locate and assess stuttered dysfluencies. This and the following article together, develop and test recognizers for repetitions and prolongations. The automatic recognizers classify the speech in two stages: In the first, the speech is segmented and in the second the segments are categorized. The units that are segmented are words. Here assessments by human judges on the speech of 12 children who stutter are described using a corresponding procedure. The accuracy of word boundary placement across judges, categorization of the words as fluent, repetition or prolongation, and duration of the different fluency categories are reported. These measures allow reliable instances of repetitions and prolongations to be selected for training and assessing the recognizers in the subsequent paper. PMID:9328878
Intervention Techniques Used With Autism Spectrum Disorder by Speech-Language Pathologists in the United States and Taiwan: A Descriptive Analysis of Practice in Clinical Settings.

PubMed

Hsieh, Ming-Yeh; Lynch, Georgina; Madison, Charles

2018-04-27

This study examined intervention techniques used with children with autism spectrum disorder (ASD) by speech-language pathologists (SLPs) in the United States and Taiwan working in clinic/hospital settings. The research questions addressed intervention techniques used with children with ASD, intervention techniques used with different age groups (under and above 8 years old), and training received before using the intervention techniques. The survey was distributed through the American Speech-Language-Hearing Association to selected SLPs across the United States. In Taiwan, the survey (Chinese version) was distributed through the Taiwan Speech-Language Pathologist Union, 2018, to certified SLPs. Results revealed that SLPs in the United States and Taiwan used 4 common intervention techniques: Social Skill Training, Augmentative and Alternative Communication, Picture Exchange Communication System, and Social Stories. Taiwanese SLPs reported SLP preparation program training across these common intervention strategies. In the United States, SLPs reported training via SLP preparation programs, peer therapists, and self-taught. Most SLPs reported using established or emerging evidence-based practices as defined by the National Professional Development Center (2014) and the National Standards Report (2015). Future research should address comparison of SLP preparation programs to examine the impact of preprofessional training on use of evidence-based practices to treat ASD.
Competencia Comunicativa em Portuges (Communicative Competence in Portuguese).

ERIC Educational Resources Information Center

Paiva, Ricardo

A textbook designed to give speech and writing practice to intermediate and advanced students of Portuguese as a second language includes 14 units intended to cover two semesters' work with approximately five hours per week of instruction. The units typically include: a text forming the basis for free conversation and practice of language…
The effect of simultaneous text on the recall of noise-degraded speech.

PubMed

Grossman, Irina; Rajan, Ramesh

2017-05-01

Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Automatic detection of obstructive sleep apnea using speech signals.

PubMed

Goldshtein, Evgenia; Tarasiuk, Ariel; Zigel, Yaniv

2011-05-01

Obstructive sleep apnea (OSA) is a common disorder associated with anatomical abnormalities of the upper airways that affects 5% of the population. Acoustic parameters may be influenced by the vocal tract structure and soft tissue properties. We hypothesize that speech signal properties of OSA patients will be different than those of control subjects not having OSA. Using speech signal processing techniques, we explored acoustic speech features of 93 subjects who were recorded using a text-dependent speech protocol and a digital audio recorder immediately prior to polysomnography study. Following analysis of the study, subjects were divided into OSA (n=67) and non-OSA (n=26) groups. A Gaussian mixture model-based system was developed to model and classify between the groups; discriminative features such as vocal tract length and linear prediction coefficients were selected using feature selection technique. Specificity and sensitivity of 83% and 79% were achieved for the male OSA and 86% and 84% for the female OSA patients, respectively. We conclude that acoustic features from speech signals during wakefulness can detect OSA patients with good specificity and sensitivity. Such a system can be used as a basis for future development of a tool for OSA screening. © 2011 IEEE
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
Information as Power: An Anthology of Selected United States Army War College Student Papers. Volume 5

DTIC Science & Technology

2011-01-01

sedition or idolatry] and [until] the religion, all of it, is for Allah. And if they cease - then indeed...policy. To examine this view, I have used as sources the following major speeches which bear on the role of religion in his national security policy...Parliament in Ankara, Turkey (henceforth, Ankara); his June 4, 2009 “On a New Beginning” speech at Cairo University, Cairo, Egypt
Multisensory integration of speech sounds with letters vs. visual speech: only visual speech induces the mismatch negativity.

PubMed

Stekelenburg, Jeroen J; Keetels, Mirjam; Vroomen, Jean

2018-05-01

Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Selective attention to human voice enhances brain activity bilaterally in the superior temporal sulcus.

PubMed

Alho, Kimmo; Vorobyev, Victor A; Medvedev, Svyatoslav V; Pakhomov, Sergey V; Starchenko, Maria G; Tervaniemi, Mari; Näätänen, Risto

2006-02-23

Regional cerebral blood flow was measured with positron emission tomography (PET) in 10 healthy male volunteers. They heard two binaurally delivered concurrent stories, one spoken by a male voice and the other by a female voice. A third story was presented at the same time as a text running on a screen. The subjects were instructed to attend silently to one of the stories at a time. In an additional resting condition, no stories were delivered. PET data showed that in comparison with the reading condition, the brain activity in the speech-listening conditions was enhanced bilaterally in the anterior superior temporal sulcus including cortical areas that have been reported to be specifically sensitive to human voice. Previous studies on attention to non-linguistic sounds and visual objects, in turn, showed prefrontal activations that are presumably related to attentional control functions. However, comparisons of the present speech-listening and reading conditions with each other or with the resting condition indicated no prefrontal activity, except for an activation in the inferior frontal cortex that was presumably associated with semantic and syntactic processing of the attended story. Thus, speech listening, as well as reading, even in a distracting environment appears to depend less on the prefrontal control functions than do other types of attention-demanding tasks, probably because selective attention to speech and written text are over-learned actions rehearsed daily.
Does Use of Text-to-Speech and Related Read-Aloud Tools Improve Reading Comprehension for Students with Reading Disabilities? A Meta-Analysis

ERIC Educational Resources Information Center

Wood, Sarah G.; Moxley, Jerad H.; Tighe, Elizabeth L.; Wagner, Richard K.

2018-01-01

Text-to-speech and related read-aloud tools are being widely implemented in an attempt to assist students' reading comprehension skills. Read-aloud software, including text-to-speech, is used to translate written text into spoken text, enabling one to listen to written text while reading along. It is not clear how effective text-to-speech is at…
Evaluation of the importance of time-frequency contributions to speech intelligibility in noise

PubMed Central

Yu, Chengzhu; Wójcicki, Kamil K.; Loizou, Philipos C.; Hansen, John H. L.; Johnson, Michael T.

2014-01-01

Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures. PMID:24815280
Methods and apparatus for non-acoustic speech characterization and recognition

DOEpatents

Holzrichter, John F.

1999-01-01

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Methods and apparatus for non-acoustic speech characterization and recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Texting while driving: is speech-based text entry less risky than handheld text entry?

PubMed

He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S

2014-11-01

Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.
[Prosody, speech input and language acquisition].

PubMed

Jungheim, M; Miller, S; Kühn, D; Ptok, M

2014-04-01

In order to acquire language, children require speech input. The prosody of the speech input plays an important role. In most cultures adults modify their code when communicating with children. Compared to normal speech this code differs especially with regard to prosody. For this review a selective literature search in PubMed and Scopus was performed. Prosodic characteristics are a key feature of spoken language. By analysing prosodic features, children gain knowledge about underlying grammatical structures. Child-directed speech (CDS) is modified in a way that meaningful sequences are highlighted acoustically so that important information can be extracted from the continuous speech flow more easily. CDS is said to enhance the representation of linguistic signs. Taking into consideration what has previously been described in the literature regarding the perception of suprasegmentals, CDS seems to be able to support language acquisition due to the correspondence of prosodic and syntactic units. However, no findings have been reported, stating that the linguistically reduced CDS could hinder first language acquisition.
Use of speech-to-text technology for documentation by healthcare providers.

PubMed

Ajami, Sima

2016-01-01

Medical records are a critical component of a patient's treatment. However, documentation of patient-related information is considered a secondary activity in the provision of healthcare services, often leading to incomplete medical records and patient data of low quality. Advances in information technology (IT) in the health system and registration of information in electronic health records (EHR) using speechto- text conversion software have facilitated service delivery. This narrative review is a literature search with the help of libraries, books, conference proceedings, databases of Science Direct, PubMed, Proquest, Springer, SID (Scientific Information Database), and search engines such as Yahoo, and Google. I used the following keywords and their combinations: speech recognition, automatic report documentation, voice to text software, healthcare, information, and voice recognition. Due to lack of knowledge of other languages, I searched all texts in English or Persian with no time limits. Of a total of 70, only 42 articles were selected. Speech-to-text conversion technology offers opportunities to improve the documentation process of medical records, reduce cost and time of recording information, enhance the quality of documentation, improve the quality of services provided to patients, and support healthcare providers in legal matters. Healthcare providers should recognize the impact of this technology on service delivery.

Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under “Cocktail-Party” Listening Conditions

PubMed Central

Gao, Yayue; Wang, Qian; Ding, Yu; Wang, Changming; Li, Haifeng; Wu, Xihong; Qu, Tianshu; Li, Liang

2017-01-01

Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated “cocktail-party” listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the “cocktail-party” listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process. PMID:28239344
Selective Attention Enhances Beta-Band Cortical Oscillation to Speech under "Cocktail-Party" Listening Conditions.

PubMed

Gao, Yayue; Wang, Qian; Ding, Yu; Wang, Changming; Li, Haifeng; Wu, Xihong; Qu, Tianshu; Li, Liang

2017-01-01

Human listeners are able to selectively attend to target speech in a noisy environment with multiple-people talking. Using recordings of scalp electroencephalogram (EEG), this study investigated how selective attention facilitates the cortical representation of target speech under a simulated "cocktail-party" listening condition with speech-on-speech masking. The result shows that the cortical representation of target-speech signals under the multiple-people talking condition was specifically improved by selective attention relative to the non-selective-attention listening condition, and the beta-band activity was most strongly modulated by selective attention. Moreover, measured with the Granger Causality value, selective attention to the single target speech in the mixed-speech complex enhanced the following four causal connectivities for the beta-band oscillation: the ones (1) from site FT7 to the right motor area, (2) from the left frontal area to the right motor area, (3) from the central frontal area to the right motor area, and (4) from the central frontal area to the right frontal area. However, the selective-attention-induced change in beta-band causal connectivity from the central frontal area to the right motor area, but not other beta-band causal connectivities, was significantly correlated with the selective-attention-induced change in the cortical beta-band representation of target speech. These findings suggest that under the "cocktail-party" listening condition, the beta-band oscillation in EEGs to target speech is specifically facilitated by selective attention to the target speech that is embedded in the mixed-speech complex. The selective attention-induced unmasking of target speech may be associated with the improved beta-band functional connectivity from the central frontal area to the right motor area, suggesting a top-down attentional modulation of the speech-motor process.
Exploring expressivity and emotion with artificial voice and speech technologies.

PubMed

Pauletto, Sandra; Balentine, Bruce; Pidcock, Chris; Jones, Kevin; Bottaci, Leonardo; Aretoulaki, Maria; Wells, Jez; Mundy, Darren P; Balentine, James

2013-10-01

Emotion in audio-voice signals, as synthesized by text-to-speech (TTS) technologies, was investigated to formulate a theory of expression for user interface design. Emotional parameters were specified with markup tags, and the resulting audio was further modulated with post-processing techniques. Software was then developed to link a selected TTS synthesizer with an automatic speech recognition (ASR) engine, producing a chatbot that could speak and listen. Using these two artificial voice subsystems, investigators explored both artistic and psychological implications of artificial speech emotion. Goals of the investigation were interdisciplinary, with interest in musical composition, augmentative and alternative communication (AAC), commercial voice announcement applications, human-computer interaction (HCI), and artificial intelligence (AI). The work-in-progress points towards an emerging interdisciplinary ontology for artificial voices. As one study output, HCI tools are proposed for future collaboration.
Dynamic action units slip in speech production errors ☆

PubMed Central

Goldstein, Louis; Pouplier, Marianne; Chen, Larissa; Saltzman, Elliot; Byrd, Dani

2008-01-01

In the past, the nature of the compositional units proposed for spoken language has largely diverged from the types of control units pursued in the domains of other skilled motor tasks. A classic source of evidence as to the units structuring speech has been patterns observed in speech errors – “slips of the tongue”. The present study reports, for the first time, on kinematic data from tongue and lip movements during speech errors elicited in the laboratory using a repetition task. Our data are consistent with the hypothesis that speech production results from the assembly of dynamically defined action units – gestures – in a linguistically structured environment. The experimental results support both the presence of gestural units and the dynamical properties of these units and their coordination. This study of speech articulation shows that it is possible to develop a principled account of spoken language within a more general theory of action. PMID:16822494
Emergence of neural encoding of auditory objects while listening to competing speakers

PubMed Central

Ding, Nai; Simon, Jonathan Z.

2012-01-01

A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation. PMID:22753470
Classroom Applications: Will Taping and Replaying Their Speeches Help My Fourth Grade Students with Organizing Their Written Work?

ERIC Educational Resources Information Center

Wirth, Harold E.

A fourth-grade teacher developed a unit on writing designed to help his students go from oral to written text after finding that only 4 of the 22 in his classroom had the organizational and writing skills to get their ideas on paper. The basis of the unit was a unique problem which the teacher himself was trying to solve in real life; namely, how…
Translations on Vietnam. Number 1892. Material on the Fourth Vietnam Workers Party Congress (Selected Speeches)

DTIC Science & Technology

1977-02-24

the 17 political struggle against the enemy’s conscription of troops, fight the enemy in their strongholds in Saigon, Hue and Da Nang and...and feudaL cultures. The state-operated cultural and art units and the cinematography sector must be the main force units in the establishment and
Methodology for speech assessment in the Scandcleft project--an international randomized clinical trial on palatal surgery: experiences from a pilot study.

PubMed

Lohmander, A; Willadsen, E; Persson, C; Henningsson, G; Bowden, M; Hutters, B

2009-07-01

To present the methodology for speech assessment in the Scandcleft project and discuss issues from a pilot study. Description of methodology and blinded test for speech assessment. Speech samples and instructions for data collection and analysis for comparisons of speech outcomes across five included languages were developed and tested. PARTICIPANTS AND MATERIALS: Randomly selected video recordings of 10 5-year-old children from each language (n = 50) were included in the project. Speech material consisted of test consonants in single words, connected speech, and syllable chains with nasal consonants. Five experienced speech and language pathologists participated as observers. Narrow phonetic transcription of test consonants translated into cleft speech characteristics, ordinal scale rating of resonance, and perceived velopharyngeal closure (VPC). A velopharyngeal composite score (VPC-sum) was extrapolated from raw data. Intra-agreement comparisons were performed. Range for intra-agreement for consonant analysis was 53% to 89%, for hypernasality on high vowels in single words the range was 20% to 80%, and the agreement between the VPC-sum and the overall rating of VPC was 78%. Pooling data of speakers of different languages in the same trial and comparing speech outcome across trials seems possible if the assessment of speech concerns consonants and is confined to speech units that are phonetically similar across languages. Agreed conventions and rules are important. A composite variable for perceptual assessment of velopharyngeal function during speech seems usable; whereas, the method for hypernasality evaluation requires further testing.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

PubMed Central

Colburn, H. Steven

2016-01-01

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. PMID:27698261
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

PubMed

Mi, Jing; Colburn, H Steven

2016-10-03

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.
Speech perception in individuals with auditory dys-synchrony: effect of lengthening of voice onset time and burst duration of speech segments.

PubMed

Kumar, U A; Jayaram, M

2013-07-01

The purpose of this study was to evaluate the effect of lengthening of voice onset time and burst duration of selected speech stimuli on perception by individuals with auditory dys-synchrony. This is the second of a series of articles reporting the effect of signal enhancing strategies on speech perception by such individuals. Two experiments were conducted: (1) assessment of the 'just-noticeable difference' for voice onset time and burst duration of speech sounds; and (2) assessment of speech identification scores when speech sounds were modified by lengthening the voice onset time and the burst duration in units of one just-noticeable difference, both in isolation and in combination with each other plus transition duration modification. Lengthening of voice onset time as well as burst duration improved perception of voicing. However, the effect of voice onset time modification was greater than that of burst duration modification. Although combined lengthening of voice onset time, burst duration and transition duration resulted in improved speech perception, the improvement was less than that due to lengthening of transition duration alone. These results suggest that innovative speech processing strategies that enhance temporal cues may benefit individuals with auditory dys-synchrony.
The Role of Supralexical Prosodic Units in Speech Production: Evidence from the Distribution of Speech Errors

ERIC Educational Resources Information Center

Choe, Wook Kyung

2013-01-01

The current dissertation represents one of the first systematic studies of the distribution of speech errors within supralexical prosodic units. Four experiments were conducted to gain insight into the specific role of these units in speech planning and production. The first experiment focused on errors in adult English. These were found to be…
Speech motor development: Integrating muscles, movements, and linguistic units.

PubMed

Smith, Anne

2006-01-01

A fundamental problem for those interested in human communication is to determine how ideas and the various units of language structure are communicated through speaking. The physiological concepts involved in the control of muscle contraction and movement are theoretically distant from the processing levels and units postulated to exist in language production models. A review of the literature on adult speakers suggests that they engage complex, parallel processes involving many units, including sentence, phrase, syllable, and phoneme levels. Infants must develop multilayered interactions among language and motor systems. This discussion describes recent studies of speech motor performance relative to varying linguistic goals during the childhood, teenage, and young adult years. Studies of the developing interactions between speech motor and language systems reveal both qualitative and quantitative differences between the developing and the mature systems. These studies provide an experimental basis for a more comprehensive theoretical account of how mappings between units of language and units of action are formed and how they function. Readers will be able to: (1) understand the theoretical differences between models of speech motor control and models of language processing, as well as the nature of the concepts used in the two different kinds of models, (2) explain the concept of coarticulation and state why this phenomenon has confounded attempts to determine the role of linguistic units, such as syllables and phonemes, in speech production, (3) describe the development of speech motor performance skills and specify quantitative and qualitative differences between speech motor performance in children and adults, and (4) describe experimental methods that allow scientists to study speech and limb motor control, as well as compare units of action used to study non-speech and speech movements.
Teaching American Diplomacy Using Primary Sources: Cuba

ERIC Educational Resources Information Center

Kraft, Michael; Anderson, David J.; Starbird, Caroline; Ertenberg, Samantha

2005-01-01

The purpose of this book is to allow high school students to examine the relationship between Cuba and the United States by studying a rich collection of primary materials and classroom-ready lessons which incorporate those materials. This book contains materials from 27 primary sources, including texts of speeches before the House and Senate,…
Writing for the Ear: Strengthening Oral Style in Manuscript Speeches

ERIC Educational Resources Information Center

Bruss, Kristine

2012-01-01

Public speaking texts typically advise speakers to avoid using a manuscript. Speaking from a manuscript can limit eye contact, reduce expressiveness, and bore listeners. The ideal, rather, is to sound conversational. Conversational style is inclusive, suggesting that a speaker is ""of the people," united in understanding, values and purpose." If a…
Developing a corpus of spoken language variability

NASA Astrophysics Data System (ADS)

Carmichael, Lesley; Wright, Richard; Wassink, Alicia Beckford

2003-10-01

We are developing a novel, searchable corpus as a research tool for investigating phonetic and phonological phenomena across various speech styles. Five speech styles have been well studied independently in previous work: reduced (casual), careful (hyperarticulated), citation (reading), Lombard effect (speech in noise), and ``motherese'' (child-directed speech). Few studies to date have collected a wide range of styles from a single set of speakers, and fewer yet have provided publicly available corpora. The pilot corpus includes recordings of (1) a set of speakers participating in a variety of tasks designed to elicit the five speech styles, and (2) casual peer conversations and wordlists to illustrate regional vowels. The data include high-quality recordings and time-aligned transcriptions linked to text files that can be queried. Initial measures drawn from the database provide comparison across speech styles along the following acoustic dimensions: MLU (changes in unit duration); relative intra-speaker intensity changes (mean and dynamic range); and intra-speaker pitch values (minimum, maximum, mean, range). The corpus design will allow for a variety of analyses requiring control of demographic and style factors, including hyperarticulation variety, disfluencies, intonation, discourse analysis, and detailed spectral measures.
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems.

PubMed

Greene, Beth G; Logan, John S; Pisoni, David B

1986-03-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered.
Speech Synthesis Applied to Language Teaching.

ERIC Educational Resources Information Center

Sherwood, Bruce

1981-01-01

The experimental addition of speech output to computer-based Esperanto lessons using speech synthesized from text is described. Because of Esperanto's phonetic spelling and simple rhythm, it is particularly easy to describe the mechanisms of Esperanto synthesis. Attention is directed to how the text-to-speech conversion is performed and the ways…
Text as a Supplement to Speech in Young and Older Adults a)

PubMed Central

Krull, Vidya; Humes, Larry E.

2015-01-01

Objective The purpose of this experiment was to quantify the contribution of visual text to auditory speech recognition in background noise. Specifically, we tested the hypothesis that partially accurate visual text from an automatic speech recognizer could be used successfully to supplement speech understanding in difficult listening conditions in older adults, with normal or impaired hearing. Our working hypotheses were based on what is known regarding audiovisual speech perception in the elderly from speechreading literature. We hypothesized that: 1) combining auditory and visual text information will result in improved recognition accuracy compared to auditory or visual text information alone; 2) benefit from supplementing speech with visual text (auditory and visual enhancement) in young adults will be greater than that in older adults; and 3) individual differences in performance on perceptual measures would be associated with cognitive abilities. Design Fifteen young adults with normal hearing, fifteen older adults with normal hearing, and fifteen older adults with hearing loss participated in this study. All participants completed sentence recognition tasks in auditory-only, text-only, and combined auditory-text conditions. The auditory sentence stimuli were spectrally shaped to restore audibility for the older participants with impaired hearing. All participants also completed various cognitive measures, including measures of working memory, processing speed, verbal comprehension, perceptual and cognitive speed, processing efficiency, inhibition, and the ability to form wholes from parts. Group effects were examined for each of the perceptual and cognitive measures. Audiovisual benefit was calculated relative to performance on auditory-only and visual-text only conditions. Finally, the relationship between perceptual measures and other independent measures were examined using principal-component factor analyses, followed by regression analyses. Results Both young and older adults performed similarly on nine out of ten perceptual measures (auditory, visual, and combined measures). Combining degraded speech with partially correct text from an automatic speech recognizer improved the understanding of speech in both young and older adults, relative to both auditory- and text-only performance. In all subjects, cognition emerged as a key predictor for a general speech-text integration ability. Conclusions These results suggest that neither age nor hearing loss affected the ability of subjects to benefit from text when used to support speech, after ensuring audibility through spectral shaping. These results also suggest that the benefit obtained by supplementing auditory input with partially accurate text is modulated by cognitive ability, specifically lexical and verbal skills. PMID:26458131
Perception of synthetic speech produced automatically by rule: Intelligibility of eight text-to-speech systems

PubMed Central

GREENE, BETH G.; LOGAN, JOHN S.; PISONI, DAVID B.

2012-01-01

We present the results of studies designed to measure the segmental intelligibility of eight text-to-speech systems and a natural speech control, using the Modified Rhyme Test (MRT). Results indicated that the voices tested could be grouped into four categories: natural speech, high-quality synthetic speech, moderate-quality synthetic speech, and low-quality synthetic speech. The overall performance of the best synthesis system, DECtalk-Paul, was equivalent to natural speech only in terms of performance on initial consonants. The findings are discussed in terms of recent work investigating the perception of synthetic speech under more severe conditions. Suggestions for future research on improving the quality of synthetic speech are also considered. PMID:23225916

Choosing and Using Text-to-Speech Software

ERIC Educational Resources Information Center

Peters, Tom; Bell, Lori

2007-01-01

This article describes a computer-based technology for generating speech called text-to-speech (TTS). This software is ready for widespread use by libraries, other organizations, and individual users. It offers the affordable ability to turn just about any electronic text that is not image-based into an artificially spoken communication. The…
Arab American Voices.

ERIC Educational Resources Information Center

Hall, Loretta

Through speeches, newspaper accounts, poems, memoirs, interviews, and other materials by and about Arab Americans, this collection explores issues central to what it means to be of Arab descent in the United States today. Each of the entries is accompanied by an introduction, biographical and historical information, a glossary for the selection,…
An experimental version of the MZT (speech-from-text) system with external F(sub 0) control

NASA Astrophysics Data System (ADS)

Nowak, Ignacy

1994-12-01

The version of a Polish speech from text system described in this article was developed using the speech-from-text system. The new system has additional functions which make it possible to enter commands in edited orthographic text to control the phrase component and accentuation parameters. This makes it possible to generate a series of modified intonation contours in the texts spoken by the system. The effects obtained are made easier to control by a graphic illustration of the base frequency pattern in phrases that were last 'spoken' by the system. This version of the system was designed as a test prototype which will help us expand and refine our set of rules for automatic generation of intonation contours, which in turn will enable the fully automated speech-from-text system to generate speech with a more varied and precisely formed fundamental frequency pattern.
Auditory Support in Linguistically Diverse Classrooms: Factors Related to Bilingual Text-to-Speech Use

ERIC Educational Resources Information Center

Van Laere, E.; Braak, J.

2017-01-01

Text-to-speech technology can act as an important support tool in computer-based learning environments (CBLEs) as it provides auditory input, next to on-screen text. Particularly for students who use a language at home other than the language of instruction (LOI) applied at school, text-to-speech can be useful. The CBLE E-Validiv offers content in…
Identification of Pure-Tone Audiologic Thresholds for Pediatric Cochlear Implant Candidacy: A Systematic Review.

PubMed

de Kleijn, Jasper L; van Kalmthout, Ludwike W M; van der Vossen, Martijn J B; Vonck, Bernard M D; Topsakal, Vedat; Bruijnzeel, Hanneke

2018-05-24

Although current guidelines recommend cochlear implantation only for children with profound hearing impairment (HI) (>90 decibel [dB] hearing level [HL]), studies show that children with severe hearing impairment (>70-90 dB HL) could also benefit from cochlear implantation. To perform a systematic review to identify audiologic thresholds (in dB HL) that could serve as an audiologic candidacy criterion for pediatric cochlear implantation using 4 domains of speech and language development as independent outcome measures (speech production, speech perception, receptive language, and auditory performance). PubMed and Embase databases were searched up to June 28, 2017, to identify studies comparing speech and language development between children who were profoundly deaf using cochlear implants and children with severe hearing loss using hearing aids, because no studies are available directly comparing children with severe HI in both groups. If cochlear implant users with profound HI score better on speech and language tests than those with severe HI who use hearing aids, this outcome could support adjusting cochlear implantation candidacy criteria to lower audiologic thresholds. Literature search, screening, and article selection were performed using a predefined strategy. Article screening was executed independently by 4 authors in 2 pairs; consensus on article inclusion was reached by discussion between these 4 authors. This study is reported according to the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) statement. Title and abstract screening of 2822 articles resulted in selection of 130 articles for full-text review. Twenty-one studies were selected for critical appraisal, resulting in selection of 10 articles for data extraction. Two studies formulated audiologic thresholds (in dB HLs) at which children could qualify for cochlear implantation: (1) at 4-frequency pure-tone average (PTA) thresholds of 80 dB HL or greater based on speech perception and auditory performance subtests and (2) at PTA thresholds of 88 and 96 dB HL based on a speech perception subtest. In 8 of the 18 outcome measures, children with profound HI using cochlear implants performed similarly to children with severe HI using hearing aids. Better performance of cochlear implant users was shown with a picture-naming test and a speech perception in noise test. Owing to large heterogeneity in study population and selected tests, it was not possible to conduct a meta-analysis. Studies indicate that lower audiologic thresholds (≥80 dB HL) than are advised in current national and manufacturer guidelines would be appropriate as audiologic candidacy criteria for pediatric cochlear implantation.
Visually Impaired Persons' Comprehension of Text Presented with Speech Synthesis.

ERIC Educational Resources Information Center

Hjelmquist, E.; And Others

1992-01-01

This study of 48 individuals with visual impairments (16 middle-aged with experience in synthetic speech, 16 middle-aged inexperienced, and 16 older inexperienced) found that speech synthesis, compared to natural speech, generally yielded lower results with respect to memory and understanding of texts. Experience had no effect on performance.…
Stereotype and Tradition: White Folklore About Blacks (Volumes 1 and 2).

ERIC Educational Resources Information Center

Rosenberg, Neil Vandraegen

The forms of white folklore about blacks in the United States are described. This folklore appears in most folklore genres, including folk speech, proverbs, riddles, beliefs, songs and narratives. Using texts submitted to various folklore archives over a 20-year period, this study analyzes the content and context of a large group of jokes and…
Interventions in the Alteration on Lingual Frenum: Systematic Review

PubMed Central

Miranda, Priscilla Poliseni; Cardoso, Carolina Louise; Gomes, Erissandra

2015-01-01

Introduction Altered lingual frenum modifies the normal tongue mobility, which may influence the stomatognathic functions, resulting in anatomical, physiological and social damage to the subject. It is necessary that health professionals are aware of the process of evaluation, diagnostics and treatment used today, guiding their intervention. Objective To perform a systematic review of what are the treatment methods used in cases of lingual frenum alteration. Data Synthesis The literature searches were conducted in MEDLINE, LILACS, SciELO, Cochrane and IBECS, delimited by language (Portuguese, English, Spanish), date of publication (January 2000 to January 2014) and studies performed in humans. The selection order used to verify the eligibility of the studies were related to: full text availability; review the abstract; text analysis; final selection. Of the total 443 publications, 26 remained for analysis. The surgical approach was used in all studies, regardless of the study population (infants, children and adults), with a range of tools and techniques employed; speech therapy was recommended in the post surgical in 4 studies. Only 4 studies, all with infants, showed scientific evidence. Conclusion Surgical intervention is effective for the remission of the limitations caused by the alteration on lingual frenum, but there is a deficit of studies with higher methodological quality. The benefits of speech therapy in the post surgical period are described from improvement in the language of mobility aspects and speech articulation. PMID:27413412
Attentional Gain Control of Ongoing Cortical Speech Representations in a “Cocktail Party”

PubMed Central

Kerlin, Jess R.; Shahin, Antoine J.; Miller, Lee M.

2010-01-01

Normal listeners possess the remarkable perceptual ability to select a single speech stream among many competing talkers. However, few studies of selective attention have addressed the unique nature of speech as a temporally extended and complex auditory object. We hypothesized that sustained selective attention to speech in a multi-talker environment would act as gain control on the early auditory cortical representations of speech. Using high-density electroencephalography and a template-matching analysis method, we found selective gain to the continuous speech content of an attended talker, greatest at a frequency of 4–8 Hz, in auditory cortex. In addition, the difference in alpha power (8–12 Hz) at parietal sites across hemispheres indicated the direction of auditory attention to speech, as has been previously found in visual tasks. The strength of this hemispheric alpha lateralization, in turn, predicted an individual’s attentional gain of the cortical speech signal. These results support a model of spatial speech stream segregation, mediated by a supramodal attention mechanism, enabling selection of the attended representation in auditory cortex. PMID:20071526
Psychopathology of catatonic speech disorders and the dilemma of catatonia: a selective review.

PubMed

Ungvari, G S; White, E; Pang, A H

1995-12-01

Over the past decade there has been an upsurge of interest in the prevalence, nosological position, treatment response and pathophysiology of catatonia. However, the psychopathology of catatonia has received only scant attention. Once the hallmark of catatonia, speech disorders--particularly logorrhoea, verbigeration and echolalia--seem to have been neglected in modern literature. The aims of the present paper are to outline the conceptual history of catatonic speech disorders and to follow their development in contemporary clinical research. The English-language psychiatric literature for the last 60 years on logorrhoea, verbigeration and echolalia was searched through Medline and cross-referencing. Kahlbaum, Wernicke, Jaspers, Kraepelin, Bleuler, Kleist and Leonhard's oft cited classical texts supplemented the search. In contrast to classical psychopathological sources, very few recent papers were found on catatonic speech disorders. Current clinical research failed to incorporate the observations of traditional descriptive psychopathology. Modern catatonia research operates with simplified versions of psychopathological terms devised and refined by generations of classical writers.
Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

ERIC Educational Resources Information Center

Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

2007-01-01

Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…
Effects and modeling of phonetic and acoustic confusions in accented speech.

PubMed

Fung, Pascale; Liu, Yi

2005-11-01

Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.
Datalink in air traffic management: Human factors issues in communications.

PubMed

Stedmon, Alex W; Sharples, Sarah; Littlewood, Robert; Cox, Gemma; Patel, Harshada; Wilson, John R

2007-07-01

This paper examines issues underpinning the potential move in aviation away from real speech radiotelephony (R/T) communications towards datalink communications involving text and synthetic speech communications. Using a novel air traffic control (ATC) task, two experiments are reported. Experiment 1 compared the use of speech and text while Experiment 2 compared the use of real and synthetic speech communications. Results indicated that generally there were no significant differences between speech and text communications and that either type could be used without any main effects on performance. However, a number of specific differences were observed across the different phases of the scenarios indicating that workload levels may be more varied when speech communications are used. Experiment 2 illustrated that participants placed a greater level of trust in real speech than synthetic speech, and trusted true communications more than false communications (regardless of whether they were real or synthetic voices). The findings are considered in terms of datalink initiatives for future air traffic management, the importance placed on real speech R/T communications, and the need to develop more natural synthetic speech in this application area.
Talking Wheelchair

NASA Technical Reports Server (NTRS)

1981-01-01

Communication is made possible for disabled individuals by means of an electronic system, developed at Stanford University's School of Medicine, which produces highly intelligible synthesized speech. Familiarly known as the "talking wheelchair" and formally as the Versatile Portable Speech Prosthesis (VPSP). Wheelchair mounted system consists of a word processor, a video screen, a voice synthesizer and a computer program which instructs the synthesizer how to produce intelligible sounds in response to user commands. Computer's memory contains 925 words plus a number of common phrases and questions. Memory can also store several thousand other words of the user's choice. Message units are selected by operating a simple switch, joystick or keyboard. Completed message appears on the video screen, then user activates speech synthesizer, which generates a voice with a somewhat mechanical tone. With the keyboard, an experienced user can construct messages as rapidly as 30 words per minute.
Effects of Lexical Tone Contour on Mandarin Sentence Intelligibility

ERIC Educational Resources Information Center

Chen, Fei; Wong, Lena L. N.; Hu, Yi

2014-01-01

Purpose: This study examined the effects of lexical tone contour on the intelligibility of Mandarin sentences in quiet and in noise. Method: A text-to-speech synthesis engine was used to synthesize Mandarin sentences with each word carrying the original lexical tone, flat tone, or a tone randomly selected from the 4 Mandarin lexical tones. The…
Annotating Socio-Cultural Structures in Text

DTIC Science & Technology

2012-10-31

parts of speech (POS) within text, using the Stanford Part of Speech Tagger (Stanford Log-Linear, 2011). The ERDC-CERL taxonomy is then used to...annotated NP/VP Pane: Shows the sentence parsed using the Parts of Speech tagger Document View Pane: Specifies the document (being annotated) in three...first parsed using the Stanford Parts of Speech tagger and converted to an XML document both components which are done through the Import function
DOE Office of Scientific and Technical Information (OSTI.GOV)

Aimthikul, Y.

This thesis reviews the essential aspects of speech synthesis and distinguishes between the two prevailing techniques: compressed digital speech and phonemic synthesis. It then presents the hardware details of the five speech modules evaluated. FORTRAN programs were written to facilitate message creation and retrieval with four of the modules driven by a PDP-11 minicomputer. The fifth module was driven directly by a computer terminal. The compressed digital speech modules (T.I. 990/306, T.S.I. Series 3D and N.S. Digitalker) each contain a limited vocabulary produced by the manufacturers while both the phonemic synthesizers made by Votrax permit an almost unlimited set ofmore » sounds and words. A text-to-phoneme rules program was adapted for the PDP-11 (running under the RSX-11M operating system) to drive the Votrax Speech Pac module. However, the Votrax Type'N Talk unit has its own built-in translator. Comparison of these modules revealed that the compressed digital speech modules were superior in pronouncing words on an individual basis but lacked the inflection capability that permitted the phonemic synthesizers to generate more coherent phrases. These findings were necessarily highly subjective and dependent on the specific words and phrases studied. In addition, the rapid introduction of new modules by manufacturers will necessitate new comparisons. However, the results of this research verified that all of the modules studied do possess reasonable quality of speech that is suitable for man-machine applications. Furthermore, the development tools are now in place to permit the addition of computer speech output in such applications.« less
Using the self-select paradigm to delineate the nature of speech motor programming.

PubMed

Wright, David L; Robin, Don A; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H; Fox, Peter T

2009-06-01

The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance.
An algorithm that improves speech intelligibility in noise for normal-hearing listeners.

PubMed

Kim, Gibak; Lu, Yang; Hu, Yi; Loizou, Philipos C

2009-09-01

Traditional noise-suppression algorithms have been shown to improve speech quality, but not speech intelligibility. Motivated by prior intelligibility studies of speech synthesized using the ideal binary mask, an algorithm is proposed that decomposes the input signal into time-frequency (T-F) units and makes binary decisions, based on a Bayesian classifier, as to whether each T-F unit is dominated by the target or the masker. Speech corrupted at low signal-to-noise ratio (SNR) levels (-5 and 0 dB) using different types of maskers is synthesized by this algorithm and presented to normal-hearing listeners for identification. Results indicated substantial improvements in intelligibility (over 60% points in -5 dB babble) over that attained by human listeners with unprocessed stimuli. The findings from this study suggest that algorithms that can estimate reliably the SNR in each T-F unit can improve speech intelligibility.
A Survey of Speech Education in United States Two-Year Colleges.

ERIC Educational Resources Information Center

Planck, Carolyn Roberts

The status of speech education in all United States two-year colleges is discussed. Both public and private schools are examined. Two separate studies were conducted, each utilizing the same procedure. The specific aspects with which the research was concerned were: (1) availability of speech courses, (2) departmentalization of speech courses, (3)…

English Intonation and Computerized Speech Synthesis. Technical Report No. 287.

ERIC Educational Resources Information Center

Levine, Arvin

This work treats some of the important issues encountered in an attempt to synthesize natural sounding English speech from arbitrary written text. Details of the systems that interact in producing speech are described. The principal systems dealt with are phonology (intonation), phonetics, syntax, semantics, and text-view (discourse). Technical…
"Look What I Did!": Student Conferences with Text-to-Speech Software

ERIC Educational Resources Information Center

Young, Chase; Stover, Katie

2014-01-01

The authors describe a strategy that empowers students to edit and revise their own writing. Students input their writing in to text-to-speech software that rereads the text aloud. While listening, students make necessary revisions and edits.
Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a ‘Cocktail Party’

PubMed Central

Zion Golumbic, Elana M.; Ding, Nai; Bickel, Stephan; Lakatos, Peter; Schevon, Catherine A.; McKhann, Guy M.; Goodman, Robert R.; Emerson, Ronald; Mehta, Ashesh D.; Simon, Jonathan Z.; Poeppel, David; Schroeder, Charles E.

2013-01-01

Summary The ability to focus on and understand one talker in a noisy social environment is a critical social-cognitive capacity, whose underlying neuronal mechanisms are unclear. We investigated the manner in which speech streams are represented in brain activity and the way that selective attention governs the brain’s representation of speech using a ‘Cocktail Party’ Paradigm, coupled with direct recordings from the cortical surface in surgical epilepsy patients. We find that brain activity dynamically tracks speech streams using both low frequency phase and high frequency amplitude fluctuations, and that optimal encoding likely combines the two. In and near low level auditory cortices, attention ‘modulates’ the representation by enhancing cortical tracking of attended speech streams, but ignored speech remains represented. In higher order regions, the representation appears to become more ‘selective,’ in that there is no detectable tracking of ignored speech. This selectivity itself seems to sharpen as a sentence unfolds. PMID:23473326
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural perspective

PubMed Central

Zion Golumbic, Elana M.; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the ‘Cocktail Party’ effect. Yet, the neural mechanisms underlying on-line speech decoding and attentional stream selection are not well understood. We review findings from behavioral and neurophysiological investigations that underscore the importance of the temporal structure of speech for achieving these perceptual feats. We discuss the hypothesis that entrainment of ambient neuronal oscillations to speech’s temporal structure, across multiple time-scales, serves to facilitate its decoding and underlies the selection of an attended speech stream over other competing input. In this regard, speech decoding and attentional stream selection are examples of ‘active sensing’, emphasizing an interaction between proactive and predictive top-down modulation of neuronal dynamics and bottom-up sensory input. PMID:22285024
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems

NASA Technical Reports Server (NTRS)

Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan

2010-01-01

A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
Speech intelligibility in hospitals.

PubMed

Ryherd, Erica E; Moeller, Michael; Hsu, Timothy

2013-07-01

Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII < 0.45). Further, occupied spaces were found to have 10%-15% lower SII than unoccupied spaces on average. Additionally, staff perception of communication problems at nurse stations was significantly correlated with SII ratings. In a targeted second phase, a unit treated with sound absorption had higher SII ratings for a larger percentage of time as compared to an identical untreated unit. Taken as a whole, the study provides an extensive baseline evaluation of speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.
The Role of Speech Prosody and Text Reading Prosody in Children's Reading Comprehension

ERIC Educational Resources Information Center

Veenendaal, Nathalie J.; Groen, Margriet A.; Verhoeven, Ludo

2014-01-01

Background: Text reading prosody has been associated with reading comprehension. However, text reading prosody is a reading-dependent measure that relies heavily on decoding skills. Investigation of the contribution of speech prosody--which is independent from reading skills--in addition to text reading prosody, to reading comprehension could…
Systematic review of compound action potentials as predictors for cochlear implant performance.

PubMed

van Eijl, Ruben H M; Buitenhuis, Patrick J; Stegeman, Inge; Klis, Sjaak F L; Grolman, Wilko

2017-02-01

The variability in speech perception between cochlear implant users is thought to result from the degeneration of the auditory nerve. Degeneration of the auditory nerve, histologically assessed, correlates with electrophysiologically acquired measures, such as electrically evoked compound action potentials (eCAPs) in experimental animals. To predict degeneration of the auditory nerve in humans, where histology is impossible, this paper reviews the correlation between speech perception and eCAP recordings in cochlear implant patients. PubMed and Embase. We performed a systematic search for articles containing the following major themes: cochlear implants, evoked potentials, and speech perception. Two investigators independently conducted title-abstract screening, full-text screening, and critical appraisal. Data were extracted from the remaining articles. Twenty-five of 1,429 identified articles described a correlation between speech perception and eCAP attributes. Due to study heterogeneity, a meta-analysis was not feasible, and studies were descriptively analyzed. Several studies investigating presence of the eCAP, recovery time constant, slope of the amplitude growth function, and spatial selectivity showed significant correlations with speech perception. In contrast, neural adaptation, eCAP threshold, and change with varying interphase gap did not significantly correlate with speech perception in any of the identified studies. Significant correlations between speech perception and parameters obtained through eCAP recordings have been documented in literature; however, reporting was ambiguous. There is insufficient evidence for eCAPs as a predictive factor for speech perception. More research is needed to further investigate this relation. Laryngoscope, 2016 127:476-487, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Eisenhower's "Atoms for Peace" Speech: A Case Study in the Strategic Use of Language.

ERIC Educational Resources Information Center

Medhurst, Martin J.

1987-01-01

Examines speech delivered by President Eisenhower to General Assembly of the United Nations in December 1953. Demonstrates how a complex rhetorical situation resulted in the crafting and exploitation of a public policy address. Speech bolstered international image of the United States as peacemaker, warned the Soviets against a preemptive nuclear…
New Measures of Masked Text Recognition in Relation to Speech-in-Noise Perception and Their Associations with Age and Cognitive Abilities

ERIC Educational Resources Information Center

Besser, Jana; Zekveld, Adriana A.; Kramer, Sophia E.; Ronnberg, Jerker; Festen, Joost M.

2012-01-01

Purpose: In this research, the authors aimed to increase the analogy between Text Reception Threshold (TRT; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) and Speech Reception Threshold (SRT; Plomp & Mimpen, 1979) and to examine the TRT's value in estimating cognitive abilities that are important for speech comprehension in noise. Method: The…
Model Common-Core Unit Piloted for ELL Teachers

ERIC Educational Resources Information Center

Maxwell, Lesli A.

2013-01-01

Seventh and 8th grade English-learners in selected urban schools will soon dive into some of the most celebrated speeches in U.S. history. They'll dissect, for example, Abraham Lincoln's Gettysburg Address, Martin Luther King Jr.'s "I Have a Dream," and Robert F. Kennedy's "On the Death of Martin Luther King." Though their…
The Underpinnings of American Foreign Policy

DTIC Science & Technology

2010-03-01

speeches , those given in December 2009, first at the United States Military Academy at West Point and later as he accepted his Nobel Peace Prize... speeches , those given in December 2009, first at the United States Military Academy at West Point and later as he accepted his Nobel Peace Prize...Nobel Acceptance Speech , December 10th, 2009. President Obama’s Nobel Peace Prize acceptance speech in December, 2009, dismayed many of his
(abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences

NASA Technical Reports Server (NTRS)

Scott, Kenneth C.

1994-01-01

We are developing a system for synthesizing image sequences the simulate the facial motion of a speaker. To perform this synthesis, we are pursuing two major areas of effort. We are developing the necessary computer graphics technology to synthesize a realistic image sequence of a person speaking selected speech sequences. Next, we are developing a model that expresses the relation between spoken phonemes and face/mouth shape. A subject is video taped speaking an arbitrary text that contains expression of the full list of desired database phonemes. The subject is video taped from the front speaking normally, recording both audio and video detail simultaneously. Using the audio track, we identify the specific video frames on the tape relating to each spoken phoneme. From this range we digitize the video frame which represents the extreme of mouth motion/shape. Thus, we construct a database of images of face/mouth shape related to spoken phonemes. A selected audio speech sequence is recorded which is the basis for synthesizing a matching video sequence; the speaker need not be the same as used for constructing the database. The audio sequence is analyzed to determine the spoken phoneme sequence and the relative timing of the enunciation of those phonemes. Synthesizing an image sequence corresponding to the spoken phoneme sequence is accomplished using a graphics technique known as morphing. Image sequence keyframes necessary for this processing are based on the spoken phoneme sequence and timing. We have been successful in synthesizing the facial motion of a native English speaker for a small set of arbitrary speech segments. Our future work will focus on advancement of the face shape/phoneme model and independent control of facial features.
Mistaking minds and machines: How speech affects dehumanization and anthropomorphism.

PubMed

Schroeder, Juliana; Epley, Nicholas

2016-11-01

Treating a human mind like a machine is an essential component of dehumanization, whereas attributing a humanlike mind to a machine is an essential component of anthropomorphism. Here we tested how a cue closely connected to a person's actual mental experience-a humanlike voice-affects the likelihood of mistaking a person for a machine, or a machine for a person. We predicted that paralinguistic cues in speech are particularly likely to convey the presence of a humanlike mind, such that removing voice from communication (leaving only text) would increase the likelihood of mistaking the text's creator for a machine. Conversely, adding voice to a computer-generated script (resulting in speech) would increase the likelihood of mistaking the text's creator for a human. Four experiments confirmed these hypotheses, demonstrating that people are more likely to infer a human (vs. computer) creator when they hear a voice expressing thoughts than when they read the same thoughts in text. Adding human visual cues to text (i.e., seeing a person perform a script in a subtitled video clip), did not increase the likelihood of inferring a human creator compared with only reading text, suggesting that defining features of personhood may be conveyed more clearly in speech (Experiments 1 and 2). Removing the naturalistic paralinguistic cues that convey humanlike capacity for thinking and feeling, such as varied pace and intonation, eliminates the humanizing effect of speech (Experiment 4). We discuss implications for dehumanizing others through text-based media, and for anthropomorphizing machines through speech-based media. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Free Speech Yearbook 1980.

ERIC Educational Resources Information Center

Kane, Peter E., Ed.

The 11 articles in this collection deal with theoretical and practical freedom of speech issues. The topics covered are (1) the United States Supreme Court and communication theory; (2) truth, knowledge, and a democratic respect for diversity; (3) denial of freedom of speech in Jock Yablonski's campaign for the presidency of the United Mine…
Creating speech-synchronized animation.

PubMed

King, Scott A; Parent, Richard E

2005-01-01

We present a facial model designed primarily to support animated speech. Our facial model takes facial geometry as input and transforms it into a parametric deformable model. The facial model uses a muscle-based parameterization, allowing for easier integration between speech synchrony and facial expressions. Our facial model has a highly deformable lip model that is grafted onto the input facial geometry to provide the necessary geometric complexity needed for creating lip shapes and high-quality renderings. Our facial model also includes a highly deformable tongue model that can represent the shapes the tongue undergoes during speech. We add teeth, gums, and upper palate geometry to complete the inner mouth. To decrease the processing time, we hierarchically deform the facial surface. We also present a method to animate the facial model over time to create animated speech using a model of coarticulation that blends visemes together using dominance functions. We treat visemes as a dynamic shaping of the vocal tract by describing visemes as curves instead of keyframes. We show the utility of the techniques described in this paper by implementing them in a text-to-audiovisual-speech system that creates animation of speech from unrestricted text. The facial and coarticulation models must first be interactively initialized. The system then automatically creates accurate real-time animated speech from the input text. It is capable of cheaply producing tremendous amounts of animated speech with very low resource requirements.
Speech Language Assessments in Te Reo in a Primary School Maori Immersion Unit

ERIC Educational Resources Information Center

Naidoo, Kershni

2012-01-01

This research originated from the need for a speech and language therapy assessment in te reo Maori for a particular child who attended a Maori immersion unit. A Speech and Language Therapy te reo assessment had already been developed but it needed to be revised and normative data collected. Discussions and assessments were carried out in a…
Brain-to-text: decoding spoken phrases from phone representations in the brain.

PubMed

Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja

2015-01-01

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.
Brain-to-text: decoding spoken phrases from phone representations in the brain

PubMed Central

Herff, Christian; Heger, Dominic; de Pesters, Adriana; Telaar, Dominic; Brunner, Peter; Schalk, Gerwin; Schultz, Tanja

2015-01-01

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech. PMID:26124702
Processing load induced by informational masking is related to linguistic abilities.

PubMed

Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Rönnberg, Jerker; Kramer, Sophia E

2012-01-01

It is often assumed that the benefit of hearing aids is not primarily reflected in better speech performance, but that it is reflected in less effortful listening in the aided than in the unaided condition. Before being able to assess such a hearing aid benefit the present study examined how processing load while listening to masked speech relates to inter-individual differences in cognitive abilities relevant for language processing. Pupil dilation was measured in thirty-two normal hearing participants while listening to sentences masked by fluctuating noise or interfering speech at either 50% and 84% intelligibility. Additionally, working memory capacity, inhibition of irrelevant information, and written text reception was tested. Pupil responses were larger during interfering speech as compared to fluctuating noise. This effect was independent of intelligibility level. Regression analysis revealed that high working memory capacity, better inhibition, and better text reception were related to better speech reception thresholds. Apart from a positive relation to speech recognition, better inhibition and better text reception are also positively related to larger pupil dilation in the single-talker masker conditions. We conclude that better cognitive abilities not only relate to better speech perception, but also partly explain higher processing load in complex listening conditions.

Speech in the Junior High School. Michigan Speech Association Curriculum Guide Series, No. 4.

ERIC Educational Resources Information Center

Herman, Deldee; Ratliffe, Sharon

Designed to provide the student with experience in oral communication, this curriculum guide presents a one-semester speech course for junior high school students with "normal" rather than defective speech. The eight units cover speech in social interaction; group discussion and business meetings; demonstrations and reports; creative dramatics;…
[Combining speech sample and feature bilateral selection algorithm for classification of Parkinson's disease].

PubMed

Zhang, Xiaoheng; Wang, Lirui; Cao, Yao; Wang, Pin; Zhang, Cheng; Yang, Liuyang; Li, Yongming; Zhang, Yanling; Cheng, Oumei

2018-02-01

Diagnosis of Parkinson's disease (PD) based on speech data has been proved to be an effective way in recent years. However, current researches just care about the feature extraction and classifier design, and do not consider the instance selection. Former research by authors showed that the instance selection can lead to improvement on classification accuracy. However, no attention is paid on the relationship between speech sample and feature until now. Therefore, a new diagnosis algorithm of PD is proposed in this paper by simultaneously selecting speech sample and feature based on relevant feature weighting algorithm and multiple kernel method, so as to find their synergy effects, thereby improving classification accuracy. Experimental results showed that this proposed algorithm obtained apparent improvement on classification accuracy. It can obtain mean classification accuracy of 82.5%, which was 30.5% higher than the relevant algorithm. Besides, the proposed algorithm detected the synergy effects of speech sample and feature, which is valuable for speech marker extraction.
Text-to-audiovisual speech synthesizer for children with learning disabilities.

PubMed

Mendi, Engin; Bayrak, Coskun

2013-01-01

Learning disabilities affect the ability of children to learn, despite their having normal intelligence. Assistive tools can highly increase functional capabilities of children with learning disorders such as writing, reading, or listening. In this article, we describe a text-to-audiovisual synthesizer that can serve as an assistive tool for such children. The system automatically converts an input text to audiovisual speech, providing synchronization of the head, eye, and lip movements of the three-dimensional face model with appropriate facial expressions and word flow of the text. The proposed system can enhance speech perception and help children having learning deficits to improve their chances of success.
Selective mutism - resources

MedlinePlus

Resources - selective mutism ... The following organizations are good resources for information on selective mutism : American Speech-Language-Hearing Association -- www.asha.org/public/speech/disorders/selectivemutism/ Selective Mutism Association -- www. ...
Speech pathology in ancient India--a review of Sanskrit literature.

PubMed

Savithri, S R

1987-12-01

This paper aims at highlighting the knowledge of the Sanskrit scholars of ancient times in the field of speech and language pathology. The information collected here is mainly from the Sanskrit texts written between 2000 B.C. and 1633 A.D. Some aspects of speech and language that have been dealt with in this review have been elaborately described in the original Sanskrit texts. The present paper, however, being limited in its scope, reviews only the essential facts, but not the details. The purpose is only to give a glimpse of the knowledge that the Sanskrit scholars of those times possessed. In brief, this paper is a review of Sanskrit literature for information on the origin and development of speech and language, speech production, normality of speech and language, and disorders of speech and language and their treatment.
The Frame Constraint on Experimentally Elicited Speech Errors in Japanese.

PubMed

Saito, Akie; Inoue, Tomoyoshi

2017-06-01

The so-called syllable position effect in speech errors has been interpreted as reflecting constraints posed by the frame structure of a given language, which is separately operating from linguistic content during speech production. The effect refers to the phenomenon that when a speech error occurs, replaced and replacing sounds tend to be in the same position within a syllable or word. Most of the evidence for the effect comes from analyses of naturally occurring speech errors in Indo-European languages, and there are few studies examining the effect in experimentally elicited speech errors and in other languages. This study examined whether experimentally elicited sound errors in Japanese exhibits the syllable position effect. In Japanese, the sub-syllabic unit known as "mora" is considered to be a basic sound unit in production. Results showed that the syllable position effect occurred in mora errors, suggesting that the frame constrains the ordering of sounds during speech production.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
An international perspective: supporting adolescents with speech, language, and communication needs in the United Kingdom.

PubMed

Joffe, Victoria

2015-02-01

This article provides an overview of the education system in the United Kingdom, with a particular focus on the secondary school context and supporting older children and young people with speech, language, and communication needs (SLCNs). Despite the pervasive nature of speech, language, and communication difficulties and their long-term impact on academic performance, mental health, and well-being, evidence suggests that there is limited support to older children and young people with SLCNs in the United Kingdom, relative to what is available in the early years. Focus in secondary schools is predominantly on literacy, with little attention to supporting oral language. The article provides a synopsis of the working practices of pediatric speech and language therapists working with adolescents in the United Kingdom and the type and level of speech and language therapy support provided for older children and young people with SLCNs in secondary and further education. Implications for the nature and type of specialist support to adolescents and adults with SLCNs are discussed. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Selected Topics from LVCSR Research for Asian Languages at Tokyo Tech

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

This paper presents our recent work in regard to building Large Vocabulary Continuous Speech Recognition (LVCSR) systems for the Thai, Indonesian, and Chinese languages. For Thai, since there is no word boundary in the written form, we have proposed a new method for automatically creating word-like units from a text corpus, and applied topic and speaking style adaptation to the language model to recognize spoken-style utterances. For Indonesian, we have applied proper noun-specific adaptation to acoustic modeling, and rule-based English-to-Indonesian phoneme mapping to solve the problem of large variation in proper noun and English word pronunciation in a spoken-query information retrieval system. In spoken Chinese, long organization names are frequently abbreviated, and abbreviated utterances cannot be recognized if the abbreviations are not included in the dictionary. We have proposed a new method for automatically generating Chinese abbreviations, and by expanding the vocabulary using the generated abbreviations, we have significantly improved the performance of spoken query-based search.
Using the Self-Select Paradigm to Delineate the Nature of Speech Motor Programming

PubMed Central

Wright, David L.; Robin, Don A.; Rhee, Jooyhun; Vaculin, Amber; Jacks, Adam; Guenther, Frank H.; Fox, Peter T.

2015-01-01

Purpose The authors examined the involvement of 2 speech motor programming processes identified by S. T. Klapp (1995, 2003) during the articulation of utterances differing in syllable and sequence complexity. According to S. T. Klapp, 1 process, INT, resolves the demands of the programmed unit, whereas a second process, SEQ, oversees the serial order demands of longer sequences. Method A modified reaction time paradigm was used to assess INT and SEQ demands. Specifically, syllable complexity was dependent on syllable structure, whereas sequence complexity involved either repeated or unique syllabi within an utterance. Results INT execution was slowed when articulating single syllables in the form CCCV compared to simpler CV syllables. Planning unique syllables within a multisyllabic utterance rather than repetitions of the same syllable slowed INT but not SEQ. Conclusions The INT speech motor programming process, important for mental syllabary access, is sensitive to changes in both syllable structure and the number of unique syllables in an utterance. PMID:19474396
The Psychosocial Development and Increased Fluency of Users of the SpeechEasyRTM Device: A Multiple Unit Case Study

ERIC Educational Resources Information Center

Horgan, David James

2010-01-01

This dissertation study explored the efficacy of the SpeechEasy[R] device for individuals who are gainfully employed stutterers and who participated in workplace education learning activities. This study attempted to fill a gap in the literature regarding efficacy of the SpeechEasy[R] device. It employed a qualitative multiple unit case study…
Is automatic speech-to-text transcription ready for use in psychological experiments?

PubMed

Ziman, Kirsten; Heusser, Andrew C; Fitzpatrick, Paxton C; Field, Campbell E; Manning, Jeremy R

2018-04-23

Verbal responses are a convenient and naturalistic way for participants to provide data in psychological experiments (Salzinger, The Journal of General Psychology, 61(1),65-94:1959). However, audio recordings of verbal responses typically require additional processing, such as transcribing the recordings into text, as compared with other behavioral response modalities (e.g., typed responses, button presses, etc.). Further, the transcription process is often tedious and time-intensive, requiring human listeners to manually examine each moment of recorded speech. Here we evaluate the performance of a state-of-the-art speech recognition algorithm (Halpern et al., 2016) in transcribing audio data into text during a list-learning experiment. We compare transcripts made by human annotators to the computer-generated transcripts. Both sets of transcripts matched to a high degree and exhibited similar statistical properties, in terms of the participants' recall performance and recall dynamics that the transcripts captured. This proof-of-concept study suggests that speech-to-text engines could provide a cheap, reliable, and rapid means of automatically transcribing speech data in psychological experiments. Further, our findings open the door for verbal response experiments that scale to thousands of participants (e.g., administered online), as well as a new generation of experiments that decode speech on the fly and adapt experimental parameters based on participants' prior responses.
Impact of human emotions on physiological characteristics

NASA Astrophysics Data System (ADS)

Partila, P.; Voznak, M.; Peterek, T.; Penhaker, M.; Novak, V.; Tovarek, J.; Mehic, Miralem; Vojtech, L.

2014-05-01

Emotional states of humans and their impact on physiological and neurological characteristics are discussed in this paper. This problem is the goal of many teams who have dealt with this topic. Nowadays, it is necessary to increase the accuracy of methods for obtaining information about correlations between emotional state and physiological changes. To be able to record these changes, we focused on two majority emotional states. Studied subjects were psychologically stimulated to neutral - calm and then to the stress state. Electrocardiography, Electroencephalography and blood pressure represented neurological and physiological samples that were collected during patient's stimulated conditions. Speech activity was recording during the patient was reading selected text. Feature extraction was calculated by speech processing operations. Classifier based on Gaussian Mixture Model was trained and tested using Mel-Frequency Cepstral Coefficients extracted from the patient's speech. All measurements were performed in a chamber with electromagnetic compatibility. The article discusses a method for determining the influence of stress emotional state on the human and his physiological and neurological changes.
The Development of the Text Reception Threshold Test: A Visual Analogue of the Speech Reception Threshold Test

ERIC Educational Resources Information Center

Zekveld, Adriana A.; George, Erwin L. J.; Kramer, Sophia E.; Goverts, S. Theo; Houtgast, Tammo

2007-01-01

Purpose: In this study, the authors aimed to develop a visual analogue of the widely used Speech Reception Threshold (SRT; R. Plomp & A. M. Mimpen, 1979b) test. The Text Reception Threshold (TRT) test, in which visually presented sentences are masked by a bar pattern, enables the quantification of modality-aspecific variance in speech-in-noise…
Predicting Speech Intelligibility with a Multiple Speech Subsystems Approach in Children with Cerebral Palsy

ERIC Educational Resources Information Center

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
SUBTHALAMIC NUCLEUS NEURONS DIFFERENTIALLY ENCODE EARLY AND LATE ASPECTS OF SPEECH PRODUCTION.

PubMed

Lipski, W J; Alhourani, A; Pirnia, T; Jones, P W; Dastolfo-Hromack, C; Helou, L B; Crammond, D J; Shaiman, S; Dickey, M W; Holt, L L; Turner, R S; Fiez, J A; Richardson, R M

2018-05-22

Basal ganglia-thalamocortical loops mediate all motor behavior, yet little detail is known about the role of basal ganglia nuclei in speech production. Using intracranial recording during deep brain stimulation surgery in humans with Parkinson's disease, we tested the hypothesis that the firing rate of subthalamic nucleus neurons is modulated in sync with motor execution aspects of speech. Nearly half of seventy-nine unit recordings exhibited firing rate modulation, during a syllable reading task across twelve subjects (male and female). Trial-to-trial timing of changes in subthalamic neuronal activity, relative to cue onset versus production onset, revealed that locking to cue presentation was associated more with units that decreased firing rate, while locking to speech onset was associated more with units that increased firing rate. These unique data indicate that subthalamic activity is dynamic during the production of speech, reflecting temporally-dependent inhibition and excitation of separate populations of subthalamic neurons. SIGNIFICANCE STATEMENT The basal ganglia are widely assumed to participate in speech production, yet no prior studies have reported detailed examination of speech-related activity in basal ganglia nuclei. Using microelectrode recordings from the subthalamic nucleus during a single syllable reading task, in awake humans undergoing deep brain stimulation implantation surgery, we show that the firing rate of subthalamic nucleus neurons is modulated in response to motor execution aspects of speech. These results are the first to establish a role for subthalamic nucleus neurons in encoding of aspects of speech production, and they lay the groundwork for launching a modern subfield to explore basal ganglia function in human speech. Copyright © 2018 the authors.
Hello World, It's Me: Bringing the Basic Speech Communication Course into the Digital Age

ERIC Educational Resources Information Center

Kirkwood, Jessica; Gutgold, Nichola D.; Manley, Destiny

2011-01-01

During the past decade, instructors of speech communication have been adapting the introductory speech course to keep up with the television age. Learning units in speech textbooks now teach how to speak well on television, as well as how to interpret speeches in the media. This article argues that the computer age invites adaptation of the…
Multi-time resolution analysis of speech: evidence from psychophysics

PubMed Central

Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David

2015-01-01

How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10–40 Hz modulation frequency) and syllable-sized (2–10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations—corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively—were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility—a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing. PMID:26136650
Use of Computer Speech Technologies To Enhance Learning.

ERIC Educational Resources Information Center

Ferrell, Joe

1999-01-01

Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)

Acoustic assessment of speech privacy curtains in two nursing units

PubMed Central

Pope, Diana S.; Miller-Klein, Erik T.

2016-01-01

Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959
Acoustic assessment of speech privacy curtains in two nursing units.

PubMed

Pope, Diana S; Miller-Klein, Erik T

2016-01-01

Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Understanding the abstract role of speech in communication at 12 months.

PubMed

Martin, Alia; Onishi, Kristine H; Vouloumanos, Athena

2012-04-01

Adult humans recognize that even unfamiliar speech can communicate information between third parties, demonstrating an ability to separate communicative function from linguistic content. We examined whether 12-month-old infants understand that speech can communicate before they understand the meanings of specific words. Specifically, we test the understanding that speech permits the transfer of information about a Communicator's target object to a Recipient. Initially, the Communicator selectively grasped one of two objects. In test, the Communicator could no longer reach the objects. She then turned to the Recipient and produced speech (a nonsense word) or non-speech (coughing). Infants looked longer when the Recipient selected the non-target than the target object when the Communicator had produced speech but not coughing (Experiment 1). Looking time patterns differed from the speech condition when the Recipient rather than the Communicator produced the speech (Experiment 2), and when the Communicator produced a positive emotional vocalization (Experiment 3), but did not differ when the Recipient had previously received information about the target by watching the Communicator's selective grasping (Experiment 4). Thus infants understand the information-transferring properties of speech and recognize some of the conditions under which others' information states can be updated. These results suggest that infants possess an abstract understanding of the communicative function of speech, providing an important potential mechanism for language and knowledge acquisition. Copyright © 2011 Elsevier B.V. All rights reserved.
Timing of Gestures: Gestures Anticipating or Simultaneous with Speech as Indexes of Text Comprehension in Children and Adults

ERIC Educational Resources Information Center

Ianì, Francesco; Cutica, Ilaria; Bucciarelli, Monica

2017-01-01

The deep comprehension of a text is tantamount to the construction of an articulated mental model of that text. The number of correct recollections is an index of a learner's mental model of a text. We assume that another index of comprehension is the timing of the gestures produced during text recall; gestures are simultaneous with speech when…
Relationships Among Peripheral and Central Electrophysiological Measures of Spatial and Spectral Selectivity and Speech Perception in Cochlear Implant Users.

PubMed

Scheperle, Rachel A; Abbas, Paul J

2015-01-01

The ability to perceive speech is related to the listener's ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel discrimination and the Bamford-Kowal-Bench Speech-in-Noise test. Spatial and spectral selectivity and speech perception were expected to be poorest with MAP 1 (closest electrode spacing) and best with MAP 3 (widest electrode spacing). Relationships among the electrophysiological and speech-perception measures were evaluated using mixed-model and simple linear regression analyses. All electrophysiological measures were significantly correlated with each other and with speech scores for the mixed-model analysis, which takes into account multiple measures per person (i.e., experimental MAPs). The ECAP measures were the best predictor. In the simple linear regression analysis on MAP 3 data, only the cortical measures were significantly correlated with speech scores; spectral auditory change complex amplitude was the strongest predictor. The results suggest that both peripheral and central electrophysiological measures of spatial and spectral selectivity provide valuable information about speech perception. Clinically, it is often desirable to optimize performance for individual CI users. These results suggest that ECAP measures may be most useful for within-subject applications when multiple measures are performed to make decisions about processor options. They also suggest that if the goal is to compare performance across individuals based on a single measure, then processing central to the auditory nerve (specifically, cortical measures of discriminability) should be considered.
The Different Functions of Speech in Defamation and Privacy Cases.

ERIC Educational Resources Information Center

Kebbel, Gary

1984-01-01

Reviews United States Supreme Court decisions since 1900 to show that free speech decisions often rest on the circumstances surrounding the speech. Indicates that freedom of speech wins out over privacy when social or political function but not when personal happiness is the issue.
Interfering with Inner Speech Selectively Disrupts Problem Solving and Is Linked with Real-World Executive Functioning

ERIC Educational Resources Information Center

Wallace, Gregory L.; Peng, Cynthia S.; Williams, David

2017-01-01

Purpose: According to Vygotskian theory, verbal thinking serves to guide our behavior and underpins critical self-regulatory functions. Indeed, numerous studies now link inner speech usage with performance on tests of executive function (EF). However, the selectivity of inner speech contributions to multifactorial executive planning performance…
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Advanced Persuasive Speaking, English, Speech: 5114.112.

ERIC Educational Resources Information Center

Dade County Public Schools, Miami, FL.

Developed as a high school quinmester unit on persuasive speaking, this guide provides the teacher with teaching strategies for a course which analyzes speeches from "Vital Speeches of the Day," political speeches, TV commercials, and other types of speeches. Practical use of persuasive methods for school, community, county, state, and…
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
Date Rape Drugs

MedlinePlus

... High blood pressure Slurred speech Are these drugs legal in the United States? Some of these drugs ... High blood pressure Slurred speech Are these drugs legal in the United States? Some of these drugs ...
Effects of dynamic text in an AAC app on sight word reading for individuals with autism spectrum disorder.

PubMed

Caron, Jessica; Light, Janice; Holyfield, Christine; McNaughton, David

2018-06-01

The purpose of this study was to investigate the effects of Transition to Literacy (T2L) software features (i.e., dynamic text and speech output upon selection of a graphic symbol) within a grid display in an augmentative and alternative communication (AAC) app, on the sight word reading skills of individuals with autism spectrum disorders (ASD) and complex communication needs. The study implemented a single-subject multiple probe research design across one set of three participants. The same design was utilized with an additional set of two participants. As part of the intervention, the participants were exposed to an AAC app with the T2L features during a highly structured matching task. With only limited exposure to the features, the five participants all demonstrated increased accuracy of identification of 12 targeted sight words. This study provides preliminary evidence that redesigning AAC apps to include the provision of dynamic text combined with speech output, can positively impact the sight-word reading of participants during a structured task. This adaptation in AAC system design could be used to complement literacy instruction and to potentially infuse components of literacy learning into daily communication.
TongueToSpeech (TTS): Wearable wireless assistive device for augmented speech.

PubMed

Marjanovic, Nicholas; Piccinini, Giacomo; Kerr, Kevin; Esmailbeigi, Hananeh

2017-07-01

Speech is an important aspect of human communication; individuals with speech impairment are unable to communicate vocally in real time. Our team has developed the TongueToSpeech (TTS) device with the goal of augmenting speech communication for the vocally impaired. The proposed device is a wearable wireless assistive device that incorporates a capacitive touch keyboard interface embedded inside a discrete retainer. This device connects to a computer, tablet or a smartphone via Bluetooth connection. The developed TTS application converts text typed by the tongue into audible speech. Our studies have concluded that an 8-contact point configuration between the tongue and the TTS device would yield the best user precision and speed performance. On average using the TTS device inside the oral cavity takes 2.5 times longer than the pointer finger using a T9 (Text on 9 keys) keyboard configuration to type the same phrase. In conclusion, we have developed a discrete noninvasive wearable device that allows the vocally impaired individuals to communicate in real time.
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech.

PubMed

Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M

2016-02-01

Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
Auditory Selective Attention to Speech Modulates Activity in the Visual Word Form Area

PubMed Central

Yoncheva, Yuliya N.; Zevin, Jason D.; Maurer, Urs

2010-01-01

Selective attention to speech versus nonspeech signals in complex auditory input could produce top-down modulation of cortical regions previously linked to perception of spoken, and even visual, words. To isolate such top-down attentional effects, we contrasted 2 equally challenging active listening tasks, performed on the same complex auditory stimuli (words overlaid with a series of 3 tones). Instructions required selectively attending to either the speech signals (in service of rhyme judgment) or the melodic signals (tone-triplet matching). Selective attention to speech, relative to attention to melody, was associated with blood oxygenation level–dependent (BOLD) increases during functional magnetic resonance imaging (fMRI) in left inferior frontal gyrus, temporal regions, and the visual word form area (VWFA). Further investigation of the activity in visual regions revealed overall deactivation relative to baseline rest for both attention conditions. Topographic analysis demonstrated that while attending to melody drove deactivation equivalently across all fusiform regions of interest examined, attending to speech produced a regionally specific modulation: deactivation of all fusiform regions, except the VWFA. Results indicate that selective attention to speech can topographically tune extrastriate cortex, leading to increased activity in VWFA relative to surrounding regions, in line with the well-established connectivity between areas related to spoken and visual word perception in skilled readers. PMID:19571269
President Kennedy's Speech at Rice University

NASA Technical Reports Server (NTRS)

1988-01-01

This video tape presents unedited film footage of President John F. Kennedy's speech at Rice University, Houston, Texas, September 12, 1962. The speech expresses the commitment of the United States to landing an astronaut on the Moon.
Investigating an Application of Speech-to-Text Recognition: A Study on Visual Attention and Learning Behaviour

ERIC Educational Resources Information Center

Huang, Y-M.; Liu, C-J.; Shadiev, Rustam; Shen, M-H.; Hwang, W-Y.

2015-01-01

One major drawback of previous research on speech-to-text recognition (STR) is that most findings showing the effectiveness of STR for learning were based upon subjective evidence. Very few studies have used eye-tracking techniques to investigate visual attention of students on STR-generated text. Furthermore, not much attention was paid to…
Relationships Among Peripheral and Central Electrophysiological Measures of Spatial and Spectral Selectivity and Speech Perception in Cochlear Implant Users

PubMed Central

Scheperle, Rachel A.; Abbas, Paul J.

2014-01-01

Objectives The ability to perceive speech is related to the listener’s ability to differentiate among frequencies (i.e., spectral resolution). Cochlear implant (CI) users exhibit variable speech-perception and spectral-resolution abilities, which can be attributed in part to the extent of electrode interactions at the periphery (i.e., spatial selectivity). However, electrophysiological measures of peripheral spatial selectivity have not been found to correlate with speech perception. The purpose of this study was to evaluate auditory processing at the periphery and cortex using both simple and spectrally complex stimuli to better understand the stages of neural processing underlying speech perception. The hypotheses were that (1) by more completely characterizing peripheral excitation patterns than in previous studies, significant correlations with measures of spectral selectivity and speech perception would be observed, (2) adding information about processing at a level central to the auditory nerve would account for additional variability in speech perception, and (3) responses elicited with spectrally complex stimuli would be more strongly correlated with speech perception than responses elicited with spectrally simple stimuli. Design Eleven adult CI users participated. Three experimental processor programs (MAPs) were created to vary the likelihood of electrode interactions within each participant. For each MAP, a subset of 7 of 22 intracochlear electrodes was activated: adjacent (MAP 1), every-other (MAP 2), or every third (MAP 3). Peripheral spatial selectivity was assessed using the electrically evoked compound action potential (ECAP) to obtain channel-interaction functions for all activated electrodes (13 functions total). Central processing was assessed by eliciting the auditory change complex (ACC) with both spatial (electrode pairs) and spectral (rippled noise) stimulus changes. Speech-perception measures included vowel-discrimination and the Bamford-Kowal-Bench Sentence-in-Noise (BKB-SIN) test. Spatial and spectral selectivity and speech perception were expected to be poorest with MAP 1 (closest electrode spacing) and best with MAP 3 (widest electrode spacing). Relationships among the electrophysiological and speech-perception measures were evaluated using mixed-model and simple linear regression analyses. Results All electrophysiological measures were significantly correlated with each other and with speech perception for the mixed-model analysis, which takes into account multiple measures per person (i.e. experimental MAPs). The ECAP measures were the best predictor of speech perception. In the simple linear regression analysis on MAP 3 data, only the cortical measures were significantly correlated with speech; spectral ACC amplitude was the strongest predictor. Conclusions The results suggest that both peripheral and central electrophysiological measures of spatial and spectral selectivity provide valuable information about speech perception. Clinically, it is often desirable to optimize performance for individual CI users. These results suggest that ECAP measures may be the most useful for within-subject applications, when multiple measures are performed to make decisions about processor options. They also suggest that if the goal is to compare performance across individuals based on single measure, then processing central to the auditory nerve (specifically, cortical measures of discriminability) should be considered. PMID:25658746
Selected Speeches on Obscenity by Federal Communications Commission Chairman Dean Burch, 1969-74.

ERIC Educational Resources Information Center

Hartenberger, Karen Schmidt

This study is a descriptive/historical account focusing on the obscenity issue and the selected manuscript speeches of Dean Burch while he served as chairman of the Federal Communications Commission (FCC) from October 1969 to March 1974. Research centers on the speaker and the specific manuscript speeches, considering the timeliness and…
Evaluating Text-to-Speech Synthesizers

ERIC Educational Resources Information Center

Cardoso, Walcir; Smith, George; Fuentes, Cesar Garcia

2015-01-01

Text-To-Speech (TTS) synthesizers have piqued the interest of researchers for their potential to enhance the L2 acquisition of writing (Kirstein, 2006), vocabulary and reading (Proctor, Dalton, & Grisham, 2007) and pronunciation (Cardoso, Collins, & White, 2012; Soler-Urzua, 2011). Despite their proven effectiveness, there is a need for…

Auditory detection of non-speech and speech stimuli in noise: Effects of listeners' native language background.

PubMed

Liu, Chang; Jin, Su-Hyun

2015-11-01

This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Virtual Observer Controller (VOC) for Small Unit Infantry Laser Simulation Training

DTIC Science & Technology

2007-04-01

per-seat license when deployed. As a result, ViaVoice was abandoned early in development. Next, the SPHINX engine from Carnegie Mellon University was...examined. Sphinx is Java-based software, providing cross-platform functionality, and it is also free, open-source software. Software developers at...IST had experience using SPHINX , so it was initially selected it to be the VOC speech engine. After implementing a small portion of the VOC grammar
Inner Speech's Relationship With Overt Speech in Poststroke Aphasia.

PubMed

Stark, Brielle C; Geva, Sharon; Warburton, Elizabeth A

2017-09-18

Relatively preserved inner speech alongside poor overt speech has been documented in some persons with aphasia (PWA), but the relationship of overt speech with inner speech is still largely unclear, as few studies have directly investigated these factors. The present study investigates the relationship of relatively preserved inner speech in aphasia with selected measures of language and cognition. Thirty-eight persons with chronic aphasia (27 men, 11 women; average age 64.53 ± 13.29 years, time since stroke 8-111 months) were classified as having relatively preserved inner and overt speech (n = 21), relatively preserved inner speech with poor overt speech (n = 8), or not classified due to insufficient measurements of inner and/or overt speech (n = 9). Inner speech scores (by group) were correlated with selected measures of language and cognition from the Comprehensive Aphasia Test (Swinburn, Porter, & Al, 2004). The group with poor overt speech showed a significant relationship of inner speech with overt naming (r = .95, p < .01) and with mean length of utterance produced during a written picture description (r = .96, p < .01). Correlations between inner speech and language and cognition factors were not significant for the group with relatively good overt speech. As in previous research, we show that relatively preserved inner speech is found alongside otherwise severe production deficits in PWA. PWA with poor overt speech may rely more on preserved inner speech for overt picture naming (perhaps due to shared resources with verbal working memory) and for written picture description (perhaps due to reliance on inner speech due to perceived task difficulty). Assessments of inner speech may be useful as a standard component of aphasia screening, and therapy focused on improving and using inner speech may prove clinically worthwhile. https://doi.org/10.23641/asha.5303542.
Fifty years of progress in speech and speaker recognition

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

2004-10-01

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
Of Mouths and Men: Non-Native Listeners' Identification and Evaluation of Varieties of English.

ERIC Educational Resources Information Center

Jarvella, Robert J.; Bang, Eva; Jakobsen, Arnt Lykke; Mees, Inger M.

2001-01-01

Advanced Danish students of English tried to identify the national origin of young men from Ireland, Scotland, England, and the United States from their speech and then rated the speech for attractiveness. Listeners rated speech produced by Englishmen as most attractive, and speech by Americans as least attractive. (Author/VWL)
Reported Speech in Conversational Storytelling during Nursing Shift Handover Meetings

ERIC Educational Resources Information Center

Bangerter, Adrian; Mayor, Eric; Pekarek Doehler, Simona

2011-01-01

Shift handovers in nursing units involve formal transmission of information and informal conversation about non-routine events. Informal conversation often involves telling stories. Direct reported speech (DRS) was studied in handover storytelling in two nursing care units. The study goal is to contribute to a better understanding of conversation…
Phonologic-graphemic transcodifier for Portuguese Language spoken in Brazil (PLB)

NASA Astrophysics Data System (ADS)

Fragadasilva, Francisco Jose; Saotome, Osamu; Deoliveira, Carlos Alberto

An automatic speech-to-text transformer system, suited to unlimited vocabulary, is presented. The basic acoustic unit considered are the allophones of the phonemes corresponding to the Portuguese language spoken in Brazil (PLB). The input to the system is a phonetic sequence, from a former step of isolated word recognition of slowly spoken speech. In a first stage, the system eliminates phonetic elements that don't belong to PLB. Using knowledge sources such as phonetics, phonology, orthography, and PLB specific lexicon, the output is a sequence of written words, ordered by probabilistic criterion that constitutes the set of graphemic possibilities to that input sequence. Pronunciation differences of some regions of Brazil are considered, but only those that cause differences in phonological transcription, because those of phonetic level are absorbed, during the transformation to phonological level. In the final stage, all possible written words are analyzed for orthography and grammar point of view, to eliminate the incorrect ones.
DEVELOPMENT AND DISORDERS OF SPEECH IN CHILDHOOD.

ERIC Educational Resources Information Center

KARLIN, ISAAC W.; AND OTHERS

THE GROWTH, DEVELOPMENT, AND ABNORMALITIES OF SPEECH IN CHILDHOOD ARE DESCRIBED IN THIS TEXT DESIGNED FOR PEDIATRICIANS, PSYCHOLOGISTS, EDUCATORS, MEDICAL STUDENTS, THERAPISTS, PATHOLOGISTS, AND PARENTS. THE NORMAL DEVELOPMENT OF SPEECH AND LANGUAGE IS DISCUSSED, INCLUDING THEORIES ON THE ORIGIN OF SPEECH IN MAN AND FACTORS INFLUENCING THE NORMAL…
Temporally selective attention supports speech processing in 3- to 5-year-old children.

PubMed

Astheimer, Lori B; Sanders, Lisa D

2012-01-01

Recent event-related potential (ERP) evidence demonstrates that adults employ temporally selective attention to preferentially process the initial portions of words in continuous speech. Doing so is an effective listening strategy since word-initial segments are highly informative. Although the development of this process remains unexplored, directing attention to word onsets may be important for speech processing in young children who would otherwise be overwhelmed by the rapidly changing acoustic signals that constitute speech. We examined the use of temporally selective attention in 3- to 5-year-old children listening to stories by comparing ERPs elicited by attention probes presented at four acoustically matched times relative to word onsets: concurrently with a word onset, 100 ms before, 100 ms after, and at random control times. By 80 ms, probes presented at and after word onsets elicited a larger negativity than probes presented before word onsets or at control times. The latency and distribution of this effect is similar to temporally and spatially selective attention effects measured in adults and, despite differences in polarity, spatially selective attention effects measured in children. These results indicate that, like adults, preschool aged children modulate temporally selective attention to preferentially process the initial portions of words in continuous speech. Copyright © 2011 Elsevier Ltd. All rights reserved.
Speaker identification for the improvement of the security communication between law enforcement units

NASA Astrophysics Data System (ADS)

Tovarek, Jaromir; Partila, Pavol

2017-05-01

This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.
Longitudinal Comparison of the Speech and Language Performance of United States-Born and Internationally Adopted Toddlers with Cleft Lip and Palate: A Pilot Study.

PubMed

Scherer, Nancy J; Baker, Shauna; Kaiser, Ann; Frey, Jennifer R

2018-01-01

Objective This study compares the early speech and language development of children with cleft palate with or without cleft lip who were adopted internationally with children born in the United States. Design Prospective longitudinal description of early speech and language development between 18 and 36 months of age. Participants This study compares four children (age range = 19 to 38 months) with cleft palate with or without cleft lip who were adopted internationally with four children (age range = 19 to 38 months) with cleft palate with or without cleft lip who were born in the United States, matched for age, gender, and cleft type across three time points over 10 to 12 months. Main Outcome Measures Children's speech-language skills were analyzed using standardized tests, parent surveys, language samples, and single-word phonological assessments to determine differences between the groups. Results The mean scores for the children in the internationally adopted group were lower than the group born in the United States at all three time points for expressive language and speech sound production measures. Examination of matched pairs demonstrated observable differences for two of the four pairs. No differences were observed in cognitive performance and receptive language measures. Conclusions The results suggest a cumulative effect of later palate repair and/or a variety of health and environmental factors associated with their early circumstances that persist to age 3 years. Early intervention to address the trajectory of speech and language is warranted. Given the findings from this small pilot study, a larger study of the long-term speech and language development of children who are internationally adopted and have cleft palate with or without cleft lip is recommended.
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech

PubMed Central

Leong, Victoria; Goswami, Usha

2015-01-01

When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72–82% (freely-read CDS) and 90–98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages. PMID:26641472
Acoustic-Emergent Phonology in the Amplitude Envelope of Child-Directed Speech.

PubMed

Leong, Victoria; Goswami, Usha

2015-01-01

When acquiring language, young children may use acoustic spectro-temporal patterns in speech to derive phonological units in spoken language (e.g., prosodic stress patterns, syllables, phonemes). Children appear to learn acoustic-phonological mappings rapidly, without direct instruction, yet the underlying developmental mechanisms remain unclear. Across different languages, a relationship between amplitude envelope sensitivity and phonological development has been found, suggesting that children may make use of amplitude modulation (AM) patterns within the envelope to develop a phonological system. Here we present the Spectral Amplitude Modulation Phase Hierarchy (S-AMPH) model, a set of algorithms for deriving the dominant AM patterns in child-directed speech (CDS). Using Principal Components Analysis, we show that rhythmic CDS contains an AM hierarchy comprising 3 core modulation timescales. These timescales correspond to key phonological units: prosodic stress (Stress AM, ~2 Hz), syllables (Syllable AM, ~5 Hz) and onset-rime units (Phoneme AM, ~20 Hz). We argue that these AM patterns could in principle be used by naïve listeners to compute acoustic-phonological mappings without lexical knowledge. We then demonstrate that the modulation statistics within this AM hierarchy indeed parse the speech signal into a primitive hierarchically-organised phonological system comprising stress feet (proto-words), syllables and onset-rime units. We apply the S-AMPH model to two other CDS corpora, one spontaneous and one deliberately-timed. The model accurately identified 72-82% (freely-read CDS) and 90-98% (rhythmically-regular CDS) stress patterns, syllables and onset-rime units. This in-principle demonstration that primitive phonology can be extracted from speech AMs is termed Acoustic-Emergent Phonology (AEP) theory. AEP theory provides a set of methods for examining how early phonological development is shaped by the temporal modulation structure of speech across languages. The S-AMPH model reveals a crucial developmental role for stress feet (AMs ~2 Hz). Stress feet underpin different linguistic rhythm typologies, and speech rhythm underpins language acquisition by infants in all languages.
Free Speech Yearbook 1978.

ERIC Educational Resources Information Center

Phifer, Gregg, Ed.

The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Radio, Television, and Film in the Secondary School, MSA Curriculum Guide 8.

ERIC Educational Resources Information Center

Herman, Deldee M., Ed.; Ratliffe, Sharon A., Ed.

This eight-unit volume of the Michigan Speech Association curriculum guide is designed for use by instructors who teach a one semester course in radio, television, and/or film. It can also be used by those who teach a media unit within an English or speech class. The subject of the first unit is media analysis and evaluation. The second unit is an…
Speech and Hearing Science, Anatomy and Physiology.

ERIC Educational Resources Information Center

Zemlin, Willard R.

Written for those interested in speech pathology and audiology, the text presents the anatomical, physiological, and neurological bases for speech and hearing. Anatomical nomenclature used in the speech and hearing sciences is introduced and the breathing mechanism is defined and discussed in terms of the respiratory passage, the framework and…
Randomized controlled trial of supplemental augmentative and alternative communication versus voice rest alone after phonomicrosurgery.

PubMed

Rousseau, Bernard; Gutmann, Michelle L; Mau, Theodore; Francis, David O; Johnson, Jeffrey P; Novaleski, Carolyn K; Vinson, Kimberly N; Garrett, C Gaelyn

2015-03-01

This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Randomized clinical trial. Multicenter outpatient voice clinics. Thirty-seven patients undergoing phonomicrosurgery. Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a 7-day period. Pre- and postoperative magnitude of voice use was also measured as an observational outcome. Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each postoperative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on postoperative days 3 (P=.03) and 5 (P=.01). Magnitude of voice use did not differ on any preoperative (P>.05) or postoperative day (P>.05), nor did patients significantly decrease voice use as the surgery date approached (P>.05). However, there was a significant reduction in median voice use pre- to postoperatively across patients (P<.001) with median voice use ranging from 0 to 3 throughout the postoperative week. Supplemental text-to-speech communication increased patient-perceived communication effectiveness on postoperative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. © American Academy of Otolaryngology—Head and Neck Surgery Foundation 2015.
Lexical frequency and acoustic reduction in spoken Dutch

NASA Astrophysics Data System (ADS)

Pluymaekers, Mark; Ernestus, Mirjam; Baayen, R. Harald

2005-10-01

This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
An assessment of the information-seeking abilities and needs of practicing speech-language pathologists

PubMed Central

Nail-Chiwetalu, Barbara; Bernstein Ratner, Nan

2007-01-01

Objectives: This study assessed the information-seeking practices and needs of speech-language pathologists (SLPs). Improved understanding of these needs can inform librarians and educators to better prepare students in principles and methods of evidence-based practice (EBP) and, through continuing education (CE), promote the integration of EBP into clinical practice of SLPs. Methods: A 16-question survey was mailed to 1,000 certified speech-language pathologists in the United States. Results: Two hundred and eight usable surveys were returned for a response rate of 21%. For clinical questions, SLPs most often consulted with a colleague, participated in CE activities, and searched the open Internet. Few respondents relied on scholarly journal articles for assistance with clinical cases. The most prominent barriers to finding appropriate information were time and knowledge of where and how to find relevant information. Few reported having information literacy instruction by a librarian. Discussion: If EBP is to become a viable practice in clinical decision making, there appears to be a tremendous need for information literacy instruction in the university curriculum, as well as through CE activities for currently practicing SLPs. Given respondents' reported lack of time and limited access to full-text journals containing evidence relevant to clinical practice, the field of speech-language pathology will need to generate readily accessible clinical summaries of research evidence through meta-analyses, systematic reviews, and clinical practice guidelines. PMID:17443251
Free Speech and the Rights of Congress: Robert M. LaFollette and the Argument from Principle.

ERIC Educational Resources Information Center

Schliessmann, Michael R.

Senator Robert LaFollette's speech to the United States Senate on "Free Speech and the Right of Congress to Declare the Objects of War," given October 6, 1917, epitomized his opposition to the war and the Wilson administration's largely successful moves to suppress public criticism of the war. In the speech he asserted his position on…

Text to Speech (TTS) Capabilities for the Common Driver Trainer (CDT)

DTIC Science & Technology

2010-10-01

harnessing in’leigle jalClpeno jocelyn linu ~ los angeles lottery margarine mathematlze mathematized mathematized meme memes memol...including Julie, Kate, and Paul . Based upon the names of the voices, it may be that the VoiceText capability is the technology being used currently on...DFTTSExportToFileEx(O, " Paul ", 1, 1033, "Testing the Digital Future Text-to-Speech SDK.", -1, -1, -1, -1, -1, DFTTS_ TEXT_ TYPE_ XML, "test.wav", 0, "", -1
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

PubMed

Jørgensen, Søren; Dau, Torsten

2011-09-01

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex; Trawick, David

1991-01-01

The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.
An Evaluation of Text-to-Speech Synthesizers in the Foreign Language Classroom: Learners' Perceptions

ERIC Educational Resources Information Center

Bione, Tiago; Grimshaw, Jennica; Cardoso, Walcir

2016-01-01

As stated in Cardoso, Smith, and Garcia Fuentes (2015), second language researchers and practitioners have explored the pedagogical capabilities of Text-To-Speech synthesizers (TTS) for their potential to enhance the acquisition of writing (e.g. Kirstein, 2006), vocabulary and reading (e.g. Proctor, Dalton, & Grisham, 2007), and pronunciation…
Analysis of the project synthesis goal cluster orientation and inquiry emphasis of elementary science textbooks

NASA Astrophysics Data System (ADS)

Staver, John R.; Bay, Mary

The purpose of this descriptive study was to examine selected units of commonly used elementary science texts, using the Project Synthesis goal clusters as a framework for part of the examination. An inquiry classification scheme was used for the remaining segment. Four questions were answered: (1) To what extent do elementary science textbooks focus on each Project Synthesis goal cluster? (2) In which part of the text is such information found? (3) To what extent are the activities and experiments merely verifications of information already introduced in the text? (4) If inquiry is present in an activity, then what is the level of such inquiry?Eleven science textbook series, which comprise approximately 90 percent of the national market, were selected for analysis. Two units, one primary (K-3) and one intermediate (4-6), were selected for analysis by first identifying units common to most series, then randomly selecting one primary and one intermediate unit for analysis.Each randomly selected unit was carefully read, using the sentence as the unit of analysis. Each declarative and interrogative sentence in the body of the text was classified as: (1) academic; (2) personal; (3) career; or (4) societal in its focus. Each illustration, except those used in evaluation items, was similarly classified. Each activity/experiment and each miscellaneous sentence in end-of-chapter segments labelled review, summary, evaluation, etc., were similarly classified. Finally, each activity/experiment, as a whole, was categorized according to a four-category inquiry scheme (confirmation, structured inquiry, guided inquiry, open inquiry).In general, results of the analysis are: (1) most text prose focuses on academic science; (2) most remaining text prose focuses on the personal goal cluster; (3) the career and societal goal clusters receive only minor attention; (4) text illustrations exhibit a pattern similar to text prose; (5) text activities/experiments are academic in orientation, almost to the exclusion of other goal clusters; (6) end-of-chapter sentences are largely academic; (7) inquiry is absent or present only in limited forms in text activities/experiments; and (8) texts allocate only a minor portion of space to activities/experiments. Detailed findings are given as numeral, percentage, and decimal values. Discussion focuses on the implications of the results and a comparison of NSTA recommendations with the results of this analysis.
Teaching an Endangered Species Unit.

ERIC Educational Resources Information Center

Quilty, Joan; And Others

1986-01-01

Describes how a student speech activity can serve as a culminating exercise in a unit on endangered species. Offers suggestions and guidelines for researching, formatting, and delivering the speech. A table is also included explaining the causes and prevention of species endangerment. (ML)
An attention-gating recurrent working memory architecture for emergent speech representation

NASA Astrophysics Data System (ADS)

Elshaw, Mark; Moore, Roger K.; Klein, Michael

2010-06-01

This paper describes an attention-gating recurrent self-organising map approach for emergent speech representation. Inspired by evidence from human cognitive processing, the architecture combines two main neural components. The first component, the attention-gating mechanism, uses actor-critic learning to perform selective attention towards speech. Through this selective attention approach, the attention-gating mechanism controls access to working memory processing. The second component, the recurrent self-organising map memory, develops a temporal-distributed representation of speech using phone-like structures. Representing speech in terms of phonetic features in an emergent self-organised fashion, according to research on child cognitive development, recreates the approach found in infants. Using this representational approach, in a fashion similar to infants, should improve the performance of automatic recognition systems through aiding speech segmentation and fast word learning.
Linguistic processing in visual and modality-nonspecific brain areas: PET recordings during selective attention.

PubMed

Vorobyev, Victor A; Alho, Kimmo; Medvedev, Svyatoslav V; Pakhomov, Sergey V; Roudas, Marina S; Rutkovskaya, Julia M; Tervaniemi, Mari; Van Zuijen, Titia L; Näätänen, Risto

2004-07-01

Positron emission tomography (PET) was used to investigate the neural basis of selective processing of linguistic material during concurrent presentation of multiple stimulus streams ("cocktail-party effect"). Fifteen healthy right-handed adult males were to attend to one of three simultaneously presented messages: one presented visually, one to the left ear, and one to the right ear. During the control condition, subjects attended to visually presented consonant letter strings and ignored auditory messages. This paper reports the modality-nonspecific language processing and visual word-form processing, whereas the auditory attention effects have been reported elsewhere [Cogn. Brain Res. 17 (2003) 201]. The left-hemisphere areas activated by both the selective processing of text and speech were as follows: the inferior prefrontal (Brodmann's area, BA 45, 47), anterior temporal (BA 38), posterior insular (BA 13), inferior (BA 20) and middle temporal (BA 21), occipital (BA 18/30) cortices, the caudate nucleus, and the amygdala. In addition, bilateral activations were observed in the medial occipito-temporal cortex and the cerebellum. Decreases of activation during both text and speech processing were found in the parietal (BA 7, 40), frontal (BA 6, 8, 44) and occipito-temporal (BA 37) regions of the right hemisphere. Furthermore, the present data suggest that the left occipito-temporal cortex (BA 18, 20, 37, 21) can be subdivided into three functionally distinct regions in the posterior-anterior direction on the basis of their activation during attentive processing of sublexical orthography, visual word form, and supramodal higher-level aspects of language.
User Evaluation of a Communication System That Automatically Generates Captions to Improve Telephone Communication

PubMed Central

Zekveld, Adriana A.; Kramer, Sophia E.; Kessens, Judith M.; Vlaming, Marcel S. M. G.; Houtgast, Tammo

2009-01-01

This study examined the subjective benefit obtained from automatically generated captions during telephone-speech comprehension in the presence of babble noise. Short stories were presented by telephone either with or without captions that were generated offline by an automatic speech recognition (ASR) system. To simulate online ASR, the word accuracy (WA) level of the captions was 60% or 70% and the text was presented delayed to the speech. After each test, the hearing impaired participants (n = 20) completed the NASA-Task Load Index and several rating scales evaluating the support from the captions. Participants indicated that using the erroneous text in speech comprehension was difficult and the reported task load did not differ between the audio + text and audio-only conditions. In a follow-up experiment (n = 10), the perceived benefit of presenting captions increased with an increase of WA levels to 80% and 90%, and elimination of the text delay. However, in general, the task load did not decrease when captions were presented. These results suggest that the extra effort required to process the text could have been compensated for by less effort required to comprehend the speech. Future research should aim at reducing the complexity of the task to increase the willingness of hearing impaired persons to use an assistive communication system automatically providing captions. The current results underline the need for obtaining both objective and subjective measures of benefit when evaluating assistive communication systems. PMID:19126551
Selective left, right and bilateral stimulation of subthalamic nuclei in Parkinson's disease: differential effects on motor, speech and language function.

PubMed

Schulz, Geralyn M; Hosey, Lara A; Bradberry, Trent J; Stager, Sheila V; Lee, Li-Ching; Pawha, Rajesh; Lyons, Kelly E; Metman, Leo Verhagen; Braun, Allen R

2012-01-01

Deep brain stimulation (DBS) of the subthalamic nucleus improves the motor symptoms of Parkinson's disease, but may produce a worsening of speech and language performance at rates and amplitudes typically selected in clinical practice. The possibility that these dissociated effects might be modulated by selective stimulation of left and right STN has never been systematically investigated. To address this issue, we analyzed motor, speech and language functions of 12 patients implanted with bilateral stimulators configured for optimal motor responses. Behavioral responses were quantified under four stimulator conditions: bilateral DBS, right-only DBS, left-only DBS and no DBS. Under bilateral and left-only DBS conditions, our results exhibited a significant improvement in motor symptoms but worsening of speech and language. These findings contribute to the growing body of literature demonstrating that bilateral STN DBS compromises speech and language function and suggests that these negative effects may be principally due to left-sided stimulation. These findings may have practical clinical consequences, suggesting that clinicians might optimize motor, speech and language functions by carefully adjusting left- and right-sided stimulation parameters.
Speech-associated gestures, Broca’s area, and the human mirror system

PubMed Central

Skipper, Jeremy I.; Goldin-Meadow, Susan; Nusbaum, Howard C.; Small, Steven L

2009-01-01

Speech-associated gestures are hand and arm movements that not only convey semantic information to listeners but are themselves actions. Broca’s area has been assumed to play an important role both in semantic retrieval or selection (as part of a language comprehension system) and in action recognition (as part of a “mirror” or “observation–execution matching” system). We asked whether the role that Broca’s area plays in processing speech-associated gestures is consistent with the semantic retrieval/selection account (predicting relatively weak interactions between Broca’s area and other cortical areas because the meaningful information that speech-associated gestures convey reduces semantic ambiguity and thus reduces the need for semantic retrieval/selection) or the action recognition account (predicting strong interactions between Broca’s area and other cortical areas because speech-associated gestures are goal-direct actions that are “mirrored”). We compared the functional connectivity of Broca’s area with other cortical areas when participants listened to stories while watching meaningful speech-associated gestures, speech-irrelevant self-grooming hand movements, or no hand movements. A network analysis of neuroimaging data showed that interactions involving Broca’s area and other cortical areas were weakest when spoken language was accompanied by meaningful speech-associated gestures, and strongest when spoken language was accompanied by self-grooming hand movements or by no hand movements at all. Results are discussed with respect to the role that the human mirror system plays in processing speech-associated movements. PMID:17533001
Language and Speech Improvement for Kindergarten and First Grade. A Supplementary Handbook.

ERIC Educational Resources Information Center

Cole, Roberta; And Others

The 16-unit language and speech improvement handbook for kindergarten and first grade students contains an introductory section which includes a discussion of the child's developmental speech and language characteristics, a sound development chart, a speech and hearing language screening test, the Henja articulation test, and a general outline of…
Ohio School Speech and Hearing Services.

ERIC Educational Resources Information Center

Gross, F. P.; And Others

The pamphlet on speech and hearing services offered by the Ohio Department of Education discusses both the general status of speech and hearing services, and certification and program standards. The general status of Ohio's programs is described in terms of the history of speech and hearing therapy in Ohio, the present status of units in speech…
Support vector machine and mel frequency Cepstral coefficient based algorithm for hand gestures and bidirectional speech to text device

NASA Astrophysics Data System (ADS)

Balbin, Jessie R.; Padilla, Dionis A.; Fausto, Janette C.; Vergara, Ernesto M.; Garcia, Ramon G.; Delos Angeles, Bethsedea Joy S.; Dizon, Neil John A.; Mardo, Mark Kevin N.

2017-02-01

This research is about translating series of hand gesture to form a word and produce its equivalent sound on how it is read and said in Filipino accent using Support Vector Machine and Mel Frequency Cepstral Coefficient analysis. The concept is to detect Filipino speech input and translate the spoken words to their text form in Filipino. This study is trying to help the Filipino deaf community to impart their thoughts through the use of hand gestures and be able to communicate to people who do not know how to read hand gestures. This also helps literate deaf to simply read the spoken words relayed to them using the Filipino speech to text system.
Accessible Text-to-Speech Options for Students Who Struggle with Reading

ERIC Educational Resources Information Center

Bone, Erin K.; Bouck, Emily C.

2017-01-01

As students progress through school they spend more time reading to obtain information. Reading to learn can be a struggle for any student, but it tends to be a bigger obstacle for students with disabilities. Using text-to-speech applications and extensions is one way to assist students with disabilities who struggle to independently complete…
Real-Time Speech-to-Text Services. [A Report of the] National Task Force on Quality of Services in the Postsecondary Education of Deaf and Hard of Hearing Students.

ERIC Educational Resources Information Center

Stinson, Michael; Eisenberg, Sandy; Horn, Christy; Larson, Judy; Levitt, Harry; Stuckless, Ross

This report describes and discusses several applications of new computer-based technologies which enable postsecondary students with deafness or hearing impairments to read the text of the language being spoken by the instructor and fellow students virtually in real time. Two current speech-to-text options are described: (1) steno-based systems in…
Lincoln, Patriotism's Greatest Poet.

ERIC Educational Resources Information Center

American Educator, 2002

2002-01-01

Presents excerpts from the speeches and writings of Abraham Lincoln (e.g., various speeches that addressed slavery, a speech on democracy as a universal ideal, and the Gettysburg Address) to show how he evoked a vision of a United States that has inspired, shaped, and defined the country ever since. (SM)
The Intelligibility of Indian English. Monograph No. 4.

ERIC Educational Resources Information Center

Bansal, R. K.

Twenty-four English speakers from various regions of India were tested for the intelligibility of their speech. Recordings of speech in a variety of contexts were evaluated by listeners from the United Kingdom, the United States, Nigeria, and Germany. On the basis of the resulting intelligibility scores, factors which tend to hinder…
Speech Recognition as a Support Service for Deaf and Hard of Hearing Students: Adaptation and Evaluation. Final Report to Spencer Foundation.

ERIC Educational Resources Information Center

Stinson, Michael; Elliot, Lisa; McKee, Barbara; Coyne, Gina

This report discusses a project that adapted new automatic speech recognition (ASR) technology to provide real-time speech-to-text transcription as a support service for students who are deaf and hard of hearing (D/HH). In this system, as the teacher speaks, a hearing intermediary, or captionist, dictates into the speech recognition system in a…
Duration of the speech disfluencies of beginning stutterers.

PubMed

Zebrowski, P M

1991-06-01

This study compared the duration of within-word disfluencies and the number of repeated units per instance of sound/syllable and whole-word repetitions of beginning stutterers to those produced by age- and sex-matched nonstuttering children. Subjects were 10 stuttering children [9 males and 1 female; mean age 4:1 (years:months); age range 3:2-5:1), and 10 nonstuttering children (9 males and 1 female; mean age 4:0; age range: 2:10-5:1). Mothers of the stuttering children reported that their children had been stuttering for 1 year or less. One 300-word conversational speech sample from each of the stuttering and nonstuttering children was analyzed for (a) mean duration of sound/syllable repetition and sound prolongation, (b) mean number of repeated units per instance of sound/syllable and whole-word repetition, and (c) various related measures of the frequency of all between- and within-word speech disfluencies. There were no significant between-group differences for either the duration of acoustically measured sound/syllable repetitions and sound prolongations or the number of repeated units per instance of sound/syllable and whole-word repetition. Unlike frequency and type of speech disfluency produced, average duration of within-word disfluencies and number of repeated units per repetition do not differentiate the disfluent speech of beginning stutterers and their nonstuttering peers. Additional analyses support findings from previous perceptual work that type and frequency of speech disfluency, not duration, are the principal characteristics listeners use in distinguishing these two talker groups.

Adaptation to delayed auditory feedback induces the temporal recalibration effect in both speech perception and production.

PubMed

Yamamoto, Kosuke; Kawabata, Hideaki

2014-12-01

We ordinarily speak fluently, even though our perceptions of our own voices are disrupted by various environmental acoustic properties. The underlying mechanism of speech is supposed to monitor the temporal relationship between speech production and the perception of auditory feedback, as suggested by a reduction in speech fluency when the speaker is exposed to delayed auditory feedback (DAF). While many studies have reported that DAF influences speech motor processing, its relationship to the temporal tuning effect on multimodal integration, or temporal recalibration, remains unclear. We investigated whether the temporal aspects of both speech perception and production change due to adaptation to the delay between the motor sensation and the auditory feedback. This is a well-used method of inducing temporal recalibration. Participants continually read texts with specific DAF times in order to adapt to the delay. Then, they judged the simultaneity between the motor sensation and the vocal feedback. We measured the rates of speech with which participants read the texts in both the exposure and re-exposure phases. We found that exposure to DAF changed both the rate of speech and the simultaneity judgment, that is, participants' speech gained fluency. Although we also found that a delay of 200 ms appeared to be most effective in decreasing the rates of speech and shifting the distribution on the simultaneity judgment, there was no correlation between these measurements. These findings suggest that both speech motor production and multimodal perception are adaptive to temporal lag but are processed in distinct ways.
Freedom of Speech and the Communication Discipline: Defending the Value of Low-Value Speech. Wicked Problems Forum: Freedom of Speech at Colleges and Universities

ERIC Educational Resources Information Center

Herbeck, Dale A.

2018-01-01

Heated battles over free speech have erupted on college campuses across the United States in recent months. Some of the most prominent incidents involve efforts by students to prevent public appearances by speakers espousing controversial viewpoints. Efforts to silence offensive speakers on college campuses are not new; in these endeavors, one can…
The Combined Arms Role of Armored Infantry.

DTIC Science & Technology

1985-01-01

65 This is an especially tempting argument in light of the early recognition of the potential of airpower. There remained in the German Army, however...solutions within the U.S. Army indicate a recognition of the importance of associating selected primary leaders (platoon leader and some squad... speech General Richardson discussed the orientation and employment of the U.S. Army’s new light infantry units. 7. The degradation of infantry skills in
The functional role of the tonsils in speech.

PubMed

Finkelstein, Y; Nachmani, A; Ophir, D

1994-08-01

To present illustrative cases showing various tonsillar influences on speech and to present a clinical method for patient evaluation establishing concepts of management and a rational therapeutic approach. The cases were selected from a group of approximately 1000 patients referred to the clinic because of suspected palatal diseases. Complete velopharyngeal assessment was made, including otolaryngologic, speech, and hearing examinations, polysomnography, nasendoscopy, multiview videofluoroscopy, and cephalometry. New observations further elucidate the intimate relation between the tonsils and the velopharyngeal valve. The potential influence of the tonsils on the velopharyngeal valve mechanism, in hindering or assisting speech, is described. In selected cases, the decision to perform tonsillectomy depends on its potential effect on speech. The combination of nasendoscopic and multiview videofluoroscopic studies of the mechanical properties of the tonsils during speech is required for patients who present with velopharyngeal insufficiency in whom tonsillar hypertrophy is found. These studies are also required in patients with palatal anomalies who are candidates for tonsillectomy.
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

PubMed Central

Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

2017-01-01

Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824
Pragmatic Analyses of Martin Luther King (Jr)'s Speech: "I Have a Dream"--An Introspective Prognosis

ERIC Educational Resources Information Center

Josiah, Ubong E.; Oghenerho, Gift

2015-01-01

This paper investigates the speech of Martin Luther King (Jr.) titled: "I Have a Dream", presented in 1963 at the Lincoln Memorial. This speech is selected for use because it involves a speaker and an audience who belong to a particular speech community. The speech is about the failed promises by the Americans whose dream advocate…
Recent Trends in Free Speech Theory.

ERIC Educational Resources Information Center

Haiman, Franklyn S.

This syllabus of a convention workshop course on free speech theory consists of descriptions of several United States Supreme Court decisions related to free speech. Some specific areas in which decisions are discussed are: obscene and indecent communication, the definition of a public figure for purposes of libel action, the press versus official…
Randomized Controlled Trial of Supplemental Augmentative and Alternative Communication versus Voice Rest Alone after Phonomicrosurgery

PubMed Central

Rousseau, Bernard; Gutmann, Michelle L.; Mau, I-fan Theodore; Francis, David O.; Johnson, Jeffrey P.; Novaleski, Carolyn K.; Vinson, Kimberly N.; Garrett, C. Gaelyn

2015-01-01

Objective This randomized trial investigated voice rest and supplemental text-to-speech communication versus voice rest alone on visual analog scale measures of communication effectiveness and magnitude of voice use. Study Design Randomized clinical trial. Setting Multicenter outpatient voice clinics. Subjects Thirty-seven patients undergoing phonomicrosurgery. Methods Patients undergoing phonomicrosurgery were randomized to voice rest and supplemental text-to-speech communication or voice rest alone. The primary outcome measure was the impact of voice rest on ability to communicate effectively over a seven-day period. Pre- and post-operative magnitude of voice use was also measured as an observational outcome. Results Patients randomized to voice rest and supplemental text-to-speech communication reported higher median communication effectiveness on each post-operative day compared to those randomized to voice rest alone, with significantly higher median communication effectiveness on post-operative day 3 (p = 0.03) and 5 (p = 0.01). Magnitude of voice use did not differ on any pre-operative (p > 0.05) or post-operative day (p > 0.05), nor did patients significantly decrease voice use as the surgery date approached (p > 0.05). However, there was a significant reduction in median voice use pre- to post-operatively across patients (p < 0.001) with median voice use ranging from 0–3 throughout the post-operative week. Conclusion Supplemental text-to-speech communication increased patient perceived communication effectiveness on post-operative days 3 and 5 over voice rest alone. With the prevalence of smartphones and the widespread use of text messaging, supplemental text-to-speech communication may provide an accessible and cost-effective communication option for patients on vocal restrictions. PMID:25605690
Integrating Text-to-Speech Software into Pedagogically Sound Teaching and Learning Scenarios

ERIC Educational Resources Information Center

Rughooputh, S. D. D. V.; Santally, M. I.

2009-01-01

This paper presents a new technique of delivery of classes--an instructional technique which will no doubt revolutionize the teaching and learning, whether for on-campus, blended or online modules. This is based on the simple task of instructionally incorporating text-to-speech software embedded in the lecture slides that will simulate exactly the…
Building Searchable Collections of Enterprise Speech Data.

ERIC Educational Resources Information Center

Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features, bypassing the phoneme as recognition unit.

PubMed

Arnold, Denis; Tomaschek, Fabian; Sering, Konstantin; Lopez, Florence; Baayen, R Harald

2017-01-01

Sound units play a pivotal role in cognitive models of auditory comprehension. The general consensus is that during perception listeners break down speech into auditory words and subsequently phones. Indeed, cognitive speech recognition is typically taken to be computationally intractable without phones. Here we present a computational model trained on 20 hours of conversational speech that recognizes word meanings within the range of human performance (model 25%, native speakers 20-44%), without making use of phone or word form representations. Our model also generates successfully predictions about the speed and accuracy of human auditory comprehension. At the heart of the model is a 'wide' yet sparse two-layer artificial neural network with some hundred thousand input units representing summaries of changes in acoustic frequency bands, and proxies for lexical meanings as output units. We believe that our model holds promise for resolving longstanding theoretical problems surrounding the notion of the phone in linguistic theory.
Review of Speech-to-Text Recognition Technology for Enhancing Learning

ERIC Educational Resources Information Center

Shadiev, Rustam; Hwang, Wu-Yuin; Chen, Nian-Shing; Huang, Yueh-Min

2014-01-01

This paper reviewed literature from 1999 to 2014 inclusively on how Speech-to-Text Recognition (STR) technology has been applied to enhance learning. The first aim of this review is to understand how STR technology has been used to support learning over the past fifteen years, and the second is to analyze all research evidence to understand how…
Symbolic Speech

ERIC Educational Resources Information Center

Podgor, Ellen S.

1976-01-01

The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)
Availability of Pre-Admission Information to Prospective Graduate Students in Speech-Language Pathology

ERIC Educational Resources Information Center

Tekieli Koay, Mary Ellen; Lass, Norman J.; Parrill, Madaline; Naeser, Danielle; Babin, Kelly; Bayer, Olivia; Cook, Megan; Elmore, Madeline; Frye, Rachel; Kerwood, Samantha

2016-01-01

An extensive Internet search was conducted to obtain pre-admission information and acceptance statistics from 260 graduate programmes in speech-language pathology accredited by the American Speech-Language-Hearing Association (ASHA) in the United States. ASHA is the national professional, scientific and credentialing association for members and…
Mock Trial: A Window to Free Speech Rights and Abilities

ERIC Educational Resources Information Center

Schwartz, Sherry

2010-01-01

This article provides some strategies to alleviate the current tensions between personal responsibility and freedom of speech rights in the public school classroom. The article advocates the necessity of making sure students understand the points and implications of the first amendment by providing a mock trial unit concerning free speech rights.…
Applications of Text Analysis Tools for Spoken Response Grading

ERIC Educational Resources Information Center

Crossley, Scott; McNamara, Danielle

2013-01-01

This study explores the potential for automated indices related to speech delivery, language use, and topic development to model human judgments of TOEFL speaking proficiency in second language (L2) speech samples. For this study, 244 transcribed TOEFL speech samples taken from 244 L2 learners were analyzed using automated indices taken from…
An experimental model of an indigenous BCI based system to help disabled people to communicate

NASA Astrophysics Data System (ADS)

Kabir, Kazi Sadman; Rahman, Chowdhury M. Abid; Farayez, Araf; Ferdous, Mahbuba

2017-12-01

In this paper a Brain Computer Interface (BCI) system has been proposed to help patients suffering from motor disease, paralysis or locked in syndrome to communicate via eye blinking. In this proposed BCI system EEG data is fetched by NeuroSky Headset and then analyzed by the help of WPF (Windows Presentation Foundation) based serial monitor to detect the EEG signal when the eye gives a blink. This detection of eye blinking can be used to select predefined texts and those texts can be converted to speech. The experimental result shows that this system can be used as an effective and efficient tool to communicate through brain.
Speech Emotion Feature Selection Method Based on Contribution Analysis Algorithm of Neural Network

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang Xiaojia; Mao Qirong; Zhan Yongzhao

There are many emotion features. If all these features are employed to recognize emotions, redundant features may be existed. Furthermore, recognition result is unsatisfying and the cost of feature extraction is high. In this paper, a method to select speech emotion features based on contribution analysis algorithm of NN is presented. The emotion features are selected by using contribution analysis algorithm of NN from the 95 extracted features. Cluster analysis is applied to analyze the effectiveness for the features selected, and the time of feature extraction is evaluated. Finally, 24 emotion features selected are used to recognize six speech emotions.more » The experiments show that this method can improve the recognition rate and the time of feature extraction.« less
Freedom of Speech: A Selected, Annotated Basic Bibliography.

ERIC Educational Resources Information Center

Tedford, Thomas L.

Restricted to books on freedom of speech, this annotated bibliography offers a list of 38 references pertinent to the subject. Also included is a list of 18 ERIC documents on freedom of speech, and information on how to order them. (JC)
Investigating the Effectiveness of Speech-To-Text Recognition Applications on Learning Performance, Attention, and Meditation

ERIC Educational Resources Information Center

Shadiev, Rustam; Huang, Yueh-Min; Hwang, Jan-Pan

2017-01-01

In this study, the effectiveness of the application of speech-to-text recognition (STR) technology on enhancing learning and concentration in a calm state of mind, hereafter referred to as meditation (An intentional and self-regulated focusing of attention in order to relax and calm the mind), was investigated. This effectiveness was further…

Effect(s) of Language Tasks on Severity of Disfluencies in Preschool Children with Stuttering.

PubMed

Zamani, Peyman; Ravanbakhsh, Majid; Weisi, Farzad; Rashedi, Vahid; Naderi, Sara; Hosseinzadeh, Ayub; Rezaei, Mohammad

2017-04-01

Speech disfluency in children can be increased or decreased depending on the type of linguistic task presented to them. In this study, the effect of sentence imitation and sentence modeling on severity of speech disfluencies in preschool children with stuttering is investigated. In this cross-sectional descriptive analytical study, 58 children with stuttering (29 with mild stuttering and 29 with moderate stuttering) and 58 typical children aged between 4 and 6 years old participated. The severity of speech disfluencies was assessed by SSI-3 and TOCS before and after offering each task. In boys with mild stuttering, The mean stuttering severity scores in two tasks of sentence imitation and sentence modeling were [Formula: see text] and [Formula: see text] respectively ([Formula: see text]). But, in boys with moderate stuttering the stuttering severity in the both tasks were [Formula: see text] and [Formula: see text] respectively ([Formula: see text]). In girls with mild stuttering, the stuttering severity in two tasks of sentence imitation and sentence modeling were [Formula: see text] and [Formula: see text] respectively ([Formula: see text]). But, in girls with moderate stuttering the mean stuttering severity in the both tasks were [Formula: see text] and [Formula: see text] respectively ([Formula: see text]). In both gender of typical children, the score of speech disfluencies had no significant difference between two tasks ([Formula: see text]). In preschool children with mild stuttering and peer non-stutters, performing the tasks of sentence imitation and sentence modeling could not increase the severity of stuttering. But, in preschool children with moderate stuttering, doing the task of sentence modeling increased the stuttering severity score.
Lexical and sublexical units in speech perception.

PubMed

Giroux, Ibrahima; Rey, Arnaud

2009-03-01

Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vinter, 1998 in an experiment where we compare the lexical and sublexical recognition performance of adults after hearing 2 or 10 min of an artificial spoken language. The results are consistent with Parser's predictions and the clustering approach, showing that performance on words is better than performance on part-words only after 10 min. This result suggests that word segmentation abilities are not merely due to stronger associations between sublexical units but to the emergence of stronger lexical representations during the development of speech perception processes. Copyright © 2009, Cognitive Science Society, Inc.
Some Effects of Training on the Perception of Synthetic Speech

PubMed Central

Schwab, Eileen C.; Nusbaum, Howard C.; Pisoni, David B.

2012-01-01

The present study was conducted to determine the effects of training on the perception of synthetic speech. Three groups of subjects were tested with synthetic speech using the same tasks before and after training. One group was trained with synthetic speech. A second group went through the identical training procedures using natural speech. The third group received no training. Although performance of the three groups was the same prior to training, significant differences on the post-test measures of word recognition were observed: the group trained with synthetic speech performed much better than the other two groups. A six-month follow-up indicated that the group trained with synthetic speech displayed long-term retention of the knowledge and experience gained with prior exposure to synthetic speech generated by a text-to-speech system. PMID:2936671
A multilingual audiometer simulator software for training purposes.

PubMed

Kompis, Martin; Steffen, Pascal; Caversaccio, Marco; Brugger, Urs; Oesch, Ivo

2012-04-01

A set of algorithms, which allows a computer to determine the answers of simulated patients during pure tone and speech audiometry, is presented. Based on these algorithms, a computer program for training in audiometry was written and found to be useful for teaching purposes. To develop a flexible audiometer simulator software as a teaching and training tool for pure tone and speech audiometry, both with and without masking. First a set of algorithms, which allows a computer to determine the answers of a simulated, hearing-impaired patient, was developed. Then, the software was implemented. Extensive use was made of simple, editable text files to define all texts in the user interface and all patient definitions. The software 'audiometer simulator' is available for free download. It can be used to train pure tone audiometry (both with and without masking), speech audiometry, measurement of the uncomfortable level, and simple simulation tests. Due to the use of text files, the user can alter or add patient definitions and all texts and labels shown on the screen. So far, English, French, German, and Portuguese user interfaces are available and the user can choose between German or French speech audiometry.
Phonological Feature Repetition Suppression in the Left Inferior Frontal Gyrus.

PubMed

Okada, Kayoko; Matchin, William; Hickok, Gregory

2018-06-07

Models of speech production posit a role for the motor system, predominantly the posterior inferior frontal gyrus, in encoding complex phonological representations for speech production, at the phonemic, syllable, and word levels [Roelofs, A. A dorsal-pathway account of aphasic language production: The WEAVER++/ARC model. Cortex, 59(Suppl. C), 33-48, 2014; Hickok, G. Computational neuroanatomy of speech production. Nature Reviews Neuroscience, 13, 135-145, 2012; Guenther, F. H. Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39, 350-365, 2006]. However, phonological theory posits subphonemic units of representation, namely phonological features [Chomsky, N., & Halle, M. The sound pattern of English, 1968; Jakobson, R., Fant, G., & Halle, M. Preliminaries to speech analysis. The distinctive features and their correlates. Cambridge, MA: MIT Press, 1951], that specify independent articulatory parameters of speech sounds, such as place and manner of articulation. Therefore, motor brain systems may also incorporate phonological features into speech production planning units. Here, we add support for such a role with an fMRI experiment of word sequence production using a phonemic similarity manipulation. We adapted and modified the experimental paradigm of Oppenheim and Dell [Oppenheim, G. M., & Dell, G. S. Inner speech slips exhibit lexical bias, but not the phonemic similarity effect. Cognition, 106, 528-537, 2008; Oppenheim, G. M., & Dell, G. S. Motor movement matters: The flexible abstractness of inner speech. Memory & Cognition, 38, 1147-1160, 2010]. Participants silently articulated words cued by sequential visual presentation that varied in degree of phonological feature overlap in consonant onset position: high overlap (two shared phonological features; e.g., /r/ and /l/) or low overlap (one shared phonological feature, e.g., /r/ and /b/). We found a significant repetition suppression effect in the left posterior inferior frontal gyrus, with increased activation for phonologically dissimilar words compared with similar words. These results suggest that phonemes, particularly phonological features, are part of the planning units of the motor speech system.
Orthographic Learning and the Role of Text-to-Speech Software in Dutch Disabled Readers

ERIC Educational Resources Information Center

Staels, Eva; Van den Broeck, Wim

2015-01-01

In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's…
Comparing the Impact of Rates of Text-to-Speech Software on Reading Fluency and Comprehension for Adults with Reading Difficulties

ERIC Educational Resources Information Center

Coleman, Mari Beth; Killdare, Laura K.; Bell, Sherry Mee; Carter, Amanda M.

2014-01-01

The purpose of this study was to determine the impact of text-to-speech software on reading fluency and comprehension for four postsecondary students with below average reading fluency and comprehension including three students diagnosed with learning disabilities and concomitant conditions (e.g., attention deficit hyperactivity disorder, seizure…
Using Text-to-Speech Reading Support for an Adult with Mild Aphasia and Cognitive Impairment

ERIC Educational Resources Information Center

Harvey, Judy; Hux, Karen; Snell, Jeffry

2013-01-01

This single case study served to examine text-to-speech (TTS) effects on reading rate and comprehension in an individual with mild aphasia and cognitive impairment. Findings showed faster reading, given TTS presented at a normal speaking rate, but no significant comprehension changes. TTS may support reading in people with aphasia when time…
From speech to thought: the neuronal basis of cognitive units in non-experimental, real-life communication investigated using ECoG

PubMed Central

Derix, Johanna; Iljina, Olga; Weiske, Johanna; Schulze-Bonhage, Andreas; Aertsen, Ad; Ball, Tonio

2014-01-01

Exchange of thoughts by means of expressive speech is fundamental to human communication. However, the neuronal basis of real-life communication in general, and of verbal exchange of ideas in particular, has rarely been studied until now. Here, our aim was to establish an approach for exploring the neuronal processes related to cognitive “idea” units (IUs) in conditions of non-experimental speech production. We investigated whether such units corresponding to single, coherent chunks of speech with syntactically-defined borders, are useful to unravel the neuronal mechanisms underlying real-world human cognition. To this aim, we employed simultaneous electrocorticography (ECoG) and video recordings obtained in pre-neurosurgical diagnostics of epilepsy patients. We transcribed non-experimental, daily hospital conversations, identified IUs in transcriptions of the patients' speech, classified the obtained IUs according to a previously-proposed taxonomy focusing on memory content, and investigated the underlying neuronal activity. In each of our three subjects, we were able to collect a large number of IUs which could be assigned to different functional IU subclasses with a high inter-rater agreement. Robust IU-onset-related changes in spectral magnitude could be observed in high gamma frequencies (70–150 Hz) on the inferior lateral convexity and in the superior temporal cortex regardless of the IU content. A comparison of the topography of these responses with mouth motor and speech areas identified by electrocortical stimulation showed that IUs might be of use for extraoperative mapping of eloquent cortex (average sensitivity: 44.4%, average specificity: 91.1%). High gamma responses specific to memory-related IU subclasses were observed in the inferior parietal and prefrontal regions. IU-based analysis of ECoG recordings during non-experimental communication thus elicits topographically- and functionally-specific effects. We conclude that segmentation of spontaneous real-world speech in linguistically-motivated units is a promising strategy for elucidating the neuronal basis of mental processing during non-experimental communication. PMID:24982625
A Smartphone Application for Customized Frequency Table Selection in Cochlear Implants.

PubMed

Jethanamest, Daniel; Azadpour, Mahan; Zeman, Annette M; Sagi, Elad; Svirsky, Mario A

2017-09-01

A novel smartphone-based software application can facilitate self-selection of frequency allocation tables (FAT) in postlingually deaf cochlear implant (CI) users. CIs use FATs to represent the tonotopic organization of a normal cochlea. Current CI fitting methods typically use a standard FAT for all patients regardless of individual differences in cochlear size and electrode location. In postlingually deaf patients, different amounts of mismatch can result between the frequency-place function they experienced when they had normal hearing and the frequency-place function that results from the standard FAT. For some CI users, an alternative FAT may enhance sound quality or speech perception. Currently, no widely available tools exist to aid real-time selection of different FATs. This study aims to develop a new smartphone tool for this purpose and to evaluate speech perception and sound quality measures in a pilot study of CI subjects using this application. A smartphone application for a widely available mobile platform (iOS) was developed to serve as a preprocessor of auditory input to a clinical CI speech processor and enable interactive real-time selection of FATs. The application's output was validated by measuring electrodograms for various inputs. A pilot study was conducted in six CI subjects. Speech perception was evaluated using word recognition tests. All subjects successfully used the portable application with their clinical speech processors to experience different FATs while listening to running speech. The users were all able to select one table that they judged provided the best sound quality. All subjects chose a FAT different from the standard FAT in their everyday clinical processor. Using the smartphone application, the mean consonant-nucleus-consonant score with the default FAT selection was 28.5% (SD 16.8) and 29.5% (SD 16.4) when using a self-selected FAT. A portable smartphone application enables CI users to self-select frequency allocation tables in real time. Even though the self-selected FATs that were deemed to have better sound quality were only tested acutely (i.e., without long-term experience with them), speech perception scores were not inferior to those obtained with the clinical FATs. This software application may be a valuable tool for improving future methods of CI fitting.
Acoustics of Clear Speech: Effect of Instruction

ERIC Educational Resources Information Center

Lam, Jennifer; Tjaden, Kris; Wilding, Greg

2012-01-01

Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…
Self-Esteem in Children with Speech and Language Impairment: An Exploratory Study of Transition from Language Units to Mainstream School

ERIC Educational Resources Information Center

Rannard, Anne; Glenn, Sheila

2009-01-01

Little is known about the self-perceptions of children moving from language units to mainstream school. This longitudinal exploratory study examined the effects of transition on perceptions of competence and acceptance in one group of children with speech and language impairment. Seven children and their teachers completed the Pictorial Scale of…
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers

PubMed Central

Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with “quasi-unvoiced” or with “quasi-voiced” initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%. PMID:28926572
Automatic initial and final segmentation in cleft palate speech of Mandarin speakers.

PubMed

He, Ling; Liu, Yin; Yin, Heng; Zhang, Junpeng; Zhang, Jing; Zhang, Jiang

2017-01-01

The speech unit segmentation is an important pre-processing step in the analysis of cleft palate speech. In Mandarin, one syllable is composed of two parts: initial and final. In cleft palate speech, the resonance disorders occur at the finals and the voiced initials, while the articulation disorders occur at the unvoiced initials. Thus, the initials and finals are the minimum speech units, which could reflect the characteristics of cleft palate speech disorders. In this work, an automatic initial/final segmentation method is proposed. It is an important preprocessing step in cleft palate speech signal processing. The tested cleft palate speech utterances are collected from the Cleft Palate Speech Treatment Center in the Hospital of Stomatology, Sichuan University, which has the largest cleft palate patients in China. The cleft palate speech data includes 824 speech segments, and the control samples contain 228 speech segments. The syllables are extracted from the speech utterances firstly. The proposed syllable extraction method avoids the training stage, and achieves a good performance for both voiced and unvoiced speech. Then, the syllables are classified into with "quasi-unvoiced" or with "quasi-voiced" initials. Respective initial/final segmentation methods are proposed to these two types of syllables. Moreover, a two-step segmentation method is proposed. The rough locations of syllable and initial/final boundaries are refined in the second segmentation step, in order to improve the robustness of segmentation accuracy. The experiments show that the initial/final segmentation accuracies for syllables with quasi-unvoiced initials are higher than quasi-voiced initials. For the cleft palate speech, the mean time error is 4.4ms for syllables with quasi-unvoiced initials, and 25.7ms for syllables with quasi-voiced initials, and the correct segmentation accuracy P30 for all the syllables is 91.69%. For the control samples, P30 for all the syllables is 91.24%.
Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study

ERIC Educational Resources Information Center

Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle

2012-01-01

In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
Business Speech, Language Arts, Business English: 5128.21.

ERIC Educational Resources Information Center

Dade County Public Schools, Miami, FL.

Developed as part of a high school quinmester unit on business speech, this guide provides the teacher with teaching strategies for a course designed to help people in the business world. The course covers the preparation and delivery of a speech and other business situations which require skill in speaking (sales techniques, committee and group…
Using Text-to-Speech (TTS) for Audio Computer-Assisted Self-Interviewing (ACASI)

ERIC Educational Resources Information Center

Couper, Mick P.; Berglund, Patricia; Kirgis, Nicole; Buageila, Sarrah

2016-01-01

We evaluate the use of text-to-speech (TTS) technology for audio computer-assisted self-interviewing (ACASI). We use a quasi-experimental design, comparing the use of recorded human voice in the 2006-2010 National Survey of Family Growth with the use of TTS in the first year of the 2011-2013 survey, where the essential survey conditions are…
Applications of Speech-to-Text Recognition and Computer-Aided Translation for Facilitating Cross-Cultural Learning through a Learning Activity: Issues and Their Solutions

ERIC Educational Resources Information Center

Shadiev, Rustam; Wu, Ting-Ting; Sun, Ai; Huang, Yueh-Min

2018-01-01

In this study, 21 university students, who represented thirteen nationalities, participated in an online cross-cultural learning activity. The participants were engaged in interactions and exchanges carried out on Facebook® and Skype® platforms, and their multilingual communications were supported by speech-to-text recognition (STR) and…
Public Speaking Apprehension, Decision-Making Errors in the Selection of Speech Introduction Strategies and Adherence to Strategy.

ERIC Educational Resources Information Center

Beatty, Michael J.

1988-01-01

Examines the choice-making processes of students engaged in the selection of speech introduction strategies. Finds that the frequency of students making decision-making errors was a positive function of public speaking apprehension. (MS)
Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition

PubMed Central

Norman-Haignere, Sam

2015-01-01

SUMMARY The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles (“components”) whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech. PMID:26687225

Distinct Cortical Pathways for Music and Speech Revealed by Hypothesis-Free Voxel Decomposition.

PubMed

Norman-Haignere, Sam; Kanwisher, Nancy G; McDermott, Josh H

2015-12-16

The organization of human auditory cortex remains unresolved, due in part to the small stimulus sets common to fMRI studies and the overlap of neural populations within voxels. To address these challenges, we measured fMRI responses to 165 natural sounds and inferred canonical response profiles ("components") whose weighted combinations explained voxel responses throughout auditory cortex. This analysis revealed six components, each with interpretable response characteristics despite being unconstrained by prior functional hypotheses. Four components embodied selectivity for particular acoustic features (frequency, spectrotemporal modulation, pitch). Two others exhibited pronounced selectivity for music and speech, respectively, and were not explainable by standard acoustic features. Anatomically, music and speech selectivity concentrated in distinct regions of non-primary auditory cortex. However, music selectivity was weak in raw voxel responses, and its detection required a decomposition method. Voxel decomposition identifies primary dimensions of response variation across natural sounds, revealing distinct cortical pathways for music and speech. Copyright © 2015 Elsevier Inc. All rights reserved.
The effects of speech production and vocabulary training on different components of spoken language performance.

PubMed

Paatsch, Louise E; Blamey, Peter J; Sarant, Julia Z; Bow, Catherine P

2006-01-01

A group of 21 hard-of-hearing and deaf children attending primary school were trained by their teachers on the production of selected consonants and on the meanings of selected words. Speech production, vocabulary knowledge, reading aloud, and speech perception measures were obtained before and after each type of training. The speech production training produced a small but significant improvement in the percentage of consonants correctly produced in words. The vocabulary training improved knowledge of word meanings substantially. Performance on speech perception and reading aloud were significantly improved by both types of training. These results were in accord with the predictions of a mathematical model put forward to describe the relationships between speech perception, speech production, and language measures in children (Paatsch, Blamey, Sarant, Martin, & Bow, 2004). These training data demonstrate that the relationships between the measures are causal. In other words, improvements in speech production and vocabulary performance produced by training will carry over into predictable improvements in speech perception and reading scores. Furthermore, the model will help educators identify the most effective methods of improving receptive and expressive spoken language for individual children who are deaf or hard of hearing.
Multiword Lexical Units and Their Relationship to Impromptu Speech

ERIC Educational Resources Information Center

Hsu, Jeng-yih

2007-01-01

Public speaking can be very threatening to any native speakers of English, not to mention non-native EFL learners. Impromptu speech, perhaps the most challenging form of public speaking, is however being promoted in every city of the EFL countries. The case in Taiwan is no exceptional. Every year, dozens of impromptu speech contexts are held…
A Task Analysis for Teaching the Organization of an Informative Speech.

ERIC Educational Resources Information Center

Parks, Arlie Muller

The purpose of this paper is to demonstrate a task analysis of the objectives needed to organize an effective information-giving speech. A hierarchical structure of the behaviors needed to deliver a well-organized extemporaneous information-giving speech is presented, with some behaviors as subtasks for the unit objective and the others as…
Color- and motion-specific units in the tectum opticum of goldfish.

PubMed

Gruber, Morna; Behrend, Konstantin; Neumeyer, Christa

2016-01-05

Extracellular recordings were performed from 69 units at different depths between 50 and [Formula: see text]m below the surface of tectum opticum in goldfish. Using large field stimuli (86[Formula: see text] visual angle) of 21 colored HKS-papers we were able to record from 54 color-sensitive units. The colored papers were presented for 5[Formula: see text]s each. They were arranged in the sequence of the color circle in humans separated by gray of medium brightness. We found 22 units with best responses between orange, red and pink. About 12 of these red-sensitive units were of the opponent "red-ON/blue-green-OFF" type as found in retinal bipolar- and ganglion cells as well. Most of them were also activated or inhibited by black and/or white. Some units responded specifically to red either with activation or inhibition. 18 units were sensitive to blue and/or green, 10 of them to both colors and most of them to black as well. They were inhibited by red, and belonged to the opponent "blue-green-ON/red-OFF" type. Other units responded more selectively either to blue, to green or to purple. Two units were selectively sensitive to yellow. A total of 15 units were sensitive to motion, stimulated by an excentrically rotating black and white random dot pattern. Activity of these units was also large when a red-green random dot pattern of high L-cone contrast was used. Activity dropped to zero when the red-green pattern did not modulate the L-cones. Neither of these motion selective units responded to any color. The results directly show color-blindness of motion vision, and confirm the hypothesis of separate and parallel processing of "color" and "motion".
Does computer-synthesized speech manifest personality? Experimental tests of recognition, similarity-attraction, and consistency-attraction.

PubMed

Nass, C; Lee, K M

2001-09-01

Would people exhibit similarity-attraction and consistency-attraction toward unambiguously computer-generated speech even when personality is clearly not relevant? In Experiment 1, participants (extrovert or introvert) heard a synthesized voice (extrovert or introvert) on a book-buying Web site. Participants accurately recognized personality cues in text to speech and showed similarity-attraction in their evaluation of the computer voice, the book reviews, and the reviewer. Experiment 2, in a Web auction context, added personality of the text to the previous design. The results replicated Experiment 1 and demonstrated consistency (voice and text personality)-attraction. To maximize liking and trust, designers should set parameters, for example, words per minute or frequency range, that create a personality that is consistent with the user and the content being presented.
The Effect of English Verbal Songs on Connected Speech Aspects of Adult English Learners' Speech Production

ERIC Educational Resources Information Center

Ashtiani, Farshid Tayari; Zafarghandi, Amir Mahdavi

2015-01-01

The present study was an attempt to investigate the impact of English verbal songs on connected speech aspects of adult English learners' speech production. 40 participants were selected based on the results of their performance in a piloted and validated version of NELSON test given to 60 intermediate English learners in a language institute in…
Accountability Steps for Highly Reluctant Speech: Tiered-Services Consultation in a Head Start Classroom

ERIC Educational Resources Information Center

Howe, Heather; Barnett, David

2013-01-01

This consultation description reports parent and teacher problem solving for a preschool child with no typical speech directed to teachers or peers, and, by parent report, normal speech at home. This child's initial pattern of speech was similar to selective mutism, a low-incidence disorder often first detected during the preschool years, but…
Augmentative and Alternative Communication in Autism: A Comparison of the Picture Exchange Communication System and Speech-Output Technology

ERIC Educational Resources Information Center

Boesch, Miriam Chacon

2011-01-01

The purpose of this comparative efficacy study was to investigate the Picture Exchange Communication System (PECS) and a speech-generating device (SGD) in developing requesting skills, social-communicative behavior, and speech for three elementary-age children with severe autism and little to no functional speech. Requesting was selected as the…
The Effect of Speech-to-Text Technology on Learning a Writing Strategy

ERIC Educational Resources Information Center

Haug, Katrina N.; Klein, Perry D.

2018-01-01

Previous research has shown that speech-to-text (STT) software can support students in producing a given piece of writing. This is the 1st study to investigate the use of STT to teach a writing strategy. We pretested 45 Grade 5 students on argument writing and trained them to use STT. Students participated in 4 lessons on an argument writing…
[Design of standard voice sample text for subjective auditory perceptual evaluation of voice disorders].

PubMed

Li, Jin-rang; Sun, Yan-yan; Xu, Wen

2010-09-01

To design a speech voice sample text with all phonemes in Mandarin for subjective auditory perceptual evaluation of voice disorders. The principles for design of a speech voice sample text are: The short text should include the 21 initials and 39 finals, this may cover all the phonemes in Mandarin. Also, the short text should have some meanings. A short text was made out. It had 155 Chinese words, and included 21 initials and 38 finals (the final, ê, was not included because it was rarely used in Mandarin). Also, the text covered 17 light tones and one "Erhua". The constituent ratios of the initials and finals presented in this short text were statistically similar as those in Mandarin according to the method of similarity of the sample and population (r = 0.742, P < 0.001 and r = 0.844, P < 0.001, respectively). The constituent ratios of the tones presented in this short text were statistically not similar as those in Mandarin (r = 0.731, P > 0.05). A speech voice sample text with all phonemes in Mandarin was made out. The constituent ratios of the initials and finals presented in this short text are similar as those in Mandarin. Its value for subjective auditory perceptual evaluation of voice disorders need further study.
Attention Is Required for Knowledge-Based Sequential Grouping: Insights from the Integration of Syllables into Words.

PubMed

Ding, Nai; Pan, Xunyi; Luo, Cheng; Su, Naifei; Zhang, Wen; Zhang, Jianfeng

2018-01-31

How the brain groups sequential sensory events into chunks is a fundamental question in cognitive neuroscience. This study investigates whether top-down attention or specific tasks are required for the brain to apply lexical knowledge to group syllables into words. Neural responses tracking the syllabic and word rhythms of a rhythmic speech sequence were concurrently monitored using electroencephalography (EEG). The participants performed different tasks, attending to either the rhythmic speech sequence or a distractor, which was another speech stream or a nonlinguistic auditory/visual stimulus. Attention to speech, but not a lexical-meaning-related task, was required for reliable neural tracking of words, even when the distractor was a nonlinguistic stimulus presented cross-modally. Neural tracking of syllables, however, was reliably observed in all tested conditions. These results strongly suggest that neural encoding of individual auditory events (i.e., syllables) is automatic, while knowledge-based construction of temporal chunks (i.e., words) crucially relies on top-down attention. SIGNIFICANCE STATEMENT Why we cannot understand speech when not paying attention is an old question in psychology and cognitive neuroscience. Speech processing is a complex process that involves multiple stages, e.g., hearing and analyzing the speech sound, recognizing words, and combining words into phrases and sentences. The current study investigates which speech-processing stage is blocked when we do not listen carefully. We show that the brain can reliably encode syllables, basic units of speech sounds, even when we do not pay attention. Nevertheless, when distracted, the brain cannot group syllables into multisyllabic words, which are basic units for speech meaning. Therefore, the process of converting speech sound into meaning crucially relies on attention. Copyright © 2018 the authors 0270-6474/18/381178-11$15.00/0.
Brain Oscillations during Semantic Evaluation of Speech

ERIC Educational Resources Information Center

Shahin, Antoine J.; Picton, Terence W.; Miller, Lee M.

2009-01-01

Changes in oscillatory brain activity have been related to perceptual and cognitive processes such as selective attention and memory matching. Here we examined brain oscillations, measured with electroencephalography (EEG), during a semantic speech processing task that required both lexically mediated memory matching and selective attention.…
Experimental comparison between speech transmission index, rapid speech transmission index, and speech intelligibility index.

PubMed

Larm, Petra; Hongisto, Valtteri

2006-02-01

During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Effects of Dictation, Speech to Text, and Handwriting on the Written Composition of Elementary School English Language Learners

ERIC Educational Resources Information Center

Arcon, Nina; Klein, Perry D.; Dombroski, Jill D.

2017-01-01

Previous research has shown that both dictation and speech-to-text (STT) software can increase the quality of writing for native English speakers. The purpose of this study was to investigate the effect of these modalities on the written composition and cognitive load of elementary school English language learners (ELLs). In a within-subjects…
Emerging Realities of Text-to-Speech Software for Nonnative-English-Speaking Community College Students in the Freshman Year

ERIC Educational Resources Information Center

Baker, Fiona S.

2015-01-01

This study explores the expectations and early and subsequent realities of text-to-speech software for 24 nonnative-English-speaking college students who were experiencing reading difficulties in their freshman year of college. The study took place over two semesters in one academic year (from September to June) at a community college on the…
Comparing live to recorded speech in training the perception of spectrally shifted noise-vocoded speech.

PubMed

Faulkner, Andrew; Rosen, Stuart; Green, Tim

2012-10-01

Two experimental groups were trained for 2 h with live or recorded speech that was noise-vocoded and spectrally shifted and was from the same text and talker. These two groups showed equivalent improvements in performance for vocoded and shifted sentences, and the group trained with recorded speech showed consistently greater improvements than untrained controls. Another group trained with unshifted noise-vocoded speech improved no more than untrained controls. Computer-based training thus appears at least as effective as labor-intensive live-voice training for improving the perception of spectrally shifted noise-vocoded speech, and by implication, for training of users of cochlear implants.
Selective spatial attention modulates bottom-up informational masking of speech

PubMed Central

Carlile, Simon; Corkhill, Caitlin

2015-01-01

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100
Selective spatial attention modulates bottom-up informational masking of speech.

PubMed

Carlile, Simon; Corkhill, Caitlin

2015-03-02

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Illustrated Speech Anatomy.

ERIC Educational Resources Information Center

Shearer, William M.

Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…

A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record

PubMed Central

Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa

2002-01-01

Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.
First Amendment Speech and Press Theory: Preferred Position Postulate Reexamined.

ERIC Educational Resources Information Center

Stonecipher, Harry W.

If the United States Supreme Court is to exercise its historic role as guardian of the fundamental freedoms flowing from the speech and press clauses of the first amendment, it is imperative that those basic freedoms be placed in a preferred position. The preferred position doctrine provides adequate safeguards for both speech and press guarantees…
Modelling the Architecture of Phonetic Plans: Evidence from Apraxia of Speech

ERIC Educational Resources Information Center

Ziegler, Wolfram

2009-01-01

In theories of spoken language production, the gestural code prescribing the movements of the speech organs is usually viewed as a linear string of holistic, encapsulated, hard-wired, phonetic plans, e.g., of the size of phonemes or syllables. Interactions between phonetic units on the surface of overt speech are commonly attributed to either the…
Error Consistency in Acquired Apraxia of Speech with Aphasia: Effects of the Analysis Unit

ERIC Educational Resources Information Center

Haley, Katarina L.; Cunningham, Kevin T.; Eaton, Catherine Torrington; Jacks, Adam

2018-01-01

Purpose: Diagnostic recommendations for acquired apraxia of speech (AOS) have been contradictory concerning whether speech sound errors are consistent or variable. Studies have reported divergent findings that, on face value, could argue either for or against error consistency as a diagnostic criterion. The purpose of this study was to explain…
A Guide to Clinical Services in Speech Pathology and Audiology.

ERIC Educational Resources Information Center

Rehabilitation Services Administration (DHEW), Washington, DC.

A listing of speech pathology and audiology services in the United States, the guide includes the names of 910 clinics and of 216 members of the American Speech and Hearing Association who are engaged in full time private practice. Arranged geographically, by state and city, the guide specifies the following for each clinic: official name,…
Proceedings of the Speech Communication Association Summer Conference: Mini Courses in Speech Communication (7th, Chicago, July 8-10, 1971).

ERIC Educational Resources Information Center

Jeffrey, Robert C., Ed.

The Speech Communication Association's 1971 summer conference provided instruction in the application of basic research and innovative practices in communication. It was designed to assist elementary, secondary, and college teachers in the enrichment of content and procedures. The proceedings include syllabi, course units, and bibliographic…
Flexible retrospective selection of temporal resolution in real-time speech MRI using a golden-ratio spiral view order.

PubMed

Kim, Yoon-Chul; Narayanan, Shrikanth S; Nayak, Krishna S

2011-05-01

In speech production research using real-time magnetic resonance imaging (MRI), the analysis of articulatory dynamics is performed retrospectively. A flexible selection of temporal resolution is highly desirable because of natural variations in speech rate and variations in the speed of different articulators. The purpose of the study is to demonstrate a first application of golden-ratio spiral temporal view order to real-time speech MRI and investigate its performance by comparison with conventional bit-reversed temporal view order. Golden-ratio view order proved to be more effective at capturing the dynamics of rapid tongue tip motion. A method for automated blockwise selection of temporal resolution is presented that enables the synthesis of a single video from multiple temporal resolution videos and potentially facilitates subsequent vocal tract shape analysis. Copyright © 2010 Wiley-Liss, Inc.
Speech-Act and Text-Act Theory: "Theme-ing" in Freshman Composition.

ERIC Educational Resources Information Center

Horner, Winifred B.

In contrast to a speech-act theory that is limited by a simple speaker/hearer relationship, a text-act theory of written language allows for the historical or personal context of a writer and reader, both in the written work itself and in the act of reading. This theory can be applied to theme writing, essay examinations, and revision in the…
Portable Speech Synthesizer

NASA Technical Reports Server (NTRS)

Leibfritz, Gilbert H.; Larson, Howard K.

1987-01-01

Compact speech synthesizer useful traveling companion to speech-handicapped. User simply enters statement on board, and synthesizer converts statement into spoken words. Battery-powered and housed in briefcase, easily carried on trips. Unit used on telephones and face-to-face communication. Synthesizer consists of micro-computer with memory-expansion module, speech-synthesizer circuit, batteries, recharger, dc-to-dc converter, and telephone amplifier. Components, commercially available, fit neatly in 17-by 13-by 5-in. briefcase. Weighs about 20 lb (9 kg) and operates and recharges from ac receptable.
Orthographic learning and the role of text-to-speech software in Dutch disabled readers.

PubMed

Staels, Eva; Van den Broeck, Wim

2015-01-01

In this study, we examined whether orthographic learning can be demonstrated in disabled readers learning to read in a transparent orthography (Dutch). In addition, we tested the effect of the use of text-to-speech software, a new form of direct instruction, on orthographic learning. Both research goals were investigated by replicating Share's self-teaching paradigm. A total of 65 disabled Dutch readers were asked to read eight stories containing embedded homophonic pseudoword targets (e.g., Blot/Blod), with or without the support of text-to-speech software. The amount of orthographic learning was assessed 3 or 7 days later by three measures of orthographic learning. First, the results supported the presence of orthographic learning during independent silent reading by demonstrating that target spellings were correctly identified more often, named more quickly, and spelled more accurately than their homophone foils. Our results support the hypothesis that all readers, even poor readers of transparent orthographies, are capable of developing word-specific knowledge. Second, a negative effect of text-to-speech software on orthographic learning was demonstrated in this study. This negative effect was interpreted as the consequence of passively listening to the auditory presentation of the text. We clarify how these results can be interpreted within current theoretical accounts of orthographic learning and briefly discuss implications for remedial interventions. © Hammill Institute on Disabilities 2013.
Influences of selective adaptation on perception of audiovisual speech

PubMed Central

Dias, James W.; Cook, Theresa C.; Rosenblum, Lawrence D.

2016-01-01

Research suggests that selective adaptation in speech is a low-level process dependent on sensory-specific information shared between the adaptor and test-stimuli. However, previous research has only examined how adaptors shift perception of unimodal test stimuli, either auditory or visual. In the current series of experiments, we investigated whether adaptation to cross-sensory phonetic information can influence perception of integrated audio-visual phonetic information. We examined how selective adaptation to audio and visual adaptors shift perception of speech along an audiovisual test continuum. This test-continuum consisted of nine audio-/ba/-visual-/va/ stimuli, ranging in visual clarity of the mouth. When the mouth was clearly visible, perceivers “heard” the audio-visual stimulus as an integrated “va” percept 93.7% of the time (e.g., McGurk & MacDonald, 1976). As visibility of the mouth became less clear across the nine-item continuum, the audio-visual “va” percept weakened, resulting in a continuum ranging in audio-visual percepts from /va/ to /ba/. Perception of the test-stimuli was tested before and after adaptation. Changes in audiovisual speech perception were observed following adaptation to visual-/va/ and audiovisual-/va/, but not following adaptation to auditory-/va/, auditory-/ba/, or visual-/ba/. Adaptation modulates perception of integrated audio-visual speech by modulating the processing of sensory-specific information. The results suggest that auditory and visual speech information are not completely integrated at the level of selective adaptation. PMID:27041781
Alternative Speech Communication System for Persons with Severe Speech Disorders

NASA Astrophysics Data System (ADS)

Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas

2009-12-01

Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Teaching the Tyrants: Perspectives on Freedom of Speech and Undergraduates.

ERIC Educational Resources Information Center

Herbeck, Dale A.

Teaching freedom of speech to undergraduates is a difficult task, in part as a result of the challenging history of free expression in the United States. The difficulty is compounded by the need to teach the topic, in contrast to indoctrinating the students in an ideology of free speech. The Bill of Rights, and specifically the First Amendment,…
Evolution of speech-specific cognitive adaptations.

PubMed

de Boer, Bart

2015-01-01

This paper argues that an evolutionary perspective is natural when investigating cognitive adaptations related to language. This is because there appears to be correspondence between traits that linguists consider interesting and traits that have undergone selective pressure related to language. The paper briefly reviews theoretical results that shed light on what kind of adaptations we can expect to have evolved and then reviews concrete work related to the evolution of adaptations for combinatorial speech. It turns out that there is as yet no strong direct evidence for cognitive traits that have undergone selection related to speech, but there is indirect evidence that indicates selection. However, the traits that may have undergone selection are expected to be continuously variable ones, rather than the discrete ones that linguists have focused on traditionally.
Online EEG Classification of Covert Speech for Brain-Computer Interfacing.

PubMed

Sereshkeh, Alborz Rezazadeh; Trott, Robert; Bricout, Aurélien; Chau, Tom

2017-12-01

Brain-computer interfaces (BCIs) for communication can be nonintuitive, often requiring the performance of hand motor imagery or some other conversation-irrelevant task. In this paper, electroencephalography (EEG) was used to develop two intuitive online BCIs based solely on covert speech. The goal of the first BCI was to differentiate between 10[Formula: see text]s of mental repetitions of the word "no" and an equivalent duration of unconstrained rest. The second BCI was designed to discern between 10[Formula: see text]s each of covert repetition of the words "yes" and "no". Twelve participants used these two BCIs to answer yes or no questions. Each participant completed four sessions, comprising two offline training sessions and two online sessions, one for testing each of the BCIs. With a support vector machine and a combination of spectral and time-frequency features, an average accuracy of [Formula: see text] was reached across participants in the online classification of no versus rest, with 10 out of 12 participants surpassing the chance level (60.0% for [Formula: see text]). The online classification of yes versus no yielded an average accuracy of [Formula: see text], with eight participants exceeding the chance level. Task-specific changes in EEG beta and gamma power in language-related brain areas tended to provide discriminatory information. To our knowledge, this is the first report of online EEG classification of covert speech. Our findings support further study of covert speech as a BCI activation task, potentially leading to the development of more intuitive BCIs for communication.
Team Training through Communications Control

DTIC Science & Technology

1982-02-01

training * operational environment * team training research issues * training approach * team communications * models of operator beharior e...on the market soon, it certainly would be investigated carefully for its applicability to the team training problem. ce A text-to-speech voice...generation system. Votrax has recently marketed such a device, and others may soon follow suit. ’ d. A speech replay system designed to produce speech from
Index to NASA news releases and speeches, 1993

NASA Technical Reports Server (NTRS)

1994-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1993. The index is arranged in six sections: subject index, personal names index, news release number index, accession number index, speeches, and news releases.
Index to NASA news releases and speeches, 1987

NASA Technical Reports Server (NTRS)

1988-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1987. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number Index, and Speeches and News Releases.
Index to NASA news releases and speeches, 1989

NASA Technical Reports Server (NTRS)

1990-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1989. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number Index, and Speeches and News Releases.
Index to NASA news releases and speeches, 1988

NASA Technical Reports Server (NTRS)

1989-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1988. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number Index, and Speeches and News Releases.

Index to NASA news releases and speeches, 1986

NASA Technical Reports Server (NTRS)

1987-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1986. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number Index, and Speeches and News Releases.
Index to NASA news releases and speeches, 1991

NASA Technical Reports Server (NTRS)

1992-01-01

This issue of the annual index to NASA Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1991. The index is arranged in six sections: Subject Index, Personal Name Index, News Release Number Index, Accession Number Index, and Speeches and News Releases Indices.
Index to NASA news releases and speeches, 1990

NASA Technical Reports Server (NTRS)

1991-01-01

This issue of the annual Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of headquarters staff during 1990. The index is arranged in six sections: Subject Index, Personal Names Index, News Release Number Index, Accession Number, Speeches, and New Releases Indices.
Index to NASA news releases and speeches, 1992

NASA Technical Reports Server (NTRS)

1993-01-01

This issue of the Index to NASA News Releases and Speeches contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1992. The index is arranged in six sections: subject index, personal names index, news release number index, accession number index, speeches, and news releases.
Functional assessment and treatment of perseverative speech about restricted topics in an adolescent with Asperger syndrome.

PubMed

Fisher, Wayne W; Rodriguez, Nicole M; Owen, Todd M

2013-01-01

A functional analysis showed that a 14-year-old boy with Asperger syndrome displayed perseverative speech (or "restricted interests") reinforced by attention. To promote appropriate speech in a turn-taking format, we implemented differential reinforcement (DR) of nonperseverative speech and DR of on-topic speech within a multiple schedule with stimuli that signaled the contingencies in effect and who was to select the topic. Both treatments reduced perseverative speech, but only DR of on-topic speech increased appropriate turn taking during conversation. Treatment effects were maintained when implemented by family members and novel therapists. © Society for the Experimental Analysis of Behavior.
[Parent's perspective on child rearing and corporal punishment].

PubMed

Donoso, Miguir Terezinha Vieccelli; Ricas, Janete

2009-02-01

To describe parents' current perception of corporal punishment associated to child rearing and its practices. There were studied 31 family members whose children were warded due to child abuse complaints (12) and not warded (19) at a health care unit and a local social service unit in the city of Belo Horizonte (Southeastern Brazil) in 2006. Data was collected through semi-structured interviews and speech analysis was performed grouped by subjects and categories. ANALYSIS OF DISCOURSE: There was limitation of the respondents' speeches based on their production means. There was a diversity of conceptions on child rearing and its practices and corporal punishment was reported by all parents, even among those who expressed strong disapproval of this practice. Speeches were characterized by heterogeneity and polyphony with emphasis on the tradition speech, the religious speech and the popular scientific speech. Respondents did not express concepts of legal interdiction of corporal punishment or its excesses. The culture of corporal punishment of children is changing; tradition approving it has weakened and prohibition has been slowly adopted. Reinforcing legal actions against this practice can contribute to speed up the process to end corporal punishment of children.
Preliminary evaluation of synthetic speech

DOT National Transportation Integrated Search

1972-08-01

The report briefly discusses the methods for storing and generating synthetic speech and a preliminary evaluation of the intelligibility of a speech synthesizer having a 75-word vocabulary selected for air traffic control messages. A program is sugge...
Hate Speech: The History of an American Controversy.

ERIC Educational Resources Information Center

Walker, Samuel

Noting that no other country in the world offers protection to offensive speech, this book provides a comprehensive account of the history of the hate speech controversy in the United States. The book examines the issue, from the conflicts over the Ku Klux Klan in the 1920s and American Nazi groups in the 1930s, to the famous Skokie, Illinois…
Speech Recognition Thresholds for Multilingual Populations.

ERIC Educational Resources Information Center

Ramkissoon, Ishara

2001-01-01

This article traces the development of speech audiometry in the United States and reports on the current status, focusing on the needs of a multilingual population in terms of measuring speech recognition threshold (SRT). It also discusses sociolinguistic considerations, alternative SRT stimuli for second language learners, and research on using…
What History Means to Us: A Comparison of American and German Attitudes toward History. German Studies Notes.

ERIC Educational Resources Information Center

Moltmann, Gunter

The document presents the text of a speech comparing American and German attitudes toward history, followed by a discussion of issues raised in the speech by conference participants. The first part of the speech identifies aspects of American and German history which are of importance to citizens of each country. American history is characterized…
Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages.

PubMed

Jadoul, Yannick; Ravignani, Andrea; Thompson, Bill; Filippi, Piera; de Boer, Bart

2016-01-01

Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure-regularities arising in an ordered series of syllable timings-testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint.
Seeking Temporal Predictability in Speech: Comparing Statistical Approaches on 18 World Languages

PubMed Central

Jadoul, Yannick; Ravignani, Andrea; Thompson, Bill; Filippi, Piera; de Boer, Bart

2016-01-01

Temporal regularities in speech, such as interdependencies in the timing of speech events, are thought to scaffold early acquisition of the building blocks in speech. By providing on-line clues to the location and duration of upcoming syllables, temporal structure may aid segmentation and clustering of continuous speech into separable units. This hypothesis tacitly assumes that learners exploit predictability in the temporal structure of speech. Existing measures of speech timing tend to focus on first-order regularities among adjacent units, and are overly sensitive to idiosyncrasies in the data they describe. Here, we compare several statistical methods on a sample of 18 languages, testing whether syllable occurrence is predictable over time. Rather than looking for differences between languages, we aim to find across languages (using clearly defined acoustic, rather than orthographic, measures), temporal predictability in the speech signal which could be exploited by a language learner. First, we analyse distributional regularities using two novel techniques: a Bayesian ideal learner analysis, and a simple distributional measure. Second, we model higher-order temporal structure—regularities arising in an ordered series of syllable timings—testing the hypothesis that non-adjacent temporal structures may explain the gap between subjectively-perceived temporal regularities, and the absence of universally-accepted lower-order objective measures. Together, our analyses provide limited evidence for predictability at different time scales, though higher-order predictability is difficult to reliably infer. We conclude that temporal predictability in speech may well arise from a combination of individually weak perceptual cues at multiple structural levels, but is challenging to pinpoint. PMID:27994544
Development of Selective Auditory Attention Skills in Children.

ERIC Educational Resources Information Center

Cherry, Rochelle Silberzweig

1981-01-01

Fifty-three children (ages 5-9) were individually tested on their ability to select pictures of monosyllabic words presented diotically via headphones. Tasks were presented in quiet and under three noise (distractor) conditions: white noise, speech backwards, and speech forward. Age and type of distractor significantly influenced test scores.…
2009 PEPNet Postsecondary Interpreting and Speech-to-Text Survey Summary. Advancing Educational Opportunities for People Who Are Deaf or Hard of Hearing

ERIC Educational Resources Information Center

Riehl, Bambi

2010-01-01

PEPNet gets frequent requests for information about interpreter/speech-to-text position development. Determining or analyzing a salary is often part of this challenge and the people at PEPNet trust the information in this paper will be helpful to postsecondary programs. This is the sixth survey produced by PEPNet-Midwest and the University of…
Reduced auditory efferent activity in childhood selective mutism.

PubMed

Bar-Haim, Yair; Henkin, Yael; Ari-Even-Roth, Daphne; Tetin-Schneider, Simona; Hildesheimer, Minka; Muchnik, Chava

2004-06-01

Selective mutism is a psychiatric disorder of childhood characterized by consistent inability to speak in specific situations despite the ability to speak normally in others. The objective of this study was to test whether reduced auditory efferent activity, which may have direct bearings on speaking behavior, is compromised in selectively mute children. Participants were 16 children with selective mutism and 16 normally developing control children matched for age and gender. All children were tested for pure-tone audiometry, speech reception thresholds, speech discrimination, middle-ear acoustic reflex thresholds and decay function, transient evoked otoacoustic emission, suppression of transient evoked otoacoustic emission, and auditory brainstem response. Compared with control children, selectively mute children displayed specific deficiencies in auditory efferent activity. These aberrations in efferent activity appear along with normal pure-tone and speech audiometry and normal brainstem transmission as indicated by auditory brainstem response latencies. The diminished auditory efferent activity detected in some children with SM may result in desensitization of their auditory pathways by self-vocalization and in reduced control of masking and distortion of incoming speech sounds. These children may gradually learn to restrict vocalization to the minimal amount possible in contexts that require complex auditory processing.
Evaluation of selected speech parameters after prosthesis supply in patients with maxillary or mandibular defects.

PubMed

Müller, Rainer; Höhlein, Andreas; Wolf, Annette; Markwardt, Jutta; Schulz, Matthias C; Range, Ursula; Reitemeier, Bernd

2013-01-01

Ablative surgery of oropharyngeal tumors frequently leads to defects in the speech organs, resulting in impairment of speech up to the point of unintelligibility. The aim of the present study was the assessment of selected parameters of speech with and without resection prostheses. The speech sounds of 22 patients suffering from maxillary and mandibular defects were recorded using a digital audio tape (DAT) recorder with and without resection prostheses. Evaluation of the resonance and the production of the sounds /s/, /sch/, and /ch/ was performed by 2 experienced speech therapists. Additionally, the patients completed a non-standardized questionnaire containing a linguistic self-assessment. After prosthesis supply, the number of patients with rhinophonia aperta decreased from 7 to 2 while the number of patients with intelligible speech increased from 2 to 20. Correct production of the sounds /s/, /sch/, and /ch/ increased from 2 to 13 patients. A significant improvement of the evaluated parameters could be observed only in patients with maxillary defects. The linguistic self-assessment showed a higher satisfaction in patients with maxillary defects. In patients with maxillary defects due to ablative tumor surgery, an increase in speech performance and intelligibility is possible by supplying resection prostheses. © 2013 S. Karger GmbH, Freiburg.
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise

ERIC Educational Resources Information Center

Lam, Boji P. W.; Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath

2017-01-01

Purpose: Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing…
Index to NASA news releases and speeches, 1984

NASA Technical Reports Server (NTRS)

1985-01-01

The Index to NASA News Releases and Speeches (1984) contains selected speeches and news releases issued by NASA Headquarters during the year 1984. The index was prepared by the NASA Scientific and Technical Information Facility operated for the National Aeronautical and Space Administration by PRC Government Information Systems.
Discriminating between auditory and motor cortical responses to speech and non-speech mouth sounds

PubMed Central

Agnew, Z.K.; McGettigan, C.; Scott, S.K.

2012-01-01

Several perspectives on speech perception posit a central role for the representation of articulations in speech comprehension, supported by evidence for premotor activation when participants listen to speech. However no experiments have directly tested whether motor responses mirror the profile of selective auditory cortical responses to native speech sounds, or whether motor and auditory areas respond in different ways to sounds. We used fMRI to investigate cortical responses to speech and non-speech mouth (ingressive click) sounds. Speech sounds activated bilateral superior temporal gyri more than other sounds, a profile not seen in motor and premotor cortices. These results suggest that there are qualitative differences in the ways that temporal and motor areas are activated by speech and click sounds: anterior temporal lobe areas are sensitive to the acoustic/phonetic properties while motor responses may show more generalised responses to the acoustic stimuli. PMID:21812557
Semi-Direct Speech: Manambu and beyond

ERIC Educational Resources Information Center

Aikhenvald, Alexandra Y.

2008-01-01

Every language has some way of reporting what someone else has said. To express what Jakobson [Jakobson, R., 1990. "Shifters, categories, and the Russian verb. Selected writings". "Word and Language". Mouton, The Hague, Paris, pp. 130-153] called "speech within speech", the speaker can use their own words, recasting…

A Comparative Study: Oral Communication Education in Norway and the United States.

ERIC Educational Resources Information Center

Kizer, Elizabeth

Acknowledging that, although a survey of educational offerings in Norway reveals courses in theater, mass media, and speech therapy, the curriculum does not contain oral communication courses per se, such as those found in the United States, this article compares how and why general education systems and speech education have developed differently…
United Kingdom national paediatric bilateral project: Results of professional rating scales and parent questionnaires.

PubMed

Cullington, H E; Bele, D; Brinton, J C; Cooper, S; Daft, M; Harding, J; Hatton, N; Humphries, J; Lutman, M E; Maddocks, J; Maggs, J; Millward, K; O'Donoghue, G; Patel, S; Rajput, K; Salmon, V; Sear, T; Speers, A; Wheeler, A; Wilson, K

2017-01-01

This fourteen-centre project used professional rating scales and parent questionnaires to assess longitudinal outcomes in a large non-selected population of children receiving simultaneous and sequential bilateral cochlear implants. This was an observational non-randomized service evaluation. Data were collected at four time points: before bilateral cochlear implants or before the sequential implant, one year, two years, and three years after. The measures reported are Categories of Auditory Performance II (CAPII), Speech Intelligibility Rating (SIR), Bilateral Listening Skills Profile (BLSP) and Parent Outcome Profile (POP). Thousand and one children aged from 8 months to almost 18 years were involved, although there were many missing data. In children receiving simultaneous implants after one, two, and three years respectively, median CAP scores were 4, 5, and 6; median SIR were 1, 2, and 3. Three years after receiving simultaneous bilateral cochlear implants, 61% of children were reported to understand conversation without lip-reading and 66% had intelligible speech if the listener concentrated hard. Auditory performance and speech intelligibility were significantly better in female children than males. Parents of children using sequential implants were generally positive about their child's well-being and behaviour since receiving the second device; those who were less positive about well-being changes also generally reported their children less willing to wear the second device. Data from 78% of paediatric cochlear implant centres in the United Kingdom provide a real-world picture of outcomes of children with bilateral implants in the UK. This large reference data set can be used to identify children in the lower quartile for targeted intervention.
Listeners modulate temporally selective attention during natural speech processing

PubMed Central

Astheimer, Lori B.; Sanders, Lisa D.

2009-01-01

Spatially selective attention allows for the preferential processing of relevant stimuli when more information than can be processed in detail is presented simultaneously at distinct locations. Temporally selective attention may serve a similar function during speech perception by allowing listeners to allocate attentional resources to time windows that contain highly relevant acoustic information. To test this hypothesis, event-related potentials were compared in response to attention probes presented in six conditions during a narrative: concurrently with word onsets, beginning 50 and 100 ms before and after word onsets, and at random control intervals. Times for probe presentation were selected such that the acoustic environments of the narrative were matched for all conditions. Linguistic attention probes presented at and immediately following word onsets elicited larger amplitude N1s than control probes over medial and anterior regions. These results indicate that native speakers selectively process sounds presented at specific times during normal speech perception. PMID:18395316
The influence of selective attention to auditory and visual speech on the integration of audiovisual speech information.

PubMed

Buchan, Julie N; Munhall, Kevin G

2011-01-01

Conflicting visual speech information can influence the perception of acoustic speech, causing an illusory percept of a sound not present in the actual acoustic speech (the McGurk effect). We examined whether participants can voluntarily selectively attend to either the auditory or visual modality by instructing participants to pay attention to the information in one modality and to ignore competing information from the other modality. We also examined how performance under these instructions was affected by weakening the influence of the visual information by manipulating the temporal offset between the audio and video channels (experiment 1), and the spatial frequency information present in the video (experiment 2). Gaze behaviour was also monitored to examine whether attentional instructions influenced the gathering of visual information. While task instructions did have an influence on the observed integration of auditory and visual speech information, participants were unable to completely ignore conflicting information, particularly information from the visual stream. Manipulating temporal offset had a more pronounced interaction with task instructions than manipulating the amount of visual information. Participants' gaze behaviour suggests that the attended modality influences the gathering of visual information in audiovisual speech perception.
The Effects of a Peer-Tutoring Intervention on the Text Production of Students with Learning and Speech Problems: A Case Report

ERIC Educational Resources Information Center

Grünke, Matthias; Janning, Andriana Maria; Sperling, Marko

2016-01-01

The purpose of this single-case study was to evaluate the effects of a peer-tutoring intervention on the text production skills of three third graders with severe learning and speech difficulties. All tutees were initially able to produce only very short stories. During the course of the treatment, higher performing classmates taught them how to…
Digitized Ethnic Hate Speech: Understanding Effects of Digital Media Hate Speech on Citizen Journalism in Kenya

ERIC Educational Resources Information Center

Kimotho, Stephen Gichuhi; Nyaga, Rahab Njeri

2016-01-01

Ethnicity in Kenya permeates all spheres of life. However, it is in politics that ethnicity is most visible. Election time in Kenya often leads to ethnic competition and hatred, often expressed through various media. Ethnic hate speech characterized the 2007 general elections in party rallies and through text messages, emails, posters and…
Are written and spoken recall of text equivalent?

PubMed

Kellogg, Ronald T

2007-01-01

Writing is less practiced than speaking, graphemic codes are activated only in writing, and the retrieved representations of the text must be maintained in working memory longer because handwritten output is slower than speech. These extra demands on working memory could result in less effort being given to retrieval during written compared with spoken text recall. To test this hypothesis, college students read or heard Bartlett's "War of the Ghosts" and then recalled the text in writing or speech. Spoken recall produced more accurately recalled propositions and more major distortions (e.g., inferences) than written recall. The results suggest that writing reduces the retrieval effort given to reconstructing the propositions of a text.
Linguistic Flexibility Modulates Speech Planning for Causative Motion Events: A Cross-Linguistic Study of Mandarin and English

ERIC Educational Resources Information Center

Zheng, Chun

2017-01-01

Producing a sensible utterance requires speakers to select conceptual content, lexical items, and syntactic structures almost instantaneously during speech planning. Each language offers its speakers flexibility in the selection of lexical and syntactic options to talk about the same scenarios involving movement. Languages also vary typologically…
L1 literacy affects L2 pronunciation intake and text vocalization

NASA Astrophysics Data System (ADS)

Walton, Martin

2005-04-01

For both deaf and hearing learners, L1 acquisition calls on auditive, gestural and visual modes in progressive processes over longer stages imposed in strictly anatomical and social order from the earliest pre-lexical phase [Jusczyk (1993), Kuhl & Meltzoff (1996)] to ultimate literacy. By contrast, L2 learning will call on accelerating procedures but with restricted input, arbitrated by L1 literacy as can be traced in the English of French-speaking learners, whether observed in spontaneous speech or in text vocalization modes. An inventory of their predictable omissions, intrusions and substitutions at suprasegmental and syllabic levels, many of which they can actually hear while unable to vocalize in real-time, suggests that a photogenic segmentation of continuous speech into alphabetical units has eclipsed the indispensable earlier phonogenic module, filtering L2 intake and output. This competing mode analysis hypothesizes a critical effect on L2 pronunciation of L1 graphemic procedures acquired usually before puberty, informing data for any Critical Period Hypothesis or amounts of L1 activation influencing L2 accent [Flege (1997, 1998)] or any psychoacoustic French deafness with regard to English stress-timing [Dupoux (1997)]. A metaphonic model [Howell & Dean (1991)] adapted for French learners may remedially distance L1 from L2 vocalization procedures.
Grammatical Planning Units During Real-Time Sentence Production in Speakers With Agrammatic Aphasia and Healthy Speakers.

PubMed

Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K

2015-08-01

Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia produce sentences word by word without advanced planning or whether hierarchical syntactic structure (i.e., verb argument structure; VAS) is encoded as part of the advanced planning unit. Experiment 1 examined production of sentences with a predefined structure (i.e., "The A and the B are above the C") using eye tracking. Experiment 2 tested production of transitive and unaccusative sentences without a predefined sentence structure in a verb-priming study. In Experiment 1, both speakers with agrammatic aphasia and young and age-matched control speakers used word-by-word strategies, selecting the first lemma (noun A) only prior to speech onset. However, in Experiment 2, unlike controls, speakers with agrammatic aphasia preplanned transitive and unaccusative sentences, encoding VAS before speech onset. Speakers with agrammatic aphasia show incremental, word-by-word production for structurally simple sentences, requiring retrieval of multiple noun lemmas. However, when sentences involve functional (thematic to grammatical) structure building, advanced planning strategies (i.e., VAS encoding) are used. This early use of hierarchical syntactic information may provide a scaffold for impaired GE in agrammatism.
Speech perception in older listeners with normal hearing:conditions of time alteration, selective word stress, and length of sentences.

PubMed

Cho, Soojin; Yu, Jyaehyoung; Chun, Hyungi; Seo, Hyekyung; Han, Woojae

2014-04-01

Deficits of the aging auditory system negatively affect older listeners in terms of speech communication, resulting in limitations to their social lives. To improve their perceptual skills, the goal of this study was to investigate the effects of time alteration, selective word stress, and varying sentence lengths on the speech perception of older listeners. Seventeen older people with normal hearing were tested for seven conditions of different time-altered sentences (i.e., ±60%, ±40%, ±20%, 0%), two conditions of selective word stress (i.e., no-stress and stress), and three different lengths of sentences (i.e., short, medium, and long) at the most comfortable level for individuals in quiet circumstances. As time compression increased, sentence perception scores decreased statistically. Compared to a natural (or no stress) condition, the selectively stressed words significantly improved the perceptual scores of these older listeners. Long sentences yielded the worst scores under all time-altered conditions. Interestingly, there was a noticeable positive effect for the selective word stress at the 20% time compression. This pattern of results suggests that a combination of time compression and selective word stress is more effective for understanding speech in older listeners than using the time-expanded condition only.
Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

PubMed Central

Reilly, Kevin J.; Spencer, Kristie A.

2013-01-01

The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121
Visual input enhances selective speech envelope tracking in auditory cortex at a "cocktail party".

PubMed

Zion Golumbic, Elana; Cogan, Gregory B; Schroeder, Charles E; Poeppel, David

2013-01-23

Our ability to selectively attend to one auditory signal amid competing input streams, epitomized by the "Cocktail Party" problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared with responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker's face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a Cocktail Party setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive.
Neural Representations Used by Brain Regions Underlying Speech Production

ERIC Educational Resources Information Center

Segawa, Jennifer Anne

2013-01-01

Speech utterances are phoneme sequences but may not always be represented as such in the brain. For instance, electropalatography evidence indicates that as speaking rate increases, gestures within syllables are manipulated separately but those within consonant clusters act as one motor unit. Moreover, speech error data suggest that a syllable's…
Private Speech Moderates the Effects of Effortful Control on Emotionality

ERIC Educational Resources Information Center

Day, Kimberly L.; Smith, Cynthia L.; Neal, Amy; Dunsmore, Julie C.

2018-01-01

Research Findings: In addition to being a regulatory strategy, children's private speech may enhance or interfere with their effortful control used to regulate emotion. The goal of the current study was to investigate whether children's private speech during a selective attention task moderated the relations of their effortful control to their…
Discourse Analysis and Language Learning [Summary of a Symposium].

ERIC Educational Resources Information Center

Hatch, Evelyn

1981-01-01

A symposium on discourse analysis and language learning is summarized. Discourse analysis can be divided into six fields of research: syntax, the amount of syntactic organization required for different types of discourse, large speech events, intra-sentential cohesion in text, speech acts, and unequal power discourse. Research on speech events and…
Selective Influences of Precision and Power Grips on Speech Categorization.

PubMed

Tiainen, Mikko; Tiippana, Kaisa; Vainio, Martti; Peromaa, Tarja; Komeilipoor, Naeem; Vainio, Lari

2016-01-01

Recent studies have shown that articulatory gestures are systematically associated with specific manual grip actions. Here we show that executing such actions can influence performance on a speech-categorization task. Participants watched and/or listened to speech stimuli while executing either a power or a precision grip. Grip performance influenced the syllable categorization by increasing the proportion of responses of the syllable congruent with the executed grip (power grip-[ke] and precision grip-[te]). Two follow-up experiments indicated that the effect was based on action-induced bias in selecting the syllable.
Ultrasound applicability in Speech Language Pathology and Audiology.

PubMed

Barberena, Luciana da Silva; Brasil, Brunah de Castro; Melo, Roberta Michelon; Mezzomo, Carolina Lisbôa; Mota, Helena Bolli; Keske-Soares, Márcia

2014-01-01

To present recent studies that used the ultrasound in the fields of Speech Language Pathology and Audiology, which evidence possibilities of the applicability of this technique in different subareas. A bibliographic research was carried out in the PubMed database, using the keywords "ultrasonic," "speech," "phonetics," "Speech, Language and Hearing Sciences," "voice," "deglutition," and "myofunctional therapy," comprising some areas of Speech Language Pathology and Audiology Sciences. The keywords "ultrasound," "ultrasonography," "swallow," "orofacial myofunctional therapy," and "orofacial myology" were also used in the search. Studies in humans from the past 5 years were selected. In the preselection, duplicated studies, articles not fully available, and those that did not present direct relation between ultrasound and Speech Language Pathology and Audiology Sciences were discarded. The data were analyzed descriptively and classified subareas of Speech Language Pathology and Audiology Sciences. The following items were considered: purposes, participants, procedures, and results. We selected 12 articles for ultrasound versus speech/phonetics subarea, 5 for ultrasound versus voice, 1 for ultrasound versus muscles of mastication, and 10 for ultrasound versus swallow. Studies relating "ultrasound" and "Speech Language Pathology and Audiology Sciences" in the past 5 years were not found. Different studies on the use of ultrasound in Speech Language Pathology and Audiology Sciences were found. Each of them, according to its purpose, confirms new possibilities of the use of this instrument in the several subareas, aiming at a more accurate diagnosis and new evaluative and therapeutic possibilities.
Functional Characterization of the Human Speech Articulation Network.

PubMed

Basilakos, Alexandra; Smith, Kimberly G; Fillmore, Paul; Fridriksson, Julius; Fedorenko, Evelina

2018-05-01

A number of brain regions have been implicated in articulation, but their precise computations remain debated. Using functional magnetic resonance imaging, we examine the degree of functional specificity of articulation-responsive brain regions to constrain hypotheses about their contributions to speech production. We find that articulation-responsive regions (1) are sensitive to articulatory complexity, but (2) are largely nonoverlapping with nearby domain-general regions that support diverse goal-directed behaviors. Furthermore, premotor articulation regions show selectivity for speech production over some related tasks (respiration control), but not others (nonspeech oral-motor [NSO] movements). This overlap between speech and nonspeech movements concords with electrocorticographic evidence that these regions encode articulators and their states, and with patient evidence whereby articulatory deficits are often accompanied by oral-motor deficits. In contrast, the superior temporal regions show strong selectivity for articulation relative to nonspeech movements, suggesting that these regions play a specific role in speech planning/production. Finally, articulation-responsive portions of posterior inferior frontal gyrus show some selectivity for articulation, in line with the hypothesis that this region prepares an articulatory code that is passed to the premotor cortex. Taken together, these results inform the architecture of the human articulation system.
Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

PubMed Central

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141

Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

NASA Astrophysics Data System (ADS)

Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

2017-09-01

This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Methods and Applications of the Audibility Index in Hearing Aid Selection and Fitting

PubMed Central

Amlani, Amyn M.; Punch, Jerry L.; Ching, Teresa Y. C.

2002-01-01

During the first half of the 20th century, communications engineers at Bell Telephone Laboratories developed the articulation model for predicting speech intelligibility transmitted through different telecommunication devices under varying electroacoustic conditions. The profession of audiology adopted this model and its quantitative aspects, known as the Articulation Index and Speech Intelligibility Index, and applied these indices to the prediction of unaided and aided speech intelligibility in hearing-impaired listeners. Over time, the calculation methods of these indices—referred to collectively in this paper as the Audibility Index—have been continually refined and simplified for clinical use. This article provides (1) an overview of the basic principles and the calculation methods of the Audibility Index, the Speech Transmission Index and related indices, as well as the Speech Recognition Sensitivity Model, (2) a review of the literature on using the Audibility Index to predict speech intelligibility of hearing-impaired listeners, (3) a review of the literature on the applicability of the Audibility Index to the selection and fitting of hearing aids, and (4) a discussion of future scientific needs and clinical applications of the Audibility Index. PMID:25425917
Military applications of automatic speech recognition and future requirements

NASA Technical Reports Server (NTRS)

Beek, Bruno; Cupples, Edward J.

1977-01-01

An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.
Pakistan-U.S. Relations

DTIC Science & Technology

2008-08-25

Going, Going, Gone!,” Daily Times, (Lahore), August 19, 2008. Speech text at [http://www.cfr.org/publication/16999...In a landmark January 2002 speech , President Musharraf vowed to end Pakistan’s use as a base for terrorism of any kind, and he banned numerous...organizations under U.S. law. In the wake of the speech , thousands of Muslim extremists were detained, though most of these were later released. In
A meme's eye view of speech-language pathology.

PubMed

Kamhi, Alan G

2004-04-01

In this article, the reason why certain terms, labels, and ideas prevail, whereas others fail to gain acceptance, will be considered. Borrowing the concept of "meme" from the study of evolution of ideas, it will be clear why language-based and phonological disorders have less widespread appeal than, for example, auditory processing and sensory integration disorders. Discussion will also center on why most speech-language pathologists refer to themselves as speech therapists or speech pathologists, and why it is more desirable to have dyslexia than to have a reading disability. In a meme's eye view, science and logic do not always win out because selection favors ideas (memes) that are easy to understand, remember, and copy. An unfortunate consequence of these selection forces is that successful memes typically provide superficially plausible answers for complex questions.
Provision of surgical voice restoration in England: questionnaire survey of speech and language therapists.

PubMed

Bradley, P J; Counter, P; Hurren, A; Cocks, H C

2013-08-01

To conduct a questionnaire survey of speech and language therapists providing and managing surgical voice restoration in England. National Health Service Trusts registering more than 10 new laryngeal cancer patients during any one year, from November 2009 to October 2010, were identified, and a list of speech and language therapists compiled. A questionnaire was developed, peer reviewed and revised. The final questionnaire was e-mailed with a covering letter to 82 units. Eighty-two questionnaires were distributed and 72 were returned and analysed, giving a response rate of 87.8 per cent. Forty-four per cent (38/59) of the units performed more than 10 laryngectomies per year. An in-hours surgical voice restoration service was provided by speech and language therapists in 45.8 per cent (33/72) and assisted by nurses in 34.7 per cent (25/72). An out of hours service was provided directly by ENT staff in 35.5 per cent (21/59). Eighty-eight per cent (63/72) of units reported less than 10 (emergency) out of hours calls per month. Surgical voice restoration service provision varies within and between cancer networks. There is a need for a national management and care protocol, an educational programme for out of hours service providers, and a review of current speech and language therapist staffing levels in England.
An integrated approach to improving noisy speech perception

NASA Astrophysics Data System (ADS)

Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail

2002-05-01

For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.
How linguistic closure and verbal working memory relate to speech recognition in noise--a review.

PubMed

Besser, Jana; Koelewijn, Thomas; Zekveld, Adriana A; Kramer, Sophia E; Festen, Joost M

2013-06-01

The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations.
How Linguistic Closure and Verbal Working Memory Relate to Speech Recognition in Noise—A Review

PubMed Central

Koelewijn, Thomas; Zekveld, Adriana A.; Kramer, Sophia E.; Festen, Joost M.

2013-01-01

The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations. PMID:23945955
Effect of the loss of auditory feedback on segmental parameters of vowels of postlingually deafened speakers.

PubMed

Schenk, Barbara S; Baumgartner, Wolf Dieter; Hamzavi, Jafar Sasan

2003-12-01

The most obvious and best documented changes in speech of postlingually deafened speakers are the rate, fundamental frequency, and volume (energy). These changes are due to the lack of auditory feedback. But auditory feedback affects not only the suprasegmental parameters of speech. The aim of this study was to determine the change at the segmental level of speech in terms of vowel formants. Twenty-three postlingually deafened and 18 normally hearing speakers were recorded reading a German text. The frequencies of the first and second formants and the vowel spaces of selected vowels in word-in-context condition were compared. All first formant frequencies (F1) of the postlingually deafened speakers were significantly different from those of the normally hearing people. The values of F1 were higher for the vowels /e/ (418+/-61 Hz compared with 359+/-52 Hz, P=0.006) and /o/ (459+/-58 compared with 390+/-45 Hz, P=0.0003) and lower for /a/ (765+/-115 Hz compared with 851+/-146 Hz, P=0.038). The second formant frequency (F2) only showed a significant increase for the vowel/e/(2016+/-347 Hz compared with 2279+/-250 Hz, P=0.012). The postlingually deafened people were divided into two subgroups according to duration of deafness (shorter/longer than 10 years of deafness). There was no significant difference in formant changes between the two groups. Our report demonstrated an effect of auditory feedback also on segmental features of speech of postlingually deafened people.
Brain 'talks over' boring quotes: top-down activation of voice-selective areas while listening to monotonous direct speech quotations.

PubMed

Yao, Bo; Belin, Pascal; Scheepers, Christoph

2012-04-15

In human communication, direct speech (e.g., Mary said, "I'm hungry") is perceived as more vivid than indirect speech (e.g., Mary said that she was hungry). This vividness distinction has previously been found to underlie silent reading of quotations: Using functional magnetic resonance imaging (fMRI), we found that direct speech elicited higher brain activity in the temporal voice areas (TVA) of the auditory cortex than indirect speech, consistent with an "inner voice" experience in reading direct speech. Here we show that listening to monotonously spoken direct versus indirect speech quotations also engenders differential TVA activity. This suggests that individuals engage in top-down simulations or imagery of enriched supra-segmental acoustic representations while listening to monotonous direct speech. The findings shed new light on the acoustic nature of the "inner voice" in understanding direct speech. Copyright Â© 2012 Elsevier Inc. All rights reserved.
Relations Among Central Auditory Abilities, Socio-Economic Factors, Speech Delay, Phonic Abilities and Reading Achievement: A Longitudinal Study.

ERIC Educational Resources Information Center

Flowers, Arthur; Crandell, Edwin W.

Three auditory perceptual processes (resistance to distortion, selective listening in the form of auditory dedifferentiation, and binaural synthesis) were evaluated by five assessment techniques: (1) low pass filtered speech, (2) accelerated speech, (3) competing messages, (4) accelerated plus competing messages, and (5) binaural synthesis.…
JPRS Report China.

DTIC Science & Technology

1990-01-25

Kong CHIUSHIH NIENTAI No 236] 7 Wang Dan Advocates Freedom of Speech for Opposition [Hong Kong CHIUSHIH NIENTAI No 236] . 9 Conference Calls for...stand strive together to stop the Chinese Government’s brutal, inhuman conduct. Wang Dan Advocates Freedom of Speech for Opposition 90ON0062A Hong... of Speech of the Opposition Faction"] [Text] During China’s new enlightenment movement, the intellectual elites must give top priority to freedom
Evidence-Based Intervention for Individuals with Acquired Apraxia of Speech. EBP Briefs. Volume 11, Issue 2

ERIC Educational Resources Information Center

Van Sickle, Angela

2016-01-01

Clinical Question: Would individuals with acquired apraxia of speech (AOS) demonstrate greater improvements for speech production with an articulatory kinematic approach or a rate/rhythm approach? Method: EBP Intervention Comparison Review. Study Sources: ASHA journal, Google Scholar, PubMed, CINAHL Plus with Full Text, Web of Science, Ovid, and…
Developing a corpus of clinical notes manually annotated for part-of-speech.

PubMed

Pakhomov, Serguei V; Coden, Anni; Chute, Christopher G

2006-06-01

This paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech (POS) information. We describe and discuss the process of training three domain experts to perform linguistic annotation. Three domain experts were trained to perform manual annotation of a corpus of clinical notes. A part of this corpus was combined with the Penn Treebank corpus of general purpose English text and another part was set aside for testing. The corpora were then used for training and testing statistical part-of-speech taggers. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation. We used the Trigrams'n'Tags (TnT) [T. Brants, TnT-a statistical part-of-speech tagger, In: Proceedings of NAACL/ANLP-2000 Symposium, 2000] tagger trained on general English data to achieve 89.79% correctness. The same tagger trained on a portion of the medical data annotated for this project improved the performance to 94.69%. Furthermore, we find that discriminating between different types of discourse represented by different sections of clinical text may be very beneficial to improve correctness of POS tagging. Our preliminary experimental results indicate the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of clinical text.
Surgical evaluation of candidates for cochlear implants

NASA Technical Reports Server (NTRS)

Black, F. O.; Lilly, D. J.; Fowler, L. P.; Stypulkowski, P. H.

1987-01-01

The customary presentation of surgical procedures to patients in the United States consists of discussions on alternative treatment methods, risks of the procedure(s) under consideration, and potential benefits for the patient. Because the contents of the normal speech signal have not been defined in a way that permits a surgeon systematically to provide alternative auditory signals to a deaf patient, the burden is placed on the surgeon to make an arbitrary selection of candidates and available devices for cochlear prosthetic implantation. In an attempt to obtain some information regarding the ability of a deaf patient to use electrical signals to detect and understand speech, the Good Samaritan Hospital and Neurological Sciences Institute cochlear implant team has routinely performed tympanotomies using local anesthesia and has positioned temporary electrodes onto the round windows of implant candidates. The purpose of this paper is to review our experience with this procedure and to provide some observations that may be useful in a comprehensive preoperative evaluation for totally deaf patients who are being considered for cochlear implantation.
The use of non-speech oral-motor exercises among Indian speech-language pathologists to treat speech disorders: An online survey

PubMed Central

Thomas, Roha M.; Kaipa, Ramesh

2015-01-01

Objective Previous surveys in the United States of America (USA), the United Kingdom (UK), and Canada have indicated that most of the speech-language pathologists (SLPs) tend to use non-speech oral-motor exercises (NSOMEs) on a regular basis to treat speech disorders. At present, there is considerable debate regarding the clinical effectiveness of NSOMEs. The current study aimed to investigate the pattern and extent of usage of NSOMEs among Indian SLPs. Method An online survey intended to elicit information regarding the use of NSOMEs was sent to 505 members of the Indian Speech and Hearing Association. The questionnaire consisted of three sections. The first section solicited demographic information, the second and third sections solicited information from participants who did and did not prefer to use NSOMEs, respectively. Descriptive statistics were employed to analyse the responses that were clinically relevant. Results A total of 127 participants responded to the survey. Ninety-one percent of the participants who responded to the survey indicated that they used NSOMEs. Conclusion The results suggested that the percentage of SLPs preferring to use NSOMEs is similar to the findings of surveys conducted in the USA, the UK, and Canada. The Indian SLPs continue to use NSOMEs based on a multitude of beliefs. It is important for SLPs to incorporate the principles of evidence-based practice while using NSOMEs to provide high quality clinical care. PMID:26304211
Start-Up Rhetoric in Eight Speeches of Barack Obama

ERIC Educational Resources Information Center

O'Connell, Daniel C.; Kowal, Sabine; Sabin, Edward J.; Lamia, John F.; Dannevik, Margaret

2010-01-01

Our purpose in the following was to investigate the start-up rhetoric employed by U.S. President Barack Obama in his speeches. The initial 5 min from eight of his speeches from May to September of 2009 were selected for their variety of setting, audience, theme, and purpose. It was generally hypothesized that Barack Obama, widely recognized for…
Automatic measurement and representation of prosodic features

NASA Astrophysics Data System (ADS)

Ying, Goangshiuan Shawn

Effective measurement and representation of prosodic features of the acoustic signal for use in automatic speech recognition and understanding systems is the goal of this work. Prosodic features-stress, duration, and intonation-are variations of the acoustic signal whose domains are beyond the boundaries of each individual phonetic segment. Listeners perceive prosodic features through a complex combination of acoustic correlates such as intensity, duration, and fundamental frequency (F0). We have developed new tools to measure F0 and intensity features. We apply a probabilistic global error correction routine to an Average Magnitude Difference Function (AMDF) pitch detector. A new short-term frequency-domain Teager energy algorithm is used to measure the energy of a speech signal. We have conducted a series of experiments performing lexical stress detection on words in continuous English speech from two speech corpora. We have experimented with two different approaches, a segment-based approach and a rhythm unit-based approach, in lexical stress detection. The first approach uses pattern recognition with energy- and duration-based measurements as features to build Bayesian classifiers to detect the stress level of a vowel segment. In the second approach we define rhythm unit and use only the F0-based measurement and a scoring system to determine the stressed segment in the rhythm unit. A duration-based segmentation routine was developed to break polysyllabic words into rhythm units. The long-term goal of this work is to develop a system that can effectively detect the stress pattern for each word in continuous speech utterances. Stress information will be integrated as a constraint for pruning the word hypotheses in a word recognition system based on hidden Markov models.
The Timing and Effort of Lexical Access in Natural and Degraded Speech

PubMed Central

Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz

2016-01-01

Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901

Supreme Court Update: The Free Speech Rights of Students in the United States Post "Morse v. Frederick"

ERIC Educational Resources Information Center

Russo, Charles J.

2007-01-01

Enshrined in the First Amendment as part of the Bill of Rights that was added to the then 4 year old US Constitution in 1791, it should be no surprise that freedom of speech may be perhaps the most cherished right of Americans. If anything, freedom of speech, which is properly treated as a fundamental human right for children, certainly stands out…
Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

NASA Astrophysics Data System (ADS)

Nagarajan, T.; Murthy, H. A.

2004-12-01

In the development of a syllable-centric automatic speech recognition (ASR) system, segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal, a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries, the boundaries are difficult to determine in the context of long silences, semivowels, and fricatives. In this paper, these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters, corresponding to three different spectral bands. The STE functions of these signals are computed. Using these three STE functions, three minimum-phase group delay functions are derived. By combining the evidence derived from these group delay functions, the syllable boundaries are detected. Further, a multiresolution-based technique is presented to overcome the problem of shift in segment boundaries during smoothing. Experiments carried out on the Switchboard and OGI-MLTS corpora show that the error in segmentation is at most 25 milliseconds for 67% and 76.6% of the syllable segments, respectively.
Visual Input Enhances Selective Speech Envelope Tracking in Auditory Cortex at a ‘Cocktail Party’

PubMed Central

Golumbic, Elana Zion; Cogan, Gregory B.; Schroeder, Charles E.; Poeppel, David

2013-01-01

Our ability to selectively attend to one auditory signal amidst competing input streams, epitomized by the ‘Cocktail Party’ problem, continues to stimulate research from various approaches. How this demanding perceptual feat is achieved from a neural systems perspective remains unclear and controversial. It is well established that neural responses to attended stimuli are enhanced compared to responses to ignored ones, but responses to ignored stimuli are nonetheless highly significant, leading to interference in performance. We investigated whether congruent visual input of an attended speaker enhances cortical selectivity in auditory cortex, leading to diminished representation of ignored stimuli. We recorded magnetoencephalographic (MEG) signals from human participants as they attended to segments of natural continuous speech. Using two complementary methods of quantifying the neural response to speech, we found that viewing a speaker’s face enhances the capacity of auditory cortex to track the temporal speech envelope of that speaker. This mechanism was most effective in a ‘Cocktail Party’ setting, promoting preferential tracking of the attended speaker, whereas without visual input no significant attentional modulation was observed. These neurophysiological results underscore the importance of visual input in resolving perceptual ambiguity in a noisy environment. Since visual cues in speech precede the associated auditory signals, they likely serve a predictive role in facilitating auditory processing of speech, perhaps by directing attentional resources to appropriate points in time when to-be-attended acoustic input is expected to arrive. PMID:23345218
Selective Bibliography in United States History Resources.

ERIC Educational Resources Information Center

Ramos, June E.; Crevling, Barbara

The selective bibliography identifies United States history materials for grades 7 through 12. It has been compiled to give teachers a representative sample of texts and supplementary materials, most of which have been published since 1975. Section one contains references to 16 basal curriculum materials. Information is given on title, author,…
Strategies to combat auditory overload during vehicular command and control.

PubMed

Abel, Sharon M; Ho, Geoffrey; Nakashima, Ann; Smith, Ingrid

2014-09-01

Strategies to combat auditory overload were studied. Normal-hearing males were tested in a sound isolated room in a mock-up of a military land vehicle. Two tasks were presented concurrently, in quiet and vehicle noise. For Task 1 dichotic phrases were delivered over a communications headset. Participants encoded only those beginning with a preassigned call sign (Baron or Charlie). For Task 2, they agreed or disagreed with simple equations presented either over loudspeakers, as text on the laptop monitor, in both the audio and the visual modalities, or not at all. Accuracy was significantly better by 20% on Task 2 when the equations were presented visually or audiovisually. Scores were at least 78% correct for dichotic phrases presented over the headset, with a right ear advantage of 7%, given the 5 dB speech-to-noise ratio. The left ear disadvantage was particularly apparent in noise, where the interaural difference was 12%. Relatively lower scores in the left ear, in noise, were observed for phrases beginning with Charlie. These findings underscore the benefit of delivering higher priority communications to the dominant ear, the importance of selecting speech sounds that are resilient to noise masking, and the advantage of using text in cases of degraded audio. Reprint & Copyright © 2014 Association of Military Surgeons of the U.S.
The performance of an automatic acoustic-based program classifier compared to hearing aid users' manual selection of listening programs.

PubMed

Searchfield, Grant D; Linford, Tania; Kobayashi, Kei; Crowhen, David; Latzel, Matthias

2018-03-01

To compare preference for and performance of manually selected programmes to an automatic sound classifier, the Phonak AutoSense OS. A single blind repeated measures study. Participants were fit with Phonak Virto V90 ITE aids; preferences for different listening programmes were compared across four different sound scenarios (speech in: quiet, noise, loud noise and a car). Following a 4-week trial preferences were reassessed and the users preferred programme was compared to the automatic classifier for sound quality and hearing in noise (HINT test) using a 12 loudspeaker array. Twenty-five participants with symmetrical moderate-severe sensorineural hearing loss. Participant preferences of manual programme for scenarios varied considerably between and within sessions. A HINT Speech Reception Threshold (SRT) advantage was observed for the automatic classifier over participant's manual selection for speech in quiet, loud noise and car noise. Sound quality ratings were similar for both manual and automatic selections. The use of a sound classifier is a viable alternative to manual programme selection.
Integrating speech in time depends on temporal expectancies and attention.

PubMed

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.
Distractor Modality Can Turn Semantic Interference into Semantic Facilitation in the Picture-Word Interference Task: Implications for Theories of Lexical Access in Speech Production

ERIC Educational Resources Information Center

Hantsch, Ansgar; Jescheniak, Jorg D.; Schriefers, Herbert

2009-01-01

A number of recent studies have questioned the idea that lexical selection during speech production is a competitive process. One type of evidence against selection by competition is the observation that in the picture-word interference task semantically related distractors may facilitate the naming of a picture, whereas the selection by…
Speech recognition technology: an outlook for human-to-machine interaction.

PubMed

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
PubMed Central

Getty, Louise; de Courval, Louise Poulin

1981-01-01

The Speech and Hearing Department of the University of Montréal, in conjunction with ‘l'Unité de médecine familiale de Verdun’ set up a pilot project grouping family doctors, audiologists and speech pathologists. Information was exchanged on speech and language problems in children, stuttering, voice disorders, aphasia and hearing problems in children and adults. We emphasized the importance of early detection of these problems, of adequate information to the patient and his family and referral to the speech pathologist or to the audiologist. The results of this experience showed the importance of close collaboration between family doctors and communication specialists. PMID:21289800
Affective state and voice: cross-cultural assessment of speaking behavior and voice sound characteristics--a normative multicenter study of 577 + 36 healthy subjects.

PubMed

Braun, Silke; Botella, Cristina; Bridler, René; Chmetz, Florian; Delfino, Juan Pablo; Herzig, Daniela; Kluckner, Viktoria J; Mohr, Christine; Moragrega, Ines; Schrag, Yann; Seifritz, Erich; Soler, Carla; Stassen, Hans H

2014-01-01

Human speech is greatly influenced by the speakers' affective state, such as sadness, happiness, grief, guilt, fear, anger, aggression, faintheartedness, shame, sexual arousal, love, amongst others. Attentive listeners discover a lot about the affective state of their dialog partners with no great effort, and without having to talk about it explicitly during a conversation or on the phone. On the other hand, speech dysfunctions, such as slow, delayed or monotonous speech, are prominent features of affective disorders. This project was comprised of four studies with healthy volunteers from Bristol (English: n = 117), Lausanne (French: n = 128), Zurich (German: n = 208), and Valencia (Spanish: n = 124). All samples were stratified according to gender, age, and education. The specific study design with different types of spoken text along with repeated assessments at 14-day intervals allowed us to estimate the 'natural' variation of speech parameters over time, and to analyze the sensitivity of speech parameters with respect to form and content of spoken text. Additionally, our project included a longitudinal self-assessment study with university students from Zurich (n = 18) and unemployed adults from Valencia (n = 18) in order to test the feasibility of the speech analysis method in home environments. The normative data showed that speaking behavior and voice sound characteristics can be quantified in a reproducible and language-independent way. The high resolution of the method was verified by a computerized assignment of speech parameter patterns to languages at a success rate of 90%, while the correct assignment to texts was 70%. In the longitudinal self-assessment study we calculated individual 'baselines' for each test person along with deviations thereof. The significance of such deviations was assessed through the normative reference data. Our data provided gender-, age-, and language-specific thresholds that allow one to reliably distinguish between 'natural fluctuations' and 'significant changes'. The longitudinal self-assessment study with repeated assessments at 1-day intervals over 14 days demonstrated the feasibility and efficiency of the speech analysis method in home environments, thus clearing the way to a broader range of applications in psychiatry. © 2014 S. Karger AG, Basel.
Speech emotion recognition methods: A literature review

NASA Astrophysics Data System (ADS)

Basharirad, Babak; Moradhaseli, Mohammadreza

2017-10-01

Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Questions You May Want to Ask Your Child's Speech-Language Pathologist = Preguntas que usted le podria hacer al patologo del habla y el lenguaje de su hijo

ERIC Educational Resources Information Center

Centers for Disease Control and Prevention, 2007

2007-01-01

This accordion style pamphlet, dual sided with English and Spanish text, suggests questions for parents to ask their Speech-Language Pathologist and speech and language therapy services for their children. Sample questions include: How will I participate in my child's therapy sessions? How do you decide how much time my child will spend on speech…
Pakistan-U.S. Relations

DTIC Science & Technology

2008-11-10

New York Times, August 15, 2008. 157 “Going, Going, Gone!,” Daily Times, (Lahore), August 19, 2008. Speech text at http://www.cfr.org/publication...on the fight against Taliban and other religious extremists.163 In his inaugural speech , Zardari called for an all- parties committee to “revisit...efforts. In a landmark January 2002 speech , former President Musharraf vowed to end Pakistan’s use as a base for terrorism of any kind, and he banned
Combinatorial Markov Random Fields and Their Applications to Information Organization

DTIC Science & Technology

2008-02-01

titles, part-of- speech tags; • Image processing: images, colors, texture, blobs, interest points, caption words; • Video processing: video signal, audio...McGurk and MacDonald published their pioneering work [80] that revealed the multi-modal nature of speech perception: sound and moving lips compose one... Speech (POS) n-grams (that correspond to the syntactic structure of text). POS n-grams are extracted from sentences in an incremental manner: the first n
Temporal Context in Speech Processing and Attentional Stream Selection: A Behavioral and Neural Perspective

ERIC Educational Resources Information Center

Golumbic, Elana M. Zion; Poeppel, David; Schroeder, Charles E.

2012-01-01

The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the "Cocktail Party" effect. Yet, the neural mechanisms underlying on-line…
Binaural segregation in multisource reverberant environments.

PubMed

Roman, Nicoleta; Srinivasan, Soundararajan; Wang, DeLiang

2006-12-01

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. In this paper, a binaural segregation system that extracts the reverberant target signal from multisource reverberant mixtures by utilizing only the location information of target source is proposed. The proposed system combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal T-F binary mask. The main observation in this work is that the target attenuation in a T-F unit resulting from adaptive filtering is correlated with the relative strength of target to mixture. A comprehensive evaluation shows that the proposed system results in large SNR gains. In addition, comparisons using SNR as well as automatic speech recognition measures show that this system outperforms standard two-microphone beamforming approaches and a recent binaural processor.
What properties of talk are associated with the generation of spontaneous iconic hand gestures?

PubMed

Beattie, Geoffrey; Shovelton, Heather

2002-09-01

When people talk, they frequently make movements of their arms and hands, some of which appear connected with the content of the speech and are termed iconic gestures. Critical to our understanding of the relationship between speech and iconic gesture is an analysis of what properties of talk might give rise to these gestures. This paper focuses on two such properties, namely the familiarity and the imageability of the core propositional units that the gestures accompany. The study revealed that imageability had a significant effect overall on the probability of the core propositional unit being accompanied by a gesture, but that familiarity did not. Familiarity did, however, have a significant effect on the probability of a gesture in the case of high imageability units and in the case of units associated with frequent gesture use. Those iconic gestures accompanying core propositional units variously defined by the properties of imageability and familiarity were found to differ in their level of idiosyncrasy, the viewpoint from which they were generated and their overall communicative effect. This research thus uncovered a number of quite distinct relationships between gestures and speech in everyday talk, with important implications for future theories in this area.
Individual differences in selective attention predict speech identification at a cocktail party.

PubMed

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-08-31

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.
The effects of meaningful irrelevant speech and road traffic noise on teachers' attention, episodic and semantic memory.

PubMed

Enmarker, Ingela

2004-11-01

The aim of the present experiment was to examine the effects of meaningful irrelevant speech and road traffic noise on attention, episodic and semantic memory, and also to examine whether the noise effects were age-dependent. A total of 96 male and female teachers in the age range of 35-45 and 55-65 years were randomly assigned to a silent or the two noise conditions. Noise effects found in episodic memory were limited to a meaningful text, where cued recall contrary to expectations was equally impaired by the two types of noise. However, meaningful irrelevant speech also deteriorated recognition of the text, whereas road traffic noise caused no decrement. Retrieval from two word fluency tests in semantic memory showed strong effects of noise exposure, one affected by meaningful irrelevant speech and the other by road traffic noise. The results implied that both acoustic variation and the semantic interference could be of importance for noise impairments. The expected age-dependent noise effects did not show up.

Presentation video retrieval using automatically recovered slide and spoken text

NASA Astrophysics Data System (ADS)

Cooper, Matthew

2013-03-01

Video is becoming a prevalent medium for e-learning. Lecture videos contain text information in both the presentation slides and lecturer's speech. This paper examines the relative utility of automatically recovered text from these sources for lecture video retrieval. To extract the visual information, we automatically detect slides within the videos and apply optical character recognition to obtain their text. Automatic speech recognition is used similarly to extract spoken text from the recorded audio. We perform controlled experiments with manually created ground truth for both the slide and spoken text from more than 60 hours of lecture video. We compare the automatically extracted slide and spoken text in terms of accuracy relative to ground truth, overlap with one another, and utility for video retrieval. Results reveal that automatically recovered slide text and spoken text contain different content with varying error profiles. Experiments demonstrate that automatically extracted slide text enables higher precision video retrieval than automatically recovered spoken text.
Multicultural Perspectives in Communication Disorders.

ERIC Educational Resources Information Center

Screen, Robert Martin; Anderson, Noma Bennett

This text provides information about the impact of multiculturalism on services for individuals with communication disorders. It examines the involvement of the American Speech-Language-Hearing Association (ASHA) and multicultural speech, language, and hearing organizations as they respond to the challenges created by multiculturalism. An…
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.

PubMed

Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca

2018-04-01

The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
Processing of speech signals for physical and sensory disabilities.

PubMed Central

Levitt, H

1995-01-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Processing of Speech Signals for Physical and Sensory Disabilities

NASA Astrophysics Data System (ADS)

Levitt, Harry

1995-10-01

Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Genetics Home Reference: CHMP2B-related frontotemporal dementia

MedlinePlus

... CHMP2B -related frontotemporal dementia develop progressive problems with speech and language (aphasia). They may have trouble speaking, although they can often understand others' speech and written text. Affected individuals may also have ...
Airborne Laser Laboratory departure from Kirtland Air Force Base and a brief history of aero-optics

NASA Astrophysics Data System (ADS)

Kyrazis, Demos T.

2013-07-01

We discuss aspects of the development of the Airborne Laser Laboratory. Our discussion is historical in nature and consists of the text from a speech given on the occasion of the Airborne Laser Laboratory leaving Kirtland Air Force Base (AFB) to fly to Wright-Patterson AFB to become an exhibit at the National Museum of the United States Air Force. The last part of the discussion concerns the inception of the study of aero-optics as an area of research and some of the milestones in the understanding of the causes and prediction of aero-optical effects.
Factors Influencing the Selection of Speech Pathology as a Career: A Qualitative Analysis Utilising the Systems Theory Framework

ERIC Educational Resources Information Center

Byrne, Nicole

2007-01-01

Factors identified by 16 participants during in-depth interviews as influencing selection of speech pathology as a career were described using the Systems Theory Framework (STF, Patton & McMahon, 2006). Participants were highly likely to identify factors from the individual and social systems, but not the environmental-societal system, of the STF…
Speech to Text Translation for Malay Language

NASA Astrophysics Data System (ADS)

Al-khulaidi, Rami Ali; Akmeliawati, Rini

2017-11-01

The speech recognition system is a front end and a back-end process that receives an audio signal uttered by a speaker and converts it into a text transcription. The speech system can be used in several fields including: therapeutic technology, education, social robotics and computer entertainments. In most cases in control tasks, which is the purpose of proposing our system, wherein the speed of performance and response concern as the system should integrate with other controlling platforms such as in voiced controlled robots. Therefore, the need for flexible platforms, that can be easily edited to jibe with functionality of the surroundings, came to the scene; unlike other software programs that require recording audios and multiple training for every entry such as MATLAB and Phoenix. In this paper, a speech recognition system for Malay language is implemented using Microsoft Visual Studio C#. 90 (ninety) Malay phrases were tested by 10 (ten) speakers from both genders in different contexts. The result shows that the overall accuracy (calculated from Confusion Matrix) is satisfactory as it is 92.69%.
Index to NASA news releases and speeches 1994

NASA Technical Reports Server (NTRS)

1995-01-01

This index contains a listing of news releases distributed by the Office of Public Affairs, NASA Headquarters, and a selected listing of speeches presented by members of the Headquarters staff during 1994.
Neuronal populations in the occipital cortex of the blind synchronize to the temporal dynamics of speech

PubMed Central

Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto

2018-01-01

The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Live From the Front: Operational Ramifications of Military Web Logs in Combat Zones

DTIC Science & Technology

2007-05-10

may view milbloggers’ First Amendment right to freedom of speech , similar First Amendment cases must be examined. In United States v...redress of grievances against certain military regulations. The Court found in favor of the military because a service member’s freedom of speech “yields...11 The U.S. Supreme Court has given the military wide latitude to restrict service member’s freedom of speech in matters pertaining to national
Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

DTIC Science & Technology

2008-04-01

REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
High-frequency neural activity predicts word parsing in ambiguous speech streams.

PubMed

Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

2016-12-01

During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. Copyright © 2016 the American Physiological Society.
High-frequency neural activity predicts word parsing in ambiguous speech streams

PubMed Central

Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

2016-01-01

During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. PMID:27605528
A Comparison of Coverage of Speech and Press Verdicts of Supreme Court.

ERIC Educational Resources Information Center

Hale, F. Dennis

1979-01-01

An analysis of the coverage by ten newspapers of 20 United States Supreme Court decisions concerning freedom of the press and 20 decisions concerning freedom of speech revealed that the newspapers gave significantly greater coverage to the press decisions. (GT)
Segmental intelligibility of synthetic speech produced by rule.

PubMed

Logan, J S; Greene, B G; Pisoni, D B

1989-08-01

This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.
Segmental intelligibility of synthetic speech produced by rule

PubMed Central

Logan, John S.; Greene, Beth G.; Pisoni, David B.

2012-01-01

This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk—Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener’s processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener. PMID:2527884
Cognitive Load in Voice Therapy Carry-Over Exercises.

PubMed

Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

2017-01-01

The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.
Developmental changes in sensitivity to vocal paralanguage

PubMed Central

Friend, Margaret

2017-01-01

Developmental changes in children’s sensitivity to the role of acoustic variation in the speech stream in conveying speaker affect (vocal paralanguage) were examined. Four-, 7- and 10-year-olds heard utterances in three formats: low-pass filtered, reiterant, and normal speech. The availability of lexical and paralinguistic information varied across these three formats in a way that required children to base their judgments of speaker affect on different configurations of cues in each format. Across ages, the best performance was obtained when a rich array of acoustic cues was present and when there was no competing lexical information. Four-year-olds performed at chance when judgments had to be based solely on speech prosody in the filtered format and they were unable to selectively attend to paralanguage when discrepant lexical cues were present in normal speech. Seven-year-olds were significantly more sensitive to the paralinguistic role of speech prosody in filtered speech than were 4-year-olds and there was a trend toward greater attention to paralanguage when lexical and paralinguistic cues were inconsistent in normal speech. An integration of the ability to utilize prosodic cues to speaker affect with attention to paralanguage in cases of lexical/paralinguistic discrepancy was observed for 10-year-olds. The results are discussed in terms of the development of a perceptual bias emerging out of selective attention to language. PMID:28713218

The Effects of Phonological Short-Term Memory and Speech Perception on Spoken Sentence Comprehension in Children: Simulating Deficits in an Experimental Design.

PubMed

Higgins, Meaghan C; Penney, Sarah B; Robertson, Erin K

2017-10-01

The roles of phonological short-term memory (pSTM) and speech perception in spoken sentence comprehension were examined in an experimental design. Deficits in pSTM and speech perception were simulated through task demands while typically-developing children (N [Formula: see text] 71) completed a sentence-picture matching task. Children performed the control, simulated pSTM deficit, simulated speech perception deficit, or simulated double deficit condition. On long sentences, the double deficit group had lower scores than the control and speech perception deficit groups, and the pSTM deficit group had lower scores than the control group and marginally lower scores than the speech perception deficit group. The pSTM and speech perception groups performed similarly to groups with real deficits in these areas, who completed the control condition. Overall, scores were lowest on noncanonical long sentences. Results show pSTM has a greater effect than speech perception on sentence comprehension, at least in the tasks employed here.
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Phonetic basis of phonemic paraphasias in aphasia: Evidence for cascading activation.

PubMed

Kurowski, Kathleen; Blumstein, Sheila E

2016-02-01

Phonemic paraphasias are a common presenting symptom in aphasia and are thought to reflect a deficit in which selecting an incorrect phonemic segment results in the clear-cut substitution of one phonemic segment for another. The current study re-examines the basis of these paraphasias. Seven left hemisphere-damaged aphasics with a range of left hemisphere lesions and clinical diagnoses including Broca's, Conduction, and Wernicke's aphasia, were asked to produce syllable-initial voiced and voiceless fricative consonants, [z] and [s], in CV syllables followed by one of five vowels [i e a o u] in isolation and in a carrier phrase. Acoustic analyses were conducted focusing on two acoustic parameters signaling voicing in fricative consonants: duration and amplitude properties of the fricative noise. Results show that for all participants, regardless of clinical diagnosis or lesion site, phonemic paraphasias leave an acoustic trace of the original target in the error production. These findings challenge the view that phonemic paraphasias arise from a mis-selection of phonemic units followed by its correct implementation, as traditionally proposed. Rather, they appear to derive from a common mechanism with speech errors reflecting the co-activation of a target and competitor resulting in speech output that has some phonetic properties of both segments. Copyright © 2015 Elsevier Ltd. All rights reserved.
The Listening Ear: The Development of Speech as a Creative Influence in Education (Learning Resource Series).

ERIC Educational Resources Information Center

McAllen, Audrey E.

This book gives teachers an understanding of speech training through specially selected exercises. The book's exercises aim to help develop clear speaking in the classroom. Methodically and perceptively used, the book will assist those concerned with the creative powers of speech as a teaching art. In Part 1, there are sections on the links…
Open source OCR framework using mobile devices

NASA Astrophysics Data System (ADS)

Zhou, Steven Zhiying; Gilani, Syed Omer; Winkler, Stefan

2008-02-01

Mobile phones have evolved from passive one-to-one communication device to powerful handheld computing device. Today most new mobile phones are capable of capturing images, recording video, and browsing internet and do much more. Exciting new social applications are emerging on mobile landscape, like, business card readers, sing detectors and translators. These applications help people quickly gather the information in digital format and interpret them without the need of carrying laptops or tablet PCs. However with all these advancements we find very few open source software available for mobile phones. For instance currently there are many open source OCR engines for desktop platform but, to our knowledge, none are available on mobile platform. Keeping this in perspective we propose a complete text detection and recognition system with speech synthesis ability, using existing desktop technology. In this work we developed a complete OCR framework with subsystems from open source desktop community. This includes a popular open source OCR engine named Tesseract for text detection & recognition and Flite speech synthesis module, for adding text-to-speech ability.
Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

NASA Astrophysics Data System (ADS)

Maskeliunas, Rytis; Rudzionis, Vytautas

2011-06-01

In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Syntactic error modeling and scoring normalization in speech recognition

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex

1991-01-01

The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read.
Speech and pause characteristics in multiple sclerosis: A preliminary study of speakers with high and low neuropsychological test performance

PubMed Central

FEENAUGHTY, LYNDA; TJADEN, KRIS; BENEDICT, RALPH H.B.; WEINSTOCK-GUTTMAN, BIANCA

2017-01-01

This preliminary study investigated how cognitive-linguistic status in multiple sclerosis (MS) is reflected in two speech tasks (i.e. oral reading, narrative) that differ in cognitive-linguistic demand. Twenty individuals with MS were selected to comprise High and Low performance groups based on clinical tests of executive function and information processing speed and efficiency. Ten healthy controls were included for comparison. Speech samples were audio-recorded and measures of global speech timing were obtained. Results indicated predicted differences in global speech timing (i.e. speech rate and pause characteristics) for speech tasks differing in cognitive-linguistic demand, but the magnitude of these task-related differences was similar for all speaker groups. Findings suggest that assumptions concerning the cognitive-linguistic demands of reading aloud as compared to spontaneous speech may need to be re-considered for individuals with cognitive impairment. Qualitative trends suggest that additional studies investigating the association between cognitive-linguistic and speech motor variables in MS are warranted. PMID:23294227
Highlight summarization in golf videos using audio signals

NASA Astrophysics Data System (ADS)

Kim, Hyoung-Gook; Kim, Jin Young

2008-01-01

In this paper, we present an automatic summarization of highlights in golf videos based on audio information alone without video information. The proposed highlight summarization system is carried out based on semantic audio segmentation and detection on action units from audio signals. Studio speech, field speech, music, and applause are segmented by means of sound classification. Swing is detected by the methods of impulse onset detection. Sounds like swing and applause form a complete action unit, while studio speech and music parts are used to anchor the program structure. With the advantage of highly precise detection of applause, highlights are extracted effectively. Our experimental results obtain high classification precision on 18 golf games. It proves that the proposed system is very effective and computationally efficient to apply the technology to embedded consumer electronic devices.
Referred speech-language and hearing complaints in the western region of São Paulo, Brazil

PubMed Central

Samelli, Alessandra Giannella; Rondon, Silmara; Oliver, Fátima Correa; Junqueira, Simone Rennó; Molini-Avejonas, Daniela Regina

2014-01-01

OBJECTIVE: The aim of this study was to characterize the epidemiological profile of the population attending primary health care units in the western region of the city of São Paulo, Brazil, highlighting referred speech-language and hearing complaints. METHOD: This investigation was a cross-sectional observational study conducted in primary health care units. Household surveys were conducted and information was obtained from approximately 2602 individuals, including (but not limited to) data related to education, family income, health issues, access to public services and access to health services. The speech-language and hearing complaints were identified from specific questions. RESULTS: Our results revealed that the populations participating in the survey were heterogeneous in terms of their demographic and economic characteristics. The prevalence of referred speech-language and hearing complaints in this population was 10%, and only half the users of the public health system in the studied region who had complaints were monitored or received specific treatment. CONCLUSIONS: The results demonstrate the importance of using population surveys to identify speech-language and hearing complaints at the level of primary health care. Moreover, these findings highlight the need to reorganize the speech-language pathology and audiology service in the western region of São Paulo, as well as the need to improve the Family Health Strategy in areas that do not have a complete coverage, in order to expand and improve the territorial diagnostics and the speech-language pathology and audiology actions related to the prevention, identification, and rehabilitation of human communication disorders. PMID:24964306
Playing with Gladys: A case study integrating drama therapy with behavioural interventions for the treatment of selective mutism.

PubMed

Oon, Phei Phei

2010-04-01

This case study examines an integrative approach combining drama therapy and the behavioural skill "shaping", as offered to Gladys, a 5-year-old girl diagnosed with selective mutism. This study found that shaping, when implemented in the context of play, with play as the primary reinforcer, elicited from Gladys vocalization and eventually speech within a very short time. Her vocalizations allowed her to enter dramatic play, which in turn propelled spontaneous speech. This article looks at how the three elements of dramatherapy - the playspace, role-playing and dramatic projection - brought about therapeutic changes for Gladys. Aside from spontaneous speech, Gladys also developed positive self-esteem and a heightened sense of spontaneity. Subsequently, these two qualities helped her generalize her speech to new settings on her own. Gladys's newly harnessed spontaneity further helped her become more sociable and resilient.This study advances the possibility of integrating a behavioural skill with drama therapy for the therapeutic benefits of a child with an anxiety-related condition like selective mutism.
Detecting the Difficulty Level of Foreign Language Texts

DTIC Science & Technology

2010-02-01

continuous tenses), as well as part- of- speech labels for words. The authors used a k-Nearest Neighbor ( kNN ) classifier (Cover and Hart, 1967; Mitchell, 1997...anticipate, and influence these situations and to operate in them is found in foreign language speech and text. For this reason, military linguists are...the language model system, LGR is the prediction of one of the grammar-based classifiers, and CkNN is a confidence value of the kNN prediction for the
Using Computer Technology To Monitor Student Progress and Remediate Reading Problems.

ERIC Educational Resources Information Center

McCullough, C. Sue

1995-01-01

Focuses on research about application of text-to-speech systems in diagnosing and remediating word recognition, vocabulary knowledge, and comprehension disabilities. As school psychologists move toward a consultative model of service delivery, they need to know about technology such as speech synthesizers, digitizers, optical-character-recognition…
Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

PubMed

Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

2018-04-04

Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

PubMed

Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

2009-04-01

The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.
The Great Communicator Files

ERIC Educational Resources Information Center

Cohen, Mira

2007-01-01

American presidents are regularly called upon to share their thoughts, ideas, and sentiments both with the nation and the world. This prompts the questions: How are these speeches written? Who writes them? What other resources, texts, conversations, and experiences do presidents use to help them create these famous speeches? Who helps the…
Perspectives. Fall 2008

ERIC Educational Resources Information Center

PEPNet, 2008

2008-01-01

PEPNet's "Perspectives" is the collaborative newsletter of the four PEPNet regional centers. This newsletter combines each centers individual strengths into a single resource that can be used on a national level. This issue focuses on the following topics: (1) PEPNet FAQs on Web, in Print; (2) Some Speech-to-Text FAQs; (3) Speech-to-Text…
Discrepant visual speech facilitates covert selective listening in "cocktail party" conditions.

PubMed

Williams, Jason A

2012-06-01

The presence of congruent visual speech information facilitates the identification of auditory speech, while the addition of incongruent visual speech information often impairs accuracy. This latter arrangement occurs naturally when one is being directly addressed in conversation but listens to a different speaker. Under these conditions, performance may diminish since: (a) one is bereft of the facilitative effects of the corresponding lip motion and (b) one becomes subject to visual distortion by incongruent visual speech; by contrast, speech intelligibility may be improved due to (c) bimodal localization of the central unattended stimulus. Participants were exposed to centrally presented visual and auditory speech while attending to a peripheral speech stream. In some trials, the lip movements of the central visual stimulus matched the unattended speech stream; in others, the lip movements matched the attended peripheral speech. Accuracy for the peripheral stimulus was nearly one standard deviation greater with incongruent visual information, compared to the congruent condition which provided bimodal pattern recognition cues. Likely, the bimodal localization of the central stimulus further differentiated the stimuli and thus facilitated intelligibility. Results are discussed with regard to similar findings in an investigation of the ventriloquist effect, and the relative strength of localization and speech cues in covert listening.
Chronic 'speech catatonia' with constant logorrhea, verbigeration and echolalia successfully treated with lorazepam: a case report.

PubMed

Lee, Joseph W Y

2004-12-01

Logorrhea, verbigeration and echolalia persisted unremittingly for 3 years, with occasional short periods of motoric excitement, in a patient with mild intellectual handicap suffering from chronic schizophrenia. The speech catatonic symptoms, previously refractory to various antipsychotics, responded promptly to lorazepam, a benzodiazepine with documented efficacy in the treatment of acute catatonia but not chronic catatonia. It is suggested that pathways in speech production were selectively involved in the genesis of the chronic speech catatonic syndrome, possibly a rare form of chronic catatonia not previously described.
Using on-line altered auditory feedback treating Parkinsonian speech

NASA Astrophysics Data System (ADS)

Wang, Emily; Verhagen, Leo; de Vries, Meinou H.

2005-09-01

Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].

A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation

NASA Astrophysics Data System (ADS)

Wu, Bo; Yang, Minglei; Li, Kehuang; Huang, Zhen; Siniscalchi, Sabato Marco; Wang, Tong; Lee, Chin-Hui

2017-12-01

A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is proposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust system. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework, namely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from multiple microphones in an array and map them into the expected output LPS features of anechoic reference speech based on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at different RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed in the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments. Finally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study indicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm weighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into account, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to unseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.
Capturing patient information at nursing shift changes: methodological evaluation of speech recognition and information extraction

PubMed Central

Suominen, Hanna; Johnson, Maree; Zhou, Liyuan; Sanchez, Paula; Sirel, Raul; Basilakis, Jim; Hanlen, Leif; Estival, Dominique; Dawson, Linda; Kelly, Barbara

2015-01-01

Objective We study the use of speech recognition and information extraction to generate drafts of Australian nursing-handover documents. Methods Speech recognition correctness and clinicians’ preferences were evaluated using 15 recorder–microphone combinations, six documents, three speakers, Dragon Medical 11, and five survey/interview participants. Information extraction correctness evaluation used 260 documents, six-class classification for each word, two annotators, and the CRF++ conditional random field toolkit. Results A noise-cancelling lapel-microphone with a digital voice recorder gave the best correctness (79%). This microphone was also the most preferred option by all but one participant. Although the participants liked the small size of this recorder, their preference was for tablets that can also be used for document proofing and sign-off, among other tasks. Accented speech was harder to recognize than native language and a male speaker was detected better than a female speaker. Information extraction was excellent in filtering out irrelevant text (85% F1) and identifying text relevant to two classes (87% and 70% F1). Similarly to the annotators’ disagreements, there was confusion between the remaining three classes, which explains the modest 62% macro-averaged F1. Discussion We present evidence for the feasibility of speech recognition and information extraction to support clinicians’ in entering text and unlock its content for computerized decision-making and surveillance in healthcare. Conclusions The benefits of this automation include storing all information; making the drafts available and accessible almost instantly to everyone with authorized access; and avoiding information loss, delays, and misinterpretations inherent to using a ward clerk or transcription services. PMID:25336589
Using speech for mode selection in control of multifunctional myoelectric prostheses.

PubMed

Fang, Peng; Wei, Zheng; Geng, Yanjuan; Yao, Fuan; Li, Guanglin

2013-01-01

Electromyogram (EMG) recorded from residual muscles of limbs is considered as suitable control information for motorized prostheses. However, in case of high-level amputations, the residual muscles are usually limited, which may not provide enough EMG for flexible control of myoelectric prostheses with multiple degrees of freedom of movements. Here, we proposed a control strategy, where the speech signals were used as additional information and combined with the EMG signals to realize more flexible control of multifunctional prostheses. By replacing the traditional "sequential mode-switching (joint-switching)", the speech signals were used to select a mode (joint) of the prosthetic arm, and then the EMG signals were applied to determine a motion class involved in the selected joint and to execute the motion. Preliminary results from three able-bodied subjects and one transhumeral amputee demonstrated the proposed strategy could achieve a high mode-selection rate and enhance the operation efficiency, suggesting the strategy may improve the control performance of commercial myoelectric prostheses.
Communication Methods for the Hearing Impaired.

ERIC Educational Resources Information Center

World Federation of the Deaf, Rome (Italy).

Communication methods for the hearing impaired are discussed in 12 conference papers. Papers from the United States are "Adjustment through Oralism" by G. Fellendorf, "Prospectus of Patterning" (a method of teaching speech to deaf children) by M.S. Buckler, and "Visual Monitoring of Speech by the Deaf" by W.…
Speech-Language Pathologists' Comfort Levels in English Language Learner Service Delivery

ERIC Educational Resources Information Center

Kimble, Carlotta

2013-01-01

This study examined speech-language pathologists' (SLPs) comfort levels in providing service delivery to English language learners (ELLs) and limited English proficient (LEP) students. Participants included 192 SLPs from the United States and Guam. Participants completed a brief, six-item questionnaire that investigated their perceptions regarding…
Listener Reliability in Assigning Utterance Boundaries in Children's Spontaneous Speech

ERIC Educational Resources Information Center

Stockman, Ida J.

2010-01-01

Research and clinical practices often rely on an utterance unit for spoken language analysis. This paper calls attention to the problems encountered when identifying utterance boundaries in young children's spontaneous conversational speech. The results of a reliability study of utterance boundary assignment are described for 20 females with…
Commercial Speech Protection and Alcoholic Beverage Advertising.

ERIC Educational Resources Information Center

Greer, Sue

An examination of the laws governing commercial speech protection and alcoholic beverage advertisements, this document details the legal precedents for and implications of banning such advertising. An introduction looks at the current amount of alcohol consumed in the United States and the recent campaigns to have alcoholic beverage ads banned.…
Speech Intelligibility and Hearing Protector Selection

DTIC Science & Technology

2016-08-29

for use. 8 Another nonstandardized speech intelligibility test relevant to military environments is the Coordinate Response Measure ( CRM ...developed by the U.S. Air Force Research Laboratory (Bolia, Nelson, Ericson, and Simpson, 2000). The phrases in the CRM are comprised of a call...detections and the percentage of correctly identified color-number combinations. The CRM is particularly useful in evaluating speech intelligibility over
Passion and Preparation in the Basic Course: The Influence of Students' Ego-Involvement with Speech Topics and Preparation Time on Public-Speaking Grades

ERIC Educational Resources Information Center

Mazer, Joseph P.; Titsworth, Scott

2012-01-01

Authors of basic public-speaking course textbooks frequently encourage students to select speech topics in which they have vested interest, care deeply about, and hold strong opinions and beliefs. This study explores students' level of ego-involvement with informative and persuasive speech topics, examines possible ego-involvement predictors of…
Factors Influencing School-Based Speech and Language Pathologists in the Selection of Communication Assessments for Students with Autism Spectrum Disorders: Why We Do What We Do

ERIC Educational Resources Information Center

Schwartz, Lorna T.

2010-01-01

Speech and language pathologists (SLPs) are collaborators in a diagnostic process that reflects an increasing number of referrals of children with autism spectrums disorders (ASD). Also, current practices leading to the remediation of speech and language disorders have come under scrutiny for limitations in effective carryover of targeted goals…
The Effect of Formal Instruction on the Pidginized Speech of One Second Language Learner.

ERIC Educational Resources Information Center

Bruzzese, Giannina

The effect of formal instruction in English as a Second Language (ESL) on the pidginized speech of a second language learner was studied. The subject was a 76-year-old Italian woman residing in the United States since the age of 37. Four one-hour tapes were made of the subject's speech in April of 1976, and during the last five months of a…
Familiar units prevail over statistical cues in word segmentation.

PubMed

Poulin-Charronnat, Bénédicte; Perruchet, Pierre; Tillmann, Barbara; Peereman, Ronald

2017-09-01

In language acquisition research, the prevailing position is that listeners exploit statistical cues, in particular transitional probabilities between syllables, to discover words of a language. However, other cues are also involved in word discovery. Assessing the weight learners give to these different cues leads to a better understanding of the processes underlying speech segmentation. The present study evaluated whether adult learners preferentially used known units or statistical cues for segmenting continuous speech. Before the exposure phase, participants were familiarized with part-words of a three-word artificial language. This design allowed the dissociation of the influence of statistical cues and familiar units, with statistical cues favoring word segmentation and familiar units favoring (nonoptimal) part-word segmentation. In Experiment 1, performance in a two-alternative forced choice (2AFC) task between words and part-words revealed part-word segmentation (even though part-words were less cohesive in terms of transitional probabilities and less frequent than words). By contrast, an unfamiliarized group exhibited word segmentation, as usually observed in standard conditions. Experiment 2 used a syllable-detection task to remove the likely contamination of performance by memory and strategy effects in the 2AFC task. Overall, the results suggest that familiar units overrode statistical cues, ultimately questioning the need for computation mechanisms of transitional probabilities (TPs) in natural language speech segmentation.
Why are background telephone conversations distracting?

PubMed

Marsh, John E; Ljung, Robert; Jahncke, Helena; MacCutcheon, Douglas; Pausch, Florian; Ball, Linden J; Vachon, François

2018-06-01

Telephone conversation is ubiquitous within the office setting. Overhearing a telephone conversation-whereby only one of the two speakers is heard-is subjectively more annoying and objectively more distracting than overhearing a full conversation. The present study sought to determine whether this "halfalogue" effect is attributable to unexpected offsets and onsets within the background speech (acoustic unexpectedness) or to the tendency to predict the unheard part of the conversation (semantic [un]predictability), and whether these effects can be shielded against through top-down cognitive control. In Experiment 1, participants performed an office-related task in quiet or in the presence of halfalogue and dialogue background speech. Irrelevant speech was either meaningful or meaningless speech. The halfalogue effect was only present for the meaningful speech condition. Experiment 2 addressed whether higher task-engagement could shield against the halfalogue effect by manipulating the font of the to-be-read material. Although the halfalogue effect was found with an easy-to-read font (fluent text), the use of a difficult-to-read font (disfluent text) eliminated the effect. The halfalogue effect is thus attributable to the semantic (un)predictability, not the acoustic unexpectedness, of background telephone conversation and can be prevented by simple means such as increasing the level of engagement required by the focal task. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Speech perception in individuals with auditory dys-synchrony.

PubMed

Kumar, U A; Jayaram, M

2011-03-01

This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony. Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant-vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the 'just noticeable difference' time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed. Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used. These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.
The human auditory brainstem response to running speech reveals a subcortical mechanism for selective attention.

PubMed

Forte, Antonio Elia; Etard, Octave; Reichenbach, Tobias

2017-10-10

Humans excel at selectively listening to a target speaker in background noise such as competing voices. While the encoding of speech in the auditory cortex is modulated by selective attention, it remains debated whether such modulation occurs already in subcortical auditory structures. Investigating the contribution of the human brainstem to attention has, in particular, been hindered by the tiny amplitude of the brainstem response. Its measurement normally requires a large number of repetitions of the same short sound stimuli, which may lead to a loss of attention and to neural adaptation. Here we develop a mathematical method to measure the auditory brainstem response to running speech, an acoustic stimulus that does not repeat and that has a high ecological validity. We employ this method to assess the brainstem's activity when a subject listens to one of two competing speakers, and show that the brainstem response is consistently modulated by attention.
The Effect of Remote Masking on the Reception of Speech by Young School-Age Children.

PubMed

Youngdahl, Carla L; Healy, Eric W; Yoho, Sarah E; Apoux, Frédéric; Holt, Rachael Frush

2018-02-15

Psychoacoustic data indicate that infants and children are less likely than adults to focus on a spectral region containing an anticipated signal and are more susceptible to remote masking of a signal. These detection tasks suggest that infants and children, unlike adults, do not listen selectively. However, less is known about children's ability to listen selectively during speech recognition. Accordingly, the current study examines remote masking during speech recognition in children and adults. Adults and 7- and 5-year-old children performed sentence recognition in the presence of various spectrally remote maskers. Intelligibility was determined for each remote-masker condition, and performance was compared across age groups. It was found that speech recognition for 5-year-olds was reduced in the presence of spectrally remote noise, whereas the maskers had no effect on the 7-year-olds or adults. Maskers of different bandwidth and remoteness had similar effects. In accord with psychoacoustic data, young children do not appear to focus on a spectral region of interest and ignore other regions during speech recognition. This tendency may help account for their typically poorer speech perception in noise. This study also appears to capture an important developmental stage, during which a substantial refinement in spectral listening occurs.
Central Asia and the United States 2004-2005: Moving Beyond Counter-Terrorism?

DTIC Science & Technology

2005-02-01

basic freedoms as the right to due process, freedom of speech , freedom of assembly, and freedom of religious belief. In short, none of the countries...others have been doing the same, dictating the path to democracy, liberalization, and economic reform and seeking to teach Uzbekistan about freedom of speech , political
Maternal Speech to Three-Month-Old Infants in the United States and Japan.

ERIC Educational Resources Information Center

Toda, Sueko; And Others

1990-01-01

Compared American and Japanese maternal speech to three-month-old infants. Observations showed that U.S. mothers were more information oriented than Japanese mothers, and that Japanese mothers were more affect oriented, using more nonsense, onomatopoeic sounds, baby talk, and babies' names. Differences are attributed to culture-specific…
Predicting speech intelligibility with a multiple speech subsystems approach in children with cerebral palsy.

PubMed

Lee, Jimin; Hustad, Katherine C; Weismer, Gary

2014-10-01

Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (speech motor impairment [SMI] group) and 9 judged to be free of dysarthria (no SMI [NSMI] group). Data from children with CP were compared to data from age-matched typically developing children. Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and typically developing groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (adjusted R2 = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R2 analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Children in the SMI group had articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems.
Cooperation not confrontation: the imperative of a nuclear age. The message from Budapest.

PubMed

Lown, B; Chazov, E

1985-08-02

Reprinted here is the text of a speech to the Fifth Congress of the International Physicians for the Prevention of Nuclear War (IPPNW), delivered in Budapest on 29 June 1985 by the group's co-founders, Dr. Bernard Lown from the United States and Dr. Eugene Chazov from the U.S.S.R. After reminding the delegates that 1985 marked the 40th anniversary of the bombings of Hiroshima and Nagasaki and the founding of the United Nations, the two physicians review the work of the IPPNW in alerting the world to the dangers of nuclear warfare. They warn that the chances of nuclear confrontation have increased, and urge their colleagues to foster cooperation between East and West. Lown and Chazov identify nuclear war as the greatest public health threat of all, and call for a moratorium on all nuclear explosions.

Individual differences in selective attention predict speech identification at a cocktail party

PubMed Central

Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

2016-01-01

Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272
Bilingual Language Assessment: Contemporary versus Recommended Practice in American Schools

ERIC Educational Resources Information Center

Arias, Graciela; Friberg, Jennifer

2017-01-01

Purpose: The purpose of this study was to identify current practices of school-based speech-language pathologists (SLPs) in the United States for bilingual language assessment and compare them to American Speech-Language-Hearing Association (ASHA) best practice guidelines and mandates of the Individuals with Disabilities Education Act (IDEA,…
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms

PubMed Central

Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer. PMID:29230254
A Cross-Lingual Mobile Medical Communication System Prototype for Foreigners and Subjects with Speech, Hearing, and Mental Disabilities Based on Pictograms.

PubMed

Wołk, Krzysztof; Wołk, Agnieszka; Glinkowski, Wojciech

2017-01-01

People with speech, hearing, or mental impairment require special communication assistance, especially for medical purposes. Automatic solutions for speech recognition and voice synthesis from text are poor fits for communication in the medical domain because they are dependent on error-prone statistical models. Systems dependent on manual text input are insufficient. Recently introduced systems for automatic sign language recognition are dependent on statistical models as well as on image and gesture quality. Such systems remain in early development and are based mostly on minimal hand gestures unsuitable for medical purposes. Furthermore, solutions that rely on the Internet cannot be used after disasters that require humanitarian aid. We propose a high-speed, intuitive, Internet-free, voice-free, and text-free tool suited for emergency medical communication. Our solution is a pictogram-based application that provides easy communication for individuals who have speech or hearing impairment or mental health issues that impair communication, as well as foreigners who do not speak the local language. It provides support and clarification in communication by using intuitive icons and interactive symbols that are easy to use on a mobile device. Such pictogram-based communication can be quite effective and ultimately make people's lives happier, easier, and safer.
Characterizing Articulation in Apraxic Speech Using Real-Time Magnetic Resonance Imaging.

PubMed

Hagedorn, Christina; Proctor, Michael; Goldstein, Louis; Wilson, Stephen M; Miller, Bruce; Gorno-Tempini, Maria Luisa; Narayanan, Shrikanth S

2017-04-14

Real-time magnetic resonance imaging (MRI) and accompanying analytical methods are shown to capture and quantify salient aspects of apraxic speech, substantiating and expanding upon evidence provided by clinical observation and acoustic and kinematic data. Analysis of apraxic speech errors within a dynamic systems framework is provided and the nature of pathomechanisms of apraxic speech discussed. One adult male speaker with apraxia of speech was imaged using real-time MRI while producing spontaneous speech, repeated naming tasks, and self-paced repetition of word pairs designed to elicit speech errors. Articulatory data were analyzed, and speech errors were detected using time series reflecting articulatory activity in regions of interest. Real-time MRI captured two types of apraxic gestural intrusion errors in a word pair repetition task. Gestural intrusion errors in nonrepetitive speech, multiple silent initiation gestures at the onset of speech, and covert (unphonated) articulation of entire monosyllabic words were also captured. Real-time MRI and accompanying analytical methods capture and quantify many features of apraxic speech that have been previously observed using other modalities while offering high spatial resolution. This patient's apraxia of speech affected the ability to select only the appropriate vocal tract gestures for a target utterance, suppressing others, and to coordinate them in time.
Speech Entrainment Compensates for Broca's Area Damage

PubMed Central

Fridriksson, Julius; Basilakos, Alexandra; Hickok, Gregory; Bonilha, Leonardo; Rorden, Chris

2015-01-01

Speech entrainment (SE), the online mimicking of an audiovisual speech model, has been shown to increase speech fluency in patients with Broca's aphasia. However, not all individuals with aphasia benefit from SE. The purpose of this study was to identify patterns of cortical damage that predict a positive response SE's fluency-inducing effects. Forty-four chronic patients with left hemisphere stroke (15 female) were included in this study. Participants completed two tasks: 1) spontaneous speech production, and 2) audiovisual SE. Number of different words per minute was calculated as a speech output measure for each task, with the difference between SE and spontaneous speech conditions yielding a measure of fluency improvement. Voxel-wise lesion-symptom mapping (VLSM) was used to relate the number of different words per minute for spontaneous speech, SE, and SE-related improvement to patterns of brain damage in order to predict lesion locations associated with the fluency-inducing response to speech entrainment. Individuals with Broca's aphasia demonstrated a significant increase in different words per minute during speech entrainment versus spontaneous speech. A similar pattern of improvement was not seen in patients with other types of aphasia. VLSM analysis revealed damage to the inferior frontal gyrus predicted this response. Results suggest that SE exerts its fluency-inducing effects by providing a surrogate target for speech production via internal monitoring processes. Clinically, these results add further support for the use of speech entrainment to improve speech production and may help select patients for speech entrainment treatment. PMID:25989443
Reflections on Teaching Literacy: Selected Speeches of Margaret J. Early

ERIC Educational Resources Information Center

Wolcott, Willa, Ed.

2011-01-01

The late Margaret J. Early was a nationally renowned educator in the field of English education and reading, a past president of the National Council of Teachers of English, an author and an editor herself, and the recipient of many awards. The book Reflections on Teaching Literacy: Selected Speeches of Margaret J. Early, edited by Willa Wolcott,…
Real-Time Captioning for Improving Informed Consent: Patient and Physician Benefits.

PubMed

Spehar, Brent; Tye-Murray, Nancy; Myerson, Joel; Murray, David J

2016-01-01

New methods are needed to improve physicians' skill in communicating information and to enhance patients' ability to recall that information. We evaluated a real-time speech-to-text captioning system that simultaneously provided a speech-to-text record for both patient and anesthesiologist. The goals of the study were to assess hearing-impaired patients' recall of an informed consent discussion about regional anesthesia using real-time captioning and to determine whether the physicians found the system useful for monitoring their own performance. We recorded 2 simulated informed consent encounters with hearing-impaired older adults, in which physicians described regional anesthetic procedures. The conversations were conducted with and without real-time captioning. Subsequently, the patient participants, who wore their hearing aids throughout, were tested on the material presented, and video recordings of the encounters were analyzed to determine how effectively physicians communicated with and without the captioning system. The anesthesiology residents provided similar information to the patient participants regardless of whether the real-time captioning system was used. Although the patients retained relatively few details regardless of the informed consent discussion, they could recall significantly more of the key points when provided with real-time captioning. Real-time speech-to-text captioning improved recall in hearing-impaired patients and proved useful for determining the information provided during an informed consent encounter. Real-time speech-to-text captioning could provide a method for assessing physicians' communication that could be used both for self-assessment and as an evaluative approach to training communication skills in practice settings.
Strategy Shifts during Learning from Texts and Pictures

ERIC Educational Resources Information Center

Schnotz, Wolfgang; Ludewig, Ulrich; Ullrich, Mark; Horz, Holger; McElvany, Nele; Baumert, Jürgen

2014-01-01

Reading for learning frequently requires integrating text and picture information into coherent knowledge structures. This article presents an experimental study aimed at analyzing the strategies used by students for integrating text and picture information. Four combinations of texts and pictures (text-picture units) were selected from textbooks…
A Walk through Graduate Education: Selected Papers and Speeches of Jules B. LaPidus, President of the Council of Graduate Schools, 1984-2000.

ERIC Educational Resources Information Center

Hamblin, Jane A., Ed.

This book was created to honor Jules B. LaPidus, retiring president of the Council of Graduate Education, and to preserve his writings and speeches. The papers and speeches of Part 1 show how the author addressed the topical issues of graduate education, moving from observation to direction on research, funding, and preparation of faculty. Part 2…
Speech-language pathologists' practices regarding assessment, analysis, target selection, intervention, and service delivery for children with speech sound disorders.

PubMed

Mcleod, Sharynne; Baker, Elise

2014-01-01

A survey of 231 Australian speech-language pathologists (SLPs) was undertaken to describe practices regarding assessment, analysis, target selection, intervention, and service delivery for children with speech sound disorders (SSD). The participants typically worked in private practice, education, or community health settings and 67.6% had a waiting list for services. For each child, most of the SLPs spent 10-40 min in pre-assessment activities, 30-60 min undertaking face-to-face assessments, and 30-60 min completing paperwork after assessments. During an assessment SLPs typically conducted a parent interview, single-word speech sampling, collected a connected speech sample, and used informal tests. They also determined children's stimulability and estimated intelligibility. With multilingual children, informal assessment procedures and English-only tests were commonly used and SLPs relied on family members or interpreters to assist. Common analysis techniques included determination of phonological processes, substitutions-omissions-distortions-additions (SODA), and phonetic inventory. Participants placed high priority on selecting target sounds that were stimulable, early developing, and in error across all word positions and 60.3% felt very confident or confident selecting an appropriate intervention approach. Eight intervention approaches were frequently used: auditory discrimination, minimal pairs, cued articulation, phonological awareness, traditional articulation therapy, auditory bombardment, Nuffield Centre Dyspraxia Programme, and core vocabulary. Children typically received individual therapy with an SLP in a clinic setting. Parents often observed and participated in sessions and SLPs typically included siblings and grandparents in intervention sessions. Parent training and home programs were more frequently used than the group therapy. Two-thirds kept up-to-date by reading journal articles monthly or every 6 months. There were many similarities with previously reported practices for children with SSD in the US, UK, and the Netherlands, with some (but not all) practices aligning with current research evidence.
Self-organizing map classifier for stressed speech recognition

NASA Astrophysics Data System (ADS)

Partila, Pavol; Tovarek, Jaromir; Voznak, Miroslav

2016-05-01

This paper presents a method for detecting speech under stress using Self-Organizing Maps. Most people who are exposed to stressful situations can not adequately respond to stimuli. Army, police, and fire department occupy the largest part of the environment that are typical of an increased number of stressful situations. The role of men in action is controlled by the control center. Control commands should be adapted to the psychological state of a man in action. It is known that the psychological changes of the human body are also reflected physiologically, which consequently means the stress effected speech. Therefore, it is clear that the speech stress recognizing system is required in the security forces. One of the possible classifiers, which are popular for its flexibility, is a self-organizing map. It is one type of the artificial neural networks. Flexibility means independence classifier on the character of the input data. This feature is suitable for speech processing. Human Stress can be seen as a kind of emotional state. Mel-frequency cepstral coefficients, LPC coefficients, and prosody features were selected for input data. These coefficients were selected for their sensitivity to emotional changes. The calculation of the parameters was performed on speech recordings, which can be divided into two classes, namely the stress state recordings and normal state recordings. The benefit of the experiment is a method using SOM classifier for stress speech detection. Results showed the advantage of this method, which is input data flexibility.
Speech training alters consonant and vowel responses in multiple auditory cortex fields

PubMed Central

Engineer, Crystal T.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Fink, Melyssa K.; Kilgard, Michael P.

2015-01-01

Speech sounds evoke unique neural activity patterns in primary auditory cortex (A1). Extensive speech sound discrimination training alters A1 responses. While the neighboring auditory cortical fields each contain information about speech sound identity, each field processes speech sounds differently. We hypothesized that while all fields would exhibit training-induced plasticity following speech training, there would be unique differences in how each field changes. In this study, rats were trained to discriminate speech sounds by consonant or vowel in quiet and in varying levels of background speech-shaped noise. Local field potential and multiunit responses were recorded from four auditory cortex fields in rats that had received 10 weeks of speech discrimination training. Our results reveal that training alters speech evoked responses in each of the auditory fields tested. The neural response to consonants was significantly stronger in anterior auditory field (AAF) and A1 following speech training. The neural response to vowels following speech training was significantly weaker in ventral auditory field (VAF) and posterior auditory field (PAF). This differential plasticity of consonant and vowel sound responses may result from the greater paired pulse depression, expanded low frequency tuning, reduced frequency selectivity, and lower tone thresholds, which occurred across the four auditory fields. These findings suggest that alterations in the distributed processing of behaviorally relevant sounds may contribute to robust speech discrimination. PMID:25827927
Reviewing the connection between speech and obstructive sleep apnea.

PubMed

Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A

2016-02-20

Sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). The altered UA structure or function in OSA speakers has led to hypothesize the automatic analysis of speech for OSA assessment. In this paper we critically review several approaches using speech analysis and machine learning techniques for OSA detection, and discuss the limitations that can arise when using machine learning techniques for diagnostic applications. A large speech database including 426 male Spanish speakers suspected to suffer OSA and derived to a sleep disorders unit was used to study the clinical validity of several proposals using machine learning techniques to predict the apnea-hypopnea index (AHI) or classify individuals according to their OSA severity. AHI describes the severity of patients' condition. We first evaluate AHI prediction using state-of-the-art speaker recognition technologies: speech spectral information is modelled using supervectors or i-vectors techniques, and AHI is predicted through support vector regression (SVR). Using the same database we then critically review several OSA classification approaches previously proposed. The influence and possible interference of other clinical variables or characteristics available for our OSA population: age, height, weight, body mass index, and cervical perimeter, are also studied. The poor results obtained when estimating AHI using supervectors or i-vectors followed by SVR contrast with the positive results reported by previous research. This fact prompted us to a careful review of these approaches, also testing some reported results over our database. Several methodological limitations and deficiencies were detected that may have led to overoptimistic results. The methodological deficiencies observed after critically reviewing previous research can be relevant examples of potential pitfalls when using machine learning techniques for diagnostic applications. We have found two common limitations that can explain the likelihood of false discovery in previous research: (1) the use of prediction models derived from sources, such as speech, which are also correlated with other patient characteristics (age, height, sex,…) that act as confounding factors; and (2) overfitting of feature selection and validation methods when working with a high number of variables compared to the number of cases. We hope this study could not only be a useful example of relevant issues when using machine learning for medical diagnosis, but it will also help in guiding further research on the connection between speech and OSA.
Predicting Speech Intelligibility with A Multiple Speech Subsystems Approach in Children with Cerebral Palsy

PubMed Central

Lee, Jimin; Hustad, Katherine C.; Weismer, Gary

2014-01-01

Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584
Statistical Clustering and the Contents of the Infant Vocabulary

ERIC Educational Resources Information Center

Swingley, Daniel

2005-01-01

Infants parse speech into word-sized units according to biases that develop in the first year. One bias, present before the age of 7 months, is to cluster syllables that tend to co-occur. The present computational research demonstrates that this statistical clustering bias could lead to the extraction of speech sequences that are actual words,…
New developments in the management of speech and language disorders.

PubMed

Harding, Celia; Gourlay, Sara

2008-05-01

Speech and language disorders, which include swallowing difficulties, are usually managed by speech and language therapists. Such a diverse, complex and challenging clinical group of symptoms requires practitioners with detailed knowledge and understanding of research within those areas, as well as the ability to implement appropriate therapy strategies within many environments. These environments range from neonatal units, acute paediatric wards and health centres through to nurseries, schools and children's homes. This paper summarises the key issues that are fundamental to our understanding of this client group.
Three speech sounds, one motor action: evidence for speech-motor disparity from English flap production.

PubMed

Derrick, Donald; Stavness, Ian; Gick, Bryan

2015-03-01

The assumption that units of speech production bear a one-to-one relationship to speech motor actions pervades otherwise widely varying theories of speech motor behavior. This speech production and simulation study demonstrates that commonly occurring flap sequences may violate this assumption. In the word "Saturday," a sequence of three sounds may be produced using a single, cyclic motor action. Under this view, the initial upward tongue tip motion, starting with the first vowel and moving to contact the hard palate on the way to a retroflex position, is under active muscular control, while the downward movement of the tongue tip, including the second contact with the hard palate, results from gravity and elasticity during tongue muscle relaxation. This sequence is reproduced using a three-dimensional computer simulation of human vocal tract biomechanics and differs greatly from other observed sequences for the same word, which employ multiple targeted speech motor actions. This outcome suggests that a goal of a speaker is to produce an entire sequence in a biomechanically efficient way at the expense of maintaining parity within the individual parts of the sequence.
The Effect of Uni- and Bilateral Thalamic Deep Brain Stimulation on Speech in Patients With Essential Tremor: Acoustics and Intelligibility.

PubMed

Becker, Johannes; Barbe, Michael T; Hartinger, Mariam; Dembek, Till A; Pochmann, Jil; Wirths, Jochen; Allert, Niels; Mücke, Doris; Hermes, Anne; Meister, Ingo G; Visser-Vandewalle, Veerle; Grice, Martine; Timmermann, Lars

2017-04-01

Deep brain stimulation (DBS) of the ventral intermediate nucleus (VIM) is performed to suppress medically-resistant essential tremor (ET). However, stimulation induced dysarthria (SID) is a common side effect, limiting the extent to which tremor can be suppressed. To date, the exact pathogenesis of SID in VIM-DBS treated ET patients is unknown. We investigate the effect of inactivated, uni- and bilateral VIM-DBS on speech production in patients with ET. We employ acoustic measures, tempo, and intelligibility ratings and patient's self-estimated speech to quantify SID, with a focus on comparing bilateral to unilateral stimulation effects and the effect of electrode position on speech. Sixteen German ET patients participated in this study. Each patient was acoustically recorded with DBS-off, unilateral-right-hemispheric-DBS-on, unilateral-left-hemispheric-DBS-on, and bilateral-DBS-on during an oral diadochokinesis task and a read German standard text. To capture the extent of speech impairment, we measured syllable duration and intensity ratio during the DDK task. Naïve listeners rated speech tempo and speech intelligibility of the read text on a 5-point-scale. Patients had to rate their "ability to speak". We found an effect of bilateral compared to unilateral and inactivated stimulation on syllable durations and intensity ratio, as well as on external intelligibility ratings and patients' VAS scores. Additionally, VAS scores are associated with more laterally located active contacts. For speech ratings, we found an effect of syllable duration such that tempo and intelligibility was rated worse for speakers exhibiting greater syllable durations. Our data confirms that SID is more pronounced under bilateral compared to unilateral stimulation. Laterally located electrodes are associated with more severe SID according to patient's self-ratings. We can confirm the relation between diadochokinetic rate and SID in that listener's tempo and intelligibility ratings can be predicted by measured syllable durations from DDK tasks. © 2017 International Neuromodulation Society.
The Making of a Citizen.

ERIC Educational Resources Information Center

Pickle, Catherine; And Others

1987-01-01

Two study units are offered: (1) a primary and intermediate unit entitled "What Are the Boundaries of Freedom of Speech"; and (2) an intermediate unit entitled "What Are the Challenges of Freedom of Religion?" (MT)

Broca’s Area as a Pre-articulatory Phonetic Encoder: Gating the Motor Program

PubMed Central

Ferpozzi, Valentina; Fornia, Luca; Montagna, Marcella; Siodambro, Chiara; Castellano, Antonella; Borroni, Paola; Riva, Marco; Rossi, Marco; Pessina, Federico; Bello, Lorenzo; Cerri, Gabriella

2018-01-01

The exact nature of the role of Broca’s area in control of speech and whether it is exerted at the cognitive or at the motor level is still debated. Intraoperative evidence of a lack of motor responses to direct electrical stimulation (DES) of Broca’s area and the observation that its stimulation induces a “speech arrest” without an apparent effect on the ongoing activity of phono-articulatory muscles, raises the argument. Essentially, attribution of direct involvement of Broca’s area in motor control of speech, requires evidence of a functional connection of this area with the phono-articulatory muscles’ motoneurons. With a quantitative approach we investigated, in 20 patients undergoing surgery for brain tumors, whether DES delivered on Broca’s area affects the recruitment of the phono-articulatory muscles’ motor units. The electromyography (EMG) of the muscles active during two speech tasks (object picture naming and counting) was recorded during and in absence of DES on Broca’s area. Offline, the EMG of each muscle was analyzed in frequency (power spectrum, PS) and time domain (root mean square, RMS) and the two conditions compared. Results show that DES on Broca’s area induces an intensity-dependent “speech arrest.” The intensity of DES needed to induce “speech arrest” when applied on Broca’s area was higher when compared to the intensity effective on the neighboring pre-motor/motor cortices. Notably, PS and RMS measured on the EMG recorded during “speech arrest” were superimposable to those recorded at baseline. Partial interruptions of speech were not observed. Speech arrest was an “all-or-none” effect: muscle activation started only by removing DES, as if DES prevented speech onset. The same effect was observed when stimulating directly the subcortical fibers running below Broca’s area. Intraoperative data point to Broca’s area as a functional gate authorizing the phonetic translation to be executed by the motor areas. Given the absence of a direct effect on motor units recruitment, a direct control of Broca’s area on the phono-articulatory apparatus seems unlikely. Moreover, the strict correlation between DES-intensity and speech prevention, might attribute this effect to the inactivation of the subcortical fibers rather than to Broca’s cortical neurons. PMID:29520225
Library Automation Design for Visually Impaired People

ERIC Educational Resources Information Center

Yurtay, Nilufer; Bicil, Yucel; Celebi, Sait; Cit, Guluzar; Dural, Deniz

2011-01-01

Speech synthesis is a technology used in many different areas in computer science. This technology can bring a solution to reading activity of visually impaired people due to its text to speech conversion. Based on this problem, in this study, a system is designed needed for a visually impaired person to make use of all the library facilities in…
Lip reading using neural networks

NASA Astrophysics Data System (ADS)

Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant

2011-10-01

Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms
Semantic Comprehension of the Action-Role Relationship in Early-Linguistic Infants.

ERIC Educational Resources Information Center

Fritz, Janet J.; Suci, George J.

This study attempted to determine: (1) whether lower-order units (agent or agent-action) within the agent-action-recipient relationship exist in any functional way in the 1-word infant's comprehension of speech; and (2) whether the use of repetition and/or reduced length (common modifications in adult-to-infant speech) used to focus on these…
Automatic detection of Parkinson's disease in running speech spoken in three different languages.

PubMed

Orozco-Arroyave, J R; Hönig, F; Arias-Londoño, J D; Vargas-Bonilla, J F; Daqrouq, K; Skodda, S; Rusz, J; Nöth, E

2016-01-01

The aim of this study is the analysis of continuous speech signals of people with Parkinson's disease (PD) considering recordings in different languages (Spanish, German, and Czech). A method for the characterization of the speech signals, based on the automatic segmentation of utterances into voiced and unvoiced frames, is addressed here. The energy content of the unvoiced sounds is modeled using 12 Mel-frequency cepstral coefficients and 25 bands scaled according to the Bark scale. Four speech tasks comprising isolated words, rapid repetition of the syllables /pa/-/ta/-/ka/, sentences, and read texts are evaluated. The method proves to be more accurate than classical approaches in the automatic classification of speech of people with PD and healthy controls. The accuracies range from 85% to 99% depending on the language and the speech task. Cross-language experiments are also performed confirming the robustness and generalization capability of the method, with accuracies ranging from 60% to 99%. This work comprises a step forward for the development of computer aided tools for the automatic assessment of dysarthric speech signals in multiple languages.
Smart command recognizer (SCR) - For development, test, and implementation of speech commands

NASA Technical Reports Server (NTRS)

Simpson, Carol A.; Bunnell, John W.; Krones, Robert R.

1988-01-01

The SCR, a rapid prototyping system for the development, testing, and implementation of speech commands in a flight simulator or test aircraft, is described. A single unit performs all functions needed during these three phases of system development, while the use of common software and speech command data structure files greatly reduces the preparation time for successive development phases. As a smart peripheral to a simulation or flight host computer, the SCR interprets the pilot's spoken input and passes command codes to the simulation or flight computer.
The Contribution of Cognitive Factors to Individual Differences in Understanding Noise-Vocoded Speech in Young and Older Adults

PubMed Central

Rosemann, Stephanie; Gießing, Carsten; Özyurt, Jale; Carroll, Rebecca; Puschmann, Sebastian; Thiel, Christiane M.

2017-01-01

Noise-vocoded speech is commonly used to simulate the sensation after cochlear implantation as it consists of spectrally degraded speech. High individual variability exists in learning to understand both noise-vocoded speech and speech perceived through a cochlear implant (CI). This variability is partly ascribed to differing cognitive abilities like working memory, verbal skills or attention. Although clinically highly relevant, up to now, no consensus has been achieved about which cognitive factors exactly predict the intelligibility of speech in noise-vocoded situations in healthy subjects or in patients after cochlear implantation. We aimed to establish a test battery that can be used to predict speech understanding in patients prior to receiving a CI. Young and old healthy listeners completed a noise-vocoded speech test in addition to cognitive tests tapping on verbal memory, working memory, lexicon and retrieval skills as well as cognitive flexibility and attention. Partial-least-squares analysis revealed that six variables were important to significantly predict vocoded-speech performance. These were the ability to perceive visually degraded speech tested by the Text Reception Threshold, vocabulary size assessed with the Multiple Choice Word Test, working memory gauged with the Operation Span Test, verbal learning and recall of the Verbal Learning and Retention Test and task switching abilities tested by the Comprehensive Trail-Making Test. Thus, these cognitive abilities explain individual differences in noise-vocoded speech understanding and should be considered when aiming to predict hearing-aid outcome. PMID:28638329
Using speech recognition to enhance the Tongue Drive System functionality in computer access.

PubMed

Huo, Xueliang; Ghovanloo, Maysam

2011-01-01

Tongue Drive System (TDS) is a wireless tongue operated assistive technology (AT), which can enable people with severe physical disabilities to access computers and drive powered wheelchairs using their volitional tongue movements. TDS offers six discrete commands, simultaneously available to the users, for pointing and typing as a substitute for mouse and keyboard in computer access, respectively. To enhance the TDS performance in typing, we have added a microphone, an audio codec, and a wireless audio link to its readily available 3-axial magnetic sensor array, and combined it with a commercially available speech recognition software, the Dragon Naturally Speaking, which is regarded as one of the most efficient ways for text entry. Our preliminary evaluations indicate that the combined TDS and speech recognition technologies can provide end users with significantly higher performance than using each technology alone, particularly in completing tasks that require both pointing and text entry, such as web surfing.
Bilingualism modulates infants' selective attention to the mouth of a talking face.

PubMed

Pons, Ferran; Bosch, Laura; Lewkowicz, David J

2015-04-01

Infants growing up in bilingual environments succeed at learning two languages. What adaptive processes enable them to master the more complex nature of bilingual input? One possibility is that bilingual infants take greater advantage of the redundancy of the audiovisual speech that they usually experience during social interactions. Thus, we investigated whether bilingual infants' need to keep languages apart increases their attention to the mouth as a source of redundant and reliable speech cues. We measured selective attention to talking faces in 4-, 8-, and 12-month-old Catalan and Spanish monolingual and bilingual infants. Monolinguals looked more at the eyes than the mouth at 4 months and more at the mouth than the eyes at 8 months in response to both native and nonnative speech, but they looked more at the mouth than the eyes at 12 months only in response to nonnative speech. In contrast, bilinguals looked equally at the eyes and mouth at 4 months, more at the mouth than the eyes at 8 months, and more at the mouth than the eyes at 12 months, and these patterns of responses were found for both native and nonnative speech at all ages. Thus, to support their dual-language acquisition processes, bilingual infants exploit the greater perceptual salience of redundant audiovisual speech cues at an earlier age and for a longer time than monolingual infants. © The Author(s) 2015.
Balancing Free Speech and Government Protection in a Time of Threat.

ERIC Educational Resources Information Center

Covington, William G., Jr.

A common misconception among first-year university students is that the United States provides unabridged, uncensored absolute free speech rights. Evidently these assumptions are derived from popular press and entertainment industry images which place heavy emphasis on one end of the debate. It is a shock for some students to be exposed to the…
Electromagnetic articulography treatment for an adult with Broca's aphasia and apraxia of speech.

PubMed

Katz, W F; Bharadwaj, S V; Carstens, B

1999-12-01

Electromagnetic articulography (EMA) was explored as a means of remediating [s]/[symbol in text] articulation deficits in the speech of an adult with Broca's aphasia and apraxia of speech. Over a 1-month period, the subject was provided with 2 different treatments in a counterbalanced procedure: (1) visually guided biofeedback concerning tongue-tip position and (2) a foil treatment in which a computer program delivered voicing-contrast stimuli for simple repetition. Kinematic and perceptual data suggest improvement resulting from visually guided biofeedback, both for nonspeech oral and, to a lesser extent, speech motor tasks. In contrast, the phonetic contrast treated in the foil condition showed only marginal improvement during the therapy session, with performance dropping back to baseline 10 weeks post-treatment. Although preliminary, the findings suggest that visual biofeedback concerning tongue-tip position can be used to treat nonspeech oral and (to a lesser extent) speech motor behavior in adults with Broca's aphasia and apraxia of speech.
What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English

PubMed Central

Weisleder, Adriana; Waxman, Sandra R.

2010-01-01

Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent-child corpora from the CHILDES database (3 English; 3 Spanish), and analyzed the input when children were 2;6 years or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions, and grammatical classes. PMID:19698207
What's in the input? Frequent frames in child-directed speech offer distributional cues to grammatical categories in Spanish and English.

PubMed

Weisleder, Adriana; Waxman, Sandra R

2010-11-01

Recent analyses have revealed that child-directed speech contains distributional regularities that could, in principle, support young children's discovery of distinct grammatical categories (noun, verb, adjective). In particular, a distributional unit known as the frequent frame appears to be especially informative (Mintz, 2003). However, analyses have focused almost exclusively on the distributional information available in English. Because languages differ considerably in how the grammatical forms are marked within utterances, the scarcity of cross-linguistic evidence represents an unfortunate gap. We therefore advance the developmental evidence by analyzing the distributional information available in frequent frames across two languages (Spanish and English), across sentence positions (phrase medial and phrase final), and across grammatical forms (noun, verb, adjective). We selected six parent-child corpora from the CHILDES database (three English; three Spanish), and analyzed the input when children were aged 2 ; 6 or younger. In each language, frequent frames did indeed offer systematic cues to grammatical category assignment. We also identify differences in the accuracy of these frames across languages, sentences positions and grammatical classes.
Effects of Attention Cueing on Learning Speech Organ Operation through Mobile Phones

ERIC Educational Resources Information Center

Yang, Hui-Yu

2017-01-01

The studies regarding using a cross sectional view of speech organs enriched with attention cueing and written text to probe learners' learning efficiency and behavior through mobile phones is scant. The purpose of this study was to examine whether the presence of attention cueing can benefit learners with different amounts of prior knowledge in…
Sinteiseoir 1.0: A Multidialectical TTS Application for Irish

ERIC Educational Resources Information Center

Mac Lochlainn, Micheal

2010-01-01

This paper details the development of a multidialectical text-to-speech (TTS) application, "Sinteiseoir," for the Irish language. This work is being carried out in the context of Irish as a lesser-used language, where learners and other L2 speakers have limited direct exposure to L1 speakers and speech communities, and where native sound…
Values most extolled in Nobel Peace Prize speeches.

PubMed

Kinnier, Richard T; Kernes, Jerry L; Hayman, Jessie Wetherbe; Flynn, Patricia N; Simon, Elia; Kilian, Laura A

2007-11-01

The authors randomly selected 50 Nobel Peace Prize speeches and content analyzed them to determine which values the speakers extolled most frequently. The 10 most frequently mentioned values were peace (in 100% of the speeches), hope (92%), security (86%), justice (85%), responsibility (81%), liberty (80%), tolerance (79%), altruism (75%), God (49%), and truth (38%). The authors discuss the interplay of these values in the modern world and implications regarding the search for universal moral values.
Status Report on Speech Research: A Report on the Status and Progress of Studies on the Nature of Speech, Instrumentation for Its Investigation, and Practical Applications, January 1-March 31, 1977.

ERIC Educational Resources Information Center

Haskins Labs., New Haven, CT.

This report is one of a regular series about the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. The 11 papers discuss the dissociation of spectral and temporal cues to the voicing distinction in initial stopped consonants; perceptual integration and selective attention in…
CTC Sentinel. Volume 8, Issue 9, September 2015

DTIC Science & Technology

2015-09-01

without com- promise, complacency, equivocation, or circumvention.”12 13 The speech caused concern across the Syrian opposition, many mem- bers of which...Bayda, slide 8. n Hamza bin Ladin gave an audio speech that was released on August 14 calling for lone wolf attacks against the United States and...the West, for example. “Al-Qaeda’s as-Sahab Media Releases Audio Speech from Hamza bin Laden,” SITE Intelligence Group, August 14, 2015. SEP TEMBER
Free Speech Yearbook 1979.

ERIC Educational Resources Information Center

Kane, Peter E., Ed.

The seven articles in this collection deal with theoretical and practical freedom of speech issues. Topics covered are: the United States Supreme Court, motion picture censorship, and the color line; judicial decision making; the established scientific community's suppression of the ideas of Immanuel Velikovsky; the problems of avant-garde jazz,…
Stanton's "The Solitude of Self": A Rationale for Feminism.

ERIC Educational Resources Information Center

Campbell, Karlyn Kohrs

1980-01-01

Examines Elizabeth Cady Stanton's speech, "The Solitude of Self," as a philosophical statement of the principles underlying the nineteenth century struggle for woman's rights in the United States. Analyzes the lyric structure and tone of the speech and its tragic, existential rationale for feminism. (JMF)

Spectrotemporal Modulation Sensitivity as a Predictor of Speech Intelligibility for Hearing-Impaired Listeners

PubMed Central

Bernstein, Joshua G.W.; Mehraei, Golbarg; Shamma, Shihab; Gallun, Frederick J.; Theodoroff, Sarah M.; Leek, Marjorie R.

2014-01-01

Background A model that can accurately predict speech intelligibility for a given hearing-impaired (HI) listener would be an important tool for hearing-aid fitting or hearing-aid algorithm development. Existing speech-intelligibility models do not incorporate variability in suprathreshold deficits that are not well predicted by classical audiometric measures. One possible approach to the incorporation of such deficits is to base intelligibility predictions on sensitivity to simultaneously spectrally and temporally modulated signals. Purpose The likelihood of success of this approach was evaluated by comparing estimates of spectrotemporal modulation (STM) sensitivity to speech intelligibility and to psychoacoustic estimates of frequency selectivity and temporal fine-structure (TFS) sensitivity across a group of HI listeners. Research Design The minimum modulation depth required to detect STM applied to an 86 dB SPL four-octave noise carrier was measured for combinations of temporal modulation rate (4, 12, or 32 Hz) and spectral modulation density (0.5, 1, 2, or 4 cycles/octave). STM sensitivity estimates for individual HI listeners were compared to estimates of frequency selectivity (measured using the notched-noise method at 500, 1000measured using the notched-noise method at 500, 2000, and 4000 Hz), TFS processing ability (2 Hz frequency-modulation detection thresholds for 500, 10002 Hz frequency-modulation detection thresholds for 500, 2000, and 4000 Hz carriers) and sentence intelligibility in noise (at a 0 dB signal-to-noise ratio) that were measured for the same listeners in a separate study. Study Sample Eight normal-hearing (NH) listeners and 12 listeners with a diagnosis of bilateral sensorineural hearing loss participated. Data Collection and Analysis STM sensitivity was compared between NH and HI listener groups using a repeated-measures analysis of variance. A stepwise regression analysis compared STM sensitivity for individual HI listeners to audiometric thresholds, age, and measures of frequency selectivity and TFS processing ability. A second stepwise regression analysis compared speech intelligibility to STM sensitivity and the audiogram-based Speech Intelligibility Index. Results STM detection thresholds were elevated for the HI listeners, but only for low rates and high densities. STM sensitivity for individual HI listeners was well predicted by a combination of estimates of frequency selectivity at 4000 Hz and TFS sensitivity at 500 Hz but was unrelated to audiometric thresholds. STM sensitivity accounted for an additional 40% of the variance in speech intelligibility beyond the 40% accounted for by the audibility-based Speech Intelligibility Index. Conclusions Impaired STM sensitivity likely results from a combination of a reduced ability to resolve spectral peaks and a reduced ability to use TFS information to follow spectral-peak movements. Combining STM sensitivity estimates with audiometric threshold measures for individual HI listeners provided a more accurate prediction of speech intelligibility than audiometric measures alone. These results suggest a significant likelihood of success for an STM-based model of speech intelligibility for HI listeners. PMID:23636210
Disorders of Articulation. Prentice-Hall Foundations of Speech Pathology Series.

ERIC Educational Resources Information Center

Carrell, James A.

Designed for students of speech pathology and audiology and for practicing clinicians, the text considers the nature of the articulation process, criteria for diagnosis, and classification and etiology of disorders. Also discussed are phonetic characteristics, including phonemic errors and configurational and contextual defects; and functional…
Design and development of an AAC app based on a speech-to-symbol technology.

PubMed

Radici, Elena; Bonacina, Stefano; De Leo, Gianluca

2016-08-01

The purpose of this paper is to present the design and the development of an Augmentative and Alternative Communication app that uses a speech to symbol technology to model language, i.e. to recognize the speech and display the text or the graphic content related to it. Our app is intended to be adopted by communication partners who want to engage in interventions focused on improving communication skills. Our app has the goal of translating simple speech sentences in a set of symbols that are understandable by children with complex communication needs. We moderated a focus group among six AAC communication partners. Then, we developed a prototype. We are currently starting testing our app in an AAC Centre in Milan, Italy.
Impaired auditory temporal selectivity in the inferior colliculus of aged Mongolian gerbils.

PubMed

Khouri, Leila; Lesica, Nicholas A; Grothe, Benedikt

2011-07-06

Aged humans show severe difficulties in temporal auditory processing tasks (e.g., speech recognition in noise, low-frequency sound localization, gap detection). A degradation of auditory function with age is also evident in experimental animals. To investigate age-related changes in temporal processing, we compared extracellular responses to temporally variable pulse trains and human speech in the inferior colliculus of young adult (3 month) and aged (3 years) Mongolian gerbils. We observed a significant decrease of selectivity to the pulse trains in neuronal responses from aged animals. This decrease in selectivity led, on the population level, to an increase in signal correlations and therefore a decrease in heterogeneity of temporal receptive fields and a decreased efficiency in encoding of speech signals. A decrease in selectivity to temporal modulations is consistent with a downregulation of the inhibitory transmitter system in aged animals. These alterations in temporal processing could underlie declines in the aging auditory system, which are unrelated to peripheral hearing loss. These declines cannot be compensated by traditional hearing aids (that rely on amplification of sound) but may rather require pharmacological treatment.
Associations between tongue movement pattern consistency and formant movement pattern consistency in response to speech behavioral modificationsa)

PubMed Central

Mefferd, Antje S.

2016-01-01

The degree of speech movement pattern consistency can provide information about speech motor control. Although tongue motor control is particularly important because of the tongue's primary contribution to the speech acoustic signal, capturing tongue movements during speech remains difficult and costly. This study sought to determine if formant movements could be used to estimate tongue movement pattern consistency indirectly. Two age groups (seven young adults and seven older adults) and six speech conditions (typical, slow, loud, clear, fast, bite block speech) were selected to elicit an age- and task-dependent performance range in tongue movement pattern consistency. Kinematic and acoustic spatiotemporal indexes (STI) were calculated based on sentence-length tongue movement and formant movement signals, respectively. Kinematic and acoustic STI values showed strong associations across talkers and moderate to strong associations for each talker across speech tasks; although, in cases where task-related tongue motor performance changes were relatively small, the acoustic STI values were poorly associated with kinematic STI values. These findings suggest that, depending on the sensitivity needs, formant movement pattern consistency could be used in lieu of direct kinematic analysis to indirectly examine speech motor control. PMID:27908069
Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals

PubMed Central

Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.

2014-01-01

Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248
Selective attention: psi performance in children with learning disabilities.

PubMed

Garcia, Vera Lúcia; Pereira, Liliane Desgualdo; Fukuda, Yotaka

2007-01-01

Selective attention is essential for learning how to write and read. The objective of this study was to examine the process of selective auditory attention in children with learning disabilities. Group I included forty subjects aged between 9 years and six months and 10 years and eleven months, who had a low risk of altered hearing, language and learning development. Group II included 20 subjects aged between 9 years and five months and 11 years and ten months, who presented learning disabilities. A prospective study was done using the Pediatric Speech Intelligibility Test (PSI). Right ear PSI with an ipsilateral competing message at speech/noise ratios of 0 and -10 was sufficient to differentiate Group I and Group II. Special attention should be given to the performance of Group II on the first tested ear, which may substantiate important signs of improvements in performance and rehabilitation. The PSI - MCI of the right ear at speech/noise ratios of 0 and -10 was appropriate to differentiate Groups I and II. There was an association with the group that presented learning disabilities: this group showed problems in selective attention.
[Psychosis, language and literature].

PubMed

Maier, T

1999-05-01

There have always been debates about possible correlations between creative genius and mental illness, not only among psychiatrists but also among scientists of art and literature. Especially modern literary texts may show formal similarities to psychotic speech, which leads to the question, whether not only artists, but also people in psychotic states are able to create literature. This article points out the loosened semantic stability in psychotic speech, which equals a loss of common ground in the use of signs and symbols. In terms of Gadamer's hermeneutics, texts produced by psychotic people cannot be understood, they are mere form. Even in hermetic literary texts, the semantic code can be offended, but in deliberate artistic intention, which finds its communicative purpose in breaking the symbolic order.
Automatic feedback to promote safe walking and speech loudness control in persons with multiple disabilities: two single-case studies.

PubMed

Lancioni, Giulio E; Singh, Nirbhay N; O'Reilly, Mark F; Green, Vanessa A; Alberti, Gloria; Boccasini, Adele; Smaldone, Angela; Oliva, Doretta; Bosco, Andrea

2014-08-01

Assessing automatic feedback technologies to promote safe travel and speech loudness control in two men with multiple disabilities, respectively. The men were involved in two single-case studies. In Study I, the technology involved a microprocessor, two photocells, and a verbal feedback device. The man received verbal alerting/feedback when the photocells spotted an obstacle in front of him. In Study II, the technology involved a sound-detecting unit connected to a throat and an airborne microphone, and to a vibration device. Vibration occurred when the man's speech loudness exceeded a preset level. The man included in Study I succeeded in using the automatic feedback in substitution of caregivers' alerting/feedback for safe travel. The man of Study II used the automatic feedback to successfully reduce his speech loudness. Automatic feedback can be highly effective in helping persons with multiple disabilities improve their travel and speech performance.
Communicating headings and preview sentences in text and speech.

PubMed

Lorch, Robert F; Chen, Hung-Tao; Lemarié, Julie

2012-09-01

Two experiments tested the effects of preview sentences and headings on the quality of college students' outlines of informational texts. Experiment 1 found that performance was much better in the preview sentences condition than in a no-signals condition for both printed text and text-to-speech (TTS) audio rendering of the printed text. In contrast, performance in the headings condition was good for the printed text but poor for the auditory presentation because the TTS software failed to communicate nonverbal information carried by the visual headings. Experiment 2 compared outlining performance for five headings conditions during TTS presentation. Using a theoretical framework, "signaling available, relevant, accessible" (SARA) information, to provide an analysis of the information content of headings in the printed text, the manipulation of the headings systematically restored information that was omitted by the TTS application in Experiment 1. The result was that outlining performance improved to levels similar to the visual headings condition of Experiment 1. It is argued that SARA is a useful framework for guiding future development of TTS software for a wide variety of text signaling devices, not just headings.
Dissecting choral speech: properties of the accompanist critical to stuttering reduction.

PubMed

Kiefte, Michael; Armson, Joy

2008-01-01

The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated to selectively eliminate specific differences between choral speech and AAF. Consistent with previous findings, results showed that both choral speech and AAF reduced stuttering compared to solo reading. Although reductions under AAF were substantial, they were less dramatic than those for choral speech. Stuttering reduction for choral speech was highly robust even when the accompanist's voice temporally lagged that of the AWS, when there was no opportunity for dynamic interplay between the AWS and accompanist, and when the accompanist was replaced by the AWS's own voice, all of which approximate specific features of AAF. Choral speech was also highly effective in reducing stuttering across changes in speech rate and for both familiar and unfamiliar passages. We concluded that differences in properties between choral speech and AAF other than those that were manipulated in this experiment must account for differences in stuttering reduction. The reader will be able to (1) describe differences in stuttering reduction associated with altered auditory feedback compared to choral speech conditions and (2) describe differences between delivery of a second voice signal as an altered rendition of the speakers own voice (altered auditory feedback) and alterations in the voice of an accompanist (choral speech).
The Nationwide Speech Project: A multi-talker multi-dialect speech corpus

NASA Astrophysics Data System (ADS)

Clopper, Cynthia G.; Pisoni, David B.

2004-05-01

Most research on regional phonological variation relies on field recordings of interview speech. Recent research on the perception of dialect variation by naive listeners, however, has relied on read sentence materials in order to control for phonological and lexical content and syntax. The Nationwide Speech Project corpus was designed to obtain a large amount of speech from a number of talkers representing different regional varieties of American English. Five male and five female talkers from each of six different dialect regions in the United States were recorded reading isolated words, sentences, and passages, and in conversations with the experimenter. The talkers ranged in age from 18 and 25 years old and they were all monolingual native speakers of American English. They had lived their entire life in one dialect region and both of their parents were raised in the same region. Results of an acoustic analysis of the vowel spaces of the talkers included in the Nationwide Speech Project will be presented. [Work supported by NIH.
Specific Syndromes and Associated Communication Disorders: A Review.

ERIC Educational Resources Information Center

Sanger, Dixie D.; And Others

1984-01-01

The review, intended to provide speech-language pathologists and special educators with an awareness of genetics and specific syndromes involving speech, language, and hearing components, discusses basic etiologies of abnormal development and selected syndromes (such as Down's and Klinefelter's) that include communication disorders. (CL)
Tuning Neural Phase Entrainment to Speech.

PubMed

Falk, Simone; Lanzilotti, Cosima; Schön, Daniele

2017-08-01

Musical rhythm positively impacts on subsequent speech processing. However, the neural mechanisms underlying this phenomenon are so far unclear. We investigated whether carryover effects from a preceding musical cue to a speech stimulus result from a continuation of neural phase entrainment to periodicities that are present in both music and speech. Participants listened and memorized French metrical sentences that contained (quasi-)periodic recurrences of accents and syllables. Speech stimuli were preceded by a rhythmically regular or irregular musical cue. Our results show that the presence of a regular cue modulates neural response as estimated by EEG power spectral density, intertrial coherence, and source analyses at critical frequencies during speech processing compared with the irregular condition. Importantly, intertrial coherences for regular cues were indicative of the participants' success in memorizing the subsequent speech stimuli. These findings underscore the highly adaptive nature of neural phase entrainment across fundamentally different auditory stimuli. They also support current models of neural phase entrainment as a tool of predictive timing and attentional selection across cognitive domains.
Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment

DTIC Science & Technology

2015-10-01

Scoring, Gaussian Backend , etc.) as shown in Fig. 39. The methods in this domain also emphasized the ability to perform data purification for both...investigation using the same infrastructure was undertaken to explore Lombard effect “flavor” detection for improved speaker ID. The study The presence of...dimension selection and compared to a common N-gram frequency based selection. 2.1.2: Exploration on NN/DBN backend : Since Deep Neural Networks (DNN) have
A Unit in Comparative State History.

ERIC Educational Resources Information Center

Lunstrum, J. P.; Sayers, Evelyn

1988-01-01

Presents a secondary level teaching unit on the role of rogues and entrepreneurs in Indiana and Florida from World War I through the 1920s. The unit helps students recognize the continuing struggle to maintain basic constitutional freedoms, particularly freedom of speech and religion. Discusses ways to develop the unit and includes a list of…
Free Speech Advocates at Berkeley.

ERIC Educational Resources Information Center

Watts, William A.; Whittaker, David

1966-01-01

This study compares highly committed members of the Free Speech Movement (FSM) at Berkeley with the student population at large on 3 sociopsychological foci: general biographical data, religious orientation, and rigidity-flexibility. Questionnaires were administered to 172 FSM members selected by chance from the 10 to 1200 who entered and…
Index to NASA news releases and speeches, 1983

NASA Technical Reports Server (NTRS)

1984-01-01

A listing is presented of 271 news releases distributed by the Office of Public Affairs, NASA Headquarters and 72 selected speeches given by Headquarters staff in 1983. Subject and personal name indexes are arranged alphabetically. Indexes to titles, news release numbers, and accession numbers are arranged numerically.
Index to NASA news releases and speeches, 1980

NASA Technical Reports Server (NTRS)

1981-01-01

A listing is provided of 201 news releases distributed by the Office of Public Affairs, NASA Headquarters and 10 selected speeches presented by Headquarters staff in 1980. Subject and name indexes are arranged alphabetically. Indexes to titles, news release numbers and accession numbers are arranged numerically.
Central Asia and the United States 2004-2005: Moving Beyond Counter-Terrorism

DTIC Science & Technology

2005-02-01

rights, including such basic freedoms as the right to due process, freedom of speech , freedom of assembly, and freedom of religious belief. In short...Uzbekistan about freedom of speech , political freedoms, and civil rights as if the country were “a desert in a distant corner of the world.” Karimov asserted

Who Receives Speech/Language Services by 5 Years of Age in the United States?

ERIC Educational Resources Information Center

Morgan, Paul L.; Hammer, Carol Scheffner; Farkas, Geroge; Hillemeier, Marianne M.; Maczuga, Steve; Cook, Michael; Morano, Stephanie

2016-01-01

Purpose: We sought to identify factors predictive of or associated with receipt of speech/language services during early childhood. We did so by analyzing data from the Early Childhood Longitudinal Study-Birth Cohort (ECLS-B; Andreassen & Fletcher, 2005), a nationally representative dataset maintained by the U.S. Department of Education. We…
Yaounde French Speech Corpus

DTIC Science & Technology

2017-03-01

the Center for Technology Enhanced Language Learning (CTELL), a research cell in the Department of Foreign Languages, United States Military Academy...models for automatic speech recognition (ASR), and to, thereby, investigate the utility of ASR in pedagogical technology . The corpus is a sample of...lexical resources, language technology 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF ABSTRACT UU 18. NUMBER OF
Crisis Speeches Delivered during World War II: A Historical and Rhetorical Perspective

ERIC Educational Resources Information Center

Ramos, Tomas E.

2010-01-01

Rhetorical analyses of speeches made by United States presidents and world leaders abound, particularly studies about addresses to nations in times of crisis. These are important because what presidents say amidst uncertainty and chaos defines their leadership in the eyes of the public. But with new forms of crisis rhetoric, our understanding of…
Dialogue enabling speech-to-text user assistive agent system for hearing-impaired person.

PubMed

Lee, Seongjae; Kang, Sunmee; Han, David K; Ko, Hanseok

2016-06-01

A novel approach for assisting bidirectional communication between people of normal hearing and hearing-impaired is presented. While the existing hearing-impaired assistive devices such as hearing aids and cochlear implants are vulnerable in extreme noise conditions or post-surgery side effects, the proposed concept is an alternative approach wherein spoken dialogue is achieved by means of employing a robust speech recognition technique which takes into consideration of noisy environmental factors without any attachment into human body. The proposed system is a portable device with an acoustic beamformer for directional noise reduction and capable of performing speech-to-text transcription function, which adopts a keyword spotting method. It is also equipped with an optimized user interface for hearing-impaired people, rendering intuitive and natural device usage with diverse domain contexts. The relevant experimental results confirm that the proposed interface design is feasible for realizing an effective and efficient intelligent agent for hearing-impaired.
Selecting cockpit functions for speech I/O technology

NASA Technical Reports Server (NTRS)

Simpson, C. A.

1985-01-01

A general methodology for the initial selection of functions for speech generation and speech recognition technology is discussed. The SCR (Stimulus/Central-Processing/Response) compatibility model of Wickens et al. (1983) is examined, and its application is demonstrated for a particular cockpit display problem. Some limits of the applicability of that model are illustrated in the context of predicting overall pilot-aircraft system performance. A program of system performance measurement is recommended for the evaluation of candidate systems. It is suggested that no one measure of system performance can necessarily be depended upon to the exclusion of others. Systems response time, system accuracy, and pilot ratings are all important measures. Finally, these measures must be collected in the context of the total flight task environment.
Discriminative analysis of lip motion features for speaker identification and speech-reading.

PubMed

Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat

2006-10-01

There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
Classification Influence of Features on Given Emotions and Its Application in Feature Selection

NASA Astrophysics Data System (ADS)

Xing, Yin; Chen, Chuang; Liu, Li-Long

2018-04-01

In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
Prosody Production and Perception with Conversational Speech

ERIC Educational Resources Information Center

Mo, Yoonsook

2010-01-01

Speech utterances are more than the linear concatenation of individual phonemes or words. They are organized by prosodic structures comprising phonological units of different sizes (e.g., syllable, foot, word, and phrase) and the prominence relations among them. As the linguistic structure of spoken languages, prosody serves an important function…
Bipolar Disorder in Children: Implications for Speech-Language Pathologists

ERIC Educational Resources Information Center

Quattlebaum, Patricia D.; Grier, Betsy C.; Klubnik, Cynthia

2012-01-01

In the United States, bipolar disorder is an increasingly common diagnosis in children, and these children can present with severe behavior problems and emotionality. Many studies have documented the frequent coexistence of behavior disorders and speech-language disorders. Like other children with behavior disorders, children with bipolar disorder…
Some Problems in Psycholinguistics.

ERIC Educational Resources Information Center

Hadding-Koch, Kerstin

1968-01-01

Among the most important questions in psycholinguistics today are the following: By which processes does man organize and understand speech? Which are the smallest linguistic units and rules stored in the memory and used in the production and perception of speech? Are the same mechanisms at work in both cases? Discussed in this paper are…
DIAGNOSIS AND APPRAISAL OF COMMUNICATION DISORDERS. PRENTICE-HALL FOUNDATIONS OF SPEECH PATHOLOGY SERIES.

ERIC Educational Resources Information Center

DARLEY, FREDERIC L.

THIS TEXT GIVES THE STUDENT AN OUTLINE OF THE BASIC PRINCIPLES OF SCIENTIFIC METHODOLOGY WHICH UNDERLIE EVALUATIVE WORK IN SPEECH DISORDERS. RATIONALE AND ASSESSMENT TECHNIQUES ARE GIVEN FOR EXAMINATION OF THE BASIC COMMUNICATION PROCESSES OF SYMBOLIZATION, RESPIRATION, PHONATION, ARTICULATION-RESONANCE, PROSODY, ASSOCIATED SENSORY AND PERCEPTUAL…
Hearing Story Characters' Voices: Auditory Imagery during Reading

ERIC Educational Resources Information Center

Gunraj, Danielle N.; Klin, Celia M.

2012-01-01

Despite the longstanding belief in an inner voice, there is surprisingly little known about the perceptual features of that voice during text processing. This article asked whether readers infer nonlinguistic phonological features, such as speech rate, associated with a character's speech. Previous evidence for this type of auditory imagery has…
Speech and language development in 2-year-old children with cerebral palsy.

PubMed

Hustad, Katherine C; Allison, Kristen; McFadd, Emily; Riehle, Katherine

2014-06-01

We examined early speech and language development in children who had cerebral palsy. Questions addressed whether children could be classified into early profile groups on the basis of speech and language skills and whether there were differences on selected speech and language measures among groups. Speech and language assessments were completed on 27 children with CP who were between the ages of 24 and 30 months (mean age 27.1 months; SD 1.8). We examined several measures of expressive and receptive language, along with speech intelligibility. Two-step cluster analysis was used to identify homogeneous groups of children based on their performance on the seven dependent variables characterizing speech and language performance. Three groups of children identified were those not yet talking (44% of the sample); those whose talking abilities appeared to be emerging (41% of the sample); and those who were established talkers (15% of the sample). Group differences were evident on all variables except receptive language skills. 85% of 2-year-old children with CP in this study had clinical speech and/or language delays relative to age expectations. Findings suggest that children with CP should receive speech and language assessment and treatment at or before 2 years of age.
Influence of speech sample on perceptual rating of hypernasality.

PubMed

Medeiros, Maria Natália Leite de; Fukushiro, Ana Paula; Yamashita, Renata Paciello

2016-07-07

To investigate the influence of speech sample of spontaneous conversation or sentences repetition on intra and inter-rater hypernasality reliability. One hundred and twenty audio recorded speech samples (60 containing spontaneous conversation and 60 containing repeated sentences) of individuals with repaired cleft palate±lip, both genders, aged between 6 and 52 years old (mean=21±10) were selected and edited. Three experienced speech and language pathologists rated hypernasality according to their own criteria using 4-point scale: 1=absence of hypernasality, 2=mild hypernasality, 3=moderate hypernasality and 4=severe hypernasality, first in spontaneous speech samples and 30 days after, in sentences repetition samples. Intra- and inter-rater agreements were calculated for both speech samples and were statistically compared by the Z test at a significance level of 5%. Comparison of intra-rater agreements between both speech samples showed an increase of the coefficients obtained in the analysis of sentences repetition compared to those obtained in spontaneous conversation. Comparison between inter-rater agreement showed no significant difference among the three raters for the two speech samples. Sentences repetition improved intra-raters reliability of perceptual judgment of hypernasality. However, the speech sample had no influence on reliability among different raters.
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

PubMed Central

Shin, Young Hoon; Seo, Jiwon

2016-01-01

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

PubMed

Shin, Young Hoon; Seo, Jiwon

2016-10-29

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Speech parts as Poisson processes.

PubMed

Badalamenti, A F

2001-09-01

This paper presents evidence that six of the seven parts of speech occur in written text as Poisson processes, simple or recurring. The six major parts are nouns, verbs, adjectives, adverbs, prepositions, and conjunctions, with the interjection occurring too infrequently to support a model. The data consist of more than the first 5000 words of works by four major authors coded to label the parts of speech, as well as periods (sentence terminators). Sentence length is measured via the period and found to be normally distributed with no stochastic model identified for its occurrence. The models for all six speech parts but the noun significantly distinguish some pairs of authors and likewise for the joint use of all words types. Any one author is significantly distinguished from any other by at least one word type and sentence length very significantly distinguishes each from all others. The variety of word type use, measured by Shannon entropy, builds to about 90% of its maximum possible value. The rate constants for nouns are close to the fractions of maximum entropy achieved. This finding together with the stochastic models and the relations among them suggest that the noun may be a primitive organizer of written text.
On the Development of Speech Resources for the Mixtec Language

PubMed Central

2013-01-01

The Mixtec language is one of the main native languages in Mexico. In general, due to urbanization, discrimination, and limited attempts to promote the culture, the native languages are disappearing. Most of the information available about the Mixtec language is in written form as in dictionaries which, although including examples about how to pronounce the Mixtec words, are not as reliable as listening to the correct pronunciation from a native speaker. Formal acoustic resources, as speech corpora, are almost non-existent for the Mixtec, and no speech technologies are known to have been developed for it. This paper presents the development of the following resources for the Mixtec language: (1) a speech database of traditional narratives of the Mixtec culture spoken by a native speaker (labelled at the phonetic and orthographic levels by means of spectral analysis) and (2) a native speaker-adaptive automatic speech recognition (ASR) system (trained with the speech database) integrated with a Mixtec-to-Spanish/Spanish-to-Mixtec text translator. The speech database, although small and limited to a single variant, was reliable enough to build the multiuser speech application which presented a mean recognition/translation performance up to 94.36% in experiments with non-native speakers (the target users). PMID:23710134
Breath-Group Intelligibility in Dysarthria: Characteristics and Underlying Correlates

ERIC Educational Resources Information Center

Yunusova, Yana; Weismer, Gary; Kent, Ray D.; Rusche, Nicole M.

2005-01-01

Purpose: This study was designed to determine whether within-speaker fluctuations in speech intelligibility occurred among speakers with dysarthria who produced a reading passage, and, if they did, whether selected linguistic and acoustic variables predicted the variations in speech intelligibility. Method: Participants with dysarthria included a…
The effect of guessing on the speech reception thresholds of children.

PubMed

Moodley, A

1990-01-01

Speech audiometry is an essential part of the assessment of hearing impaired children and it is now widely used throughout the United Kingdom. Although instructions are universally agreed upon as an important aspect in the administration of any form of audiometric testing, there has been little, if any, research towards evaluating the influence which instructions that are given to a listener have on the Speech Reception Threshold obtained. This study attempts to evaluate what effect guessing has on the Speech Reception Threshold of children. A sample of 30 secondary school pupils between 16 and 18 years of age with normal hearing was used in the study. It is argued that the type of instruction normally used for Speech Reception Threshold in audiometric testing may not provide a sufficient amount of control for guessing and the implications of this, using data obtained in the study, are examined.

Language Arts: The Intricate Interplay of Reading, Writing and Speech. Harvesting the Harvesters. Book 6.

ERIC Educational Resources Information Center

Lawless, Ken

The sixth in a series of 10 study units for a Migrant Educators' National Training OutReach (MENTOR) correspondence course examines the role of speech, reading, and writing in migrant education and suggests approaches to teaching reading and writing which use group activities and individualized evaluation. Designed to be used in preservice or…
A Randomized Controlled Trial for Children with Childhood Apraxia of Speech Comparing Rapid Syllable Transition Treatment and the Nuffield Dyspraxia Programme-Third Edition

ERIC Educational Resources Information Center

Murray, Elizabeth; McCabe, Patricia; Ballard, Kirrie J.

2015-01-01

Purpose: This randomized controlled trial compared the experimental Rapid Syllable Transition (ReST) treatment to the Nuffield Dyspraxia Programme-Third Edition (NDP3; Williams & Stephens, 2004), used widely in clinical practice in Australia and the United Kingdom. Both programs aim to improve speech motor planning/programming for children…
Central Presbycusis: A Review and Evaluation of the Evidence

PubMed Central

Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur

2018-01-01

Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
Automatic mechanisms for measuring subjective unit of discomfort.

PubMed

Hartanto, D W I; Kang, Ni; Brinkman, Willem-Paul; Kampmann, Isabel L; Morina, Nexhmedin; Emmelkamp, Paul G M; Neerincx, Mark A

2012-01-01

Current practice in Virtual Reality Exposure Therapy (VRET) is that therapists ask patients about their anxiety level by means of the Subjective Unit of Discomfort (SUD) scale. With an aim of developing a home-based VRET system, this measurement ideally should be done using speech technology. In a VRET system for social phobia with scripted avatar-patient dialogues, the timing of asking patients to give their SUD score becomes relevant. This study examined three timing mechanisms: (1) dialogue dependent (i.e. naturally in the flow of the dialogue); (2) speech dependent (i.e. when both patient and avatar are silent); and (3) context independent (i.e. randomly). Results of an experiment with non-patients (n = 24) showed a significant effect for the timing mechanisms on the perceived dialogue flow, user preference, reported presence and user dialog replies. Overall, dialogue dependent timing mechanism seems superior followed by the speech dependent and context independent timing mechanism.
Design of an efficient music-speech discriminator.

PubMed

Tardón, Lorenzo J; Sammartino, Simone; Barbancho, Isabel

2010-01-01

In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers.
Zoological Nomenclature and Speech Act Theory

NASA Astrophysics Data System (ADS)

Cambefort, Yves

To know natural objects, it is necessary to give them names. This has always been done, from antiquity up to modern times. Today, the nomenclature system invented by Linnaeus in the eighteenth century is still in use, even if the philosophical principles underlying it have changed. Naming living objects still means giving them a sort of existence, since without a name they cannot be referred to, just as if they did not exist. Therefore, naming a living object is a process close to creating it. Naming is performed by means of a particular kind of text: original description written by specialists, and more often accompanied by other, ancillary texts whose purpose is to gain the acceptance and support of fellow zoologists. It is noteworthy that the actions performed by these texts are called "nomenclatural acts". These texts and acts, together with related scientific and social relationships, are examined here in the frame of speech act theory.
Children's language development after cochlear implantation: a literature review.

PubMed

Monteiro, Clarice Gomes; Cordeiro, Ana Augusta de Andrade; Silva, Hilton Justino da; Queiroga, Bianca Arruda Manchester de

2016-01-01

review the literature for studies that describe the language development of children after they receive cochlear implants. Literature review on the PubMed, Web of Science, Scopus, and Science Direct databases, tracing the selection and critical analysis stages in the journals found and selected. We selected original articles looking at children with cochlear implants, which mentioned language development after surgery. Case studies, dissertations, books chapters, editorials, and original articles that did not mention aspects of oral communication development, perception of sounds and speech, and other stages of human development, in the title, abstract, or text, were excluded. A protocol was created for this study including the following points: author, year, location, sample, type of study, objectives, methods used, main results, and conclusion. 5,052 articles were found based on the search descriptors and free terms. Of this total, 3,414 were excluded due to the title, 1,245 due to the abstract, and 358 from reading the full text; we selected 35, of which 28 were repeated. In the end, seven articles were analyzed in this review. We conclude that cochlear implant users have slower linguistic and educational development than their peers with normal hearing - though they are better than conventional prostheses users - and they are able to match them over time. There is great variability in the test methodologies, thus reducing the effectiveness and reliability of the results found.
Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1997-01-01

Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Action Unit Models of Facial Expression of Emotion in the Presence of Speech

PubMed Central

Shah, Miraj; Cooper, David G.; Cao, Houwei; Gur, Ruben C.; Nenkova, Ani; Verma, Ragini

2014-01-01

Automatic recognition of emotion using facial expressions in the presence of speech poses a unique challenge because talking reveals clues for the affective state of the speaker but distorts the canonical expression of emotion on the face. We introduce a corpus of acted emotion expression where speech is either present (talking) or absent (silent). The corpus is uniquely suited for analysis of the interplay between the two conditions. We use a multimodal decision level fusion classifier to combine models of emotion from talking and silent faces as well as from audio to recognize five basic emotions: anger, disgust, fear, happy and sad. Our results strongly indicate that emotion prediction in the presence of speech from action unit facial features is less accurate when the person is talking. Modeling talking and silent expressions separately and fusing the two models greatly improves accuracy of prediction in the talking setting. The advantages are most pronounced when silent and talking face models are fused with predictions from audio features. In this multi-modal prediction both the combination of modalities and the separate models of talking and silent facial expression of emotion contribute to the improvement. PMID:25525561
Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults

PubMed Central

Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.

2013-01-01

This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273
Speech evaluation after intravelar veloplasty. How to use Borel-Maisonny classification in the international literature?

PubMed

Kadlub, N; Chapuis Vandenbogaerde, C; Joly, A; Neiva, C; Vazquez, M-P; Picard, A

2018-04-01

Comparing functional outcomes after velar repair appeared to be difficult because of the absence of international standardized scale. Moreover most of the studies evaluating speech after cleft surgery present multiple biases. The aim of our study was to assess speech outcomes in a homogeneous group of patients, and to define an equivalence table between different speech scales. Patients with isolated cleft lip and palate (CLP), operated in our unit by the same senior surgeon were included. All patient were operated according to the same protocol (cheilo-rhinoplasty and intravelar veloplasty at 6 months, followed by a direct closure of the hard palate at 15 months). Speech evaluation was performed after 3 year-old and before the alveolar cleft repair. Borel-Maisonny scale and nasometry were used for speech evaluation. Twenty-four patients were included: 17 unilateral CLP and 7 bilateral CLP. According to the Borel-Maisonny classifications, 82.5% were ranged phonation 1, 1-2 or 2b. Nasometry were normal in almost 60% of cases. This study showed the efficiency of our protocol, and intravelar veloplasty. Moreover we proposed an equivalence table for speech evaluation scale. Copyright © 2017 Elsevier Masson SAS. All rights reserved.
Research in speech communication.

PubMed

Flanagan, J

1995-10-24

Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Aphasia rehabilitation during adolescence: a case report.

PubMed

Laures-Gore, Jacqueline; McCusker, Tiffany; Hartley, Leila L

2017-06-01

Descriptions of speech-language interventions addressing the unique aspects of aphasia in adolescence appear to be nonexistent. The current paper presents the case of a male adolescent who experienced a stroke with resultant aphasia and the speech and language therapy he received. Furthermore, we discuss the issues that are unique to an adolescent with aphasia and how they were addressed with this particular patient. Traditional language and apraxia therapy was provided to this patient with inclusion of technology and academic topics. The patient demonstrated improvements in his speech and language abilities, most notably his reading comprehension and speech production. Age-related issues, including academic needs, group treatment, socialization, adherence/compliance, independence and family involvement, emerged during intervention. Although aphasia therapy for adolescents may be similar in many aspects to selected interventions for adults, it is necessary for the clinician to be mindful of age-related issues throughout the course of therapy. Goals and interventions should be selected based on factors salient to an adolescent as well as the potential long-term impact of therapy. Implications for Research Aphasia and its treatment in adolescence need to be further explored. Academics and technology are important aspects of aphasia treatment in adolescence. Issues specific to adolescence such as socialization, adherence/compliance, and independence are important to address in speech-language therapy.
An exploratory study on the driving method of speech synthesis based on the human eye reading imaging data

NASA Astrophysics Data System (ADS)

Gao, Pei-pei; Liu, Feng

2016-10-01

With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Speech Motor Development: Integrating Muscles, Movements, and Linguistic Units

ERIC Educational Resources Information Center

Smith, Anne

2006-01-01

A fundamental problem for those interested in human communication is to determine how ideas and the various units of language structure are communicated through speaking. The physiological concepts involved in the control of muscle contraction and movement are theoretically distant from the processing levels and units postulated to exist in…
GALLAUDET'S NEW HEARING AND SPEECH CENTER.

ERIC Educational Resources Information Center

FRISINA, D. ROBERT

THIS REPROT DESCRIBES THE DESIGN OF A NEW SPEECH AND HEARING CENTER AND ITS INTEGRATION INTO THE OVERALL ARCHITECTURAL SCHEME OF THE CAMPUS. THE CIRCULAR SHAPE WAS SELECTED TO COMPLEMENT THE SURROUNDING STRUCTURES AND COMPENSATE FOR DIFFERENCES IN SITE, WHILE PROVIDING THE ACOUSTICAL ADVANTAGES OF NON-PARALLEL WALLS, AND FACILITATING TRAFFIC FLOW.…
Vocal Pitch Discrimination in the Motor System

ERIC Educational Resources Information Center

D'Ausilio, Alessandro; Bufalari, Ilaria; Salmas, Paola; Busan, Pierpaolo; Fadiga, Luciano

2011-01-01

Speech production can be broadly separated into two distinct components: Phonation and Articulation. These two aspects require the efficient control of several phono-articulatory effectors. Speech is indeed generated by the vibration of the vocal-folds in the larynx (F0) followed by "filtering" by articulators, to select certain resonant…
A Curriculum Guide for Speech Communication--Grades 8-12.

ERIC Educational Resources Information Center

Brilhart, Barbara L., Comp.

This curriculum guide is a result of a graduate seminar in improvement of speech instruction given in 1971 at the University of Nebraska (Omaha). It is designed primarily for a full-year high school course, but individual sections can be used for a semester course or units. The aim of the curriculum is to integrate new approaches in communication…
Speech to Text: Today and Tomorrow. Proceedings of a Conference at Gallaudet University (Washington, D.C., September, 1988). GRI Monograh Series B, No. 2.

ERIC Educational Resources Information Center

Harkins, Judith E., Ed.; Virvan, Barbara M., Ed.

The conference proceedings contains 23 papers on telephone relay service, real-time captioning, and automatic speech recognition, and a glossary. The keynote address, by Representative Major R. Owens, examines current issues in federal legislation. Other papers have the following titles and authors: "Telephone Relay Service: Rationale and…
Establishing Reader Involvement in Transnational Marketing Communications: Relative Focus on Speech-Like or Written-Like Strategy.

ERIC Educational Resources Information Center

Frank, Jane

A study examined the use of three linguistic features imitating speech found in two groups of direct-mail marketing texts, in order to show differences in the ways U.S.-based and transnational efforts exploit readers' expectations regarding "literate" versus "oral" modes of expression. Two groups of sales letters, 25 U.S.-based…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.