Delphi, Maryam; Lotfi, M-Yones; Moossavi, Abdollah; Bakhshi, Enayatollah; Banimostafa, Maryam
2017-09-01
Previous studies have shown that interaural-time-difference (ITD) training can improve localization ability. Surprisingly little is, however, known about localization training vis-à-vis speech perception in noise based on interaural time difference in the envelope (ITD ENV). We sought to investigate the reliability of an ITD ENV-based training program in speech-in-noise perception among elderly individuals with normal hearing and speech-in-noise disorder. The present interventional study was performed during 2016. Sixteen elderly men between 55 and 65 years of age with the clinical diagnosis of normal hearing up to 2000 Hz and speech-in-noise perception disorder participated in this study. The training localization program was based on changes in ITD ENV. In order to evaluate the reliability of the training program, we performed speech-in-noise tests before the training program, immediately afterward, and then at 2 months' follow-up. The reliability of the training program was analyzed using the Friedman test and the SPSS software. Significant statistical differences were shown in the mean scores of speech-in-noise perception between the 3 time points (P=0.001). The results also indicated no difference in the mean scores of speech-in-noise perception between the 2 time points of immediately after the training program and 2 months' follow-up (P=0.212). The present study showed the reliability of an ITD ENV-based localization training in elderly individuals with speech-in-noise perception disorder.
Faulkner, Andrew; Rosen, Stuart; Green, Tim
2012-10-01
Two experimental groups were trained for 2 h with live or recorded speech that was noise-vocoded and spectrally shifted and was from the same text and talker. These two groups showed equivalent improvements in performance for vocoded and shifted sentences, and the group trained with recorded speech showed consistently greater improvements than untrained controls. Another group trained with unshifted noise-vocoded speech improved no more than untrained controls. Computer-based training thus appears at least as effective as labor-intensive live-voice training for improving the perception of spectrally shifted noise-vocoded speech, and by implication, for training of users of cochlear implants.
Technology and Speech Training: An Affair to Remember.
ERIC Educational Resources Information Center
Levitt, Harry
1989-01-01
A history of speech training technology is presented, from the simple hand-held mirror to complicated computer-based systems and tactile devices, and subsequent papers in this theme issue are introduced. Both the advantages and problems of technological aids are addressed. Simplicity in the application and use of speech training aids is stressed.…
Expertise with artificial non-speech sounds recruits speech-sensitive cortical regions
Leech, Robert; Holt, Lori L.; Devlin, Joseph T.; Dick, Frederic
2009-01-01
Regions of the human temporal lobe show greater activation for speech than for other sounds. These differences may reflect intrinsically specialized domain-specific adaptations for processing speech, or they may be driven by the significant expertise we have in listening to the speech signal. To test the expertise hypothesis, we used a video-game-based paradigm that tacitly trained listeners to categorize acoustically complex, artificial non-linguistic sounds. Before and after training, we used functional MRI to measure how expertise with these sounds modulated temporal lobe activation. Participants’ ability to explicitly categorize the non-speech sounds predicted the change in pre- to post-training activation in speech-sensitive regions of the left posterior superior temporal sulcus, suggesting that emergent auditory expertise may help drive this functional regionalization. Thus, seemingly domain-specific patterns of neural activation in higher cortical regions may be driven in part by experience-based restructuring of high-dimensional perceptual space. PMID:19386919
Partial maintenance of auditory-based cognitive training benefits in older adults
Anderson, Samira; White-Schwoch, Travis; Choi, Hee Jae; Kraus, Nina
2014-01-01
The potential for short-term training to improve cognitive and sensory function in older adults has captured the public’s interest. Initial results have been promising. For example, eight weeks of auditory-based cognitive training decreases peak latencies and peak variability in neural responses to speech presented in a background of noise and instills gains in speed of processing, speech-in-noise recognition, and short-term memory in older adults. But while previous studies have demonstrated short-term plasticity in older adults, we must consider the long-term maintenance of training gains. To evaluate training maintenance, we invited participants from an earlier training study to return for follow-up testing six months after the completion of training. We found that improvements in response peak timing to speech in noise and speed of processing were maintained, but the participants did not maintain speech-in-noise recognition or memory gains. Future studies should consider factors that are important for training maintenance, including the nature of the training, compliance with the training schedule, and the need for booster sessions after the completion of primary training. PMID:25111032
Visual Feedback of Tongue Movement for Novel Speech Sound Learning
Katz, William F.; Mehta, Sonya
2015-01-01
Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Auditory Training Effects on the Listening Skills of Children With Auditory Processing Disorder.
Loo, Jenny Hooi Yin; Rosen, Stuart; Bamiou, Doris-Eva
2016-01-01
Children with auditory processing disorder (APD) typically present with "listening difficulties,"' including problems understanding speech in noisy environments. The authors examined, in a group of such children, whether a 12-week computer-based auditory training program with speech material improved the perception of speech-in-noise test performance, and functional listening skills as assessed by parental and teacher listening and communication questionnaires. The authors hypothesized that after the intervention, (1) trained children would show greater improvements in speech-in-noise perception than untrained controls; (2) this improvement would correlate with improvements in observer-rated behaviors; and (3) the improvement would be maintained for at least 3 months after the end of training. This was a prospective randomized controlled trial of 39 children with normal nonverbal intelligence, ages 7 to 11 years, all diagnosed with APD. This diagnosis required a normal pure-tone audiogram and deficits in at least two clinical auditory processing tests. The APD children were randomly assigned to (1) a control group that received only the current standard treatment for children diagnosed with APD, employing various listening/educational strategies at school (N = 19); or (2) an intervention group that undertook a 3-month 5-day/week computer-based auditory training program at home, consisting of a wide variety of speech-based listening tasks with competing sounds, in addition to the current standard treatment. All 39 children were assessed for language and cognitive skills at baseline and on three outcome measures at baseline and immediate postintervention. Outcome measures were repeated 3 months postintervention in the intervention group only, to assess the sustainability of treatment effects. The outcome measures were (1) the mean speech reception threshold obtained from the four subtests of the listening in specialized noise test that assesses sentence perception in various configurations of masking speech, and in which the target speakers and test materials were unrelated to the training materials; (2) the Children's Auditory Performance Scale that assesses listening skills, completed by the children's teachers; and (3) the Clinical Evaluation of Language Fundamental-4 pragmatic profile that assesses pragmatic language use, completed by parents. All outcome measures significantly improved at immediate postintervention in the intervention group only, with effect sizes ranging from 0.76 to 1.7. Improvements in speech-in-noise performance correlated with improved scores in the Children's Auditory Performance Scale questionnaire in the trained group only. Baseline language and cognitive assessments did not predict better training outcome. Improvements in speech-in-noise performance were sustained 3 months postintervention. Broad speech-based auditory training led to improved auditory processing skills as reflected in speech-in-noise test performance and in better functional listening in real life. The observed correlation between improved functional listening with improved speech-in-noise perception in the trained group suggests that improved listening was a direct generalization of the auditory training.
Slater, Jessica; Skoe, Erika; Strait, Dana L; O'Connell, Samantha; Thompson, Elaine; Kraus, Nina
2015-09-15
Music training may strengthen auditory skills that help children not only in musical performance but in everyday communication. Comparisons of musicians and non-musicians across the lifespan have provided some evidence for a "musician advantage" in understanding speech in noise, although reports have been mixed. Controlled longitudinal studies are essential to disentangle effects of training from pre-existing differences, and to determine how much music training is necessary to confer benefits. We followed a cohort of elementary school children for 2 years, assessing their ability to perceive speech in noise before and after musical training. After the initial assessment, participants were randomly assigned to one of two groups: one group began music training right away and completed 2 years of training, while the second group waited a year and then received 1 year of music training. Outcomes provide the first longitudinal evidence that speech-in-noise perception improves after 2 years of group music training. The children were enrolled in an established and successful community-based music program and followed the standard curriculum, therefore these findings provide an important link between laboratory-based research and real-world assessment of the impact of music training on everyday communication skills. Copyright © 2015 Elsevier B.V. All rights reserved.
Auditory Training with Multiple Talkers and Passage-Based Semantic Cohesion
ERIC Educational Resources Information Center
Casserly, Elizabeth D.; Barney, Erin C.
2017-01-01
Purpose: Current auditory training methods typically result in improvements to speech recognition abilities in quiet, but learner gains may not extend to other domains in speech (e.g., recognition in noise) or self-assessed benefit. This study examined the potential of training involving multiple talkers and training emphasizing discourse-level…
Training to Improve Hearing Speech in Noise: Biological Mechanisms
Song, Judy H.; Skoe, Erika; Banai, Karen
2012-01-01
We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207
Tongue Control and Its Implication in Pronunciation Training
ERIC Educational Resources Information Center
Ouni, Slim
2014-01-01
Pronunciation training based on speech production techniques illustrating tongue movements is gaining popularity. However, there is not sufficient evidence that learners can imitate some tongue animation. In this paper, we argue that although controlling tongue movement related to speech is not such an easy task, training with visual feedback…
Environmental Sound Training in Cochlear Implant Users
Sheft, Stanley; Kuvadia, Sejal; Gygi, Brian
2015-01-01
Purpose The study investigated the effect of a short computer-based environmental sound training regimen on the perception of environmental sounds and speech in experienced cochlear implant (CI) patients. Method Fourteen CI patients with the average of 5 years of CI experience participated. The protocol consisted of 2 pretests, 1 week apart, followed by 4 environmental sound training sessions conducted on separate days in 1 week, and concluded with 2 posttest sessions, separated by another week without training. Each testing session included an environmental sound test, which consisted of 40 familiar everyday sounds, each represented by 4 different tokens, as well as the Consonant Nucleus Consonant (CNC) word test, and Revised Speech Perception in Noise (SPIN-R) sentence test. Results Environmental sounds scores were lower than for either of the speech tests. Following training, there was a significant average improvement of 15.8 points in environmental sound perception, which persisted 1 week later after training was discontinued. No significant improvements were observed for either speech test. Conclusions The findings demonstrate that environmental sound perception, which remains problematic even for experienced CI patients, can be improved with a home-based computer training regimen. Such computer-based training may thus provide an effective low-cost approach to rehabilitation for CI users, and potentially, other hearing impaired populations. PMID:25633579
Music-Based Training for Pediatric CI Recipients: A Systematic Analysis of Published Studies
Gfeller, Kate
2016-01-01
In recent years, there has been growing interest in the use of music-based training to enhance speech and language development in children with normal hearing and some forms of communication disorders, including pediatric CI users. The use of music training for CI users may initially seem incongruous given that signal processing for CIs presents a degraded version of pitch and timbre, both key elements in music. Furthermore, empirical data of systematic studies of music training, particularly in relation to transfer to speech skills are limited. This study describes the rationale for music training of CI users, describes key features of published studies of music training with CI users, and highlights some developmental and logistical issues that should be taken into account when interpreting or planning studies of music training and speech outcomes with pediatric CI recipients. PMID:27246744
Gfeller, Kate; Guthe, Emily; Driscoll, Virginia; Brown, Carolyn J
2015-09-01
This paper provides a preliminary report of a music-based training program for adult cochlear implant (CI) recipients. Included in this report are descriptions of the rationale for music-based training, factors influencing program development, and the resulting program components. Prior studies describing experience-based plasticity in response to music training, auditory training for persons with hearing impairment, and music training for CI recipients were reviewed. These sources revealed rationales for using music to enhance speech, factors associated with successful auditory training, relevant aspects of electric hearing and music perception, and extant evidence regarding limitations and advantages associated with parameters for music training with CI users. This informed the development of a computer-based music training program designed specifically for adult CI users. Principles and parameters for perceptual training of music, such as stimulus choice, rehabilitation approach, and motivational concerns were developed in relation to the unique auditory characteristics of adults with electric hearing. An outline of the resulting program components and the outcome measures for evaluating program effectiveness are presented. Music training can enhance the perceptual accuracy of music, but is also hypothesized to enhance several features of speech with similar processing requirements as music (e.g., pitch and timbre). However, additional evaluation of specific training parameters and the impact of music-based training on speech perception of CI users is required.
Gfeller, Kate; Guthe, Emily; Driscoll, Virginia; Brown, Carolyn J.
2015-01-01
Objective This paper provides a preliminary report of a music-based training program for adult cochlear implant (CI) recipients. Included in this report are descriptions of the rationale for music-based training, factors influencing program development, and the resulting program components. Methods Prior studies describing experience-based plasticity in response to music training, auditory training for persons with hearing impairment, and music training for cochlear implant recipients were reviewed. These sources revealed rationales for using music to enhance speech, factors associated with successful auditory training, relevant aspects of electric hearing and music perception, and extant evidence regarding limitations and advantages associated with parameters for music training with CI users. This information formed the development of a computer-based music training program designed specifically for adult CI users. Results Principles and parameters for perceptual training of music, such as stimulus choice, rehabilitation approach, and motivational concerns were developed in relation to the unique auditory characteristics of adults with electric hearing. An outline of the resulting program components and the outcome measures for evaluating program effectiveness are presented. Conclusions Music training can enhance the perceptual accuracy of music, but is also hypothesized to enhance several features of speech with similar processing requirements as music (e.g., pitch and timbre). However, additional evaluation of specific training parameters and the impact of music-based training on speech perception of CI users are required. PMID:26561884
ERIC Educational Resources Information Center
Plumb, Allison M.; Plexico, Laura W.
2013-01-01
Purpose: To investigate the graduate training experiences of school-based speech-language pathologists (SLPs) working with children with autism spectrum disorders (ASDs). Comparisons were made between recent graduates (post 2006) and pre-2006 graduates to determine if differences existed in their academic and clinical experiences or their…
High school music classes enhance the neural processing of speech.
Tierney, Adam; Krizman, Jennifer; Skoe, Erika; Johnston, Kathleen; Kraus, Nina
2013-01-01
Should music be a priority in public education? One argument for teaching music in school is that private music instruction relates to enhanced language abilities and neural function. However, the directionality of this relationship is unclear and it is unknown whether school-based music training can produce these enhancements. Here we show that 2 years of group music classes in high school enhance the neural encoding of speech. To tease apart the relationships between music and neural function, we tested high school students participating in either music or fitness-based training. These groups were matched at the onset of training on neural timing, reading ability, and IQ. Auditory brainstem responses were collected to a synthesized speech sound presented in background noise. After 2 years of training, the neural responses of the music training group were earlier than at pre-training, while the neural timing of students in the fitness training group was unchanged. These results represent the strongest evidence to date that in-school music education can cause enhanced speech encoding. The neural benefits of musical training are, therefore, not limited to expensive private instruction early in childhood but can be elicited by cost-effective group instruction during adolescence.
Women's Speech/Men's Speech: Does Forensic Training Make a Difference?
ERIC Educational Resources Information Center
Larson, Suzanne; Vreeland, Amy L.
A study of cross examination speeches of males and females was conducted to determine gender differences in intercollegiate debate. The theory base for gender differences in speech is closely tied to the analysis of dyadic conversation. It is based on the belief that women are less forceful and dominant in cross examination, and will exhibit…
Reversal of age-related neural timing delays with training
Anderson, Samira; White-Schwoch, Travis; Parbery-Clark, Alexandra; Kraus, Nina
2013-01-01
Neural slowing is commonly noted in older adults, with consequences for sensory, motor, and cognitive domains. One of the deleterious effects of neural slowing is impairment of temporal resolution; older adults, therefore, have reduced ability to process the rapid events that characterize speech, especially in noisy environments. Although hearing aids provide increased audibility, they cannot compensate for deficits in auditory temporal processing. Auditory training may provide a strategy to address these deficits. To that end, we evaluated the effects of auditory-based cognitive training on the temporal precision of subcortical processing of speech in noise. After training, older adults exhibited faster neural timing and experienced gains in memory, speed of processing, and speech-in-noise perception, whereas a matched control group showed no changes. Training was also associated with decreased variability of brainstem response peaks, suggesting a decrease in temporal jitter in response to a speech signal. These results demonstrate that auditory-based cognitive training can partially restore age-related deficits in temporal processing in the brain; this plasticity in turn promotes better cognitive and perceptual skills. PMID:23401541
Cason, Nia; Astésano, Corine; Schön, Daniele
2015-02-01
Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.
The Role of Corticostriatal Systems in Speech Category Learning
Yi, Han-Gyol; Maddox, W. Todd; Mumford, Jeanette A.; Chandrasekaran, Bharath
2016-01-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. Keywords: category learning, fMRI, corticostriatal systems, speech, putamen PMID:25331600
Initial Development of a Spatially Separated Speech-in-Noise and Localization Training Program
Tyler, Richard S.; Witt, Shelley A.; Dunn, Camille C.; Wang, Wenjun
2010-01-01
Objective This article describes the initial development of a novel approach for training hearing-impaired listeners to improve their ability to understand speech in the presence of background noise and to also improve their ability to localize sounds. Design Most people with hearing loss, even those well fit with hearing devices, still experience significant problems understanding speech in noise. Prior research suggests that at least some subjects can experience improved speech understanding with training. However, all training systems that we are aware of have one basic, critical limitation. They do not provide spatial separation of the speech and noise, therefore ignoring the potential benefits of training binaural hearing. In this paper we describe our initial experience with a home-based training system that includes spatially separated speech-in-noise and localization training. Results Throughout the development of this system patient input, training and preliminary pilot data from individuals with bilateral cochlear implants were utilized. Positive feedback from subjective reports indicated that some individuals were engaged in the treatment, and formal testing showed benefit. Feedback and practical issues resulted from the reduction of an eight-loudspeaker to a two-loudspeaker system. Conclusions These preliminary findings suggest we have successfully developed a viable spatial hearing training system that can improve binaural hearing in noise and localization. Applications include, but are not limited to, hearing with hearing aids and cochlear implants. PMID:20701836
Auditory-Perceptual Learning Improves Speech Motor Adaptation in Children
Shiller, Douglas M.; Rochon, Marie-Lyne
2015-01-01
Auditory feedback plays an important role in children’s speech development by providing the child with information about speech outcomes that is used to learn and fine-tune speech motor plans. The use of auditory feedback in speech motor learning has been extensively studied in adults by examining oral motor responses to manipulations of auditory feedback during speech production. Children are also capable of adapting speech motor patterns to perceived changes in auditory feedback, however it is not known whether their capacity for motor learning is limited by immature auditory-perceptual abilities. Here, the link between speech perceptual ability and the capacity for motor learning was explored in two groups of 5–7-year-old children who underwent a period of auditory perceptual training followed by tests of speech motor adaptation to altered auditory feedback. One group received perceptual training on a speech acoustic property relevant to the motor task while a control group received perceptual training on an irrelevant speech contrast. Learned perceptual improvements led to an enhancement in speech motor adaptation (proportional to the perceptual change) only for the experimental group. The results indicate that children’s ability to perceive relevant speech acoustic properties has a direct influence on their capacity for sensory-based speech motor adaptation. PMID:24842067
[Improvement in Phoneme Discrimination in Noise in Normal Hearing Adults].
Schumann, A; Garea Garcia, L; Hoppe, U
2017-02-01
Objective: The study's aim was to examine the possibility to train phoneme-discrimination in noise with normal hearing adults, and its effectivity on speech recognition in noise. A specific computerised training program was used, consisting of special nonsense-syllables with background noise, to train participants' discrimination ability. Material and Methods: 46 normal hearing subjects took part in this study, 28 as training group participants, 18 as control group participants. Only the training group subjects were asked to train over a period of 3 weeks, twice a week for an hour with a computer-based training program. Speech recognition in noise were measured pre- to posttraining for the training group subjects with the Freiburger Einsilber Test. The control group subjects obtained test and restest measures within a 2-3 week break. For the training group follow-up speech recognition was measured 2-3 months after the end of the training. Results: The majority of training group subjects improved their phoneme discrimination significantly. Besides, their speech recognition in noise improved significantly during the training compared to the control group, and remained stable for a period of time. Conclusions: Phonem-Discrimination in noise can be trained by normal hearing adults. The improvements have got a positiv effect on speech recognition in noise, also for a longer period of time. © Georg Thieme Verlag KG Stuttgart · New York.
Speech training alters tone frequency tuning in rat primary auditory cortex
Engineer, Crystal T.; Perez, Claudia A.; Carraway, Ryan S.; Chang, Kevin Q.; Roland, Jarod L.; Kilgard, Michael P.
2013-01-01
Previous studies in both humans and animals have documented improved performance following discrimination training. This enhanced performance is often associated with cortical response changes. In this study, we tested the hypothesis that long-term speech training on multiple tasks can improve primary auditory cortex (A1) responses compared to rats trained on a single speech discrimination task or experimentally naïve rats. Specifically, we compared the percent of A1 responding to trained sounds, the responses to both trained and untrained sounds, receptive field properties of A1 neurons, and the neural discrimination of pairs of speech sounds in speech trained and naïve rats. Speech training led to accurate discrimination of consonant and vowel sounds, but did not enhance A1 response strength or the neural discrimination of these sounds. Speech training altered tone responses in rats trained on six speech discrimination tasks but not in rats trained on a single speech discrimination task. Extensive speech training resulted in broader frequency tuning, shorter onset latencies, a decreased driven response to tones, and caused a shift in the frequency map to favor tones in the range where speech sounds are the loudest. Both the number of trained tasks and the number of days of training strongly predict the percent of A1 responding to a low frequency tone. Rats trained on a single speech discrimination task performed less accurately than rats trained on multiple tasks and did not exhibit A1 response changes. Our results indicate that extensive speech training can reorganize the A1 frequency map, which may have downstream consequences on speech sound processing. PMID:24344364
Some Effects of Training on the Perception of Synthetic Speech
Schwab, Eileen C.; Nusbaum, Howard C.; Pisoni, David B.
2012-01-01
The present study was conducted to determine the effects of training on the perception of synthetic speech. Three groups of subjects were tested with synthetic speech using the same tasks before and after training. One group was trained with synthetic speech. A second group went through the identical training procedures using natural speech. The third group received no training. Although performance of the three groups was the same prior to training, significant differences on the post-test measures of word recognition were observed: the group trained with synthetic speech performed much better than the other two groups. A six-month follow-up indicated that the group trained with synthetic speech displayed long-term retention of the knowledge and experience gained with prior exposure to synthetic speech generated by a text-to-speech system. PMID:2936671
Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones
NASA Astrophysics Data System (ADS)
Heinzen, Christina Carolyn
The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns for the across-category lexical tone contrast. Overall, the results support the use of IDS characteristics in training non-native speech contrasts and provide impetus for further research.
Loo, Jenny Hooi Yin; Bamiou, Doris-Eva; Campbell, Nicci; Luxon, Linda M
2010-08-01
This article reviews the evidence for computer-based auditory training (CBAT) in children with language, reading, and related learning difficulties, and evaluates the extent it can benefit children with auditory processing disorder (APD). Searches were confined to studies published between 2000 and 2008, and they are rated according to the level of evidence hierarchy proposed by the American Speech-Language Hearing Association (ASHA) in 2004. We identified 16 studies of two commercially available CBAT programs (13 studies of Fast ForWord (FFW) and three studies of Earobics) and five further outcome studies of other non-speech and simple speech sounds training, available for children with language, learning, and reading difficulties. The results suggest that, apart from the phonological awareness skills, the FFW and Earobics programs seem to have little effect on the language, spelling, and reading skills of children. Non-speech and simple speech sounds training may be effective in improving children's reading skills, but only if it is delivered by an audio-visual method. There is some initial evidence to suggest that CBAT may be of benefit for children with APD. Further research is necessary, however, to substantiate these preliminary findings.
Visual-auditory integration during speech imitation in autism.
Williams, Justin H G; Massaro, Dominic W; Peel, Natalie J; Bosseler, Alexis; Suddendorf, Thomas
2004-01-01
Children with autistic spectrum disorder (ASD) may have poor audio-visual integration, possibly reflecting dysfunctional 'mirror neuron' systems which have been hypothesised to be at the core of the condition. In the present study, a computer program, utilizing speech synthesizer software and a 'virtual' head (Baldi), delivered speech stimuli for identification in auditory, visual or bimodal conditions. Children with ASD were poorer than controls at recognizing stimuli in the unimodal conditions, but once performance on this measure was controlled for, no group difference was found in the bimodal condition. A group of participants with ASD were also trained to develop their speech-reading ability. Training improved visual accuracy and this also improved the children's ability to utilize visual information in their processing of speech. Overall results were compared to predictions from mathematical models based on integration and non-integration, and were most consistent with the integration model. We conclude that, whilst they are less accurate in recognizing stimuli in the unimodal condition, children with ASD show normal integration of visual and auditory speech stimuli. Given that training in recognition of visual speech was effective, children with ASD may benefit from multi-modal approaches in imitative therapy and language training.
Enhancing speech recognition using improved particle swarm optimization based hidden Markov model.
Selvaraj, Lokesh; Ganesan, Balakrishnan
2014-01-01
Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO) is suggested. The suggested methodology contains four stages, namely, (i) denoising, (ii) feature mining (iii), vector quantization, and (iv) IPSO based hidden Markov model (HMM) technique (IP-HMM). At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC), mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.
Written Language Disorders: Speech-Language Pathologists' Training, Knowledge, and Confidence
ERIC Educational Resources Information Center
Blood, Gordon W.; Mamett, Callie; Gordon, Rebecca; Blood, Ingrid M.
2010-01-01
Purpose: This study examined speech-language pathologists' (SLPs') perceptions of their (a) educational and clinical training in evaluating and treating written language disorders, (b) knowledge bases in this area, (c) sources of knowledge about written language disorders, (d) confidence levels, and (e) predictors of confidence in working with…
Fu, Szu-Wei; Li, Pei-Chun; Lai, Ying-Hui; Yang, Cheng-Chien; Hsieh, Li-Chun; Tsao, Yu
2017-11-01
Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients. Objective: This paper focuses on machine learning based voice conversion (VC) techniques for improving the speech intelligibility of surgical patients who have had parts of their articulators removed. Because of the removal of parts of the articulator, a patient's speech may be distorted and difficult to understand. To overcome this problem, VC methods can be applied to convert the distorted speech such that it is clear and more intelligible. To design an effective VC method, two key points must be considered: 1) the amount of training data may be limited (because speaking for a long time is usually difficult for postoperative patients); 2) rapid conversion is desirable (for better communication). Methods: We propose a novel joint dictionary learning based non-negative matrix factorization (JD-NMF) algorithm. Compared to conventional VC techniques, JD-NMF can perform VC efficiently and effectively with only a small amount of training data. Results: The experimental results demonstrate that the proposed JD-NMF method not only achieves notably higher short-time objective intelligibility (STOI) scores (a standardized objective intelligibility evaluation metric) than those obtained using the original unconverted speech but is also significantly more efficient and effective than a conventional exemplar-based NMF VC method. Conclusion: The proposed JD-NMF method may outperform the state-of-the-art exemplar-based NMF VC method in terms of STOI scores under the desired scenario. Significance: We confirmed the advantages of the proposed joint training criterion for the NMF-based VC. Moreover, we verified that the proposed JD-NMF can effectively improve the speech intelligibility scores of oral surgery patients.
Neurophysiological Influence of Musical Training on Speech Perception
Shahin, Antoine J.
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Neurophysiological influence of musical training on speech perception.
Shahin, Antoine J
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.
Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus
2010-09-01
Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.
Evidence-based speech-language pathology practices in schools: findings from a national survey.
Hoffman, Lavae M; Ireland, Marie; Hall-Mills, Shannon; Flynn, Perry
2013-07-01
This study documented evidence-based practice (EBP) patterns as reported by speech-language pathologists (SLPs) employed in public schools during 2010-2011. Using an online survey, practioners reported their EBP training experiences, resources available in their workplaces, and the frequency with which they engage in specific EBP activities, as well as their resource needs and future training format preferences. A total of 2,762 SLPs in 28 states participated in the online survey, 85% of whom reported holding the Certificate of Clinical Competence in Speech-Language Pathology credential. Results revealed that one quarter of survey respondents had no formal training in EBP, 11% of SLPs worked in school districts with official EBP procedural guidelines, and 91% had no scheduled time to support EBP activities. The majority of SLPs posed and researched 0 to 2 EBP questions per year and read 0 to 4 American Speech-Language-Hearing Association (ASHA) journal articles per year on either assessment or intervention topics. Use of ASHA online resources and engagement in EBP activities were documented to be low. However, results also revealed that school-based SLPs have high interest in additional training and resources to support scientifically based practices. Suggestions for enhancing EBP support in public schools and augmenting knowledge transfer are provided.
Speech training alters consonant and vowel responses in multiple auditory cortex fields
Engineer, Crystal T.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Fink, Melyssa K.; Kilgard, Michael P.
2015-01-01
Speech sounds evoke unique neural activity patterns in primary auditory cortex (A1). Extensive speech sound discrimination training alters A1 responses. While the neighboring auditory cortical fields each contain information about speech sound identity, each field processes speech sounds differently. We hypothesized that while all fields would exhibit training-induced plasticity following speech training, there would be unique differences in how each field changes. In this study, rats were trained to discriminate speech sounds by consonant or vowel in quiet and in varying levels of background speech-shaped noise. Local field potential and multiunit responses were recorded from four auditory cortex fields in rats that had received 10 weeks of speech discrimination training. Our results reveal that training alters speech evoked responses in each of the auditory fields tested. The neural response to consonants was significantly stronger in anterior auditory field (AAF) and A1 following speech training. The neural response to vowels following speech training was significantly weaker in ventral auditory field (VAF) and posterior auditory field (PAF). This differential plasticity of consonant and vowel sound responses may result from the greater paired pulse depression, expanded low frequency tuning, reduced frequency selectivity, and lower tone thresholds, which occurred across the four auditory fields. These findings suggest that alterations in the distributed processing of behaviorally relevant sounds may contribute to robust speech discrimination. PMID:25827927
Schumann, Annette; Serman, Maja; Gefeller, Olaf; Hoppe, Ulrich
2015-03-01
Specific computer-based auditory training may be a useful completion in the rehabilitation process for cochlear implant (CI) listeners to achieve sufficient speech intelligibility. This study evaluated the effectiveness of a computerized, phoneme-discrimination training programme. The study employed a pretest-post-test design; participants were randomly assigned to the training or control group. Over a period of three weeks, the training group was instructed to train in phoneme discrimination via computer, twice a week. Sentence recognition in different noise conditions (moderate to difficult) was tested pre- and post-training, and six months after the training was completed. The control group was tested and retested within one month. Twenty-seven adult CI listeners who had been using cochlear implants for more than two years participated in the programme; 15 adults in the training group, 12 adults in the control group. Besides significant improvements for the trained phoneme-identification task, a generalized training effect was noted via significantly improved sentence recognition in moderate noise. No significant changes were noted in the difficult noise conditions. Improved performance was maintained over an extended period. Phoneme-discrimination training improves experienced CI listeners' speech perception in noise. Additional research is needed to optimize auditory training for individual benefit.
Erasmus, D; Schutte, L; van der Merwe, M; Geertsema, S
2013-12-01
To investigate whether privately practising speech-language therapists in South Africa are fulfilling their role of identification, assessment and intervention for adolescents with written-language and reading difficulties. Further needs concerning training with regard to this population group were also determined. A survey study was conducted, using a self-administered questionnaire. Twenty-two currently practising speech-language therapists who are registered members of the South African Speech-Language-Hearing Association (SASLHA) participated in the study. The respondents indicated that they are aware of their role regarding adolescents with written-language difficulties. However, they feel that South-African speech-language therapists are not fulfilling this role. Existing assessment tools and interventions for written-language difficulties are described as inadequate, and culturally and age inappropriate. Yet, the majority of the respondents feel that they are adequately equipped to work with adolescents with written-language difficulties, based on their own experience, self-study and secondary training. The respondents feel that training regarding effective collaboration with teachers is necessary to establish specific roles, and to promote speech-language therapy for adolescents among teachers. Further research is needed in developing appropriate assessment and intervention tools as well as improvement of training at an undergraduate level.
Arts, Remo A G J; George, Erwin L J; Janssen, Miranda A M L; Griessner, Andreas; Zierhofer, Clemens; Stokroos, Robert J
2018-06-01
Previous studies show that intracochlear electrical stimulation independent of environmental sounds appears to suppress tinnitus, even long-term. In order to assess the viability of this potential treatment option it is essential to study the effects of this tinnitus specific electrical stimulation on speech perception. A randomised, prospective crossover design. Ten patients with unilateral or asymmetric hearing loss and severe tinnitus complaints. The audiological effects of standard clinical CI, formal auditory training and tinnitus specific electrical stimulation were investigated. Results show that standard clinical CI in unilateral or asymmetric hearing loss is shown to be beneficial for speech perception in quiet, speech perception in noise and subjective hearing ability. Formal auditory training does not appear to improve speech perception performance. However, CI-related discomfort reduces significantly more rapidly during CI rehabilitation in subjects receiving formal auditory training. Furthermore, tinnitus specific electrical stimulation has neither positive nor negative effects on speech perception. In combination with the findings from previous studies on tinnitus suppression using intracochlear electrical stimulation independent of environmental sounds, the results of this study contribute to the viability of cochlear implantation based on tinnitus complaints.
Lim, Hayoung A
2010-01-01
The study compared the effect of music training, speech training and no-training on the verbal production of children with Autism Spectrum Disorders (ASD). Participants were 50 children with ASD, age range 3 to 5 years, who had previously been evaluated on standard tests of language and level of functioning. They were randomly assigned to one of three 3-day conditions. Participants in music training (n = 18) watched a music video containing 6 songs and pictures of the 36 target words; those in speech training (n = 18) watched a speech video containing 6 stories and pictures, and those in the control condition (n = 14) received no treatment. Participants' verbal production including semantics, phonology, pragmatics, and prosody was measured by an experimenter designed verbal production evaluation scale. Results showed that participants in both music and speech training significantly increased their pre to posttest verbal production. Results also indicated that both high and low functioning participants improved their speech production after receiving either music or speech training; however, low functioning participants showed a greater improvement after the music training than the speech training. Children with ASD perceive important linguistic information embedded in music stimuli organized by principles of pattern perception, and produce the functional speech.
Specific acoustic models for spontaneous and dictated style in indonesian speech recognition
NASA Astrophysics Data System (ADS)
Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.
2018-03-01
The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
Lidestam, Björn; Moradi, Shahram; Pettersson, Rasmus; Ricklefs, Theodor
2014-08-01
The effects of audiovisual versus auditory training for speech-in-noise identification were examined in 60 young participants. The training conditions were audiovisual training, auditory-only training, and no training (n = 20 each). In the training groups, gated consonants and words were presented at 0 dB signal-to-noise ratio; stimuli were either audiovisual or auditory-only. The no-training group watched a movie clip without performing a speech identification task. Speech-in-noise identification was measured before and after the training (or control activity). Results showed that only audiovisual training improved speech-in-noise identification, demonstrating superiority over auditory-only training.
ERIC Educational Resources Information Center
Mundy, Marie-Anne; Padilla Oviedo, Andres; Ramirez, Juan; Taylor, Nick; Flores, Itza
2014-01-01
One of the main goals of universities is to graduate students who are capable and competent in competing in the workforce. As presentational communication skills are critical in today's job market, Hispanic university students need to be trained to effectively develop and deliver presentational speeches. Web/technology enhanced training techniques…
The Role of Corticostriatal Systems in Speech Category Learning.
Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath
2016-04-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model
Rallapalli, Varsha H.
2016-01-01
Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.
Hsieh, Ming-Yeh; Lynch, Georgina; Madison, Charles
2018-04-27
This study examined intervention techniques used with children with autism spectrum disorder (ASD) by speech-language pathologists (SLPs) in the United States and Taiwan working in clinic/hospital settings. The research questions addressed intervention techniques used with children with ASD, intervention techniques used with different age groups (under and above 8 years old), and training received before using the intervention techniques. The survey was distributed through the American Speech-Language-Hearing Association to selected SLPs across the United States. In Taiwan, the survey (Chinese version) was distributed through the Taiwan Speech-Language Pathologist Union, 2018, to certified SLPs. Results revealed that SLPs in the United States and Taiwan used 4 common intervention techniques: Social Skill Training, Augmentative and Alternative Communication, Picture Exchange Communication System, and Social Stories. Taiwanese SLPs reported SLP preparation program training across these common intervention strategies. In the United States, SLPs reported training via SLP preparation programs, peer therapists, and self-taught. Most SLPs reported using established or emerging evidence-based practices as defined by the National Professional Development Center (2014) and the National Standards Report (2015). Future research should address comparison of SLP preparation programs to examine the impact of preprofessional training on use of evidence-based practices to treat ASD.
ERIC Educational Resources Information Center
Levy, Erika S.; Crowley, Catherine J.
2012-01-01
Speech-language pathology (SLP) training programs are the initial gateway for nonnative speakers of English to join the SLP profession. An anonymous web-based survey in New York State examined policies and practices implemented when SLP students have foreign accents in English or in other languages. Responses were elicited from 530 students and 28…
Musical training during early childhood enhances the neural encoding of speech in noise
Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina
2012-01-01
For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child’s access to a target signal in noise. Given adult musicians’ perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and subcortical processing of speech in noise and related cognitive abilities in musician and nonmusician children that were matched for a variety of overarching factors. Outcomes reveal that musicians’ advantages for processing speech in noise are present during pivotal developmental years. Supported by correlations between auditory working memory and attention and auditory brainstem response properties, we propose that musicians’ perceptual and neural enhancements are driven in a top-down manner by strengthened cognitive abilities with training. Our results may be considered by professionals involved in the remediation of language-based learning deficits, which are often characterized by poor speech perception in noise. PMID:23102977
Paatsch, Louise E; Blamey, Peter J; Sarant, Julia Z; Bow, Catherine P
2006-01-01
A group of 21 hard-of-hearing and deaf children attending primary school were trained by their teachers on the production of selected consonants and on the meanings of selected words. Speech production, vocabulary knowledge, reading aloud, and speech perception measures were obtained before and after each type of training. The speech production training produced a small but significant improvement in the percentage of consonants correctly produced in words. The vocabulary training improved knowledge of word meanings substantially. Performance on speech perception and reading aloud were significantly improved by both types of training. These results were in accord with the predictions of a mathematical model put forward to describe the relationships between speech perception, speech production, and language measures in children (Paatsch, Blamey, Sarant, Martin, & Bow, 2004). These training data demonstrate that the relationships between the measures are causal. In other words, improvements in speech production and vocabulary performance produced by training will carry over into predictable improvements in speech perception and reading scores. Furthermore, the model will help educators identify the most effective methods of improving receptive and expressive spoken language for individual children who are deaf or hard of hearing.
ERIC Educational Resources Information Center
Chen, Howard Hao-Jan
2011-01-01
Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…
CLINIC-LABORATORY DESIGN BASED ON FUNCTION AND PHILOSOPHY AT PURDUE UNIVERSITY.
ERIC Educational Resources Information Center
HANLEY, T.D.; STEER, M.D.
THIS REPORT DESCRIBES THE DESIGN OF A NEW CLINIC AND LABORATORY FOR SPEECH AND HEARING TO ACCOMMODATE THE THREE BASIC PROGRAMS OF--(1) CLINICAL TRAINING OF UNDERGRADUATE AND GRADUATE STUDENT MAJORS, (2) SERVICES MADE AVAILABLE TO THE SPEECH AND HEARING HANDICAPPED, AND (3) RESEARCH IN SPEECH PATHOLOGY, AUDIOLOGY, PSYCHO-ACOUSTICS, AND…
ERIC Educational Resources Information Center
Camarata, Stephen; Yoder, Paul; Camarata, Mary
2006-01-01
Children with Down syndrome often display speech-comprehensibility and grammatical deficits beyond what would be predicted based upon general mental age. Historically, speech-comprehensibility has often been treated using traditional articulation therapy and oral-motor training so there may be little or no coordination of grammatical and…
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.
Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu
2005-01-01
Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
End-to-End ASR-Free Keyword Search From Speech
NASA Astrophysics Data System (ADS)
Audhkhasi, Kartik; Rosenberg, Andrew; Sethy, Abhinav; Ramabhadran, Bhuvana; Kingsbury, Brian
2017-12-01
End-to-end (E2E) systems have achieved competitive results compared to conventional hybrid hidden Markov model (HMM)-deep neural network based automatic speech recognition (ASR) systems. Such E2E systems are attractive due to the lack of dependence on alignments between input acoustic and output grapheme or HMM state sequence during training. This paper explores the design of an ASR-free end-to-end system for text query-based keyword search (KWS) from speech trained with minimal supervision. Our E2E KWS system consists of three sub-systems. The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to reconstruct the audio through a finite-dimensional representation. The second sub-system is a character-level RNN language model using embeddings learned from a convolutional neural network. Since the acoustic and text query embeddings occupy different representation spaces, they are input to a third feed-forward neural network that predicts whether the query occurs in the acoustic utterance or not. This E2E ASR-free KWS system performs respectably despite lacking a conventional ASR system and trains much faster.
Bernhardt, May B; Bacsfalvi, Penelope; Adler-Bock, Marcy; Shimizu, Reiko; Cheney, Audrey; Giesbrecht, Nathan; O'connell, Maureen; Sirianni, Jason; Radanov, Bosko
2008-02-01
Ultrasound has shown promise as a visual feedback tool in speech therapy. Rural clients, however, often have minimal access to new technologies. The purpose of the current study was to evaluate consultative treatment using ultrasound in rural communities. Two speech-language pathologists (SLPs) trained in ultrasound use provided consultation with ultrasound in rural British Columbia to 13 school-aged children with residual speech impairments. Local SLPs provided treatment without ultrasound before and after the consultation. Speech samples were transcribed phonetically by independent trained listeners. Eleven children showed greater gains in production of the principal target /[image omitted]/ after the ultrasound consultation. Four of the seven participants who received more consultation time with ultrasound showed greatest improvement. Individual client factors also affected outcomes. The current study was a quasi-experimental clinic-based study. Larger, controlled experimental studies are needed to provide ultimate evaluation of the consultative use of ultrasound in speech therapy.
Deep neural network and noise classification-based speech enhancement
NASA Astrophysics Data System (ADS)
Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei
2017-07-01
In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
Fraga González, Gorka; Žarić, Gojko; Tijms, Jurgen; Bonte, Milene; van der Molen, Maurits W.
2015-01-01
A recent account of dyslexia assumes that a failure to develop automated letter-speech sound integration might be responsible for the observed lack of reading fluency. This study uses a pre-test-training-post-test design to evaluate the effects of a training program based on letter-speech sound associations with a special focus on gains in reading fluency. A sample of 44 children with dyslexia and 23 typical readers, aged 8 to 9, was recruited. Children with dyslexia were randomly allocated to either the training program group (n = 23) or a waiting-list control group (n = 21). The training intensively focused on letter-speech sound mapping and consisted of 34 individual sessions of 45 minutes over a five month period. The children with dyslexia showed substantial reading gains for the main word reading and spelling measures after training, improving at a faster rate than typical readers and waiting-list controls. The results are interpreted within the conceptual framework assuming a multisensory integration deficit as the most proximal cause of dysfluent reading in dyslexia. Trial Registration: ISRCTN register ISRCTN12783279 PMID:26629707
Working memory training to improve speech perception in noise across languages
Ingvalson, Erin M.; Dhar, Sumitrajit; Wong, Patrick C. M.; Liu, Hanjun
2015-01-01
Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners. PMID:26093435
Working memory training to improve speech perception in noise across languages.
Ingvalson, Erin M; Dhar, Sumitrajit; Wong, Patrick C M; Liu, Hanjun
2015-06-01
Working memory capacity has been linked to performance on many higher cognitive tasks, including the ability to perceive speech in noise. Current efforts to train working memory have demonstrated that working memory performance can be improved, suggesting that working memory training may lead to improved speech perception in noise. A further advantage of working memory training to improve speech perception in noise is that working memory training materials are often simple, such as letters or digits, making them easily translatable across languages. The current effort tested the hypothesis that working memory training would be associated with improved speech perception in noise and that materials would easily translate across languages. Native Mandarin Chinese and native English speakers completed ten days of reversed digit span training. Reading span and speech perception in noise both significantly improved following training, whereas untrained controls showed no gains. These data suggest that working memory training may be used to improve listeners' speech perception in noise and that the materials may be quickly adapted to a wide variety of listeners.
ERIC Educational Resources Information Center
Harrison, Emily; Wood, Clare; Holliman, Andrew J.; Vousden, Janet I.
2018-01-01
Despite empirical evidence of a relationship between sensitivity to speech rhythm and reading, there have been few studies that have examined the impact of rhythmic training on reading attainment, and no intervention study has focused on speech rhythm sensitivity specifically to enhance reading skills. Seventy-three typically developing 4- to…
Nan, Yun; Liu, Li; Geiser, Eveline; Shu, Hua; Gong, Chen Chen; Dong, Qi; Gabrieli, John D E; Desimone, Robert
2018-06-25
Musical training confers advantages in speech-sound processing, which could play an important role in early childhood education. To understand the mechanisms of this effect, we used event-related potential and behavioral measures in a longitudinal design. Seventy-four Mandarin-speaking children aged 4-5 y old were pseudorandomly assigned to piano training, reading training, or a no-contact control group. Six months of piano training improved behavioral auditory word discrimination in general as well as word discrimination based on vowels compared with the controls. The reading group yielded similar trends. However, the piano group demonstrated unique advantages over the reading and control groups in consonant-based word discrimination and in enhanced positive mismatch responses (pMMRs) to lexical tone and musical pitch changes. The improved word discrimination based on consonants correlated with the enhancements in musical pitch pMMRs among the children in the piano group. In contrast, all three groups improved equally on general cognitive measures, including tests of IQ, working memory, and attention. The results suggest strengthened common sound processing across domains as an important mechanism underlying the benefits of musical training on language processing. In addition, although we failed to find far-transfer effects of musical training to general cognition, the near-transfer effects to speech perception establish the potential for musical training to help children improve their language skills. Piano training was not inferior to reading training on direct tests of language function, and it even seemed superior to reading training in enhancing consonant discrimination.
Lim, Hayoung A; Draper, Ellary
2011-01-01
This study compared a common form of Applied Behavior Analysis Verbal Behavior (ABA VB) approach and music incorporated with ABA VB method as part of developmental speech-language training in the speech production of children with Autism Spectrum Disorders (ASD). This study explored how the perception of musical patterns incorporated in ABA VB operants impacted the production of speech in children with ASD. Participants were 22 children with ASD, age range 3 to 5 years, who were verbal or pre verbal with presence of immediate echolalia. They were randomly assigned a set of target words for each of the 3 training conditions: (a) music incorporated ABA VB, (b) speech (ABA VB) and (c) no-training. Results showed both music and speech trainings were effective for production of the four ABA verbal operants; however, the difference between music and speech training was not statistically different. Results also indicated that music incorporated ABA VB training was most effective in echoic production, and speech training was most effective in tact production. Music can be incorporated into the ABA VB training method, and musical stimuli can be used as successfully as ABA VB speech training to enhance the functional verbal production in children with ASD.
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.
Schafer, Phillip B; Jin, Dezhe Z
2014-03-01
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Auditory Perceptual Learning in Adults with and without Age-Related Hearing Loss
Karawani, Hanin; Bitan, Tali; Attias, Joseph; Banai, Karen
2016-01-01
Introduction : Speech recognition in adverse listening conditions becomes more difficult as we age, particularly for individuals with age-related hearing loss (ARHL). Whether these difficulties can be eased with training remains debated, because it is not clear whether the outcomes are sufficiently general to be of use outside of the training context. The aim of the current study was to compare training-induced learning and generalization between normal-hearing older adults and those with ARHL. Methods : Fifty-six listeners (60–72 y/o), 35 participants with ARHL, and 21 normal hearing adults participated in the study. The study design was a cross over design with three groups (immediate-training, delayed-training, and no-training group). Trained participants received 13 sessions of home-based auditory training over the course of 4 weeks. Three adverse listening conditions were targeted: (1) Speech-in-noise, (2) time compressed speech, and (3) competing speakers, and the outcomes of training were compared between normal and ARHL groups. Pre- and post-test sessions were completed by all participants. Outcome measures included tests on all of the trained conditions as well as on a series of untrained conditions designed to assess the transfer of learning to other speech and non-speech conditions. Results : Significant improvements on all trained conditions were observed in both ARHL and normal-hearing groups over the course of training. Normal hearing participants learned more than participants with ARHL in the speech-in-noise condition, but showed similar patterns of learning in the other conditions. Greater pre- to post-test changes were observed in trained than in untrained listeners on all trained conditions. In addition, the ability of trained listeners from the ARHL group to discriminate minimally different pseudowords in noise also improved with training. Conclusions : ARHL did not preclude auditory perceptual learning but there was little generalization to untrained conditions. We suggest that most training-related changes occurred at higher level task-specific cognitive processes in both groups. However, these were enhanced by high quality perceptual representations in the normal-hearing group. In contrast, some training-related changes have also occurred at the level of phonemic representations in the ARHL group, consistent with an interaction between bottom-up and top-down processes. PMID:26869944
Audiovisual cues and perceptual learning of spectrally distorted speech.
Pilling, Michael; Thomas, Sharon
2011-12-01
Two experiments investigate the effectiveness of audiovisual (AV) speech cues (cues derived from both seeing and hearing a talker speak) in facilitating perceptual learning of spectrally distorted speech. Speech was distorted through an eight channel noise-vocoder which shifted the spectral envelope of the speech signal to simulate the properties of a cochlear implant with a 6 mm place mismatch: Experiment I found that participants showed significantly greater improvement in perceiving noise-vocoded speech when training gave AV cues than when it gave auditory cues alone. Experiment 2 compared training with AV cues with training which gave written feedback. These two methods did not significantly differ in the pattern of training they produced. Suggestions are made about the types of circumstances in which the two training methods might be found to differ in facilitating auditory perceptual learning of speech.
Towards Artificial Speech Therapy: A Neural System for Impaired Speech Segmentation.
Iliya, Sunday; Neri, Ferrante
2016-09-01
This paper presents a neural system-based technique for segmenting short impaired speech utterances into silent, unvoiced, and voiced sections. Moreover, the proposed technique identifies those points of the (voiced) speech where the spectrum becomes steady. The resulting technique thus aims at detecting that limited section of the speech which contains the information about the potential impairment of the speech. This section is of interest to the speech therapist as it corresponds to the possibly incorrect movements of speech organs (lower lip and tongue with respect to the vocal tract). Two segmentation models to detect and identify the various sections of the disordered (impaired) speech signals have been developed and compared. The first makes use of a combination of four artificial neural networks. The second is based on a support vector machine (SVM). The SVM has been trained by means of an ad hoc nested algorithm whose outer layer is a metaheuristic while the inner layer is a convex optimization algorithm. Several metaheuristics have been tested and compared leading to the conclusion that some variants of the compact differential evolution (CDE) algorithm appears to be well-suited to address this problem. Numerical results show that the SVM model with a radial basis function is capable of effective detection of the portion of speech that is of interest to a therapist. The best performance has been achieved when the system is trained by the nested algorithm whose outer layer is hybrid-population-based/CDE. A population-based approach displays the best performance for the isolation of silence/noise sections, and the detection of unvoiced sections. On the other hand, a compact approach appears to be clearly well-suited to detect the beginning of the steady state of the voiced signal. Both the proposed segmentation models display outperformed two modern segmentation techniques based on Gaussian mixture model and deep learning.
Evidence-Based Speech-Language Pathology Practices in Schools: Findings from a National Survey
ERIC Educational Resources Information Center
Hoffman, LaVae M.; Ireland, Marie; Hall-Mills, Shannon; Flynn, Perry
2013-01-01
Purpose: This study documented evidence-based practice (EBP) patterns as reported by speech-language pathologists (SLPs) employed in public schools during 2010-2011. Method: Using an online survey, practioners reported their EBP training experiences, resources available in their workplaces, and the frequency with which they engage in specific EBP…
Robust estimators for speech enhancement in real environments
NASA Astrophysics Data System (ADS)
Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly
2015-09-01
Common statistical estimators for speech enhancement rely on several assumptions about stationarity of speech signals and noise. These assumptions may not always valid in real-life due to nonstationary characteristics of speech and noise processes. We propose new estimators based on existing estimators by incorporation of computation of rank-order statistics. The proposed estimators are better adapted to non-stationary characteristics of speech signals and noise processes. Through computer simulations we show that the proposed estimators yield a better performance in terms of objective metrics than that of known estimators when speech signals are contaminated with airport, babble, restaurant, and train-station noise.
Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.
2014-01-01
In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331
Adaptation to spectrally-rotated speech.
Green, Tim; Rosen, Stuart; Faulkner, Andrew; Paterson, Ruth
2013-08-01
Much recent interest surrounds listeners' abilities to adapt to various transformations that distort speech. An extreme example is spectral rotation, in which the spectrum of low-pass filtered speech is inverted around a center frequency (2 kHz here). Spectral shape and its dynamics are completely altered, rendering speech virtually unintelligible initially. However, intonation, rhythm, and contrasts in periodicity and aperiodicity are largely unaffected. Four normal hearing adults underwent 6 h of training with spectrally-rotated speech using Continuous Discourse Tracking. They and an untrained control group completed pre- and post-training speech perception tests, for which talkers differed from the training talker. Significantly improved recognition of spectrally-rotated sentences was observed for trained, but not untrained, participants. However, there were no significant improvements in the identification of medial vowels in /bVd/ syllables or intervocalic consonants. Additional tests were performed with speech materials manipulated so as to isolate the contribution of various speech features. These showed that preserving intonational contrasts did not contribute to the comprehension of spectrally-rotated speech after training, and suggested that improvements involved adaptation to altered spectral shape and dynamics, rather than just learning to focus on speech features relatively unaffected by the transformation.
Temporal plasticity in auditory cortex improves neural discrimination of speech sounds
Engineer, Crystal T.; Shetake, Jai A.; Engineer, Navzer D.; Vrana, Will A.; Wolf, Jordan T.; Kilgard, Michael P.
2017-01-01
Background Many individuals with language learning impairments exhibit temporal processing deficits and degraded neural responses to speech sounds. Auditory training can improve both the neural and behavioral deficits, though significant deficits remain. Recent evidence suggests that vagus nerve stimulation (VNS) paired with rehabilitative therapies enhances both cortical plasticity and recovery of normal function. Objective/Hypothesis We predicted that pairing VNS with rapid tone trains would enhance the primary auditory cortex (A1) response to unpaired novel speech sounds. Methods VNS was paired with tone trains 300 times per day for 20 days in adult rats. Responses to isolated speech sounds, compressed speech sounds, word sequences, and compressed word sequences were recorded in A1 following the completion of VNS-tone train pairing. Results Pairing VNS with rapid tone trains resulted in stronger, faster, and more discriminable A1 responses to speech sounds presented at conversational rates. Conclusion This study extends previous findings by documenting that VNS paired with rapid tone trains altered the neural response to novel unpaired speech sounds. Future studies are necessary to determine whether pairing VNS with appropriate auditory stimuli could potentially be used to improve both neural responses to speech sounds and speech perception in individuals with receptive language disorders. PMID:28131520
Team Training through Communications Control
1982-02-01
training * operational environment * team training research issues * training approach * team communications * models of operator beharior e...on the market soon, it certainly would be investigated carefully for its applicability to the team training problem. ce A text-to-speech voice...generation system. Votrax has recently marketed such a device, and others may soon follow suit. ’ d. A speech replay system designed to produce speech from
Cleft audit protocol for speech (CAPS-A): a comprehensive training package for speech analysis.
Sell, D; John, A; Harding-Bell, A; Sweeney, T; Hegarty, F; Freeman, J
2009-01-01
The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been paid to this issue. To design, execute, and evaluate a training programme for speech and language therapists on the systematic and reliable use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A), addressing issues of standardized speech samples, data acquisition, recording, playback, and listening guidelines. Thirty-six specialist speech and language therapists undertook the training programme over four days. This consisted of two days' training on the CAPS-A tool followed by a third day, making independent ratings and transcriptions on ten new cases which had been previously recorded during routine audit data collection. This task was repeated on day 4, a minimum of one month later. Ratings were made using the CAPS-A record form with the CAPS-A definition table. An analysis was made of the speech and language therapists' CAPS-A ratings at occasion 1 and occasion 2 and the intra- and inter-rater reliability calculated. Trained therapists showed consistency in individual judgements on specific sections of the tool. Intraclass correlation coefficients were calculated for each section with good agreement on eight of 13 sections. There were only fair levels of agreement on anterior oral cleft speech characteristics, non-cleft errors/immaturities and voice. This was explained, at least in part, by their low prevalence which affects the calculation of the intraclass correlation coefficient statistic. Speech and language therapists benefited from training on the CAPS-A, focusing on specific aspects of speech using definitions of parameters and scalar points, in order to apply the tool systematically and reliably. Ratings are enhanced by ensuring a high degree of attention to the nature of the data, standardizing the speech sample, data acquisition, the listening process together with the use of high-quality recording and playback equipment. In addition, a method is proposed for maintaining listening skills following training as part of an individual's continuing education.
Training in the Community-Collaborative Context: A Case Study
ERIC Educational Resources Information Center
Yamada, Racquel-María
2014-01-01
Emerging community-based methodologies call for collaboration with speech community members. Although motivated, community members may lack the tools or training to contribute actively. In response, many linguists deliver training workshops in documentation or preservation, while others train community members to record data. Although workshops…
Asynchronous sampling of speech with some vocoder experimental results
NASA Technical Reports Server (NTRS)
Babcock, M. L.
1972-01-01
The method of asynchronously sampling speech is based upon the derivatives of the acoustical speech signal. The following results are apparent from experiments to date: (1) It is possible to represent speech by a string of pulses of uniform amplitude, where the only information contained in the string is the spacing of the pulses in time; (2) the string of pulses may be produced in a simple analog manner; (3) the first derivative of the original speech waveform is the most important for the encoding process; (4) the resulting pulse train can be utilized to control an acoustical signal production system to regenerate the intelligence of the original speech.
Bernstein, Lynne E.; Eberhardt, Silvio P.; Auer, Edward T.
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training. PMID:25206344
Bernstein, Lynne E; Eberhardt, Silvio P; Auer, Edward T
2014-01-01
Training with audiovisual (AV) speech has been shown to promote auditory perceptual learning of vocoded acoustic speech by adults with normal hearing. In Experiment 1, we investigated whether AV speech promotes auditory-only (AO) perceptual learning in prelingually deafened adults with late-acquired cochlear implants. Participants were assigned to learn associations between spoken disyllabic C(=consonant)V(=vowel)CVC non-sense words and non-sense pictures (fribbles), under AV and then AO (AV-AO; or counter-balanced AO then AV, AO-AV, during Periods 1 then 2) training conditions. After training on each list of paired-associates (PA), testing was carried out AO. Across all training, AO PA test scores improved (7.2 percentage points) as did identification of consonants in new untrained CVCVC stimuli (3.5 percentage points). However, there was evidence that AV training impeded immediate AO perceptual learning: During Period-1, training scores across AV and AO conditions were not different, but AO test scores were dramatically lower in the AV-trained participants. During Period-2 AO training, the AV-AO participants obtained significantly higher AO test scores, demonstrating their ability to learn the auditory speech. Across both orders of training, whenever training was AV, AO test scores were significantly lower than training scores. Experiment 2 repeated the procedures with vocoded speech and 43 normal-hearing adults. Following AV training, their AO test scores were as high as or higher than following AO training. Also, their CVCVC identification scores patterned differently than those of the cochlear implant users. In Experiment 1, initial consonants were most accurate, and in Experiment 2, medial consonants were most accurate. We suggest that our results are consistent with a multisensory reverse hierarchy theory, which predicts that, whenever possible, perceivers carry out perceptual tasks immediately based on the experience and biases they bring to the task. We point out that while AV training could be an impediment to immediate unisensory perceptual learning in cochlear implant patients, it was also associated with higher scores during training.
Speech reconstruction using a deep partially supervised neural network.
McLoughlin, Ian; Li, Jingjie; Song, Yan; Sharifzadeh, Hamid R
2017-08-01
Statistical speech reconstruction for larynx-related dysphonia has achieved good performance using Gaussian mixture models and, more recently, restricted Boltzmann machine arrays; however, deep neural network (DNN)-based systems have been hampered by the limited amount of training data available from individual voice-loss patients. The authors propose a novel DNN structure that allows a partially supervised training approach on spectral features from smaller data sets, yielding very good results compared with the current state-of-the-art.
Automated Intelligibility Assessment of Pathological Speech Using Phonological Features
NASA Astrophysics Data System (ADS)
Middag, Catherine; Martens, Jean-Pierre; Van Nuffelen, Gwen; De Bodt, Marc
2009-12-01
It is commonly acknowledged that word or phoneme intelligibility is an important criterion in the assessment of the communication efficiency of a pathological speaker. People have therefore put a lot of effort in the design of perceptual intelligibility rating tests. These tests usually have the drawback that they employ unnatural speech material (e.g., nonsense words) and that they cannot fully exclude errors due to listener bias. Therefore, there is a growing interest in the application of objective automatic speech recognition technology to automate the intelligibility assessment. Current research is headed towards the design of automated methods which can be shown to produce ratings that correspond well with those emerging from a well-designed and well-performed perceptual test. In this paper, a novel methodology that is built on previous work (Middag et al., 2008) is presented. It utilizes phonological features, automatic speech alignment based on acoustic models that were trained on normal speech, context-dependent speaker feature extraction, and intelligibility prediction based on a small model that can be trained on pathological speech samples. The experimental evaluation of the new system reveals that the root mean squared error of the discrepancies between perceived and computed intelligibilities can be as low as 8 on a scale of 0 to 100.
Serpanos, Yula C; Senzer, Deborah
2015-05-01
This study presents a piloted training model of experiential instruction in outer and middle ear (OE-ME) screening for graduate speech-language pathology students with peer teaching by doctor of audiology (AuD) students. Six individual experiential training sessions in screening otoscopy and tympanometry were conducted for 36 graduate-level speech-language pathology students led by a supervised AuD student. Postexperiential training, survey outcomes from 24 speech-language pathology students revealed a significant improvement (p = .01) in perceptions of attaining adequate knowledge and comfort in performing screening otoscopy (handheld and video otoscopy) and tympanometry. In a group of matched controls who did not receive experiential training in OE-ME screening (n = 24), ratings on the same learning outcomes survey in otoscopy and tympanometry were significantly poorer (p = .01) compared with students who did receive experiential training. A training model of experiential instruction for speech-language pathology students by AuD students improved learning outcomes, illustrating its promise in affecting clinical practices. The instructional model also meets the Council on Academic Accreditation in Audiology and Speech-Language Pathology (CAA; American Speech-Language-Hearing Association, 2008) and American Speech-Language-Hearing Association (2014) Certificate of Clinical Competence (ASHA CCC) standards for speech-language pathology in OE-ME screening and CAA (2008) and ASHA (2012) CCC standards in the supervisory process for audiology.
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis
ERIC Educational Resources Information Center
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
2009-01-01
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
A multilingual audiometer simulator software for training purposes.
Kompis, Martin; Steffen, Pascal; Caversaccio, Marco; Brugger, Urs; Oesch, Ivo
2012-04-01
A set of algorithms, which allows a computer to determine the answers of simulated patients during pure tone and speech audiometry, is presented. Based on these algorithms, a computer program for training in audiometry was written and found to be useful for teaching purposes. To develop a flexible audiometer simulator software as a teaching and training tool for pure tone and speech audiometry, both with and without masking. First a set of algorithms, which allows a computer to determine the answers of a simulated, hearing-impaired patient, was developed. Then, the software was implemented. Extensive use was made of simple, editable text files to define all texts in the user interface and all patient definitions. The software 'audiometer simulator' is available for free download. It can be used to train pure tone audiometry (both with and without masking), speech audiometry, measurement of the uncomfortable level, and simple simulation tests. Due to the use of text files, the user can alter or add patient definitions and all texts and labels shown on the screen. So far, English, French, German, and Portuguese user interfaces are available and the user can choose between German or French speech audiometry.
Video Game Rehabilitation of Velopharyngeal Dysfunction: A Case Series.
Cler, Gabriel J; Mittelman, Talia; Braden, Maia N; Woodnorth, Geralyn Harvey; Stepp, Cara E
2017-06-22
Video games provide a promising platform for rehabilitation of speech disorders. Although video games have been used to train speech perception in foreign language learners and have been proposed for aural rehabilitation, their use in speech therapy has been limited thus far. We present feasibility results from at-home use in a case series of children with velopharyngeal dysfunction (VPD) using an interactive video game that provided real-time biofeedback to facilitate appropriate nasalization. Five participants were recruited across a range of ages, VPD severities, and VPD etiologies. Participants completed multiple weeks of individual game play with a video game that provides feedback on nasalization measured via nasal accelerometry. Nasalization was assessed before and after training by using nasometry, aerodynamic measures, and expert perceptual judgments. Four participants used the game at home or school, with the remaining participant unwilling to have the nasal accelerometer secured to his nasal skin, perhaps due to his young age. The remaining participants showed a tendency toward decreased nasalization after training, particularly for the words explicitly trained in the video game. Results suggest that video game-based systems may provide a useful rehabilitation platform for providing real-time feedback of speech nasalization in VPD. https://doi.org/10.23641/asha.5116828.
Abdul Aziz, Safiyyah; Fletcher, Janet; Bayliss, Donna M
2016-08-01
Self-regulatory speech has been shown to be important for the planning and problem solving of children. Our intervention study, including comparisons to both wait-list and typically developing controls, examined the effectiveness of a training programme designed to improve self-regulatory speech, and consequently, the planning and problem solving performance of 87 (60 males, 27 females) children aged 4-7 years with Specific Language Impairment (SLI) who were delayed in their self-regulatory speech development. The self-regulatory speech and Tower of London (TOL) performance of children with SLI who received the intervention initially or after a waiting period was compared with that of 80 (48 male, 32 female) typically developing children who did not receive any intervention. Children were tested at three time points: Time 1- prior to intervention; Time 2 - after the first SLI group had received training and the second SLI group provided a wait-list control; and Time 3 - when the second SLI group had received training. At Time 1 children with SLI produced less self-regulatory speech and were impaired on the TOL relative to the typically developing children. At Time 2, the TOL performance of children with SLI in the first training group improved significantly, whereas there was no improvement for the second training group (the wait-list group). At Time 3, the second training group improved their TOL performance and the first group maintained their performance. No significant differences in TOL performance were evident between typically developing children and those with SLI at Time 3. Moreover, decreases in social speech and increases in inaudible muttering following self-regulatory speech training were associated with improvements in TOL performance. Together, the results show that self-regulatory speech training was effective in increasing self-regulatory speech and in improving planning and problem solving performance in children with SLI.
Deep Learning Based Binaural Speech Separation in Reverberant Environments.
Zhang, Xueliang; Wang, DeLiang
2017-05-01
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features. The training target is the recently suggested ideal ratio mask. Systematic evaluations and comparisons show that the proposed system achieves very good separation performance and substantially outperforms related algorithms under challenging multi-source and reverberant environments.
Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan
2017-02-01
Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
How Speech Communication Training Interfaces with Public Relations Training.
ERIC Educational Resources Information Center
Bosley, Phyllis B.
Speech communication training is a valuable asset for those entering the public relations (PR) field. This notion is reinforced by the 1987 "Design for Undergraduate Public Relations Education," a guide for implementing speech communication courses within a public relations curriculum, and also in the incorporation of oral communication training…
Survey of Speech-Language Pathology Graduate Program Training in Outer and Middle Ear Screening.
Serpanos, Yula C; Senzer, Deborah
2015-08-01
The purpose of this study was to determine the national training practices of speech-language pathology graduate programs in outer and middle ear screening. Directors of all American Speech-Language-Hearing Association-accredited speech-language pathology graduate programs (N = 254; Council on Academic Accreditation in Audiology and Speech-Language Pathology, 2013) were surveyed on instructional formats in outer and middle ear screening. The graduate speech-language pathology program survey yielded 84 (33.1%) responses. Results indicated that some programs do not provide any training in the areas of conventional screening otoscopy using a handheld otoscope (15.5%; n = 13) or screening tympanometry (11.9%; n = 10), whereas close to one half (46.4%; n = 39) reported no training in screening video otoscopy. Outcomes revealed that approximately one third or more of speech-language pathology graduate programs do not provide experiential opportunities in screening handheld otoscopy (36.9%) or tympanometry (32.1%), and most (78.6%) do not provide experiential opportunities in video otoscopy. The implication from the graduate speech-language pathology program survey findings is that some speech-language pathologists will graduate from academic programs without the acquired knowledge or experiential learning required to establish skill in 1 or more areas of screening otoscopy and tympanometry. Graduate speech-language pathology programs should consider appropriate training opportunities for students to acquire and demonstrate skill in outer and middle ear screening.
ERIC Educational Resources Information Center
Lu, Shuang
2013-01-01
The relationship between speech perception and production has been debated for a long time. The Motor Theory of speech perception (Liberman et al., 1989) claims that perceiving speech is identifying the intended articulatory gestures rather than perceiving the sound patterns. It seems to suggest that speech production precedes speech perception,…
ERIC Educational Resources Information Center
Zaccagnini, Cindy M.; Antia, Shirin D.
1993-01-01
This study of the effects of intensive multisensory speech training on the speech production of a profoundly hearing-impaired child (age nine) found that the addition of Visual Phonics hand cues did not result in speech production gains. All six target phonemes were generalized to new words and maintained after the intervention was discontinued.…
Automatic measurement of voice onset time using discriminative structured prediction.
Sonderegger, Morgan; Keshet, Joseph
2012-12-01
A discriminative large-margin algorithm for automatic measurement of voice onset time (VOT) is described, considered as a case of predicting structured output from speech. Manually labeled data are used to train a function that takes as input a speech segment of an arbitrary length containing a voiceless stop, and outputs its VOT. The function is explicitly trained to minimize the difference between predicted and manually measured VOT; it operates on a set of acoustic feature functions designed based on spectral and temporal cues used by human VOT annotators. The algorithm is applied to initial voiceless stops from four corpora, representing different types of speech. Using several evaluation methods, the algorithm's performance is near human intertranscriber reliability, and compares favorably with previous work. Furthermore, the algorithm's performance is minimally affected by training and testing on different corpora, and remains essentially constant as the amount of training data is reduced to 50-250 manually labeled examples, demonstrating the method's practical applicability to new datasets.
Biological impact of preschool music classes on processing speech in noise
Strait, Dana L.; Parbery-Clark, Alexandra; O’Connell, Samantha; Kraus, Nina
2013-01-01
Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural enhancements in musically trained preschoolers for processing speech in noise. Longitudinal outcomes reveal that children enrolled in music classes experience further increased neural resilience to background noise following one year of continued training compared to nonmusician peers. Together, these data reveal enhanced development of neural mechanisms undergirding speech-in-noise perception in preschoolers undergoing music training and may indicate a biological impact of music training on auditory function during early childhood. PMID:23872199
Biological impact of preschool music classes on processing speech in noise.
Strait, Dana L; Parbery-Clark, Alexandra; O'Connell, Samantha; Kraus, Nina
2013-10-01
Musicians have increased resilience to the effects of noise on speech perception and its neural underpinnings. We do not know, however, how early in life these enhancements arise. We compared auditory brainstem responses to speech in noise in 32 preschool children, half of whom were engaged in music training. Thirteen children returned for testing one year later, permitting the first longitudinal assessment of subcortical auditory function with music training. Results indicate emerging neural enhancements in musically trained preschoolers for processing speech in noise. Longitudinal outcomes reveal that children enrolled in music classes experience further increased neural resilience to background noise following one year of continued training compared to nonmusician peers. Together, these data reveal enhanced development of neural mechanisms undergirding speech-in-noise perception in preschoolers undergoing music training and may indicate a biological impact of music training on auditory function during early childhood. Copyright © 2013 Elsevier Ltd. All rights reserved.
Comparison of Two Music Training Approaches on Music and Speech Perception in Cochlear Implant Users
Fuller, Christina D.; Galvin, John J.; Maat, Bert; Başkent, Deniz; Free, Rolien H.
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users. PMID:29621947
Fuller, Christina D; Galvin, John J; Maat, Bert; Başkent, Deniz; Free, Rolien H
2018-01-01
In normal-hearing (NH) adults, long-term music training may benefit music and speech perception, even when listening to spectro-temporally degraded signals as experienced by cochlear implant (CI) users. In this study, we compared two different music training approaches in CI users and their effects on speech and music perception, as it remains unclear which approach to music training might be best. The approaches differed in terms of music exercises and social interaction. For the pitch/timbre group, melodic contour identification (MCI) training was performed using computer software. For the music therapy group, training involved face-to-face group exercises (rhythm perception, musical speech perception, music perception, singing, vocal emotion identification, and music improvisation). For the control group, training involved group nonmusic activities (e.g., writing, cooking, and woodworking). Training consisted of weekly 2-hr sessions over a 6-week period. Speech intelligibility in quiet and noise, vocal emotion identification, MCI, and quality of life (QoL) were measured before and after training. The different training approaches appeared to offer different benefits for music and speech perception. Training effects were observed within-domain (better MCI performance for the pitch/timbre group), with little cross-domain transfer of music training (emotion identification significantly improved for the music therapy group). While training had no significant effect on QoL, the music therapy group reported better perceptual skills across training sessions. These results suggest that more extensive and intensive training approaches that combine pitch training with the social aspects of music therapy may further benefit CI users.
Sullivan, Jessica R.; Thibodeau, Linda M.; Assmann, Peter F.
2013-01-01
Previous studies have indicated that individuals with normal hearing (NH) experience a perceptual advantage for speech recognition in interrupted noise compared to continuous noise. In contrast, adults with hearing impairment (HI) and younger children with NH receive a minimal benefit. The objective of this investigation was to assess whether auditory training in interrupted noise would improve speech recognition in noise for children with HI and perhaps enhance their utilization of glimpsing skills. A partially-repeated measures design was used to evaluate the effectiveness of seven 1-h sessions of auditory training in interrupted and continuous noise. Speech recognition scores in interrupted and continuous noise were obtained from pre-, post-, and 3 months post-training from 24 children with moderate-to-severe hearing loss. Children who participated in auditory training in interrupted noise demonstrated a significantly greater improvement in speech recognition compared to those who trained in continuous noise. Those who trained in interrupted noise demonstrated similar improvements in both noise conditions while those who trained in continuous noise only showed modest improvements in the interrupted noise condition. This study presents direct evidence that auditory training in interrupted noise can be beneficial in improving speech recognition in noise for children with HI. PMID:23297921
Texting while driving: is speech-based text entry less risky than handheld text entry?
He, J; Chaparro, A; Nguyen, B; Burge, R J; Crandall, J; Chaparro, B; Ni, R; Cao, S
2014-11-01
Research indicates that using a cell phone to talk or text while maneuvering a vehicle impairs driving performance. However, few published studies directly compare the distracting effects of texting using a hands-free (i.e., speech-based interface) versus handheld cell phone, which is an important issue for legislation, automotive interface design and driving safety training. This study compared the effect of speech-based versus handheld text entries on simulated driving performance by asking participants to perform a car following task while controlling the duration of a secondary text-entry task. Results showed that both speech-based and handheld text entries impaired driving performance relative to the drive-only condition by causing more variation in speed and lane position. Handheld text entry also increased the brake response time and increased variation in headway distance. Text entry using a speech-based cell phone was less detrimental to driving performance than handheld text entry. Nevertheless, the speech-based text entry task still significantly impaired driving compared to the drive-only condition. These results suggest that speech-based text entry disrupts driving, but reduces the level of performance interference compared to text entry with a handheld device. In addition, the difference in the distraction effect caused by speech-based and handheld text entry is not simply due to the difference in task duration. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Bedwinek, Anne P.; Kummer, Ann W.; Rice, Gale B.; Grames, Lynn Marty
2010-01-01
Purpose: The purpose of this study was to obtain information regarding the education and experience of preschool and school-based speech-language pathologists (SLPs) regarding the assessment and treatment of children born with cleft lip and/or palate and to determine their continuing education needs in this area. Method: A 16-item mixed-methods…
Fuller, Christina; Free, Rolien; Maat, Bert; Başkent, Deniz
2012-08-01
In normal-hearing listeners, musical background has been observed to change the sound representation in the auditory system and produce enhanced performance in some speech perception tests. Based on these observations, it has been hypothesized that musical background can influence sound and speech perception, and as an extension also the quality of life, by cochlear-implant users. To test this hypothesis, this study explored musical background [using the Dutch Musical Background Questionnaire (DMBQ)], and self-perceived sound and speech perception and quality of life [using the Nijmegen Cochlear Implant Questionnaire (NCIQ) and the Speech Spatial and Qualities of Hearing Scale (SSQ)] in 98 postlingually deafened adult cochlear-implant recipients. In addition to self-perceived measures, speech perception scores (percentage of phonemes recognized in words presented in quiet) were obtained from patient records. The self-perceived hearing performance was associated with the objective speech perception. Forty-one respondents (44% of 94 respondents) indicated some form of formal musical training. Fifteen respondents (18% of 83 respondents) judged themselves as having musical training, experience, and knowledge. No association was observed between musical background (quantified by DMBQ), and self-perceived hearing-related performance or quality of life (quantified by NCIQ and SSQ), or speech perception in quiet.
Eberhardt, Silvio P; Auer, Edward T; Bernstein, Lynne E
2014-01-01
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee's primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee's lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT).
Eberhardt, Silvio P.; Auer Jr., Edward T.; Bernstein, Lynne E.
2014-01-01
In a series of studies we have been investigating how multisensory training affects unisensory perceptual learning with speech stimuli. Previously, we reported that audiovisual (AV) training with speech stimuli can promote auditory-only (AO) perceptual learning in normal-hearing adults but can impede learning in congenitally deaf adults with late-acquired cochlear implants. Here, impeder and promoter effects were sought in normal-hearing adults who participated in lipreading training. In Experiment 1, visual-only (VO) training on paired associations between CVCVC nonsense word videos and nonsense pictures demonstrated that VO words could be learned to a high level of accuracy even by poor lipreaders. In Experiment 2, visual-auditory (VA) training in the same paradigm but with the addition of synchronous vocoded acoustic speech impeded VO learning of the stimuli in the paired-associates paradigm. In Experiment 3, the vocoded AO stimuli were shown to be less informative than the VO speech. Experiment 4 combined vibrotactile speech stimuli with the visual stimuli during training. Vibrotactile stimuli were shown to promote visual perceptual learning. In Experiment 5, no-training controls were used to show that training with visual speech carried over to consonant identification of untrained CVCVC stimuli but not to lipreading words in sentences. Across this and previous studies, multisensory training effects depended on the functional relationship between pathways engaged during training. Two principles are proposed to account for stimulus effects: (1) Stimuli presented to the trainee’s primary perceptual pathway will impede learning by a lower-rank pathway. (2) Stimuli presented to the trainee’s lower rank perceptual pathway will promote learning by a higher-rank pathway. The mechanisms supporting these principles are discussed in light of multisensory reverse hierarchy theory (RHT). PMID:25400566
Advancements in robust algorithm formulation for speaker identification of whispered speech
NASA Astrophysics Data System (ADS)
Fan, Xing
Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.
Perceptual Learning of Time-Compressed Speech: More than Rapid Adaptation
Banai, Karen; Lavner, Yizhar
2012-01-01
Background Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only. Methodology/Principal Findings Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer. Conclusions/Significance Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition. PMID:23056592
The influence of listener experience and academic training on ratings of nasality.
Lewis, Kerry E; Watterson, Thomas L; Houghton, Sarah M
2003-01-01
This study assessed listener agreement levels for nasality ratings, and the strength of relationship between nasality ratings and nasalance scores on one hand, and listener clinical experience and formal academic training in cleft palate speech on the other. The listeners were 12 adults who represented four levels of clinical experience and academic training in cleft palate speech. Three listeners were teachers with no clinical experience and no academic training (TR), three were graduate students in speech-language pathology (GS) with academic training but no clinical experience, three were craniofacial surgeons (MD) with extensive experience listening to cleft palate speech but with no academic training in speech disorders, and three were certified speech-language pathologists (SLP) with both extensive academic training and clinical experience. The speech samples were audio recordings from 20 persons representing a range of nasality from normal to severely hypernasal. Nasalance scores were obtained simultaneously with the audio recordings. Results revealed that agreement levels for nasality ratings were highest for the SLPs, followed by the MDs. Thus, the more experienced groups tended to be more reliable. Mean nasality ratings obtained for each of the rater groups revealed an inverse relationship with experience. That is, the two groups with clinical experience (SLP and MD) tended to rate nasality lower than the two groups without experience (GS and TR). Correlation coefficients between nasalance scores and nasality judgments were low to moderate for all groups and did not follow a pattern. EDUCATIONAL OUTCOMES: As a result of this activity, the reader will be able to (1) describe the influence of listener experience and academic training in cleft palate speech on perceptual ratings of nasality. (2) describe the influence of experience and training on the nasality/nasalance relationship and, (3) compare the present findings to previous findings reported in the literature.
Replacing Maladaptive Speech with Verbal Labeling Responses: An Analysis of Generalized Responding.
ERIC Educational Resources Information Center
Foxx, R. M.; And Others
1988-01-01
Three mentally handicapped students (aged 13, 36, and 40) with maladaptive speech received training to answer questions with verbal labels. The results of their cues-pause-point training showed that the students replaced their maladaptive speech with correct labels (answers) to questions in the training setting and three generalization settings.…
Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome
Engineer, Crystal T.; Rahebi, Kimiya C.; Borland, Michael S.; Buell, Elizabeth P.; Centanni, Tracy M.; Fink, Melyssa K.; Im, Kwok W.; Wilson, Linda G.; Kilgard, Michael P.
2015-01-01
Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial. PMID:26321676
Remote voice training: A case study on space shuttle applications, appendix C
NASA Technical Reports Server (NTRS)
Mollakarimi, Cindy; Hamid, Tamin
1990-01-01
The Tile Automation System includes applications of automation and robotics technology to all aspects of the Shuttle tile processing and inspection system. An integrated set of rapid prototyping testbeds was developed which include speech recognition and synthesis, laser imaging systems, distributed Ada programming environments, distributed relational data base architectures, distributed computer network architectures, multi-media workbenches, and human factors considerations. Remote voice training in the Tile Automation System is discussed. The user is prompted over a headset by synthesized speech for the training sequences. The voice recognition units and the voice output units are remote from the user and are connected by Ethernet to the main computer system. A supervisory channel is used to monitor the training sequences. Discussions include the training approaches as well as the human factors problems and solutions for this system utilizing remote training techniques.
Voice Interactive Analysis System Study. Final Report, August 28, 1978 through March 23, 1979.
ERIC Educational Resources Information Center
Harry, D. P.; And Others
The Voice Interactive Analysis System study continued research and development of the LISTEN real-time, minicomputer based connected speech recognition system, within NAVTRAEQUIPCEN'S program of developing automatic speech technology in support of training. An attempt was made to identify the most effective features detected by the TTI-500 model…
Perceptual Learning of Interrupted Speech
Benard, Michel Ruben; Başkent, Deniz
2013-01-01
The intelligibility of periodically interrupted speech improves once the silent gaps are filled with noise bursts. This improvement has been attributed to phonemic restoration, a top-down repair mechanism that helps intelligibility of degraded speech in daily life. Two hypotheses were investigated using perceptual learning of interrupted speech. If different cognitive processes played a role in restoring interrupted speech with and without filler noise, the two forms of speech would be learned at different rates and with different perceived mental effort. If the restoration benefit were an artificial outcome of using the ecologically invalid stimulus of speech with silent gaps, this benefit would diminish with training. Two groups of normal-hearing listeners were trained, one with interrupted sentences with the filler noise, and the other without. Feedback was provided with the auditory playback of the unprocessed and processed sentences, as well as the visual display of the sentence text. Training increased the overall performance significantly, however restoration benefit did not diminish. The increase in intelligibility and the decrease in perceived mental effort were relatively similar between the groups, implying similar cognitive mechanisms for the restoration of the two types of interruptions. Training effects were generalizable, as both groups improved their performance also with the other form of speech than that they were trained with, and retainable. Due to null results and relatively small number of participants (10 per group), further research is needed to more confidently draw conclusions. Nevertheless, training with interrupted speech seems to be effective, stimulating participants to more actively and efficiently use the top-down restoration. This finding further implies the potential of this training approach as a rehabilitative tool for hearing-impaired/elderly populations. PMID:23469266
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.
The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
Yuen, Kevin C P
2014-01-01
One of the recent developments in the education of speech-language pathology is to include literacy disorders and learning disabilities as key training components in the training curriculum. Disorders in reading and writing are interwoven with disorders in speaking and listening, which should be managed holistically, particularly in children and adolescents. With extensive training in clinical linguistics, language disorders, and other theoretical knowledge and clinical skills, speech-language pathologists (SLPs) are the best equipped and most competent professionals to screen, identify, diagnose, and manage individuals with literacy disorders. To tackle the challenges of and the huge demand for services in literacy as well as language and learning disorders, the Hong Kong Institute of Education has recently developed the Master of Science Programme in Educational Speech-Language Pathology and Learning Disabilities, which is one of the very first speech-language pathology training programmes in Asia to blend training components of learning disabilities, literacy disorders, and social-emotional-behavioural-developmental disabilities into a developmentally and medically oriented speech-language pathology training programme. This new training programme aims to prepare a new generation of SLPs to be able to offer comprehensive support to individuals with speech, language, literacy, learning, communication, and swallowing disorders of different developmental or neurogenic origins, particularly to infants and adolescents as well as to their family and educational team. © 2015 S. Karger AG, Basel.
Effects of intelligibility on working memory demand for speech perception.
Francis, Alexander L; Nusbaum, Howard C
2009-08-01
Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.
Speech sound discrimination training improves auditory cortex responses in a rat model of autism
Engineer, Crystal T.; Centanni, Tracy M.; Im, Kwok W.; Kilgard, Michael P.
2014-01-01
Children with autism often have language impairments and degraded cortical responses to speech. Extensive behavioral interventions can improve language outcomes and cortical responses. Prenatal exposure to the antiepileptic drug valproic acid (VPA) increases the risk for autism and language impairment. Prenatal exposure to VPA also causes weaker and delayed auditory cortex responses in rats. In this study, we document speech sound discrimination ability in VPA exposed rats and document the effect of extensive speech training on auditory cortex responses. VPA exposed rats were significantly impaired at consonant, but not vowel, discrimination. Extensive speech training resulted in both stronger and faster anterior auditory field (AAF) responses compared to untrained VPA exposed rats, and restored responses to control levels. This neural response improvement generalized to non-trained sounds. The rodent VPA model of autism may be used to improve the understanding of speech processing in autism and contribute to improving language outcomes. PMID:25140133
ERIC Educational Resources Information Center
Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus
2010-01-01
Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent…
Barista: A Framework for Concurrent Speech Processing by USC-SAIL
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.
2016-01-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S
2014-05-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Optimal pattern synthesis for speech recognition based on principal component analysis
NASA Astrophysics Data System (ADS)
Korsun, O. N.; Poliyev, A. V.
2018-02-01
The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.
Geiger, Martha
2012-01-01
The purpose of this paper is to provide a preliminary, qualitative review of an approach to training centre-based carers in supporting basic communication development and providing communication opportunities for the children with severe and profound disabilities in their care. In South Africa, these children are often the most neglected in terms of planning and providing appropriate interventions. For those with severe communication disabilities, an additional lack is in the area of the basic human right to meaningful interactions and communication. Sustainable strategies to provide opportunities for basic communication development of these children are urgently sought. Several effective international and local parent training programmes have been developed, but the urgent need remains to train centre-based carers who are taking care of groups of diversely disabled children in severely under-resourced settings . Non-profit organisations (NPOs) have been exploring practical centre-based approaches to skills sharing in physical rehabilitation, activities for daily living, feeding and support for basic communication development. As a freelance speech therapist contracted by four NPOs to implement hands-on training in basic communication for centre-based carers of non-verbal children, the author describes a training approach that evolved over three years, in collaboration with the carers and centre managements. Implications for training (for speech therapists and for community-based rehabilitation workers) and for further research are identified.
Video Game Rehabilitation of Velopharyngeal Dysfunction: A Case Series
Mittelman, Talia; Braden, Maia N.; Woodnorth, Geralyn Harvey; Stepp, Cara E.
2017-01-01
Purpose Video games provide a promising platform for rehabilitation of speech disorders. Although video games have been used to train speech perception in foreign language learners and have been proposed for aural rehabilitation, their use in speech therapy has been limited thus far. We present feasibility results from at-home use in a case series of children with velopharyngeal dysfunction (VPD) using an interactive video game that provided real-time biofeedback to facilitate appropriate nasalization. Method Five participants were recruited across a range of ages, VPD severities, and VPD etiologies. Participants completed multiple weeks of individual game play with a video game that provides feedback on nasalization measured via nasal accelerometry. Nasalization was assessed before and after training by using nasometry, aerodynamic measures, and expert perceptual judgments. Results Four participants used the game at home or school, with the remaining participant unwilling to have the nasal accelerometer secured to his nasal skin, perhaps due to his young age. The remaining participants showed a tendency toward decreased nasalization after training, particularly for the words explicitly trained in the video game. Conclusion Results suggest that video game–based systems may provide a useful rehabilitation platform for providing real-time feedback of speech nasalization in VPD. Supplemental Material https://doi.org/10.23641/asha.5116828 PMID:28655049
Using visible speech to train perception and production of speech for individuals with hearing loss.
Massaro, Dominic W; Light, Joanna
2004-04-01
The main goal of this study was to implement a computer-animated talking head, Baldi, as a language tutor for speech perception and production for individuals with hearing loss. Baldi can speak slowly; illustrate articulation by making the skin transparent to reveal the tongue, teeth, and palate; and show supplementary articulatory features, such as vibration of the neck to show voicing and turbulent airflow to show frication. Seven students with hearing loss between the ages of 8 and 13 were trained for 6 hours across 21 weeks on 8 categories of segments (4 voiced vs. voiceless distinctions, 3 consonant cluster distinctions, and 1 fricative vs. affricate distinction). Training included practice at the segment and the word level. Perception and production improved for each of the 7 children. Speech production also generalized to new words not included in the training lessons. Finally, speech production deteriorated somewhat after 6 weeks without training, indicating that the training method rather than some other experience was responsible for the improvement that was found.
Sheft, Stanley; Gygi, Brian; Ho, Kim Thien N.
2012-01-01
Perceptual training with spectrally degraded environmental sounds results in improved environmental sound identification, with benefits shown to extend to untrained speech perception as well. The present study extended those findings to examine longer-term training effects as well as effects of mere repeated exposure to sounds over time. Participants received two pretests (1 week apart) prior to a week-long environmental sound training regimen, which was followed by two posttest sessions, separated by another week without training. Spectrally degraded stimuli, processed with a four-channel vocoder, consisted of a 160-item environmental sound test, word and sentence tests, and a battery of basic auditory abilities and cognitive tests. Results indicated significant improvements in all speech and environmental sound scores between the initial pretest and the last posttest with performance increments following both exposure and training. For environmental sounds (the stimulus class that was trained), the magnitude of positive change that accompanied training was much greater than that due to exposure alone, with improvement for untrained sounds roughly comparable to the speech benefit from exposure. Additional tests of auditory and cognitive abilities showed that speech and environmental sound performance were differentially correlated with tests of spectral and temporal-fine-structure processing, whereas working memory and executive function were correlated with speech, but not environmental sound perception. These findings indicate generalizability of environmental sound training and provide a basis for implementing environmental sound training programs for cochlear implant (CI) patients. PMID:22891070
Learning-induced neural plasticity of speech processing before birth
Partanen, Eino; Kujala, Teija; Näätänen, Risto; Liitola, Auli; Sambeth, Anke; Huotilainen, Minna
2013-01-01
Learning, the foundation of adaptive and intelligent behavior, is based on plastic changes in neural assemblies, reflected by the modulation of electric brain responses. In infancy, auditory learning implicates the formation and strengthening of neural long-term memory traces, improving discrimination skills, in particular those forming the prerequisites for speech perception and understanding. Although previous behavioral observations show that newborns react differentially to unfamiliar sounds vs. familiar sound material that they were exposed to as fetuses, the neural basis of fetal learning has not thus far been investigated. Here we demonstrate direct neural correlates of human fetal learning of speech-like auditory stimuli. We presented variants of words to fetuses; unlike infants with no exposure to these stimuli, the exposed fetuses showed enhanced brain activity (mismatch responses) in response to pitch changes for the trained variants after birth. Furthermore, a significant correlation existed between the amount of prenatal exposure and brain activity, with greater activity being associated with a higher amount of prenatal speech exposure. Moreover, the learning effect was generalized to other types of similar speech sounds not included in the training material. Consequently, our results indicate neural commitment specifically tuned to the speech features heard before birth and their memory representations. PMID:23980148
Maruyama, Tsukasa; Takeuchi, Hikaru; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Nouchi, Rui; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta
2018-01-01
Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension.
Zhang, He-Hua; Yang, Liuyang; Liu, Yuchuan; Wang, Pin; Yin, Jun; Li, Yongming; Qiu, Mingguo; Zhu, Xueru; Yan, Fang
2016-11-16
The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined. In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation. Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms. This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.
Beliefs regarding the Impact of Accent within Speech-Language Pathology Practice Areas
ERIC Educational Resources Information Center
Levy, Erika S.; Crowley, Catherine J.
2012-01-01
With the demographic shifts in the United States, it is increasingly the case that speech-language pathologists (SLPs) come from different language backgrounds from those of their clients and have nonnative accents in their languages of service. An anonymous web-based survey was completed by students and clinic directors in SLP training programs…
An Australian survey of parent involvement in intervention for childhood speech sound disorders.
Sugden, Eleanor; Baker, Elise; Munro, Natalie; Williams, A Lynn; Trivette, Carol M
2017-08-17
To investigate how speech-language pathologists (SLPs) report involving parents in intervention for phonology-based speech sound disorders (SSDs), and to describe the home practice that they recommend. Further aims were to describe the training SLPs report providing to parents, to explore SLPs' beliefs and motivations for involving parents in intervention, and to determine whether SLPs' characteristics are associated with their self-reported practice. An online survey of 288 SLPs working with SSD in Australia was conducted. The majority of SLPs (96.4%) reported involving parents in intervention, most commonly in providing home practice. On average, these tasks were recommended to be completed five times per week for 10 min. SLPs reported training parents using a range of training methods, most commonly providing opportunities for parents to observe the SLP conduct the intervention. SLPs' place of work and years of experience were associated with how they involved and trained parents in intervention. Most (95.8%) SLPs agreed or strongly agreed that family involvement is essential for intervention to be effective. Parent involvement and home practice appear to be intricately linked within intervention for phonology-based SSDs in Australia. More high-quality research is needed to understand how to best involve parents within clinical practice.
Gordon, Karen A.; Papsin, Blake C.; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A.
2017-01-01
Objectives: Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Design: Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Results: Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Conclusions: Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation. PMID:28085739
Good, Arla; Gordon, Karen A; Papsin, Blake C; Nespoli, Gabe; Hopyan, Talar; Peretz, Isabelle; Russo, Frank A
Children who use cochlear implants (CIs) have characteristic pitch processing deficits leading to impairments in music perception and in understanding emotional intention in spoken language. Music training for normal-hearing children has previously been shown to benefit perception of emotional prosody. The purpose of the present study was to assess whether deaf children who use CIs obtain similar benefits from music training. We hypothesized that music training would lead to gains in auditory processing and that these gains would transfer to emotional speech prosody perception. Study participants were 18 child CI users (ages 6 to 15). Participants received either 6 months of music training (i.e., individualized piano lessons) or 6 months of visual art training (i.e., individualized painting lessons). Measures of music perception and emotional speech prosody perception were obtained pre-, mid-, and post-training. The Montreal Battery for Evaluation of Musical Abilities was used to measure five different aspects of music perception (scale, contour, interval, rhythm, and incidental memory). The emotional speech prosody task required participants to identify the emotional intention of a semantically neutral sentence under audio-only and audiovisual conditions. Music training led to improved performance on tasks requiring the discrimination of melodic contour and rhythm, as well as incidental memory for melodies. These improvements were predominantly found from mid- to post-training. Critically, music training also improved emotional speech prosody perception. Music training was most advantageous in audio-only conditions. Art training did not lead to the same improvements. Music training can lead to improvements in perception of music and emotional speech prosody, and thus may be an effective supplementary technique for supporting auditory rehabilitation following cochlear implantation.
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.
Patel, Aniruddh D
2011-01-01
Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.
Jeon, Jin Yong; Hong, Joo Young; Jang, Hyung Suk; Kim, Jae Hyeon
2015-12-01
It is necessary to consider not only annoyance of interior noises but also speech privacy to achieve acoustic comfort in a passenger car of a high-speed train because speech from other passengers can be annoying. This study aimed to explore an optimal acoustic environment to satisfy speech privacy and reduce annoyance in a passenger car. Two experiments were conducted using speech sources and compartment noise of a high speed train with varying speech-to-noise ratios (SNRA) and background noise levels (BNL). Speech intelligibility was tested in experiment I, and in experiment II, perceived speech privacy, annoyance, and acoustic comfort of combined sounds with speech and background noise were assessed. The results show that speech privacy and annoyance were significantly influenced by the SNRA. In particular, the acoustic comfort was evaluated as acceptable when the SNRA was less than -6 dB for both speech privacy and noise annoyance. In addition, annoyance increased significantly as the BNL exceeded 63 dBA, whereas the effect of the background-noise level on the speech privacy was not significant. These findings suggest that an optimal level of interior noise in a passenger car might exist between 59 and 63 dBA, taking normal speech levels into account.
Music and Speech Perception in Children Using Sung Speech
Nie, Yingjiu; Galvin, John J.; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
2018-01-01
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners. PMID:29609496
Music and Speech Perception in Children Using Sung Speech.
Nie, Yingjiu; Galvin, John J; Morikawa, Michael; André, Victoria; Wheeler, Harley; Fu, Qian-Jie
2018-01-01
This study examined music and speech perception in normal-hearing children with some or no musical training. Thirty children (mean age = 11.3 years), 15 with and 15 without formal music training participated in the study. Music perception was measured using a melodic contour identification (MCI) task; stimuli were a piano sample or sung speech with a fixed timbre (same word for each note) or a mixed timbre (different words for each note). Speech perception was measured in quiet and in steady noise using a matrix-styled sentence recognition task; stimuli were naturally intonated speech or sung speech with a fixed pitch (same note for each word) or a mixed pitch (different notes for each word). Significant musician advantages were observed for MCI and speech in noise but not for speech in quiet. MCI performance was significantly poorer with the mixed timbre stimuli. Speech performance in noise was significantly poorer with the fixed or mixed pitch stimuli than with spoken speech. Across all subjects, age at testing and MCI performance were significantly correlated with speech performance in noise. MCI and speech performance in quiet was significantly poorer for children than for adults from a related study using the same stimuli and tasks; speech performance in noise was significantly poorer for young than for older children. Long-term music training appeared to benefit melodic pitch perception and speech understanding in noise in these pediatric listeners.
Perceptual learning for speech in noise after application of binary time-frequency masks
Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.
2013-01-01
Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil
2006-01-01
The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
NASA Astrophysics Data System (ADS)
Nishiura, Takanobu; Nakamura, Satoshi
2003-10-01
Humans communicate with each other through speech by focusing on the target speech among environmental sounds in real acoustic environments. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes hidden Markov model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal. [Work supported by Ministry of Public Management, Home Affairs, Posts and Telecommunications of Japan.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People
NASA Astrophysics Data System (ADS)
Mean Foong, Oi; Low, Tang Jung; La, Wai Wan
The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
Effects of emotion on different phoneme classes
NASA Astrophysics Data System (ADS)
Lee, Chul Min; Yildirim, Serdar; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Abe; Lee, Sungbok; Narayanan, Shrikanth
2004-10-01
This study investigates the effects of emotion on different phoneme classes using short-term spectral features. In the research on emotion in speech, most studies have focused on prosodic features of speech. In this study, based on the hypothesis that different emotions have varying effects on the properties of the different speech sounds, we investigate the usefulness of phoneme-class level acoustic modeling for automatic emotion classification. Hidden Markov models (HMM) based on short-term spectral features for five broad phonetic classes are used for this purpose using data obtained from recordings of two actresses. Each speaker produces 211 sentences with four different emotions (neutral, sad, angry, happy). Using the speech material we trained and compared the performances of two sets of HMM classifiers: a generic set of ``emotional speech'' HMMs (one for each emotion) and a set of broad phonetic-class based HMMs (vowel, glide, nasal, stop, fricative) for each emotion type considered. Comparison of classification results indicates that different phoneme classes were affected differently by emotional change and that the vowel sounds are the most important indicator of emotions in speech. Detailed results and their implications on the underlying speech articulation will be discussed.
Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces
Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise
2016-01-01
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
Maruyama, Tsukasa; Taki, Yasuyuki; Motoki, Kosuke; Jeong, Hyeonjeong; Kotozaki, Yuka; Nakagawa, Seishu; Iizuka, Kunio; Yokoyama, Ryoichi; Yamamoto, Yuki; Hanawa, Sugiko; Araki, Tsuyoshi; Sakaki, Kohei; Sasaki, Yukako; Magistro, Daniele; Kawashima, Ryuta
2018-01-01
Time-compressed speech is an artificial form of rapidly presented speech. Training with time-compressed speech (TCSSL) in a second language leads to adaptation toward TCSSL. Here, we newly investigated the effects of 4 weeks of training with TCSSL on diverse cognitive functions and neural systems using the fractional amplitude of spontaneous low-frequency fluctuations (fALFF), resting-state functional connectivity (RSFC) with the left superior temporal gyrus (STG), fractional anisotropy (FA), and regional gray matter volume (rGMV) of young adults by magnetic resonance imaging. There were no significant differences in change of performance of measures of cognitive functions or second language skills after training with TCSSL compared with that of the active control group. However, compared with the active control group, training with TCSSL was associated with increased fALFF, RSFC, and FA and decreased rGMV involving areas in the left STG. These results lacked evidence of a far transfer effect of time-compressed speech training on a wide range of cognitive functions and second language skills in young adults. However, these results demonstrated effects of time-compressed speech training on gray and white matter structures as well as on resting-state intrinsic activity and connectivity involving the left STG, which plays a key role in listening comprehension. PMID:29675038
Interprofessional Peer-Assisted Learning as a Model of Instruction in Doctor of Audiology Programs.
Serpanos, Yula C; Senzer, Deborah; Gordon, Daryl M
2017-09-18
This study reports on interprofessional peer-assisted learning (PAL) as a model of instruction in the preparation of doctoral audiology students. Ten Doctor of Audiology (AuD) students provided training in audiologic screening for 53 graduate speech-language pathology students in 9 individual PAL sessions. Pre- and post-surveys assessed the peer teaching experience for AuD students in 5 areas of their confidence in audiologic screening: knowledge, skill, making a referral based on outcomes, teaching, and supervising. Pre- and post-learning outcomes in audiologic screening for the speech-language pathology student trainees determined the effectiveness of training by their AuD student peers. Survey outcomes revealed significant (p < .001) improvement in the overall confidence of AuD student peer instructors. Speech-language pathology students trained by their AuD peers exhibited significant (p = .003) improvements in their knowledge and skill and making outcome-based referrals in audiologic screening, supporting the effectiveness of the PAL paradigm. In addition to meeting required accreditation and professional certification competency standards, the PAL instructional model offers an innovative curricular approach in interprofessional education and in the teaching and supervisory preparation of students in doctoral audiology programs.
Perceptual Learning and Auditory Training in Cochlear Implant Recipients
Fu, Qian-Jie; Galvin, John J.
2007-01-01
Learning electrically stimulated speech patterns can be a new and difficult experience for cochlear implant (CI) recipients. Recent studies have shown that most implant recipients at least partially adapt to these new patterns via passive, daily-listening experiences. Gradually introducing a speech processor parameter (eg, the degree of spectral mismatch) may provide for more complete and less stressful adaptation. Although the implant device restores hearing sensation and the continued use of the implant provides some degree of adaptation, active auditory rehabilitation may be necessary to maximize the benefit of implantation for CI recipients. Currently, there are scant resources for auditory rehabilitation for adult, postlingually deafened CI recipients. We recently developed a computer-assisted speech-training program to provide the means to conduct auditory rehabilitation at home. The training software targets important acoustic contrasts among speech stimuli, provides auditory and visual feedback, and incorporates progressive training techniques, thereby maintaining recipients’ interest during the auditory training exercises. Our recent studies demonstrate the effectiveness of targeted auditory training in improving CI recipients’ speech and music perception. Provided with an inexpensive and effective auditory training program, CI recipients may find the motivation and momentum to get the most from the implant device. PMID:17709574
Speech Training for Inmate Rehabilitation.
ERIC Educational Resources Information Center
Parkinson, Michael G.; Dobkins, David H.
1982-01-01
Using a computerized content analysis, the authors demonstrate changes in speech behaviors of prison inmates. They conclude that two to four hours of public speaking training can have only limited effect on students who live in a culture in which "prison speech" is the expected and rewarded form of behavior. (PD)
Audiomotor Perceptual Training Enhances Speech Intelligibility in Background Noise.
Whitton, Jonathon P; Hancock, Kenneth E; Shannon, Jeffrey M; Polley, Daniel B
2017-11-06
Sensory and motor skills can be improved with training, but learning is often restricted to practice stimuli. As an exception, training on closed-loop (CL) sensorimotor interfaces, such as action video games and musical instruments, can impart a broad spectrum of perceptual benefits. Here we ask whether computerized CL auditory training can enhance speech understanding in levels of background noise that approximate a crowded restaurant. Elderly hearing-impaired subjects trained for 8 weeks on a CL game that, like a musical instrument, challenged them to monitor subtle deviations between predicted and actual auditory feedback as they moved their fingertip through a virtual soundscape. We performed our study as a randomized, double-blind, placebo-controlled trial by training other subjects in an auditory working-memory (WM) task. Subjects in both groups improved at their respective auditory tasks and reported comparable expectations for improved speech processing, thereby controlling for placebo effects. Whereas speech intelligibility was unchanged after WM training, subjects in the CL training group could correctly identify 25% more words in spoken sentences or digit sequences presented in high levels of background noise. Numerically, CL audiomotor training provided more than three times the benefit of our subjects' hearing aids for speech processing in noisy listening conditions. Gains in speech intelligibility could be predicted from gameplay accuracy and baseline inhibitory control. However, benefits did not persist in the absence of continuing practice. These studies employ stringent clinical standards to demonstrate that perceptual learning on a computerized audio game can transfer to "real-world" communication challenges. Copyright © 2017 Elsevier Ltd. All rights reserved.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
1988-10-12
Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the
Kleber, Boris; Veit, Ralf; Moll, Christina Valérie; Gaser, Christian; Birbaumer, Niels; Lotze, Martin
2016-06-01
In contrast to instrumental musicians, professional singers do not train on a specific instrument but perfect a motor system that has already been extensively trained during speech motor development. Previous functional imaging studies suggest that experience with singing is associated with enhanced somatosensory-based vocal motor control. However, experience-dependent structural plasticity in vocal musicians has rarely been studied. We investigated voxel-based morphometry (VBM) in 27 professional classical singers and compared gray matter volume in regions of the "singing-network" to an age-matched group of 28 healthy volunteers with no special singing experience. We found right hemispheric volume increases in professional singers in ventral primary somatosensory cortex (larynx S1) and adjacent rostral supramarginal gyrus (BA40), as well as in secondary somatosensory (S2) and primary auditory cortices (A1). Moreover, we found that earlier commencement with vocal training correlated with increased gray-matter volume in S1. However, in contrast to studies with instrumental musicians, this correlation only emerged in singers who began their formal training after the age of 14years, when speech motor development has reached its first plateau. Structural data thus confirm and extend previous functional reports suggesting a pivotal role of somatosensation in vocal motor control with increased experience in singing. Results furthermore indicate a sensitive period for developing additional vocal skills after speech motor coordination has matured. Copyright © 2016 Elsevier Inc. All rights reserved.
Significance of parametric spectral ratio methods in detection and recognition of whispered speech
NASA Astrophysics Data System (ADS)
Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.
2012-12-01
In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.
Development of a Low-Cost, Noninvasive, Portable Visual Speech Recognition Program.
Kohlberg, Gavriel D; Gal, Ya'akov Kobi; Lalwani, Anil K
2016-09-01
Loss of speech following tracheostomy and laryngectomy severely limits communication to simple gestures and facial expressions that are largely ineffective. To facilitate communication in these patients, we seek to develop a low-cost, noninvasive, portable, and simple visual speech recognition program (VSRP) to convert articulatory facial movements into speech. A Microsoft Kinect-based VSRP was developed to capture spatial coordinates of lip movements and translate them into speech. The articulatory speech movements associated with 12 sentences were used to train an artificial neural network classifier. The accuracy of the classifier was then evaluated on a separate, previously unseen set of articulatory speech movements. The VSRP was successfully implemented and tested in 5 subjects. It achieved an accuracy rate of 77.2% (65.0%-87.6% for the 5 speakers) on a 12-sentence data set. The mean time to classify an individual sentence was 2.03 milliseconds (1.91-2.16). We have demonstrated the feasibility of a low-cost, noninvasive, portable VSRP based on Kinect to accurately predict speech from articulation movements in clinically trivial time. This VSRP could be used as a novel communication device for aphonic patients. © The Author(s) 2016.
Game-Based Augmented Visual Feedback for Enlarging Speech Movements in Parkinson's Disease.
Yunusova, Yana; Kearney, Elaine; Kulkarni, Madhura; Haworth, Brandon; Baljko, Melanie; Faloutsos, Petros
2017-06-22
The purpose of this pilot study was to demonstrate the effect of augmented visual feedback on acquisition and short-term retention of a relatively simple instruction to increase movement amplitude during speaking tasks in patients with dysarthria due to Parkinson's disease (PD). Nine patients diagnosed with PD, hypokinetic dysarthria, and impaired speech intelligibility participated in a training program aimed at increasing the size of their articulatory (tongue) movements during sentences. Two sessions were conducted: a baseline and training session, followed by a retention session 48 hr later. At baseline, sentences were produced at normal, loud, and clear speaking conditions. Game-based visual feedback regarding the size of the articulatory working space (AWS) was presented during training. Eight of nine participants benefited from training, increasing their sentence AWS to a greater degree following feedback as compared with the baseline loud and clear conditions. The majority of participants were able to demonstrate the learned skill at the retention session. This study demonstrated the feasibility of augmented visual feedback via articulatory kinematics for training movement enlargement in patients with hypokinesia due to PD. https://doi.org/10.23641/asha.5116840.
NASA Astrophysics Data System (ADS)
Wu, Bo; Yang, Minglei; Li, Kehuang; Huang, Zhen; Siniscalchi, Sabato Marco; Wang, Tong; Lee, Chin-Hui
2017-12-01
A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is proposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust system. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework, namely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from multiple microphones in an array and map them into the expected output LPS features of anechoic reference speech based on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at different RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed in the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments. Finally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study indicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm weighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into account, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to unseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.
Speech Perception in Older Hearing Impaired Listeners: Benefits of Perceptual Training
Woods, David L.; Doss, Zoe; Herron, Timothy J.; Arbogast, Tanya; Younus, Masood; Ettlinger, Marc; Yund, E. William
2015-01-01
Hearing aids (HAs) only partially restore the ability of older hearing impaired (OHI) listeners to understand speech in noise, due in large part to persistent deficits in consonant identification. Here, we investigated whether adaptive perceptual training would improve consonant-identification in noise in sixteen aided OHI listeners who underwent 40 hours of computer-based training in their homes. Listeners identified 20 onset and 20 coda consonants in 9,600 consonant-vowel-consonant (CVC) syllables containing different vowels (/ɑ/, /i/, or /u/) and spoken by four different talkers. Consonants were presented at three consonant-specific signal-to-noise ratios (SNRs) spanning a 12 dB range. Noise levels were adjusted over training sessions based on d’ measures. Listeners were tested before and after training to measure (1) changes in consonant-identification thresholds using syllables spoken by familiar and unfamiliar talkers, and (2) sentence reception thresholds (SeRTs) using two different sentence tests. Consonant-identification thresholds improved gradually during training. Laboratory tests of d’ thresholds showed an average improvement of 9.1 dB, with 94% of listeners showing statistically significant training benefit. Training normalized consonant confusions and improved the thresholds of some consonants into the normal range. Benefits were equivalent for onset and coda consonants, syllables containing different vowels, and syllables presented at different SNRs. Greater training benefits were found for hard-to-identify consonants and for consonants spoken by familiar than unfamiliar talkers. SeRTs, tested with simple sentences, showed less elevation than consonant-identification thresholds prior to training and failed to show significant training benefit, although SeRT improvements did correlate with improvements in consonant thresholds. We argue that the lack of SeRT improvement reflects the dominant role of top-down semantic processing in processing simple sentences and that greater transfer of benefit would be evident in the comprehension of more unpredictable speech material. PMID:25730330
Flippin, Michelle; Reszka, Stephanie; Watson, Linda R
2010-05-01
The Picture Exchange Communication System (PECS) is a popular communication-training program for young children with autism spectrum disorders (ASD). This meta-analysis reviews the current empirical evidence for PECS in affecting communication and speech outcomes for children with ASD. A systematic review of the literature on PECS written between 1994 and June 2009 was conducted. Quality of scientific rigor was assessed and used as an inclusion criterion in computation of effect sizes. Effect sizes were aggregated separately for single-subject and group studies for communication and speech outcomes. Eight single-subject experiments (18 participants) and 3 group studies (95 PECS participants, 65 in other intervention/control) were included. Results indicated that PECS is a promising but not yet established evidence-based intervention for facilitating communication in children with ASD ages 1-11 years. Small to moderate gains in communication were demonstrated following training. Gains in speech were small to negative. This meta-analysis synthesizes gains in communication and relative lack of gains made in speech across the PECS literature for children with ASD. Concerns about maintenance and generalization are identified. Emerging evidence of potential preintervention child characteristics is discussed. Phase IV was identified as a possibly influential program characteristic for speech outcomes.
Henshaw, Helen; Ferguson, Melanie A.
2013-01-01
Background Auditory training involves active listening to auditory stimuli and aims to improve performance in auditory tasks. As such, auditory training is a potential intervention for the management of people with hearing loss. Objective This systematic review (PROSPERO 2011: CRD42011001406) evaluated the published evidence-base for the efficacy of individual computer-based auditory training to improve speech intelligibility, cognition and communication abilities in adults with hearing loss, with or without hearing aids or cochlear implants. Methods A systematic search of eight databases and key journals identified 229 articles published since 1996, 13 of which met the inclusion criteria. Data were independently extracted and reviewed by the two authors. Study quality was assessed using ten pre-defined scientific and intervention-specific measures. Results Auditory training resulted in improved performance for trained tasks in 9/10 articles that reported on-task outcomes. Although significant generalisation of learning was shown to untrained measures of speech intelligibility (11/13 articles), cognition (1/1 articles) and self-reported hearing abilities (1/2 articles), improvements were small and not robust. Where reported, compliance with computer-based auditory training was high, and retention of learning was shown at post-training follow-ups. Published evidence was of very-low to moderate study quality. Conclusions Our findings demonstrate that published evidence for the efficacy of individual computer-based auditory training for adults with hearing loss is not robust and therefore cannot be reliably used to guide intervention at this time. We identify a need for high-quality evidence to further examine the efficacy of computer-based auditory training for people with hearing loss. PMID:23675431
Cheng, Xiaoting; Liu, Yangwenyi; Shu, Yilai; Tao, Duo-Duo; Wang, Bing; Yuan, Yasheng; Galvin, John J; Fu, Qian-Jie; Chen, Bing
2018-01-01
Due to limited spectral resolution, cochlear implants (CIs) do not convey pitch information very well. Pitch cues are important for perception of music and tonal language; it is possible that music training may improve performance in both listening tasks. In this study, we investigated music training outcomes in terms of perception of music, lexical tones, and sentences in 22 young (4.8 to 9.3 years old), prelingually deaf Mandarin-speaking CI users. Music perception was measured using a melodic contour identification (MCI) task. Speech perception was measured for lexical tones and sentences presented in quiet. Subjects received 8 weeks of MCI training using pitch ranges not used for testing. Music and speech perception were measured at 2, 4, and 8 weeks after training was begun; follow-up measures were made 4 weeks after training was stopped. Mean baseline performance was 33.2%, 76.9%, and 45.8% correct for MCI, lexical tone recognition, and sentence recognition, respectively. After 8 weeks of MCI training, mean performance significantly improved by 22.9, 14.4, and 14.5 percentage points for MCI, lexical tone recognition, and sentence recognition, respectively ( p < .05 in all cases). Four weeks after training was stopped, there was no significant change in posttraining music and speech performance. The results suggest that music training can significantly improve pediatric Mandarin-speaking CI users' music and speech perception.
Control of interior surface materials for speech privacy in high-speed train cabins.
Jang, H S; Lim, H; Jeon, J Y
2017-05-01
The effect of interior materials with various absorption coefficients on speech privacy was investigated in a 1:10 scale model of one high-speed train cabin geometry. The speech transmission index (STI) and privacy distance (r P ) were measured in the train cabin to quantify speech privacy. Measurement cases were selected for the ceiling, sidewall, and front and back walls and were classified as high-, medium- and low-absorption coefficient cases. Interior materials with high absorption coefficients yielded a low r P , and the ceiling had the largest impact on both the STI and r P among the interior elements. Combinations of the three cases were measured, and the maximum reduction in r P by the absorptive surfaces was 2.4 m, which exceeds the space between two rows of chairs in the high-speed train. Additionally, the contribution of the interior elements to speech privacy was analyzed using recorded impulse responses and a multiple regression model for r P using the equivalent absorption area. The analysis confirmed that the ceiling was the most important interior element for improving speech privacy. These results can be used to find the relative decrease in r P in the acoustic design of interior materials to improve speech privacy in train cabins. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Influence of musical training on understanding voiced and whispered speech in noise.
Ruggles, Dorea R; Freyman, Richard L; Oxenham, Andrew J
2014-01-01
This study tested the hypothesis that the previously reported advantage of musicians over non-musicians in understanding speech in noise arises from more efficient or robust coding of periodic voiced speech, particularly in fluctuating backgrounds. Speech intelligibility was measured in listeners with extensive musical training, and in those with very little musical training or experience, using normal (voiced) or whispered (unvoiced) grammatically correct nonsense sentences in noise that was spectrally shaped to match the long-term spectrum of the speech, and was either continuous or gated with a 16-Hz square wave. Performance was also measured in clinical speech-in-noise tests and in pitch discrimination. Musicians exhibited enhanced pitch discrimination, as expected. However, no systematic or statistically significant advantage for musicians over non-musicians was found in understanding either voiced or whispered sentences in either continuous or gated noise. Musicians also showed no statistically significant advantage in the clinical speech-in-noise tests. Overall, the results provide no evidence for a significant difference between young adult musicians and non-musicians in their ability to understand speech in noise.
A communication-based intervention for nonverbal children with autism: what changes? Who benefits?
Gordon, Kate; Pasco, Greg; McElduff, Fiona; Wade, Angie; Howlin, Pat; Charman, Tony
2011-08-01
This article examines the form and function of spontaneous communication and outcome predictors in nonverbal children with autism following classroom-based intervention (Picture Exchange Communication System [PECS] training). 84 children from 15 schools participated in a randomized controlled trial (RCT) of PECS (P. Howlin, R. K. Gordon, G. Pasco, A. Wade, & T. Charman, 2007). They were aged 4-10 years (73 boys). Primary outcome measure was naturalistic observation of communication in the classroom. Multilevel Poisson regression was used to test for intervention effects and outcome predictors. Spontaneous communication using picture cards, speech, or both increased significantly following training (rate ratio [RR] =1.90, 95% CI [1.46, 2.48], p < .001; RR = 1.77, 95% CI [1.35, 2.32], p < .001; RR = 3.74, 95% CI [2.19, 6.37], p < .001, respectively). Spontaneous communication to request objects significantly increased (RR = 2.17, 95% CI [1.75, 2.68], p < .001), but spontaneous requesting for social purposes did not (RR = 1.34, 95% CI [0.83, 2.18], p = .237). Only the effect on spontaneous speech persisted by follow-up (9 months later). Less severe baseline autism symptomatology (lower Autism Diagnosis Observation Schedule [ADOS] score; C. Lord et al., 2000) was associated with greater increase in spontaneous speech (RR = 0.90, 95% CI [0.83, 0.98], p = .011) and less severe baseline expressive language impairment (lower ADOS item A1 score), with larger increases in spontaneous use of speech and pictures together (RR = 0.62, 95% CI [0.44, 0.88], p = .008). Overall, PECS appeared to enhance children's spontaneous communication for instrumental requesting using pictures, speech, or a combination of both. Some effects of training were moderated by baseline factors. For example, PECS appears to have increased spontaneous speech in children who could talk a little at baseline.
Approximated mutual information training for speech recognition using myoelectric signals.
Guo, Hua J; Chan, A D C
2006-01-01
A new training algorithm called the approximated maximum mutual information (AMMI) is proposed to improve the accuracy of myoelectric speech recognition using hidden Markov models (HMMs). Previous studies have demonstrated that automatic speech recognition can be performed using myoelectric signals from articulatory muscles of the face. Classification of facial myoelectric signals can be performed using HMMs that are trained using the maximum likelihood (ML) algorithm; however, this algorithm maximizes the likelihood of the observations in the training sequence, which is not directly associated with optimal classification accuracy. The AMMI training algorithm attempts to maximize the mutual information, thereby training the HMMs to optimize their parameters for discrimination. Our results show that AMMI training consistently reduces the error rates compared to these by the ML training, increasing the accuracy by approximately 3% on average.
Inservice Leadership Training for Speech-Language and Special Education Personnel. Final Report.
ERIC Educational Resources Information Center
McLean, James E.; And Others
This report describes a project which targeted two-person teams of leadership-level personnel in special education and speech/language pathology for training in child language to serve severely handicapped non-verbal children. A "pyramid" training model was used and these primary trainees conducted additional training with teachers and clinicians…
Air traffic controllers' long-term speech-in-noise training effects: A control group study.
Zaballos, Maria T P; Plasencia, Daniel P; González, María L Z; de Miguel, Angel R; Macías, Ángel R
2016-01-01
Speech perception in noise relies on the capacity of the auditory system to process complex sounds using sensory and cognitive skills. The possibility that these can be trained during adulthood is of special interest in auditory disorders, where speech in noise perception becomes compromised. Air traffic controllers (ATC) are constantly exposed to radio communication, a situation that seems to produce auditory learning. The objective of this study has been to quantify this effect. 19 ATC and 19 normal hearing individuals underwent a speech in noise test with three signal to noise ratios: 5, 0 and -5 dB. Noise and speech were presented through two different loudspeakers in azimuth position. Speech tokes were presented at 65 dB SPL, while white noise files were at 60, 65 and 70 dB respectively. Air traffic controllers outperform the control group in all conditions [P<0.05 in ANOVA and Mann-Whitney U tests]. Group differences were largest in the most difficult condition, SNR=-5 dB. However, no correlation between experience and performance were found for any of the conditions tested. The reason might be that ceiling performance is achieved much faster than the minimum experience time recorded, 5 years, although intrinsic cognitive abilities cannot be disregarded. ATC demonstrated enhanced ability to hear speech in challenging listening environments. This study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions, although good cognitive qualities are likely to be a basic requirement for this training to be effective. Our results show that ATC outperform the control group in all conditions. Thus, this study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training.
Bernstein, Lynne E; Auer, Edward T; Eberhardt, Silvio P; Jiang, Jintao
2013-01-01
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called "reverse hierarchy theory" of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning.
Auditory Perceptual Learning for Speech Perception Can be Enhanced by Audiovisual Training
Bernstein, Lynne E.; Auer, Edward T.; Eberhardt, Silvio P.; Jiang, Jintao
2013-01-01
Speech perception under audiovisual (AV) conditions is well known to confer benefits to perception such as increased speed and accuracy. Here, we investigated how AV training might benefit or impede auditory perceptual learning of speech degraded by vocoding. In Experiments 1 and 3, participants learned paired associations between vocoded spoken nonsense words and nonsense pictures. In Experiment 1, paired-associates (PA) AV training of one group of participants was compared with audio-only (AO) training of another group. When tested under AO conditions, the AV-trained group was significantly more accurate than the AO-trained group. In addition, pre- and post-training AO forced-choice consonant identification with untrained nonsense words showed that AV-trained participants had learned significantly more than AO participants. The pattern of results pointed to their having learned at the level of the auditory phonetic features of the vocoded stimuli. Experiment 2, a no-training control with testing and re-testing on the AO consonant identification, showed that the controls were as accurate as the AO-trained participants in Experiment 1 but less accurate than the AV-trained participants. In Experiment 3, PA training alternated AV and AO conditions on a list-by-list basis within participants, and training was to criterion (92% correct). PA training with AO stimuli was reliably more effective than training with AV stimuli. We explain these discrepant results in terms of the so-called “reverse hierarchy theory” of perceptual learning and in terms of the diverse multisensory and unisensory processing resources available to speech perception. We propose that early AV speech integration can potentially impede auditory perceptual learning; but visual top-down access to relevant auditory features can promote auditory perceptual learning. PMID:23515520
Polur, Prasad D; Miller, Gerald E
2005-01-01
Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients, requires a robust technique that can handle conditions of very high variability and limited training data. In this study, a hidden Markov model (HMM) was constructed and conditions investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system intended to act as an assistive/control tool. In particular, we investigated the effect of high-frequency spectral components on the recognition rate of the system to determine if they contributed useful additional information to the system. A small-size vocabulary spoken by three cerebral palsy subjects was chosen. Mel-frequency cepstral coefficients extracted with the use of 15 ms frames served as training input to an ergodic HMM setup. Subsequent results demonstrated that no significant useful information was available to the system for enhancing its ability to discriminate dysarthric speech above 5.5 kHz in the current set of dysarthric data. The level of variability in input dysarthric speech patterns limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor-impaired individuals such as cerebral palsy subjects holds sufficient promise.
ERIC Educational Resources Information Center
Ali, Saandia
2016-01-01
This paper reports on the early stages of a locally funded research and development project taking place at Rennes 2 university. It aims at developing a comprehensive pedagogical framework for pronunciation training for adult learners of English. This framework will combine a direct approach to pronunciation training (face-to-face teaching) with…
Suggested Outline for Auditory Perception Training.
ERIC Educational Resources Information Center
Kelley, Clare A.
Presented are suggestions for speech therapists to use in auditory perception training and screening of language handicapped children in kindergarten through grade 3. Directions are given for using the program, which is based on games. Each component is presented in terms of purpose, materials, a description of the game, and directions for…
ERIC Educational Resources Information Center
Kelley, M. E.; Shillingsburg, M.A.; Castro, M. J.; Addison, L. R.; LaRue, R. H., Jr.
2007-01-01
The conceptual basis for many effective language-training programs are based on Skinner's (1957) analysis of verbal behavior. Skinner described several elementary verbal operants including mands, tacts, intraverbals, and echoics. According to Skinner, responses that are the same topography may actually be functionally independent. Previous…
An optimization method for speech enhancement based on deep neural network
NASA Astrophysics Data System (ADS)
Sun, Haixia; Li, Sikun
2017-06-01
Now, this document puts forward a deep neural network (DNN) model with more credible data set and more robust structure. First, we take two regularization skills, dropout and sparsity constraint to strengthen the generalization ability of the model. In this way, not only the model is able to reach the consistency between the pre-training model and the fine-tuning model, but also it reduce resource consumption. Then network compression by weights sharing and quantization is allowed to reduce storage cost. In the end, we evaluate the quality of the reconstructed speech according to different criterion. The result proofs that the improved framework has good performance on speech enhancement and meets the requirement of speech processing.
Training parents to use the natural language paradigm to increase their autistic children's speech.
Laski, K E; Charlop, M H; Schreibman, L
1988-01-01
Parents of four nonverbal and four echolalic autistic children were trained to increase their children's speech by using the Natural Language Paradigm (NLP), a loosely structured procedure conducted in a play environment with a variety of toys. Parents were initially trained to use the NLP in a clinic setting, with subsequent parent-child speech sessions occurring at home. The results indicated that following training, parents increased the frequency with which they required their children to speak (i.e., modeled words and phrases, prompted answers to questions). Correspondingly, all children increased the frequency of their verbalizations in three nontraining settings. Thus, the NLP appears to be an efficacious program for parents to learn and use in the home to increase their children's speech. PMID:3225256
A speech-controlled environmental control system for people with severe dysarthria.
Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca
2007-06-01
Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.
Speech processing using conditional observable maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, John; Nix, David
A computer implemented method enables the recognition of speech and speech characteristics. Parameters are initialized of first probability density functions that map between the symbols in the vocabulary of one or more sequences of speech codes that represent speech sounds and a continuity map. Parameters are also initialized of second probability density functions that map between the elements in the vocabulary of one or more desired sequences of speech transcription symbols and the continuity map. The parameters of the probability density functions are then trained to maximize the probabilities of the desired sequences of speech-transcription symbols. A new sequence ofmore » speech codes is then input to the continuity map having the trained first and second probability function parameters. A smooth path is identified on the continuity map that has the maximum probability for the new sequence of speech codes. The probability of each speech transcription symbol for each input speech code can then be output.« less
Brief Training with Co-Speech Gesture Lends a Hand to Word Learning in a Foreign Language
ERIC Educational Resources Information Center
Kelly, Spencer D.; McDevitt, Tara; Esch, Megan
2009-01-01
Recent research in psychology and neuroscience has demonstrated that co-speech gestures are semantically integrated with speech during language comprehension and development. The present study explored whether gestures also play a role in language learning in adults. In Experiment 1, we exposed adults to a brief training session presenting novel…
Learning to Comprehend Foreign-Accented Speech by Means of Production and Listening Training
ERIC Educational Resources Information Center
Grohe, Ann-Kathrin; Weber, Andrea
2016-01-01
The effects of production and listening training on the subsequent comprehension of foreign-accented speech were investigated in a training-test paradigm. During training, German nonnative (L2) and English native (L1) participants listened to a story spoken by a German speaker who replaced all English /?/s with /t/ (e.g., *"teft" for…
NASA Astrophysics Data System (ADS)
Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.
1996-01-01
A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
Limited connected speech experiment
NASA Astrophysics Data System (ADS)
Landell, P. B.
1983-03-01
The purpose of this contract was to demonstrate that connected Speech Recognition (CSR) can be performed in real-time on a vocabulary of one hundred words and to test the performance of the CSR system for twenty-five male and twenty-five female speakers. This report describes the contractor's real-time laboratory CSR system, the data base and training software developed in accordance with the contract, and the results of the performance tests.
Coutinho, Eduardo; Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies-the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain.
Dual-learning systems during speech category learning
Chandrasekaran, Bharath; Yi, Han-Gyol; Maddox, W. Todd
2013-01-01
Dual-systems models of visual category learning posit the existence of an explicit, hypothesis-testing ‘reflective’ system, as well as an implicit, procedural-based ‘reflexive’ system. The reflective and reflexive learning systems are competitive and neurally dissociable. Relatively little is known about the role of these domain-general learning systems in speech category learning. Given the multidimensional, redundant, and variable nature of acoustic cues in speech categories, our working hypothesis is that speech categories are learned reflexively. To this end, we examined the relative contribution of these learning systems to speech learning in adults. Native English speakers learned to categorize Mandarin tone categories over 480 trials. The training protocol involved trial-by-trial feedback and multiple talkers. Experiment 1 and 2 examined the effect of manipulating the timing (immediate vs. delayed) and information content (full vs. minimal) of feedback. Dual-systems models of visual category learning predict that delayed feedback and providing rich, informational feedback enhance reflective learning, while immediate and minimally informative feedback enhance reflexive learning. Across the two experiments, our results show feedback manipulations that targeted reflexive learning enhanced category learning success. In Experiment 3, we examined the role of trial-to-trial talker information (mixed vs. blocked presentation) on speech category learning success. We hypothesized that the mixed condition would enhance reflexive learning by not allowing an association between talker-related acoustic cues and speech categories. Our results show that the mixed talker condition led to relatively greater accuracies. Our experiments demonstrate that speech categories are optimally learned by training methods that target the reflexive learning system. PMID:24002965
McNeil, M.R.; Katz, W.F.; Fossett, T.R.D.; Garst, D.M.; Szuminsky, N.J.; Carter, G.; Lim, K.Y.
2010-01-01
Apraxia of speech (AOS) is a motor speech disorder characterized by disturbed spatial and temporal parameters of movement. Research on motor learning suggests that augmented feedback may provide a beneficial effect for training movement. This study examined the effects of the presence and frequency of online augmented visual kinematic feedback (AVKF) and clinician-provided perceptual feedback on speech accuracy in 2 adults with acquired AOS. Within a single-subject multiple-baseline design, AVKF was provided using electromagnetic midsagittal articulography (EMA) in 2 feedback conditions (50 or 100%). Articulator placement was specified for speech motor targets (SMTs). Treated and baselined SMTs were in the initial or final position of single-syllable words, in varying consonant-vowel or vowel-consonant contexts. SMTs were selected based on each participant's pre-assessed erred productions. Productions were digitally recorded and online perceptual judgments of accuracy (including segment and intersegment distortions) were made. Inter- and intra-judge reliability for perceptual accuracy was high. Results measured by visual inspection and effect size revealed positive acquisition and generalization effects for both participants. Generalization occurred across vowel contexts and to untreated probes. Results of the frequency manipulation were confounded by presentation order. Maintenance of learned and generalized effects were demonstrated for 1 participant. These data provide support for the role of augmented feedback in treating speech movements that result in perceptually accurate speech production. Future investigations will explore the independent contributions of each feedback type (i.e. kinematic and perceptual) in producing efficient and effective training of SMTs in persons with AOS. PMID:20424468
NASA Technical Reports Server (NTRS)
Lokerson, D. C. (Inventor)
1977-01-01
A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.
Musical training sharpens and bonds ears and tongue to hear speech better.
Du, Yi; Zatorre, Robert J
2017-12-19
The idea that musical training improves speech perception in challenging listening environments is appealing and of clinical importance, yet the mechanisms of any such musician advantage are not well specified. Here, using functional magnetic resonance imaging (fMRI), we found that musicians outperformed nonmusicians in identifying syllables at varying signal-to-noise ratios (SNRs), which was associated with stronger activation of the left inferior frontal and right auditory regions in musicians compared with nonmusicians. Moreover, musicians showed greater specificity of phoneme representations in bilateral auditory and speech motor regions (e.g., premotor cortex) at higher SNRs and in the left speech motor regions at lower SNRs, as determined by multivoxel pattern analysis. Musical training also enhanced the intrahemispheric and interhemispheric functional connectivity between auditory and speech motor regions. Our findings suggest that improved speech in noise perception in musicians relies on stronger recruitment of, finer phonological representations in, and stronger functional connectivity between auditory and frontal speech motor cortices in both hemispheres, regions involved in bottom-up spectrotemporal analyses and top-down articulatory prediction and sensorimotor integration, respectively.
Musical training sharpens and bonds ears and tongue to hear speech better
Du, Yi; Zatorre, Robert J.
2017-01-01
The idea that musical training improves speech perception in challenging listening environments is appealing and of clinical importance, yet the mechanisms of any such musician advantage are not well specified. Here, using functional magnetic resonance imaging (fMRI), we found that musicians outperformed nonmusicians in identifying syllables at varying signal-to-noise ratios (SNRs), which was associated with stronger activation of the left inferior frontal and right auditory regions in musicians compared with nonmusicians. Moreover, musicians showed greater specificity of phoneme representations in bilateral auditory and speech motor regions (e.g., premotor cortex) at higher SNRs and in the left speech motor regions at lower SNRs, as determined by multivoxel pattern analysis. Musical training also enhanced the intrahemispheric and interhemispheric functional connectivity between auditory and speech motor regions. Our findings suggest that improved speech in noise perception in musicians relies on stronger recruitment of, finer phonological representations in, and stronger functional connectivity between auditory and frontal speech motor cortices in both hemispheres, regions involved in bottom-up spectrotemporal analyses and top-down articulatory prediction and sensorimotor integration, respectively. PMID:29203648
Cognitive Training and Initial Use of Referential Speech.
ERIC Educational Resources Information Center
Kahn, James V.
1984-01-01
Profoundly retarded 3-10 year olds (N=24) were divided into three groups: two cognitive training programs--object permanence or means-end--and language only. Results of pre- and posttests revealed that the cognitive training approaches were successful in enabling the majority of Ss to learn to use speech. (CL)
A robotic voice simulator and the interactive training for hearing-impaired people.
Sawada, Hideyuki; Kitani, Mitsuki; Hayashi, Yasumori
2008-01-01
A talking and singing robot which adaptively learns the vocalization skill by means of an auditory feedback learning algorithm is being developed. The robot consists of motor-controlled vocal organs such as vocal cords, a vocal tract and a nasal cavity to generate a natural voice imitating a human vocalization. In this study, the robot is applied to the training system of speech articulation for the hearing-impaired, because the robot is able to reproduce their vocalization and to teach them how it is to be improved to generate clear speech. The paper briefly introduces the mechanical construction of the robot and how it autonomously acquires the vocalization skill in the auditory feedback learning by listening to human speech. Then the training system is described, together with the evaluation of the speech training by auditory impaired people.
Xiao, Bo; Imel, Zac E; Georgiou, Panayiotis G; Atkins, David C; Narayanan, Shrikanth S
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy-observational coding-has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies.
Environmental Sound Training in Cochlear Implant Users
ERIC Educational Resources Information Center
Shafiro, Valeriy; Sheft, Stanley; Kuvadia, Sejal; Gygi, Brian
2015-01-01
Purpose: The study investigated the effect of a short computer-based environmental sound training regimen on the perception of environmental sounds and speech in experienced cochlear implant (CI) patients. Method: Fourteen CI patients with the average of 5 years of CI experience participated. The protocol consisted of 2 pretests, 1 week apart,…
Humor, Rapport, and Uncomfortable Moments in Interactions with Adults with Traumatic Brain Injury
ERIC Educational Resources Information Center
Kovarsky, Dana; Schiemer, Christine; Murray, Allison
2011-01-01
We examined uncomfortable moments that damaged rapport during group interactions between college students in training to become speech-language pathologists and adults with traumatic brain injury. The students worked as staff in a community-based program affiliated with a university training program that functioned as a recreational gathering…
Neuroscience-inspired computational systems for speech recognition under noisy conditions
NASA Astrophysics Data System (ADS)
Schafer, Phillip B.
Humans routinely recognize speech in challenging acoustic environments with background music, engine sounds, competing talkers, and other acoustic noise. However, today's automatic speech recognition (ASR) systems perform poorly in such environments. In this dissertation, I present novel methods for ASR designed to approach human-level performance by emulating the brain's processing of sounds. I exploit recent advances in auditory neuroscience to compute neuron-based representations of speech, and design novel methods for decoding these representations to produce word transcriptions. I begin by considering speech representations modeled on the spectrotemporal receptive fields of auditory neurons. These representations can be tuned to optimize a variety of objective functions, which characterize the response properties of a neural population. I propose an objective function that explicitly optimizes the noise invariance of the neural responses, and find that it gives improved performance on an ASR task in noise compared to other objectives. The method as a whole, however, fails to significantly close the performance gap with humans. I next consider speech representations that make use of spiking model neurons. The neurons in this method are feature detectors that selectively respond to spectrotemporal patterns within short time windows in speech. I consider a number of methods for training the response properties of the neurons. In particular, I present a method using linear support vector machines (SVMs) and show that this method produces spikes that are robust to additive noise. I compute the spectrotemporal receptive fields of the neurons for comparison with previous physiological results. To decode the spike-based speech representations, I propose two methods designed to work on isolated word recordings. The first method uses a classical ASR technique based on the hidden Markov model. The second method is a novel template-based recognition scheme that takes advantage of the neural representation's invariance in noise. The scheme centers on a speech similarity measure based on the longest common subsequence between spike sequences. The combined encoding and decoding scheme outperforms a benchmark system in extremely noisy acoustic conditions. Finally, I consider methods for decoding spike representations of continuous speech. To help guide the alignment of templates to words, I design a syllable detection scheme that robustly marks the locations of syllabic nuclei. The scheme combines SVM-based training with a peak selection algorithm designed to improve noise tolerance. By incorporating syllable information into the ASR system, I obtain strong recognition results in noisy conditions, although the performance in noiseless conditions is below the state of the art. The work presented here constitutes a novel approach to the problem of ASR that can be applied in the many challenging acoustic environments in which we use computer technologies today. The proposed spike-based processing methods can potentially be exploited in effcient hardware implementations and could significantly reduce the computational costs of ASR. The work also provides a framework for understanding the advantages of spike-based acoustic coding in the human brain.
ERIC Educational Resources Information Center
Dailey, K. Anne
Time-compressed speech (also called compressed speech, speeded speech, or accelerated speech) is an extension of the normal recording procedure for reproducing the spoken word. Compressed speech can be used to achieve dramatic reductions in listening time without significant loss in comprehension. The implications of such temporal reductions in…
Automated Speech Rate Measurement in Dysarthria
ERIC Educational Resources Information Center
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-01-01
Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
ERIC Educational Resources Information Center
Ihnat, Mary Ann
This study was designed to investigate whether the listening ability of second-grade students could be improved using compressed-speech training as compared to normal listening training. The subjects were 95 second-grade pupils in a low-to-middle class suburban community in central New Jersey. The plan was to expose an experimental group to…
Topouzkhanian, Sylvia; Mijiyawa, Moustafa
2013-02-01
In West Africa, as in Majority World countries, people with a communication disability are generally cut-off from the normal development process. A long-term involvement of two partners (Orthophonistes du Monde and Handicap International) allowed the implementation in 2003 of the first speech-language pathology qualifying course in West Africa, within the Ecole Nationale des Auxiliaires Medicaux (ENAM, National School for Medical Auxiliaries) in Lome, Togo. It is a 3-year basic training (after the baccalaureate) in the only academic training centre for medical assistants in Togo. This department has a regional purpose and aims at training French-speaking African students. French speech-language pathology lecturers had to adapt their courses to the local realities they discovered in Togo. It was important to introduce and develop knowledge and skills in the students' system of reference. African speech-language pathologists have to face many challenges: creating an African speech and language therapy, introducing language disorders and their possible cure by means other than traditional therapies, and adapting all the evaluation tests and tools for speech-language pathology to each country, each culture, and each language. Creating an African speech-language pathology profession (according to its own standards) with a real influence in West Africa opens great opportunities for schooling and social and occupational integration of people with communication disabilities.
Training Parents to Use the Natural Language Paradigm to Increase Their Autistic Children's Speech.
ERIC Educational Resources Information Center
Laski, Karen E.; And Others
1988-01-01
Parents of four nonverbal and four echolalic autistic children, aged five-nine, were trained to increase their children's speech by using the Natural Language Paradigm. Following training, parents increased the frequency with which they required their children to speak, and children increased the frequency of their verbalizations in three…
Air Traffic Controllers’ Long-Term Speech-in-Noise Training Effects: A Control Group Study
Zaballos, María T.P.; Plasencia, Daniel P.; González, María L.Z.; de Miguel, Angel R.; Macías, Ángel R.
2016-01-01
Introduction: Speech perception in noise relies on the capacity of the auditory system to process complex sounds using sensory and cognitive skills. The possibility that these can be trained during adulthood is of special interest in auditory disorders, where speech in noise perception becomes compromised. Air traffic controllers (ATC) are constantly exposed to radio communication, a situation that seems to produce auditory learning. The objective of this study has been to quantify this effect. Subjects and Methods: 19 ATC and 19 normal hearing individuals underwent a speech in noise test with three signal to noise ratios: 5, 0 and −5 dB. Noise and speech were presented through two different loudspeakers in azimuth position. Speech tokes were presented at 65 dB SPL, while white noise files were at 60, 65 and 70 dB respectively. Results: Air traffic controllers outperform the control group in all conditions [P<0.05 in ANOVA and Mann-Whitney U tests]. Group differences were largest in the most difficult condition, SNR=−5 dB. However, no correlation between experience and performance were found for any of the conditions tested. The reason might be that ceiling performance is achieved much faster than the minimum experience time recorded, 5 years, although intrinsic cognitive abilities cannot be disregarded. Discussion: ATC demonstrated enhanced ability to hear speech in challenging listening environments. This study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions, although good cognitive qualities are likely to be a basic requirement for this training to be effective. Conclusion: Our results show that ATC outperform the control group in all conditions. Thus, this study provides evidence that long-term auditory training is indeed useful in achieving better speech-in-noise understanding even in adverse conditions. PMID:27991470
Gabay, Yafit; Karni, Avi; Banai, Karen
2017-01-01
Speech perception can improve substantially with practice (perceptual learning) even in adults. Here we compared the effects of four training protocols that differed in whether and how task difficulty was changed during a training session, in terms of the gains attained and the ability to apply (transfer) these gains to previously un-encountered items (tokens) and to different talkers. Participants trained in judging the semantic plausibility of sentences presented as time-compressed speech and were tested on their ability to reproduce, in writing, the target sentences; trail-by-trial feedback was afforded in all training conditions. In two conditions task difficulty (low or high compression) was kept constant throughout the training session, whereas in the other two conditions task difficulty was changed in an adaptive manner (incrementally from easy to difficult, or using a staircase procedure). Compared to a control group (no training), all four protocols resulted in significant post-training improvement in the ability to reproduce the trained sentences accurately. However, training in the constant-high-compression protocol elicited the smallest gains in deciphering and reproducing trained items and in reproducing novel, untrained, items after training. Overall, these results suggest that training procedures that start off with relatively little signal distortion (“easy” items, not far removed from standard speech) may be advantageous compared to conditions wherein severe distortions are presented to participants from the very beginning of the training session. PMID:28545039
Sparse Forward-Backward for Fast Training of Conditional Random Fields
2006-01-01
knowledge- based systems. Proceedings of the 6th Conference on Uncertainty in Artifcial Intelligence , 1990. Appears to be unavailable. [4] Michael I...response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and...task, the NetTalk text-to-speech data set [5], we can now train a conditional random field (CRF) in about 6 hours for which training previously
Orlov, Natasza D; Giampietro, Vincent; O'Daly, Owen; Lam, Sheut-Ling; Barker, Gareth J; Rubia, Katya; McGuire, Philip; Shergill, Sukhwinder S; Allen, Paul
2018-02-12
Neurocognitive models and previous neuroimaging work posit that auditory verbal hallucinations (AVH) arise due to increased activity in speech-sensitive regions of the left posterior superior temporal gyrus (STG). Here, we examined if patients with schizophrenia (SCZ) and AVH could be trained to down-regulate STG activity using real-time functional magnetic resonance imaging neurofeedback (rtfMRI-NF). We also examined the effects of rtfMRI-NF training on functional connectivity between the STG and other speech and language regions. Twelve patients with SCZ and treatment-refractory AVH were recruited to participate in the study and were trained to down-regulate STG activity using rtfMRI-NF, over four MRI scanner visits during a 2-week training period. STG activity and functional connectivity were compared pre- and post-training. Patients successfully learnt to down-regulate activity in their left STG over the rtfMRI-NF training. Post- training, patients showed increased functional connectivity between the left STG, the left inferior prefrontal gyrus (IFG) and the inferior parietal gyrus. The post-training increase in functional connectivity between the left STG and IFG was associated with a reduction in AVH symptoms over the training period. The speech-sensitive region of the left STG is a suitable target region for rtfMRI-NF in patients with SCZ and treatment-refractory AVH. Successful down-regulation of left STG activity can increase functional connectivity between speech motor and perception regions. These findings suggest that patients with AVH have the ability to alter activity and connectivity in speech and language regions, and raise the possibility that rtfMRI-NF training could present a novel therapeutic intervention in SCZ.
Working Memory Training and Speech in Noise Comprehension in Older Adults.
Wayne, Rachel V; Hamilton, Cheryl; Jones Huyck, Julia; Johnsrude, Ingrid S
2016-01-01
Understanding speech in the presence of background sound can be challenging for older adults. Speech comprehension in noise appears to depend on working memory and executive-control processes (e.g., Heald and Nusbaum, 2014), and their augmentation through training may have rehabilitative potential for age-related hearing loss. We examined the efficacy of adaptive working-memory training (Cogmed; Klingberg et al., 2002) in 24 older adults, assessing generalization to other working-memory tasks (near-transfer) and to other cognitive domains (far-transfer) using a cognitive test battery, including the Reading Span test, sensitive to working memory (e.g., Daneman and Carpenter, 1980). We also assessed far transfer to speech-in-noise performance, including a closed-set sentence task (Kidd et al., 2008). To examine the effect of cognitive training on benefit obtained from semantic context, we also assessed transfer to open-set sentences; half were semantically coherent (high-context) and half were semantically anomalous (low-context). Subjects completed 25 sessions (0.5-1 h each; 5 sessions/week) of both adaptive working memory training and placebo training over 10 weeks in a crossover design. Subjects' scores on the adaptive working-memory training tasks improved as a result of training. However, training did not transfer to other working memory tasks, nor to tasks recruiting other cognitive domains. We did not observe any training-related improvement in speech-in-noise performance. Measures of working memory correlated with the intelligibility of low-context, but not high-context, sentences, suggesting that sentence context may reduce the load on working memory. The Reading Span test significantly correlated only with a test of visual episodic memory, suggesting that the Reading Span test is not a pure-test of working memory, as is commonly assumed.
Working Memory Training and Speech in Noise Comprehension in Older Adults
Wayne, Rachel V.; Hamilton, Cheryl; Jones Huyck, Julia; Johnsrude, Ingrid S.
2016-01-01
Understanding speech in the presence of background sound can be challenging for older adults. Speech comprehension in noise appears to depend on working memory and executive-control processes (e.g., Heald and Nusbaum, 2014), and their augmentation through training may have rehabilitative potential for age-related hearing loss. We examined the efficacy of adaptive working-memory training (Cogmed; Klingberg et al., 2002) in 24 older adults, assessing generalization to other working-memory tasks (near-transfer) and to other cognitive domains (far-transfer) using a cognitive test battery, including the Reading Span test, sensitive to working memory (e.g., Daneman and Carpenter, 1980). We also assessed far transfer to speech-in-noise performance, including a closed-set sentence task (Kidd et al., 2008). To examine the effect of cognitive training on benefit obtained from semantic context, we also assessed transfer to open-set sentences; half were semantically coherent (high-context) and half were semantically anomalous (low-context). Subjects completed 25 sessions (0.5–1 h each; 5 sessions/week) of both adaptive working memory training and placebo training over 10 weeks in a crossover design. Subjects' scores on the adaptive working-memory training tasks improved as a result of training. However, training did not transfer to other working memory tasks, nor to tasks recruiting other cognitive domains. We did not observe any training-related improvement in speech-in-noise performance. Measures of working memory correlated with the intelligibility of low-context, but not high-context, sentences, suggesting that sentence context may reduce the load on working memory. The Reading Span test significantly correlated only with a test of visual episodic memory, suggesting that the Reading Span test is not a pure-test of working memory, as is commonly assumed. PMID:27047370
Knockdown of Dyslexia-Gene Dcdc2 Interferes with Speech Sound Discrimination in Continuous Streams.
Centanni, Tracy Michelle; Booker, Anne B; Chen, Fuyi; Sloan, Andrew M; Carraway, Ryan S; Rennaker, Robert L; LoTurco, Joseph J; Kilgard, Michael P
2016-04-27
Dyslexia is the most common developmental language disorder and is marked by deficits in reading and phonological awareness. One theory of dyslexia suggests that the phonological awareness deficit is due to abnormal auditory processing of speech sounds. Variants in DCDC2 and several other neural migration genes are associated with dyslexia and may contribute to auditory processing deficits. In the current study, we tested the hypothesis that RNAi suppression of Dcdc2 in rats causes abnormal cortical responses to sound and impaired speech sound discrimination. In the current study, rats were subjected in utero to RNA interference targeting of the gene Dcdc2 or a scrambled sequence. Primary auditory cortex (A1) responses were acquired from 11 rats (5 with Dcdc2 RNAi; DC-) before any behavioral training. A separate group of 8 rats (3 DC-) were trained on a variety of speech sound discrimination tasks, and auditory cortex responses were acquired following training. Dcdc2 RNAi nearly eliminated the ability of rats to identify specific speech sounds from a continuous train of speech sounds but did not impair performance during discrimination of isolated speech sounds. The neural responses to speech sounds in A1 were not degraded as a function of presentation rate before training. These results suggest that A1 is not directly involved in the impaired speech discrimination caused by Dcdc2 RNAi. This result contrasts earlier results using Kiaa0319 RNAi and suggests that different dyslexia genes may cause different deficits in the speech processing circuitry, which may explain differential responses to therapy. Although dyslexia is diagnosed through reading difficulty, there is a great deal of variation in the phenotypes of these individuals. The underlying neural and genetic mechanisms causing these differences are still widely debated. In the current study, we demonstrate that suppression of a candidate-dyslexia gene causes deficits on tasks of rapid stimulus processing. These animals also exhibited abnormal neural plasticity after training, which may be a mechanism for why some children with dyslexia do not respond to intervention. These results are in stark contrast to our previous work with a different candidate gene, which caused a different set of deficits. Our results shed some light on possible neural and genetic mechanisms causing heterogeneity in the dyslexic population. Copyright © 2016 the authors 0270-6474/16/364895-12$15.00/0.
Knockdown of Dyslexia-Gene Dcdc2 Interferes with Speech Sound Discrimination in Continuous Streams
Booker, Anne B.; Chen, Fuyi; Sloan, Andrew M.; Carraway, Ryan S.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.
2016-01-01
Dyslexia is the most common developmental language disorder and is marked by deficits in reading and phonological awareness. One theory of dyslexia suggests that the phonological awareness deficit is due to abnormal auditory processing of speech sounds. Variants in DCDC2 and several other neural migration genes are associated with dyslexia and may contribute to auditory processing deficits. In the current study, we tested the hypothesis that RNAi suppression of Dcdc2 in rats causes abnormal cortical responses to sound and impaired speech sound discrimination. In the current study, rats were subjected in utero to RNA interference targeting of the gene Dcdc2 or a scrambled sequence. Primary auditory cortex (A1) responses were acquired from 11 rats (5 with Dcdc2 RNAi; DC−) before any behavioral training. A separate group of 8 rats (3 DC−) were trained on a variety of speech sound discrimination tasks, and auditory cortex responses were acquired following training. Dcdc2 RNAi nearly eliminated the ability of rats to identify specific speech sounds from a continuous train of speech sounds but did not impair performance during discrimination of isolated speech sounds. The neural responses to speech sounds in A1 were not degraded as a function of presentation rate before training. These results suggest that A1 is not directly involved in the impaired speech discrimination caused by Dcdc2 RNAi. This result contrasts earlier results using Kiaa0319 RNAi and suggests that different dyslexia genes may cause different deficits in the speech processing circuitry, which may explain differential responses to therapy. SIGNIFICANCE STATEMENT Although dyslexia is diagnosed through reading difficulty, there is a great deal of variation in the phenotypes of these individuals. The underlying neural and genetic mechanisms causing these differences are still widely debated. In the current study, we demonstrate that suppression of a candidate-dyslexia gene causes deficits on tasks of rapid stimulus processing. These animals also exhibited abnormal neural plasticity after training, which may be a mechanism for why some children with dyslexia do not respond to intervention. These results are in stark contrast to our previous work with a different candidate gene, which caused a different set of deficits. Our results shed some light on possible neural and genetic mechanisms causing heterogeneity in the dyslexic population. PMID:27122044
Speech-to-Speech Relay Service
... are specifically trained in understanding a variety of speech disorders, which enables them to repeat what the caller says in a manner that makes the caller’s words clear and understandable to the ... people with speech disabilities cannot communicate by telephone because the parties ...
ERIC Educational Resources Information Center
Norton, Darryl E.
1975-01-01
The author discusses the role of speech pathology assistants (SPA), responding to speech pathologists fears regarding SPA's, describing the National Association for Hearing and Speech Action SPA training program, and pointing out the duties of SPA's and their benefits to speech pathologists. (LS)
Effects of vocal training and phonatory task on voice onset time.
McCrea, Christopher R; Morris, Richard J
2007-01-01
The purpose of this study was to examine the temporal-acoustic differences between trained singers and nonsingers during speech and singing tasks. Thirty male participants were separated into two groups of 15 according to level of vocal training (ie, trained or untrained). The participants spoke and sang carrier phrases containing English voiced and voiceless bilabial stops, and voice onset time (VOT) was measured for the stop consonant productions. Mixed analyses of variance revealed a significant main effect between speech and singing for /p/ and /b/, with VOT durations longer during speech than singing for /p/, and the opposite true for /b/. Furthermore, a significant phonatory task by vocal training interaction was observed for /p/ productions. The results indicated that the type of phonatory task influences VOT and that these influences are most obvious in trained singers secondary to the articulatory and phonatory adjustments learned during vocal training.
Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A
2013-02-01
As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.
Phonological mismatch makes aided speech recognition in noise cognitively taxing.
Rudner, Mary; Foo, Catharina; Rönnberg, Jerker; Lunner, Thomas
2007-12-01
The working memory framework for Ease of Language Understanding predicts that speech processing becomes more effortful, thus requiring more explicit cognitive resources, when there is mismatch between speech input and phonological representations in long-term memory. To test this prediction, we changed the compression release settings in the hearing instruments of experienced users and allowed them to train for 9 weeks with the new settings. After training, aided speech recognition in noise was tested with both the trained settings and orthogonal settings. We postulated that training would lead to acclimatization to the trained setting, which in turn would involve establishment of new phonological representations in long-term memory. Further, we postulated that after training, testing with orthogonal settings would give rise to phonological mismatch, associated with more explicit cognitive processing. Thirty-two participants (mean=70.3 years, SD=7.7) with bilateral sensorineural hearing loss (pure-tone average=46.0 dB HL, SD=6.5), bilaterally fitted for more than 1 year with digital, two-channel, nonlinear signal processing hearing instruments and chosen from the patient population at the Linköping University Hospital were randomly assigned to 9 weeks training with new, fast (40 ms) or slow (640 ms), compression release settings in both channels. Aided speech recognition in noise performance was tested according to a design with three within-group factors: test occasion (T1, T2), test setting (fast, slow), and type of noise (unmodulated, modulated) and one between-group factor: experience setting (fast, slow) for two types of speech materials-the highly constrained Hagerman sentences and the less-predictable Hearing in Noise Test (HINT). Complex cognitive capacity was measured using the reading span and letter monitoring tests. PREDICTION: We predicted that speech recognition in noise at T2 with mismatched experience and test settings would be associated with more explicit cognitive processing and thus stronger correlations with complex cognitive measures, as well as poorer performance if complex cognitive capacity was exceeded. Under mismatch conditions, stronger correlations were found between performance on speech recognition with the Hagerman sentences and reading span, along with poorer speech recognition for participants with low reading span scores. No consistent mismatch effect was found with HINT. The mismatch prediction generated by the working memory framework for Ease of Language Understanding is supported for speech recognition in noise with the highly constrained Hagerman sentences but not the less-predictable HINT.
Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech
NASA Technical Reports Server (NTRS)
Hieronymus, James; Aist, Greg; Dowding, John
2006-01-01
An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station.
Yoshimasu, Hidemi; Sato, Yutaka; Mishimagi, Takashi; Negishi, Akihide
2015-01-01
Background: Velopharyngeal function is very important for patients with cleft palate to acquire good speech. For patients with velopharyngeal insufficiency, prosthetic speech appliances and speech therapy are applied first, and then pharyngeal flap surgery to improve velopharyngeal function is performed in our hospital. The folded pharyngeal flap operation was first reported by Isshiki and Morimoto in 1975. We usually use a modification of the original method. Purpose: The purpose of this research was to introduce our method of the folded pharyngeal flap operation and report the results. Materials and Methods: The folded pharyngeal flap operation was performed for 110 patients with velopharyngeal insufficiency from 1982 to 2010. Of these, the 97 whose postoperative speech function was evaluated are reported. The cases included 61 males and 36 females, ranging in age from 7 to 50 years. The time from surgery to speech assessment ranged from 5 months to 6 years. In order to evaluate preoperative velopharyngeal function, assessment of speech by a trained speech pathologist, nasopharyngoscopy, and cephalometric radiography with contrast media were performed before surgery, and then the appropriate surgery was selected and performed. Postoperative velopharyngeal function was assessed by a trained speech pathologist. Results: Of the 97 patients who underwent the folded pharyngeal flap operation, 85 (87.6%) showed velopharyngeal competence, 8 (8.2%) showed marginal velopharyngeal incompetence, and only 2 (2.1%) showed velopharyngeal incompetence; in 2 cases (2.1%), hyponasality was present. Approximately 95% of patients showed improved velopharyngeal function. Conclusions: The folded pharyngeal flap operation based on appropriate preoperative assessment has been shown to be an effective method for the treatment of cleft palate patients with velopharyngeal insufficiency. PMID:26389036
Speech comprehension training and auditory and cognitive processing in older adults.
Pichora-Fuller, M Kathleen; Levitt, Harry
2012-12-01
To provide a brief history of speech comprehension training systems and an overview of research on auditory and cognitive aging as background to recommendations for future directions for rehabilitation. Two distinct domains were reviewed: one concerning technological and the other concerning psychological aspects of training. Historical trends and advances in these 2 domains were interrelated to highlight converging trends and directions for future practice. Over the last century, technological advances have influenced both the design of hearing aids and training systems. Initially, training focused on children and those with severe loss for whom amplification was insufficient. Now the focus has shifted to older adults with relatively little loss but difficulties listening in noise. Evidence of brain plasticity from auditory and cognitive neuroscience provides new insights into how to facilitate perceptual (re-)learning by older adults. There is a new imperative to complement training to increase bottom-up processing of the signal with more ecologically valid training to boost top-down information processing based on knowledge of language and the world. Advances in digital technologies enable the development of increasingly sophisticated training systems incorporating complex meaningful materials such as music, audiovisual interactive displays, and conversation.
Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions
NASA Astrophysics Data System (ADS)
Wang, Longbiao; Minami, Kazue; Yamamoto, Kazumasa; Nakagawa, Seiichi
In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.
A proposed mechanism for rapid adaptation to spectrally distorted speech.
Azadpour, Mahan; Balaban, Evan
2015-07-01
The mechanisms underlying perceptual adaptation to severely spectrally-distorted speech were studied by training participants to comprehend spectrally-rotated speech, which is obtained by inverting the speech spectrum. Spectral-rotation produces severe distortion confined to the spectral domain while preserving temporal trajectories. During five 1-hour training sessions, pairs of participants attempted to extract spoken messages from the spectrally-rotated speech of their training partner. Data on training-induced changes in comprehension of spectrally-rotated sentences and identification/discrimination of spectrally-rotated phonemes were used to evaluate the plausibility of three different classes of underlying perceptual mechanisms: (1) phonemic remapping (the formation of new phonemic categories that specifically incorporate spectrally-rotated acoustic information); (2) experience-dependent generation of a perceptual "inverse-transform" that compensates for spectral-rotation; and (3) changes in cue weighting (the identification of sets of acoustic cues least affected by spectral-rotation, followed by a rapid shift in perceptual emphasis to favour those cues, combined with the recruitment of the same type of "perceptual filling-in" mechanisms used to disambiguate speech-in-noise). Results exclusively support the third mechanism, which is the only one predicting that learning would specifically target temporally-dynamic cues that were transmitting phonetic information most stably in spite of spectral-distortion. No support was found for phonemic remapping or for inverse-transform generation.
Jungblut, Monika; Huber, Walter; Mais, Christiane
2014-01-01
Difficulties with temporal coordination or sequencing of speech movements are frequently reported in aphasia patients with concomitant apraxia of speech (AOS). Our major objective was to investigate the effects of specific rhythmic-melodic voice training on brain activation of those patients. Three patients with severe chronic nonfluent aphasia and AOS were included in this study. Before and after therapy, patients underwent the same fMRI procedure as 30 healthy control subjects in our prestudy, which investigated the neural substrates of sung vowel changes in untrained rhythm sequences. A main finding was that post-minus pretreatment imaging data yielded significant perilesional activations in all patients for example, in the left superior temporal gyrus, whereas the reverse subtraction revealed either no significant activation or right hemisphere activation. Likewise, pre- and posttreatment assessments of patients' vocal rhythm production, language, and speech motor performance yielded significant improvements for all patients. Our results suggest that changes in brain activation due to the applied training might indicate specific processes of reorganization, for example, improved temporal sequencing of sublexical speech components. In this context, a training that focuses on rhythmic singing with differently demanding complexity levels as concerns motor and cognitive capabilities seems to support paving the way for speech. PMID:24977055
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians.
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2014-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.
A Deep Ensemble Learning Method for Monaural Speech Separation.
Zhang, Xiao-Lei; Wang, DeLiang
2016-03-01
Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.
Bieber, Rebecca E.; Gordon-Salant, Sandra
2017-01-01
Adaptation to speech with a foreign accent is possible through prior exposure to talkers with that same accent. For young listeners with normal hearing, short term, accent-independent adaptation to a novel foreign accent is also facilitated through exposure training with multiple foreign accents. In the present study, accent-independent adaptation is examined in younger and older listeners with normal hearing and older listeners with hearing loss. Retention of training benefit is additionally explored. Stimuli for testing and training were HINT sentences recorded by talkers with nine distinctly different accents. Following two training sessions, all listener groups showed a similar increase in speech perception for a novel foreign accent. While no group retained this benefit at one week post-training, results of a secondary reaction time task revealed a decrease in reaction time following training, suggesting reduced listening effort. Examination of listeners' cognitive skills reveals a positive relationship between working memory and speech recognition ability. The present findings indicate that, while this no-feedback training paradigm for foreign-accented English is successful in promoting short term adaptation for listeners, this paradigm is not sufficient in facilitation of perceptual learning with lasting benefits for younger or older listeners. PMID:28464671
Schuller, Björn
2017-01-01
Music and speech exhibit striking similarities in the communication of emotions in the acoustic domain, in such a way that the communication of specific emotions is achieved, at least to a certain extent, by means of shared acoustic patterns. From an Affective Sciences points of view, determining the degree of overlap between both domains is fundamental to understand the shared mechanisms underlying such phenomenon. From a Machine learning perspective, the overlap between acoustic codes for emotional expression in music and speech opens new possibilities to enlarge the amount of data available to develop music and speech emotion recognition systems. In this article, we investigate time-continuous predictions of emotion (Arousal and Valence) in music and speech, and the Transfer Learning between these domains. We establish a comparative framework including intra- (i.e., models trained and tested on the same modality, either music or speech) and cross-domain experiments (i.e., models trained in one modality and tested on the other). In the cross-domain context, we evaluated two strategies—the direct transfer between domains, and the contribution of Transfer Learning techniques (feature-representation-transfer based on Denoising Auto Encoders) for reducing the gap in the feature space distributions. Our results demonstrate an excellent cross-domain generalisation performance with and without feature representation transfer in both directions. In the case of music, cross-domain approaches outperformed intra-domain models for Valence estimation, whereas for Speech intra-domain models achieve the best performance. This is the first demonstration of shared acoustic codes for emotional expression in music and speech in the time-continuous domain. PMID:28658285
Distributed neural signatures of natural audiovisual speech and music in the human auditory cortex.
Salmi, Juha; Koistinen, Olli-Pekka; Glerean, Enrico; Jylänki, Pasi; Vehtari, Aki; Jääskeläinen, Iiro P; Mäkelä, Sasu; Nummenmaa, Lauri; Nummi-Kuisma, Katarina; Nummi, Ilari; Sams, Mikko
2017-08-15
During a conversation or when listening to music, auditory and visual information are combined automatically into audiovisual objects. However, it is still poorly understood how specific type of visual information shapes neural processing of sounds in lifelike stimulus environments. Here we applied multi-voxel pattern analysis to investigate how naturally matching visual input modulates supratemporal cortex activity during processing of naturalistic acoustic speech, singing and instrumental music. Bayesian logistic regression classifiers with sparsity-promoting priors were trained to predict whether the stimulus was audiovisual or auditory, and whether it contained piano playing, speech, or singing. The predictive performances of the classifiers were tested by leaving one participant at a time for testing and training the model using the remaining 15 participants. The signature patterns associated with unimodal auditory stimuli encompassed distributed locations mostly in the middle and superior temporal gyrus (STG/MTG). A pattern regression analysis, based on a continuous acoustic model, revealed that activity in some of these MTG and STG areas were associated with acoustic features present in speech and music stimuli. Concurrent visual stimulus modulated activity in bilateral MTG (speech), lateral aspect of right anterior STG (singing), and bilateral parietal opercular cortex (piano). Our results suggest that specific supratemporal brain areas are involved in processing complex natural speech, singing, and piano playing, and other brain areas located in anterior (facial speech) and posterior (music-related hand actions) supratemporal cortex are influenced by related visual information. Those anterior and posterior supratemporal areas have been linked to stimulus identification and sensory-motor integration, respectively. Copyright © 2017 Elsevier Inc. All rights reserved.
Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Flaherty, Mary; Dent, Micheal L; Sawusch, James R
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
An algorithm to improve speech recognition in noise for hearing-impaired listeners
Healy, Eric W.; Yoho, Sarah E.; Wang, Yuxuan; Wang, DeLiang
2013-01-01
Despite considerable effort, monaural (single-microphone) algorithms capable of increasing the intelligibility of speech in noise have remained elusive. Successful development of such an algorithm is especially important for hearing-impaired (HI) listeners, given their particular difficulty in noisy backgrounds. In the current study, an algorithm based on binary masking was developed to separate speech from noise. Unlike the ideal binary mask, which requires prior knowledge of the premixed signals, the masks used to segregate speech from noise in the current study were estimated by training the algorithm on speech not used during testing. Sentences were mixed with speech-shaped noise and with babble at various signal-to-noise ratios (SNRs). Testing using normal-hearing and HI listeners indicated that intelligibility increased following processing in all conditions. These increases were larger for HI listeners, for the modulated background, and for the least-favorable SNRs. They were also often substantial, allowing several HI listeners to improve intelligibility from scores near zero to values above 70%. PMID:24116438
Włodarczyk, Elżbieta; Szkiełkowska, Agata; Skarżyński, Henryk; Piłka, Adam
2011-01-01
To assess effectiveness of the auditory training in children with dyslalia and central auditory processing disorders. Material consisted of 50 children aged 7-9-years-old. Children with articulation disorders stayed under long-term speech therapy care in the Auditory and Phoniatrics Clinic. All children were examined by a laryngologist and a phoniatrician. Assessment included tonal and impedance audiometry and speech therapists' and psychologist's consultations. Additionally, a set of electrophysiological examinations was performed - registration of N2, P2, N2, P2, P300 waves and psychoacoustic test of central auditory functions: FPT - frequency pattern test. Next children took part in the regular auditory training and attended speech therapy. Speech assessment followed treatment and therapy, again psychoacoustic tests were performed and P300 cortical potentials were recorded. After that statistical analyses were performed. Analyses revealed that application of auditory training in patients with dyslalia and other central auditory disorders is very efficient. Auditory training may be a very efficient therapy supporting speech therapy in children suffering from dyslalia coexisting with articulation and central auditory disorders and in children with educational problems of audiogenic origin. Copyright © 2011 Polish Otolaryngology Society. Published by Elsevier Urban & Partner (Poland). All rights reserved.
Military and Government Applications of Human-Machine Communication by Voice
NASA Astrophysics Data System (ADS)
Weinstein, Clifford J.
1995-10-01
This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs.
Li, Kan; Príncipe, José C.
2018-01-01
This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime. PMID:29666568
Li, Kan; Príncipe, José C
2018-01-01
This paper presents a novel real-time dynamic framework for quantifying time-series structure in spoken words using spikes. Audio signals are converted into multi-channel spike trains using a biologically-inspired leaky integrate-and-fire (LIF) spike generator. These spike trains are mapped into a function space of infinite dimension, i.e., a Reproducing Kernel Hilbert Space (RKHS) using point-process kernels, where a state-space model learns the dynamics of the multidimensional spike input using gradient descent learning. This kernelized recurrent system is very parsimonious and achieves the necessary memory depth via feedback of its internal states when trained discriminatively, utilizing the full context of the phoneme sequence. A main advantage of modeling nonlinear dynamics using state-space trajectories in the RKHS is that it imposes no restriction on the relationship between the exogenous input and its internal state. We are free to choose the input representation with an appropriate kernel, and changing the kernel does not impact the system nor the learning algorithm. Moreover, we show that this novel framework can outperform both traditional hidden Markov model (HMM) speech processing as well as neuromorphic implementations based on spiking neural network (SNN), yielding accurate and ultra-low power word spotters. As a proof of concept, we demonstrate its capabilities using the benchmark TI-46 digit corpus for isolated-word automatic speech recognition (ASR) or keyword spotting. Compared to HMM using Mel-frequency cepstral coefficient (MFCC) front-end without time-derivatives, our MFCC-KAARMA offered improved performance. For spike-train front-end, spike-KAARMA also outperformed state-of-the-art SNN solutions. Furthermore, compared to MFCCs, spike trains provided enhanced noise robustness in certain low signal-to-noise ratio (SNR) regime.
Miller, Sharon E; Zhang, Yang; Nelson, Peggy B
2016-02-01
This study implemented a pretest-intervention-posttest design to examine whether multiple-talker identification training enhanced phonetic perception of the /ba/-/da/ and /wa/-/ja/ contrasts in adult listeners who were deafened postlingually and have cochlear implants (CIs). Nine CI recipients completed 8 hours of identification training using a custom-designed training package. Perception of speech produced by familiar talkers (talkers used during training) and unfamiliar talkers (talkers not used during training) was measured before and after training. Five additional untrained CI recipients completed identical pre- and posttests over the same time course as the trainees to control for procedural learning effects. Perception of the speech contrasts produced by the familiar talkers significantly improved for the trained CI listeners, and effects of perceptual learning transferred to unfamiliar talkers. Such training-induced significant changes were not observed in the control group. The data provide initial evidence of the efficacy of the multiple-talker identification training paradigm for CI users who were deafened postlingually. This pattern of results is consistent with enhanced phonemic categorization of the trained speech sounds.
The sensorimotor and social sides of the architecture of speech.
Pezzulo, Giovanni; Barca, Laura; D'Ausilio, Alessando
2014-12-01
Speech is a complex skill to master. In addition to sophisticated phono-articulatory abilities, speech acquisition requires neuronal systems configured for vocal learning, with adaptable sensorimotor maps that couple heard speech sounds with motor programs for speech production; imitation and self-imitation mechanisms that can train the sensorimotor maps to reproduce heard speech sounds; and a "pedagogical" learning environment that supports tutor learning.
Envelope Responses in Single-Trial EEG Indicate Attended Speaker in a Cocktail Party
2013-06-20
users to modulate their brain activity, such as motor rhythms, in order to signal intent [13], but these often require considerable training . Other...BCIs forgo training and instead have subjects make choices by attending to one of multiple visual and/or auditory stimuli. By presenting each stimulus...modulated). An envelope-based BCI could operate on more naturalistic auditory stimuli, such as speech or music . For example, an envelope-based BCI
Watch what you say, your computer might be listening: A review of automated speech recognition
NASA Technical Reports Server (NTRS)
Degennaro, Stephen V.
1991-01-01
Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.
Xiao, Bo; Imel, Zac E.; Georgiou, Panayiotis G.; Atkins, David C.; Narayanan, Shrikanth S.
2015-01-01
The technology for evaluating patient-provider interactions in psychotherapy–observational coding–has not changed in 70 years. It is labor-intensive, error prone, and expensive, limiting its use in evaluating psychotherapy in the real world. Engineering solutions from speech and language processing provide new methods for the automatic evaluation of provider ratings from session recordings. The primary data are 200 Motivational Interviewing (MI) sessions from a study on MI training methods with observer ratings of counselor empathy. Automatic Speech Recognition (ASR) was used to transcribe sessions, and the resulting words were used in a text-based predictive model of empathy. Two supporting datasets trained the speech processing tasks including ASR (1200 transcripts from heterogeneous psychotherapy sessions and 153 transcripts and session recordings from 5 MI clinical trials). The accuracy of computationally-derived empathy ratings were evaluated against human ratings for each provider. Computationally-derived empathy scores and classifications (high vs. low) were highly accurate against human-based codes and classifications, with a correlation of 0.65 and F-score (a weighted average of sensitivity and specificity) of 0.86, respectively. Empathy prediction using human transcription as input (as opposed to ASR) resulted in a slight increase in prediction accuracies, suggesting that the fully automatic system with ASR is relatively robust. Using speech and language processing methods, it is possible to generate accurate predictions of provider performance in psychotherapy from audio recordings alone. This technology can support large-scale evaluation of psychotherapy for dissemination and process studies. PMID:26630392
McMurray, Bob; Jongman, Allard
2012-01-01
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed. PMID:21417542
Neumann, K; Holler-Zittlau, I; van Minnen, S; Sick, U; Zaretsky, Y; Euler, H A
2011-01-01
The German Kindersprachscreening (KiSS) is a universal speech and language screening test for large-scale identification of Hessian kindergarten children requiring special educational language training or clinical speech/language therapy. To calculate the procedural screening validity, 257 children (aged 4.0 to 4.5 years) were tested using KiSS and four language tests (Reynell Development Language Scales III, Patholinguistische Diagnostik, PLAKSS, AWST-R). The majority or consensus judgements of three speech-language professionals, based on the language test results, served as a reference criterion. The base (fail) rates of the professionals were either self-determined or preset based on known prevalence rates. Screening validity was higher for preset than for self-determined base rates due to higher inter-judge agreement. The confusion matrices of the overall index classification of the KiSS (speech-language abnormalities with educational or clinical needs) with the fixed base rate expert judgement about language impairment, including fluency or voice disorders, yielded a sensitivity of 88% and a specificity of 78%, for just language impairment 84% and 75%, respectively. Specificities for disorders requiring clinical diagnostics in the KiSS (language impairment alone or combined with fluency/voice disorders) related to the test-based consensus expert judgment was about 93%. Sensitivities were unsatisfactory because the differentiation between educational and clinical needs requires improvement. Since the judgement concordances between the speech-language professionals was only moderate, the development of a comprehensive German reference test for speech and language disorders with evidence-based algorithmic decision rules rather than subjective clinical judgement is advocated.
ERIC Educational Resources Information Center
Harris, Karen R.
To investigate task performance and the use of private speech and to examine the effects of a cognitive training approach, 30 learning disabled (LD) and 30 nonLD Ss (7 to 8 years old) were given a 17 piece wooden puzzle rigged so that it could not be completed correctly. Six variables were measured: (1) proportion of private speech that was task…
Decoding Articulatory Features from fMRI Responses in Dorsal Speech Regions.
Correia, Joao M; Jansma, Bernadette M B; Bonte, Milene
2015-11-11
The brain's circuitry for perceiving and producing speech may show a notable level of overlap that is crucial for normal development and behavior. The extent to which sensorimotor integration plays a role in speech perception remains highly controversial, however. Methodological constraints related to experimental designs and analysis methods have so far prevented the disentanglement of neural responses to acoustic versus articulatory speech features. Using a passive listening paradigm and multivariate decoding of single-trial fMRI responses to spoken syllables, we investigated brain-based generalization of articulatory features (place and manner of articulation, and voicing) beyond their acoustic (surface) form in adult human listeners. For example, we trained a classifier to discriminate place of articulation within stop syllables (e.g., /pa/ vs /ta/) and tested whether this training generalizes to fricatives (e.g., /fa/ vs /sa/). This novel approach revealed generalization of place and manner of articulation at multiple cortical levels within the dorsal auditory pathway, including auditory, sensorimotor, motor, and somatosensory regions, suggesting the representation of sensorimotor information. Additionally, generalization of voicing included the right anterior superior temporal sulcus associated with the perception of human voices as well as somatosensory regions bilaterally. Our findings highlight the close connection between brain systems for speech perception and production, and in particular, indicate the availability of articulatory codes during passive speech perception. Sensorimotor integration is central to verbal communication and provides a link between auditory signals of speech perception and motor programs of speech production. It remains highly controversial, however, to what extent the brain's speech perception system actively uses articulatory (motor), in addition to acoustic/phonetic, representations. In this study, we examine the role of articulatory representations during passive listening using carefully controlled stimuli (spoken syllables) in combination with multivariate fMRI decoding. Our approach enabled us to disentangle brain responses to acoustic and articulatory speech properties. In particular, it revealed articulatory-specific brain responses of speech at multiple cortical levels, including auditory, sensorimotor, and motor regions, suggesting the representation of sensorimotor information during passive speech perception. Copyright © 2015 the authors 0270-6474/15/3515015-11$15.00/0.
Celeste, Letícia Corrêa; Zanoni, Graziela; Queiroga, Bianca; Alves, Luciana Mendonça
2017-03-09
To map the profile of Brazilian Speech Therapists who report acting in Educational Speech Therapy, with regard to aspects related to training, performance and professional experience. Retrospective study, based on secondary database analysis of the Federal Council of Hearing and Speech Sciences on the questionnaires reporting acting with Educational Environment. 312 questionnaires were completed, of which 93.3% by women aged 30-39 years. Most Speech Therapists continued the studies, opting mostly for specialization. Almost 50% of respondents, have worked for less than six years with the speciality, most significantly in the public service (especially municipal) and private area. The profile of the Speech Therapists active in the Educational area in Brazil is a professional predominantly female, who values to continue their studies after graduation, looking mostly for specialization in the following areas: Audiology and Orofacial Motor. The time experience of the majority is up to 10 years of work whose nature is divided mainly in public (municipal) and private schools. The performance of Speech Therapists in the Educational area concentrates in Elementary and Primary school, with varied workload.
ERIC Educational Resources Information Center
2000
This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…
Speech-perception training for older adults with hearing loss impacts word recognition and effort.
Kuchinsky, Stefanie E; Ahlstrom, Jayne B; Cute, Stephanie L; Humes, Larry E; Dubno, Judy R; Eckert, Mark A
2014-10-01
The current pupillometry study examined the impact of speech-perception training on word recognition and cognitive effort in older adults with hearing loss. Trainees identified more words at the follow-up than at the baseline session. Training also resulted in an overall larger and faster peaking pupillary response, even when controlling for performance and reaction time. Perceptual and cognitive capacities affected the peak amplitude of the pupil response across participants but did not diminish the impact of training on the other pupil metrics. Thus, we demonstrated that pupillometry can be used to characterize training-related and individual differences in effort during a challenging listening task. Importantly, the results indicate that speech-perception training not only affects overall word recognition, but also a physiological metric of cognitive effort, which has the potential to be a biomarker of hearing loss intervention outcome. Copyright © 2014 Society for Psychophysiological Research.
Palmer, Rebecca; Enderby, Pam
2016-10-01
The speech-language pathology profession has explored a number of approaches to support efficient delivery of interventions for people with stroke-induced aphasia. This study aimed to explore the role of volunteers in supporting self-managed practice of computerised language exercises. A qualitative interview study of the volunteer support role was carried out alongside a pilot randomised controlled trial of computer aphasia therapy. Patients with aphasia practised computer exercises tailored for them by a speech-language pathologist at home regularly for 5 months. Eight of the volunteers who supported the intervention took part in semi-structured interviews. Interviews were audio recorded, transcribed verbatim and analysed thematically. Emergent themes included: training and support requirements; perception of the volunteer role; challenges facing the volunteer, in general and specifically related to supporting computer therapy exercises. The authors concluded that volunteers helped to motivate patients to practise their computer therapy exercises and also provided support to the carers. Training and ongoing structured support of therapy activity and conduct is required from a trained speech-language pathologist to ensure the successful involvement of volunteers supporting impairment-based computer exercises in patients' own homes.
ERIC Educational Resources Information Center
Lee, Andrew H.; Lyster, Roy
2016-01-01
This study investigated the effects of different types of corrective feedback (CF) provided during second language (L2) speech perception training. One hundred Korean learners of L2 English, randomly assigned to five groups (n = 20 per group), participated in eight computer-assisted perception training sessions targeting two minimal pairs of…
Auditory training changes temporal lobe connectivity in 'Wernicke's aphasia': a randomised trial.
Woodhead, Zoe Vj; Crinion, Jennifer; Teki, Sundeep; Penny, Will; Price, Cathy J; Leff, Alexander P
2017-07-01
Aphasia is one of the most disabling sequelae after stroke, occurring in 25%-40% of stroke survivors. However, there remains a lack of good evidence for the efficacy or mechanisms of speech comprehension rehabilitation. This within-subjects trial tested two concurrent interventions in 20 patients with chronic aphasia with speech comprehension impairment following left hemisphere stroke: (1) phonological training using 'Earobics' software and (2) a pharmacological intervention using donepezil, an acetylcholinesterase inhibitor. Donepezil was tested in a double-blind, placebo-controlled, cross-over design using block randomisation with bias minimisation. The primary outcome measure was speech comprehension score on the comprehensive aphasia test. Magnetoencephalography (MEG) with an established index of auditory perception, the mismatch negativity response, tested whether the therapies altered effective connectivity at the lower (primary) or higher (secondary) level of the auditory network. Phonological training improved speech comprehension abilities and was particularly effective for patients with severe deficits. No major adverse effects of donepezil were observed, but it had an unpredicted negative effect on speech comprehension. The MEG analysis demonstrated that phonological training increased synaptic gain in the left superior temporal gyrus (STG). Patients with more severe speech comprehension impairments also showed strengthening of bidirectional connections between the left and right STG. Phonological training resulted in a small but significant improvement in speech comprehension, whereas donepezil had a negative effect. The connectivity results indicated that training reshaped higher order phonological representations in the left STG and (in more severe patients) induced stronger interhemispheric transfer of information between higher levels of auditory cortex.Clinical trial registrationThis trial was registered with EudraCT (2005-004215-30, https:// eudract .ema.europa.eu/) and ISRCTN (68939136, http://www.isrctn.com/). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Improving the Child's Speech. Second Edition.
ERIC Educational Resources Information Center
Anderson, Virgil A.; Newby, Hayes A.
This is primarily a book for the classroom teacher, although it will also prove useful to parents, child-guidance workers, physicians, and others who are concerned with the development and training of children with speech handicaps. Contents include "Speech Improvement as an Educational Problem,""Recognizing Speech Disabilities,""The Development…
Idaho's Three-Tiered System for Speech-Language Paratherapist Training and Utilization.
ERIC Educational Resources Information Center
Longhurst, Thomas M.
1997-01-01
Discusses the development and current implementation of Idaho's three-tiered system of speech-language paratherapists. Support personnel providing speech-language services to learners with special communication needs in educational settings must obtain one of three certification levels: (1) speech-language aide, (2) associate degree…
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2015-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes. PMID:25688196
Aroudi, Ali; Doclo, Simon
2017-07-01
To decode auditory attention from single-trial EEG recordings in an acoustic scenario with two competing speakers, a least-squares method has been recently proposed. This method however requires the clean speech signals of both the attended and the unattended speaker to be available as reference signals. Since in practice only the binaural signals consisting of a reverberant mixture of both speakers and background noise are available, in this paper we explore the potential of using these (unprocessed) signals as reference signals for decoding auditory attention in different acoustic conditions (anechoic, reverberant, noisy, and reverberant-noisy). In addition, we investigate whether it is possible to use these signals instead of the clean attended speech signal for filter training. The experimental results show that using the unprocessed binaural signals for filter training and for decoding auditory attention is feasible with a relatively large decoding performance, although for most acoustic conditions the decoding performance is significantly lower than when using the clean speech signals.
NASA Technical Reports Server (NTRS)
Wolf, Jared J.
1977-01-01
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Community Health Workers perceptions in relation to speech and language disorders.
Knochenhauer, Carla Cristina Lins Santos; Vianna, Karina Mary de Paiva
2016-01-01
To know the perception of the Community Health Workers (CHW) about the speech and language disorders. Cross-sectional study, which involved a questionnaire with questions related to the knowledge of CHW on speech and language disorders. The research was carried out with CHW allocated in the Centro Sanitary District of Florianópolis. We interviewed 35 CHW, being mostly (80%) female gender, with a average age of 47 years (standard deviation = 2.09 years). From the total number of interviewed professionals, 57% said that they knew the work of the speech therapist, 57% believe that there is no relationship between chronic diseases and speech therapy and 97% who think the participation of Speech, Hearing and Language Sciences is important in primary care. As for capacity development, 88% of CHW claim not to have had any training performed by a speech therapist, 75% of professionals stated they had done the training Estratégia Amamenta e Alimenta Brasil, 57% of the Programa Capital Criança and 41% of the Programa Capital Idoso. The knowledge of CHW about the work of a speech therapist is still limited, but the importance of speech and language disorders is recognized in primary care. The lack of knowledge, with regard to speech and language disorders, may be related to lack of qualification of the CHW in actions and/or continuing education courses that could clarify and educate these professionals to identify and better educate the population in their home visits. This study highlights the need for further research on training actions of these professionals.
Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai
2017-01-01
Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.
2013-01-01
Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
ERIC Educational Resources Information Center
Hardison, Debra M.; Sonchaeng, Chayawan
2005-01-01
This paper provides a sequence of specific techniques and examples for implementing theatre voice training and technology in teaching ESL/EFL oral skills. A layered approach is proposed based on information processing theory in which the focus of learner attention is shifted in stages from the physiological to the linguistic and then to the…
Loebach, Jeremy L.; Pisoni, David B.; Svirsky, Mario A.
2009-01-01
Objective The objective of this study was to assess whether training on speech processed with an 8-channel noise vocoder to simulate the output of a cochlear implant would produce transfer of auditory perceptual learning to the recognition of non-speech environmental sounds, the identification of speaker gender, and the discrimination of talkers by voice. Design Twenty-four normal hearing subjects were trained to transcribe meaningful English sentences processed with a noise vocoder simulation of a cochlear implant. An additional twenty-four subjects served as an untrained control group and transcribed the same sentences in their unprocessed form. All subjects completed pre- and posttest sessions in which they transcribed vocoded sentences to provide an assessment of training efficacy. Transfer of perceptual learning was assessed using a series of closed-set, nonlinguistic tasks: subjects identified talker gender, discriminated the identity of pairs of talkers, and identified ecologically significant environmental sounds from a closed set of alternatives. Results Although both groups of subjects showed significant pre- to posttest improvements, subjects who transcribed vocoded sentences during training performed significantly better at posttest than subjects in the control group. Both groups performed equally well on gender identification and talker discrimination. Subjects who received explicit training on the vocoded sentences, however, performed significantly better on environmental sound identification than the untrained subjects. Moreover, across both groups, pretest speech performance, and to a higher degree posttest speech performance, were significantly correlated with environmental sound identification. For both groups, environmental sounds that were characterized as having more salient temporal information were identified more often than environmental sounds that were characterized as having more salient spectral information. Conclusions Listeners trained to identify noise-vocoded sentences showed evidence of transfer of perceptual learning to the identification of environmental sounds. In addition, the correlation between environmental sound identification and sentence transcription indicates that subjects who were better able to utilize the degraded acoustic information to identify the environmental sounds were also better able to transcribe the linguistic content of novel sentences. Both trained and untrained groups performed equally well (~75% correct) on the gender identification task, indicating that training did not have an effect on the ability to identify the gender of talkers. Although better than chance, performance on the talker discrimination task was poor overall (~55%), suggesting that either explicit training is required to reliably discriminate talkers’ voices, or that additional information (perhaps spectral in nature) not present in the vocoded speech is required to excel in such tasks. Taken together, the results suggest that while transfer of auditory perceptual learning with spectrally degraded speech does occur, explicit task-specific training may be necessary for tasks that cannot rely on temporal information alone. PMID:19773659
A Survey of Speech-Language-Hearing Therapists' Career Situation and Challenges in Mainland China.
Lin, Qiang; Lu, Jianliang; Chen, Zhuoming; Yan, Jiajian; Wang, Hong; Ouyang, Hui; Mou, Zhiwei; Huang, Dongfeng; O'Young, Bryan
2016-01-01
The aim of this survey was to investigate the background of speech-language pathologists and their training needs to provide a profile of the current state of the profession in Mainland China. A survey was conducted of 293 speech-language therapists. The questionnaire used asked questions related to their career background and had a 24-item ranking scale covering almost all of the common speech-language-hearing disorders. A summary of the raw data was constructed by calculating the average ranking score for each answer choice in order to determine the academic training needs with the highest preference among the respondents. The majority of respondents were female, <35 years old and with a total service time of <5 years. More than three quarters of the training needs with the highest preference among the 24 items involved basic-level knowledge of common speech-language-hearing disorders, such as diagnosis, assessment and conventional treatment, but seldom specific advanced technology or current progress. The results revealed that speech-language therapists in Mainland China tend to be young, with little total working experience and at the first stage of their career. This may be due to the lack of systematic educational programs and national certification systems for speech-language therapists. © 2016 S. Karger AG, Basel.
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation
Banks, Briony; Gowen, Emma; Munro, Kevin J.; Adank, Patti
2015-01-01
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation. PMID:26283946
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.
Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti
2015-01-01
Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.
Speech intelligibility in complex acoustic environments in young children
NASA Astrophysics Data System (ADS)
Litovsky, Ruth
2003-04-01
While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.
Fels, S S; Hinton, G E
1998-01-01
Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Finch, Emma; Cameron, Ashley; Fleming, Jennifer; Lethlean, Jennifer; Hudson, Kyla; McPhail, Steven
2017-07-01
Aphasia is a common consequence of stroke. Despite receiving specialised training in communication, speech-language pathology students may lack confidence when communicating with People with Aphasia (PWA). This paper reports data from secondary outcome measures from a randomised controlled trial. The aim of the current study was to examine the effects of communication partner training on the communication skills of speech-language pathology students during conversations with PWA. Thirty-eight speech-language pathology students were randomly allocated to trained and untrained groups. The first group received a lecture about communication strategies for communicating with PWA then participated in a conversation with PWA (Trained group), while the second group of students participated in a conversation with the PWA without receiving the lecture (Untrained group). The conversations between the groups were analysed according to the Measure of skill in Supported Conversation (MSC) scales, Measure of Participation in Conversation (MPC) scales, types of strategies used in conversation, and the occurrence and repair of conversation breakdowns. The trained group received significantly higher MSC Revealing Competence scores, used significantly more props, and introduced significantly more new ideas into the conversation than the untrained group. The trained group also used more gesture and writing to facilitate the conversation, however, the difference was not significant. There was no significant difference between the groups according to MSC Acknowledging Competence scores, MPC Interaction or Transaction scores, or in the number of interruptions, minor or major conversation breakdowns, or in the success of strategies initiated to repair the conversation breakdowns. Speech-language pathology students may benefit from participation in communication partner training programs. Copyright © 2017 Elsevier Inc. All rights reserved.
Nowakowski, Matilda E; Antony, Martin M; Koerner, Naomi
2015-12-01
The present study investigated the effects of computerized interpretation training and cognitive restructuring on symptomatology, behavior, and physiological reactivity in an analogue social anxiety sample. Seventy-two participants with elevated social anxiety scores were randomized to one session of computerized interpretation training (n = 24), cognitive restructuring (n = 24), or an active placebo control condition (n = 24). Participants completed self-report questionnaires focused on interpretation biases and social anxiety symptomatology at pre and posttraining and a speech task at posttraining during which subjective, behavioral, and physiological measures of anxiety were assessed. Only participants in the interpretation training condition endorsed significantly more positive than negative interpretations of ambiguous social situations at posttraining. There was no evidence of generalizability of interpretation training effects to self-report measures of interpretation biases and symptomatology or the anxiety response during the posttraining speech task. Participants in the cognitive restructuring condition were rated as having higher quality speeches and showing fewer signs of anxiety during the posttraining speech task compared to participants in the interpretation training condition. The present study did not include baseline measures of speech performance or computer assessed interpretation biases. The results of the present study bring into question the generalizability of computerized interpretation training as well as the effectiveness of a single session of cognitive restructuring in modifying the full anxiety response. Clinical and theoretical implications are discussed. Copyright © 2015 Elsevier Ltd. All rights reserved.
Replacing maladaptive speech with verbal labeling responses: an analysis of generalized responding.
Foxx, R M; Faw, G D; McMorrow, M J; Kyle, M S; Bittle, R G
1988-01-01
We taught three mentally handicapped students to answer questions with verbal labels and evaluated the generalized effects of this training on their maladaptive speech (e.g., echolalia) and correct responding to untrained questions. The students received cues-pause-point training on an initial question set followed by generalization assessments on a different set in another setting. Probes were conducted on novel questions in three other settings to determine the strength and spread of the generalization effect. A multiple baseline across subjects design revealed that maladaptive speech was replaced with correct labels (answers) to questions in the training and all generalization settings. These results replicate and extend previous research that suggested that cues-pause-point procedures may be useful in replacing maladaptive speech patterns by teaching students to use their verbal labeling repertoires. PMID:3225258
The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.
NASA Astrophysics Data System (ADS)
Baldwin, Carol May
Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.
Parent-child interaction in motor speech therapy.
Namasivayam, Aravind Kumar; Jethava, Vibhuti; Pukonen, Margit; Huynh, Anna; Goshulak, Debra; Kroll, Robert; van Lieshout, Pascal
2018-01-01
This study measures the reliability and sensitivity of a modified Parent-Child Interaction Observation scale (PCIOs) used to monitor the quality of parent-child interaction. The scale is part of a home-training program employed with direct motor speech intervention for children with speech sound disorders. Eighty-four preschool age children with speech sound disorders were provided either high- (2×/week/10 weeks) or low-intensity (1×/week/10 weeks) motor speech intervention. Clinicians completed the PCIOs at the beginning, middle, and end of treatment. Inter-rater reliability (Kappa scores) was determined by an independent speech-language pathologist who assessed videotaped sessions at the midpoint of the treatment block. Intervention sensitivity of the scale was evaluated using a Friedman test for each item and then followed up with Wilcoxon pairwise comparisons where appropriate. We obtained fair-to-good inter-rater reliability (Kappa = 0.33-0.64) for the PCIOs using only video-based scoring. Child-related items were more strongly influenced by differences in treatment intensity than parent-related items, where a greater number of sessions positively influenced parent learning of treatment skills and child behaviors. The adapted PCIOs is reliable and sensitive to monitor the quality of parent-child interactions in a 10-week block of motor speech intervention with adjunct home therapy. Implications for rehabilitation Parent-centered therapy is considered a cost effective method of speech and language service delivery. However, parent-centered models may be difficult to implement for treatments such as developmental motor speech interventions that require a high degree of skill and training. For children with speech sound disorders and motor speech difficulties, a translated and adapted version of the parent-child observation scale was found to be sufficiently reliable and sensitive to assess changes in the quality of the parent-child interactions during intervention. In developmental motor speech interventions, high-intensity treatment (2×/week/10 weeks) facilitates greater changes in the parent-child interactions than low intensity treatment (1×/week/10 weeks). On one hand, parents may need to attend more than five sessions with the clinician to learn how to observe and address their child's speech difficulties. On the other hand, children with speech sound disorders may need more than 10 sessions to adapt to structured play settings even when activities and therapy materials are age-appropriate.
Translation of incremental talk test responses to steady-state exercise training intensity.
Lyon, Ellen; Menke, Miranda; Foster, Carl; Porcari, John P; Gibson, Mark; Bubbers, Terresa
2014-01-01
The Talk Test (TT) is a submaximal, incremental exercise test that has been shown to be useful in prescribing exercise training intensity. It is based on a subject's ability to speak comfortably during exercise. This study defined the amount of reduction in absolute workload intensity from an incremental exercise test using the TT to give appropriate absolute training intensity for cardiac rehabilitation patients. Patients in an outpatient rehabilitation program (N = 30) performed an incremental exercise test with the TT given every 2-minute stage. Patients rated their speech comfort after reciting a standardized paragraph. Anything other than a "yes" response was considered the "equivocal" stage, while all preceding stages were "positive" stages. The last stage with the unequivocally positive ability to speak was the Last Positive (LP), and the preceding stages were (LP-1 and LP-2). Subsequently, three 20-minute steady-state training bouts were performed in random order at the absolute workload at the LP, LP-1, and LP-2 stages of the incremental test. Speech comfort, heart rate (HR), and rating of perceived exertion (RPE) were recorded every 5 minutes. The 20-minute exercise training bout was completed fully by LP (n = 19), LP-1 (n = 28), and LP-2 (n = 30). Heart rate, RPE, and speech comfort were similar through the LP-1 and LP-2 tests, but the LP stage was markedly more difficult. Steady-state exercise training intensity was easily and appropriately prescribed at intensity associated with the LP-1 and LP-2 stages of the TT. The LP stage may be too difficult for patients in a cardiac rehabilitation program.
ERIC Educational Resources Information Center
McDonald, David; Proctor, Penny; Gill, Wendy; Heaven, Sue; Marr, Jane; Young, Jane
2015-01-01
Intensive Speech and Language Therapy (SLT) training courses for Early Childhood Educators (ECEs) can have a positive effect on their use of interaction strategies that support children's communication skills. The impact of brief SLT training courses is not yet clearly understood. The aims of these two studies were to assess the impact of a brief…
Auditory Training with Frequent Communication Partners
ERIC Educational Resources Information Center
Tye-Murray, Nancy; Spehar, Brent; Sommers, Mitchell; Barcroft, Joe
2016-01-01
Purpose: Individuals with hearing loss engage in auditory training to improve their speech recognition. They typically practice listening to utterances spoken by unfamiliar talkers but never to utterances spoken by their most frequent communication partner (FCP)--speech they most likely desire to recognize--under the assumption that familiarity…
Strait, Dana L.; Kraus, Nina
2011-01-01
Even in the quietest of rooms, our senses are perpetually inundated by a barrage of sounds, requiring the auditory system to adapt to a variety of listening conditions in order to extract signals of interest (e.g., one speaker's voice amidst others). Brain networks that promote selective attention are thought to sharpen the neural encoding of a target signal, suppressing competing sounds and enhancing perceptual performance. Here, we ask: does musical training benefit cortical mechanisms that underlie selective attention to speech? To answer this question, we assessed the impact of selective auditory attention on cortical auditory-evoked response variability in musicians and non-musicians. Outcomes indicate strengthened brain networks for selective auditory attention in musicians in that musicians but not non-musicians demonstrate decreased prefrontal response variability with auditory attention. Results are interpreted in the context of previous work documenting perceptual and subcortical advantages in musicians for the hearing and neural encoding of speech in background noise. Musicians’ neural proficiency for selectively engaging and sustaining auditory attention to language indicates a potential benefit of music for auditory training. Given the importance of auditory attention for the development and maintenance of language-related skills, musical training may aid in the prevention, habilitation, and remediation of individuals with a wide range of attention-based language, listening and learning impairments. PMID:21716636
Biological Impact of Music and Software-Based Auditory Training
ERIC Educational Resources Information Center
Kraus, Nina
2012-01-01
Auditory-based communication skills are developed at a young age and are maintained throughout our lives. However, some individuals--both young and old--encounter difficulties in achieving or maintaining communication proficiency. Biological signals arising from hearing sounds relate to real-life communication skills such as listening to speech in…
Characteristics of speaking style and implications for speech recognition.
Shinozaki, Takahiro; Ostendorf, Mari; Atlas, Les
2009-09-01
Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.
Leonard, Matthew K; Desai, Maansi; Hungate, Dylan; Cai, Ruofan; Singhal, Nilika S; Knowlton, Robert C; Chang, Edward F
2018-05-22
Music and speech are human-specific behaviours that share numerous properties, including the fine motor skills required to produce them. Given these similarities, previous work has suggested that music and speech may at least partially share neural substrates. To date, much of this work has focused on perception, and has not investigated the neural basis of production, particularly in trained musicians. Here, we report two rare cases of musicians undergoing neurosurgical procedures, where it was possible to directly stimulate the left hemisphere cortex during speech and piano/guitar music production tasks. We found that stimulation to left inferior frontal cortex, including pars opercularis and ventral pre-central gyrus, caused slowing and arrest for both speech and music, and note sequence errors for music. Stimulation to posterior superior temporal cortex only caused production errors during speech. These results demonstrate partially dissociable networks underlying speech and music production, with a shared substrate in frontal regions.
NASA Astrophysics Data System (ADS)
Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan
2016-05-01
With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.
Schmidt-Naylor, Anna C; Saunders, Kathryn J; Brady, Nancy C
2017-05-17
We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy.
ERIC Educational Resources Information Center
Borrie, Stephanie A.; Schäfer, Martina C. M.
2017-01-01
Purpose: Intelligibility improvements immediately following perceptual training with dysarthric speech using lexical feedback are comparable to those observed when training uses somatosensory feedback (Borrie & Schäfer, 2015). In this study, we investigated if these lexical and somatosensory guided improvements in listener intelligibility of…
Preparing Speech Language Pathology Students to Work in Early Childhood
ERIC Educational Resources Information Center
Barton, Erin E.; Moore, Heather W.; Squires, Jane K.
2012-01-01
The shortage of highly qualified speech language pathologists (SLPs) with specialized training in early intervention and early childhood special education (EI/ECSE) is a pressing issue facing the field and dramatically impacts young children's social and academic success. SLP personnel preparation programs focused on training specialists in…
Context-Conditioned Generalization in Adaptation to Distorted Speech
ERIC Educational Resources Information Center
Dahan, Delphine; Mead, Rebecca L.
2010-01-01
People were trained to decode noise-vocoded speech by hearing monosyllabic stimuli in distorted and unaltered forms. When later presented with different stimuli, listeners were able to successfully generalize their experience. However, generalization was modulated by the degree to which testing stimuli resembled training stimuli: Testing stimuli's…
38 CFR 21.152 - Interpreter service for the hearing impaired.
Code of Federal Regulations, 2010 CFR
2010-07-01
... development and pursuit of a rehabilitation program. This service will be provided if: (1) A VA physician... determines that the veteran: (i) Can benefit from language and speech training; and (ii) Agrees to undertake language and speech training. (b) Periods during which interpreter service may be provided. Interpreter...
Earle, F Sayako; Myers, Emily B
2015-01-01
This investigation explored the generalization of phonetic learning across talkers following training on a nonnative (Hindi dental and retroflex) contrast. Participants were trained in two groups, either in the morning or in the evening. Discrimination and identification performance was assessed in the trained talker and an untrained talker three times over 24 h following training. Results suggest that overnight consolidation promotes generalization across talkers in identification, but not necessarily discrimination, of nonnative speech sounds.
A nationwide survey of nonspeech oral motor exercise use: implications for evidence-based practice.
Lof, Gregory L; Watson, Maggie M
2008-07-01
A nationwide survey was conducted to determine if speech-language pathologists (SLPs) use nonspeech oral motor exercises (NSOMEs) to address children's speech sound problems. For those SLPs who used NSOMEs, the survey also identified (a) the types of NSOMEs used by the SLPs, (b) the SLPs' underlying beliefs about why they use NSOMEs, (c) clinicians' training for these exercises, (d) the application of NSOMEs across various clinical populations, and (e) specific tasks/procedures/tools that are used for intervention. A total of 2,000 surveys were mailed to a randomly selected subgroup of SLPs, obtained from the American Speech-Language-Hearing Association (ASHA) membership roster, who self-identified that they worked in various settings with children who have speech sound problems. The questions required answers that used both a forced choice and Likert-type scales. The response rate was 27.5% (537 out of 2,000). Of these respondents, 85% reported using NSOMEs to deal with children's speech sound production problems. Those SLPs reported that the research literature supports the use of NSOMEs, and that they learned to use these techniques from continuing education events. They also stated that NSOMEs can help improve the speech of children from disparate etiologies, and "warming up" and strengthening the articulators are important components of speech sound therapy. There are theoretical and research data that challenge both the use of NSOMEs and the efficacy of such exercises in resolving speech sound problems. SLPs need to follow the concepts of evidence-based practice in order to determine if these exercises are actually effective in bringing about changes in speech productions.
Nonhomogeneous transfer reveals specificity in speech motor learning.
Rochet-Capellan, Amélie; Richer, Lara; Ostry, David J
2012-03-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning.
Nonhomogeneous transfer reveals specificity in speech motor learning
Rochet-Capellan, Amélie; Richer, Lara
2012-01-01
Does motor learning generalize to new situations that are not experienced during training, or is motor learning essentially specific to the training situation? In the present experiments, we use speech production as a model to investigate generalization in motor learning. We tested for generalization from training to transfer utterances by varying the acoustical similarity between these two sets of utterances. During the training phase of the experiment, subjects received auditory feedback that was altered in real time as they repeated a single consonant-vowel-consonant utterance. Different groups of subjects were trained with different consonant-vowel-consonant utterances, which differed from a subsequent transfer utterance in terms of the initial consonant or vowel. During the adaptation phase of the experiment, we observed that subjects in all groups progressively changed their speech output to compensate for the perturbation (altered auditory feedback). After learning, we tested for generalization by having all subjects produce the same single transfer utterance while receiving unaltered auditory feedback. We observed limited transfer of learning, which depended on the acoustical similarity between the training and the transfer utterances. The gradients of generalization observed here are comparable to those observed in limb movement. The present findings are consistent with the conclusion that speech learning remains specific to individual instances of learning. PMID:22190628
Music and speech distractors disrupt sensorimotor synchronization: effects of musical training.
Białuńska, Anita; Dalla Bella, Simone
2017-12-01
Humans display a natural tendency to move to the beat of music, more than to the rhythm of any other auditory stimulus. We typically move with music, but rarely with speech. This proclivity is apparent early during development and can be further developed over the years via joint dancing, singing, or instrument playing. Synchronization of movement to the beat can thus improve with age, but also with musical experience. In a previous study, we found that music perturbed synchronization with a metronome more than speech fragments; music superiority disappeared when distractors shared isochrony and the same meter (Dalla Bella et al., PLoS One 8(8):e71945, 2013). Here, we examined if the interfering effect of music and speech distractors in a synchronization task is influenced by musical training. Musicians and non-musicians synchronized by producing finger force pulses to the sounds of a metronome while music and speech distractors were presented at one of various phase relationships with respect to the target. Distractors were familiar musical excerpts and fragments of children poetry comparable in terms of beat/stress isochrony. Music perturbed synchronization with the metronome more than speech did in both groups. However, the difference in synchronization error between music and speech distractors was smaller for musicians than for non-musicians, especially when the peak force of movement is reached. These findings point to a link between musical training and timing of sensorimotor synchronization when reacting to music and speech distractors.
McKnight, Lindsay M; O'Malley-Keighran, Mary-Pat; Carroll, Clare
2016-11-01
There is evidence indicating that parent training programmes including interaction coaching of parents of children with autism spectrum disorders (ASD) can increase parental responsiveness, promote language development and social interaction skills in children with ASD. However, there is a lack of research exploring precisely how healthcare professionals use language in interaction coaching. To identify the speech acts of healthcare professionals during individual video-recorded interaction coaching sessions of a Hanen-influenced parent training programme with parents of children with ASD. This retrospective study used speech act analysis. Healthcare professional participants included two speech-language therapists and one occupational therapist. Sixteen videos were transcribed and a speech act analysis was conducted to identify the form and functions of the language used by the healthcare professionals. Descriptive statistics provided frequencies and percentages for the different speech acts used across the 16 videos. Six types of speech acts used by the healthcare professionals during coaching sessions were identified. These speech acts were, in order of frequency: Instructing, Modelling, Suggesting, Commanding, Commending and Affirming. The healthcare professionals were found to tailor their interaction coaching to the learning needs of the parents. A pattern was observed in which more direct speech acts were used in instances where indirect speech acts did not achieve the intended response. The study provides an insight into the nature of interaction coaching provided by healthcare professionals during a parent training programme. It identifies the types of language used during interaction coaching. It also highlights additional important aspects of interaction coaching such as the ability of healthcare professionals to adjust the directness of the coaching in order to achieve the intended parental response to the child's interaction. The findings may be used to increase the awareness of healthcare professionals about the types of speech acts used during interaction coaching as well as the manner in which coaching sessions are conducted. © 2016 Royal College of Speech and Language Therapists.
Sleep and Native Language Interference Affect Non-Native Speech Sound Learning
Earle, F. Sayako; Myers, Emily B.
2015-01-01
Adults learning a new language are faced with a significant challenge: non-native speech sounds that are perceptually similar to sounds in one’s native language can be very difficult to acquire. Sleep and native language interference, two factors that may help to explain this difficulty in acquisition, are addressed in three studies. Results of Experiment 1 showed that participants trained on a non-native contrast at night improved in discrimination 24 hours after training, while those trained in the morning showed no such improvement. Experiments 2 and 3 addressed the possibility that incidental exposure to perceptually similar native language speech sounds during the day interfered with maintenance in the morning group. Taken together, results show that the ultimate success of non-native speech sound learning depends not only on the similarity of learned sounds to the native language repertoire, but also to interference from native language sounds before sleep. PMID:26280264
Sleep and native language interference affect non-native speech sound learning.
Earle, F Sayako; Myers, Emily B
2015-12-01
Adults learning a new language are faced with a significant challenge: non-native speech sounds that are perceptually similar to sounds in one's native language can be very difficult to acquire. Sleep and native language interference, 2 factors that may help to explain this difficulty in acquisition, are addressed in 3 studies. Results of Experiment 1 showed that participants trained on a non-native contrast at night improved in discrimination 24 hr after training, while those trained in the morning showed no such improvement. Experiments 2 and 3 addressed the possibility that incidental exposure to perceptually similar native language speech sounds during the day interfered with maintenance in the morning group. Taken together, results show that the ultimate success of non-native speech sound learning depends not only on the similarity of learned sounds to the native language repertoire, but also to interference from native language sounds before sleep. (c) 2015 APA, all rights reserved).
Ward, Roslyn; Leitão, Suze; Strauss, Geoff
2014-08-01
This study evaluates perceptual changes in speech production accuracy in six children (3-11 years) with moderate-to-severe speech impairment associated with cerebral palsy before, during, and after participation in a motor-speech intervention program (Prompts for Restructuring Oral Muscular Phonetic Targets). An A1BCA2 single subject research design was implemented. Subsequent to the baseline phase (phase A1), phase B targeted each participant's first intervention priority on the PROMPT motor-speech hierarchy. Phase C then targeted one level higher. Weekly speech probes were administered, containing trained and untrained words at the two levels of intervention, plus an additional level that served as a control goal. The speech probes were analysed for motor-speech-movement-parameters and perceptual accuracy. Analysis of the speech probe data showed all participants recorded a statistically significant change. Between phases A1-B and B-C 6/6 and 4/6 participants, respectively, recorded a statistically significant increase in performance level on the motor speech movement patterns targeted during the training of that intervention. The preliminary data presented in this study make a contribution to providing evidence that supports the use of a treatment approach aligned with dynamic systems theory to improve the motor-speech movement patterns and speech production accuracy in children with cerebral palsy.
Noise Hampers Children’s Expressive Word Learning
Riley, Kristine Grohne; McGregor, Karla K.
2013-01-01
Purpose To determine the effects of noise and speech style on word learning in typically developing school-age children. Method Thirty-one participants ages 9;0 (years; months) to 10;11 attempted to learn 2 sets of 8 novel words and their referents. They heard all of the words 13 times each within meaningful narrative discourse. Signal-to-noise ratio (noise vs. quiet) and speech style (plain vs. clear) were manipulated such that half of the children heard the new words in broadband white noise and half heard them in quiet; within those conditions, each child heard one set of words produced in a plain speech style and another set in a clear speech style. Results Children who were trained in quiet learned to produce the word forms more accurately than those who were trained in noise. Clear speech resulted in more accurate word form productions than plain speech, whether the children had learned in noise or quiet. Learning from clear speech in noise and plain speech in quiet produced comparable results. Conclusion Noise limits expressive vocabulary growth in children, reducing the quality of word form representation in the lexicon. Clear speech input can aid expressive vocabulary growth in children, even in noisy environments. PMID:22411494
NASA Astrophysics Data System (ADS)
Mirkovic, Bojana; Debener, Stefan; Jaeger, Manuela; De Vos, Maarten
2015-08-01
Objective. Recent studies have provided evidence that temporal envelope driven speech decoding from high-density electroencephalography (EEG) and magnetoencephalography recordings can identify the attended speech stream in a multi-speaker scenario. The present work replicated the previous high density EEG study and investigated the necessary technical requirements for practical attended speech decoding with EEG. Approach. Twelve normal hearing participants attended to one out of two simultaneously presented audiobook stories, while high density EEG was recorded. An offline iterative procedure eliminating those channels contributing the least to decoding provided insight into the necessary channel number and optimal cross-subject channel configuration. Aiming towards the future goal of near real-time classification with an individually trained decoder, the minimum duration of training data necessary for successful classification was determined by using a chronological cross-validation approach. Main results. Close replication of the previously reported results confirmed the method robustness. Decoder performance remained stable from 96 channels down to 25. Furthermore, for less than 15 min of training data, the subject-independent (pre-trained) decoder performed better than an individually trained decoder did. Significance. Our study complements previous research and provides information suggesting that efficient low-density EEG online decoding is within reach.
Moreno, Sylvain; Marques, Carlos; Santos, Andreia; Santos, Manuela; Castro, São Luís; Besson, Mireille
2009-03-01
We conducted a longitudinal study with 32 nonmusician children over 9 months to determine 1) whether functional differences between musician and nonmusician children reflect specific predispositions for music or result from musical training and 2) whether musical training improves nonmusical brain functions such as reading and linguistic pitch processing. Event-related brain potentials were recorded while 8-year-old children performed tasks designed to test the hypothesis that musical training improves pitch processing not only in music but also in speech. Following the first testing sessions nonmusician children were pseudorandomly assigned to music or to painting training for 6 months and were tested again after training using the same tests. After musical (but not painting) training, children showed enhanced reading and pitch discrimination abilities in speech. Remarkably, 6 months of musical training thus suffices to significantly improve behavior and to influence the development of neural processes as reflected in specific pattern of brain waves. These results reveal positive transfer from music to speech and highlight the influence of musical training. Finally, they demonstrate brain plasticity in showing that relatively short periods of training have strong consequences on the functional organization of the children's brain.
Computational validation of the motor contribution to speech perception.
Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio
2014-07-01
Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). Copyright © 2014 Cognitive Science Society, Inc.
Military and government applications of human-machine communication by voice.
Weinstein, C J
1995-01-01
This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will require a varying mix of advances in speech technology and in integration of the technology into applications environments. Applications that are described include (1) speech recognition and synthesis for mobile command and control; (2) speech processing for a portable multifunction soldier's computer; (3) speech- and language-based technology for naval combat team tactical training; (4) speech technology for command and control on a carrier flight deck; (5) control of auxiliary systems, and alert and warning generation, in fighter aircraft and helicopters; and (6) voice check-in, report entry, and communication for law enforcement agents or special forces. A phased approach for transfer of the technology into applications is advocated, where integration of applications systems is pursued in parallel with advanced research to meet future needs. Images Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6 PMID:7479718
Lohmander, Anette; Henriksson, Cecilia; Havstam, Christina
2010-12-01
The aim was to evaluate the effectiveness of electropalatography (EPG) in home training of persistent articulation errors in an 11-year-old Swedish girl born with isolated cleft palate. The /t/ and /s/ sounds were trained in a single subject design across behaviours during an eight month period using a portable training unit (PTU). Both EPG analysis and perceptual analysis showed an improvement in the production of /t/ and /s/ in words and sentences after therapy. Analysis of tongue-contact patterns showed that the participant had more normal articulatory patterns of /t/ and /s/ after just 2 months (after approximately 8 hours of training) respectively. No statistically significant transfer by means of intelligibility in connected speech was found. The present results show that EPG home training can be a sufficient method for treating persistent speech disorders associated with cleft palate. Methods for transfer from function (articulation) to activity (intelligibility) need to be explored.
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory.
Besson, Mireille; Chobert, Julie; Marie, Céline
2011-01-01
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing.
1996-01-01
These guidelines are an official statement of the American Speech-Language-Hearing Association. They provide guidance on the training, credentialing, use, and supervision of one category of support personnel in speech-language pathology: speech-language pathology assistants. Guidelines are not official standards of the Association. They were developed by the Task Force on Support Personnel: Dennis J. Arnst, Kenneth D. Barker, Ann Olsen Bird, Sheila Bridges, Linda S. DeYoung, Katherine Formichella, Nena M. Germany, Gilbert C. Hanke, Ann M. Horton, DeAnne M. Owre, Sidney L. Ramsey, Cathy A. Runnels, Brenda Terrell, Gerry W. Werven, Denise West, Patricia A. Mercaitis (consultant), Lisa C. O'Connor (consultant), Frederick T. Spahr (coordinator), Diane Paul-Brown (associate coordinator), Ann L. Carey (Executive Board liaison). The 1994 guidelines supersede the 1981 guidelines entitled, "Guidelines for the Employment and Utilization of Supportive Personnel" (Asha, March 1981, 165-169). Refer to the 1995 position statement on the "Training, Credentialing, Use, and Supervision of Support Personnel in Speech-Language Pathology" (Asha, 37 [Suppl. 14], 21).
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory
Besson, Mireille; Chobert, Julie; Marie, Céline
2011-01-01
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing. PMID:21738519
Jürgens, Tim; Brand, Thomas
2009-11-01
This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
Automatic lip reading by using multimodal visual features
NASA Astrophysics Data System (ADS)
Takahashi, Shohei; Ohya, Jun
2013-12-01
Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
Kumar, Prawin; Anil, Sam Publius; Grover, Vibhu; Sanju, Himanshu Kumar; Sinha, Sachchidanand
2017-02-01
Most trained musicians are actively involved in rigorous practice from several years to achieve a high level of proficiency. Therefore, musicians are best group to research changes or modification in brain structures and functions across several information processing systems. This study aimed to investigate cortical and subcortical processing of short duration speech stimuli in trained rock musicians and non-musicians. Two groups of participant (experimental and control groups) in the age range of 18-25 years were selected for the study. Experimental group includes 15 rock musicians who had minimum professional training of 5 years of rock music, and each member had to be a regular performer of rock music for at least 15 h a week. Further age-matched 15 participants who were not having any formal training of any music served as non-musicians, in the control group. The speech-evoked ABR (S-ABR) and speech-evoked ALLR (S-LLR) with short duration speech 'synthetic /da/' was elicited in both groups. Different measures were analyzed for S-ABR and S-LLR. For S-ABR, MANOVA revealed significant main effect of groups on latencies of wave V, wave A, and amplitude of wave V/A slope. Similarly, Kruskal-Wallis test showed significantly higher F 0 amplitude in rock musicians compared with non-musicians. For S-LLR, MANOVA showed statistically significant differences observed for latencies of wave P2 and N2 and amplitude measures of P2-N2 amplitude. This study indicated better neural processing of short duration speech stimuli at subcortical as well as cortical level among rock musicians when compared with non-musicians.
Auditory Learning in Children with Cochlear Implants
ERIC Educational Resources Information Center
Mishra, Srikanta K.; Boddupally, Shiva P.; Rayapati, Deeksha
2015-01-01
Purpose: The purpose of this study was to examine and characterize the training-induced changes in speech-in-noise perception in children with congenital deafness who have cochlear implants (CIs). Method: Twenty-seven children with congenital deafness who have CIs were studied. Eleven children with CIs were trained on a speech-in-noise task,…
ERIC Educational Resources Information Center
O'Donoghue, Cynthia R.; Dean-Claytor, Ashli
2008-01-01
Purpose: The number of children requiring dysphagia management in the schools is increasing. This article reports survey findings relative to speech-language pathologists' (SLPs') training and self-rated confidence to treat children with swallowing and feeding disorders in the schools. Method: Surveys were completed by 222 SLPs representing…
Business Communication Students Learn to Hear a Bad Speech Habit
ERIC Educational Resources Information Center
Bell, Reginald L.; Liang-Bell, Lei Paula; Deselle, Bettye
2006-01-01
Students were trained to perceive filled pauses (FP) as a bad speech habit. In a series of classroom sensitivity training activities, followed by students being rewarded to observe twenty minutes of live television from the public media, no differences between male and female Business Communication students was revealed. The practice of teaching…
Barcroft, Joe; Sommers, Mitchell S; Tye-Murray, Nancy; Mauzé, Elizabeth; Schroy, Catherine; Spehar, Brent
2011-11-01
Our long-term objective is to develop an auditory training program that will enhance speech recognition in those situations where patients most want improvement. As a first step, the current investigation trained participants using either a single talker or multiple talkers to determine if auditory training leads to transfer-appropriate gains. The experiment implemented a 2 × 2 × 2 mixed design, with training condition as a between-participants variable and testing interval and test version as repeated-measures variables. Participants completed a computerized six-week auditory training program wherein they heard either the speech of a single talker or the speech of six talkers. Training gains were assessed with single-talker and multi-talker versions of the Four-choice discrimination test. Participants in both groups were tested on both versions. Sixty-nine adult hearing-aid users were randomly assigned to either single-talker or multi-talker auditory training. Both groups showed significant gains on both test versions. Participants who trained with multiple talkers showed greater improvement on the multi-talker version whereas participants who trained with a single talker showed greater improvement on the single-talker version. Transfer-appropriate gains occurred following auditory training, suggesting that auditory training can be designed to target specific patient needs.
Elmer, Stefan; Klein, Carina; Kühnis, Jürg; Liem, Franziskus; Meyer, Martin; Jäncke, Lutz
2014-10-01
In this study, we used high-density EEG to evaluate whether speech and music expertise has an influence on the categorization of expertise-related and unrelated sounds. With this purpose in mind, we compared the categorization of speech, music, and neutral sounds between professional musicians, simultaneous interpreters (SIs), and controls in response to morphed speech-noise, music-noise, and speech-music continua. Our hypothesis was that music and language expertise will strengthen the memory representations of prototypical sounds, which act as a perceptual magnet for morphed variants. This means that the prototype would "attract" variants. This so-called magnet effect should be manifested by an increased assignment of morphed items to the trained category, by a reduced maximal slope of the psychometric function, as well as by differential event-related brain responses reflecting memory comparison processes (i.e., N400 and P600 responses). As a main result, we provide first evidence for a domain-specific behavioral bias of musicians and SIs toward the trained categories, namely music and speech. In addition, SIs showed a bias toward musical items, indicating that interpreting training has a generic influence on the cognitive representation of spectrotemporal signals with similar acoustic properties to speech sounds. Notably, EEG measurements revealed clear distinct N400 and P600 responses to both prototypical and ambiguous items between the three groups at anterior, central, and posterior scalp sites. These differential N400 and P600 responses represent synchronous activity occurring across widely distributed brain networks, and indicate a dynamical recruitment of memory processes that vary as a function of training and expertise.
Sensory-Cognitive Interaction in the Neural Encoding of Speech in Noise: A Review
Anderson, Samira; Kraus, Nina
2011-01-01
Background Speech-in-noise (SIN) perception is one of the most complex tasks faced by listeners on a daily basis. Although listening in noise presents challenges for all listeners, background noise inordinately affects speech perception in older adults and in children with learning disabilities. Hearing thresholds are an important factor in SIN perception, but they are not the only factor. For successful comprehension, the listener must perceive and attend to relevant speech features, such as the pitch, timing, and timbre of the target speaker’s voice. Here, we review recent studies linking SIN and brainstem processing of speech sounds. Purpose To review recent work that has examined the ability of the auditory brainstem response to complex sounds (cABR), which reflects the nervous system’s transcription of pitch, timing, and timbre, to be used as an objective neural index for hearing-in-noise abilities. Study Sample We examined speech-evoked brainstem responses in a variety of populations, including children who are typically developing, children with language-based learning impairment, young adults, older adults, and auditory experts (i.e., musicians). Data Collection and Analysis In a number of studies, we recorded brainstem responses in quiet and babble noise conditions to the speech syllable /da/ in all age groups, as well as in a variable condition in children in which /da/ was presented in the context of seven other speech sounds. We also measured speech-in-noise perception using the Hearing-in-Noise Test (HINT) and the Quick Speech-in-Noise Test (QuickSIN). Results Children and adults with poor SIN perception have deficits in the subcortical spectrotemporal representation of speech, including low-frequency spectral magnitudes and the timing of transient response peaks. Furthermore, auditory expertise, as engendered by musical training, provides both behavioral and neural advantages for processing speech in noise. Conclusions These results have implications for future assessment and management strategies for young and old populations whose primary complaint is difficulty hearing in background noise. The cABR provides a clinically applicable metric for objective assessment of individuals with SIN deficits, for determination of the biologic nature of disorders affecting SIN perception, for evaluation of appropriate hearing aid algorithms, and for monitoring the efficacy of auditory remediation and training. PMID:21241645
Horn, Nynne Thorup; Sørensen, Stine Derdau; McGregor, William B.; Wallentin, Mikkel
2015-01-01
Models of speech learning suggest that adaptations to foreign language sound categories take place within 6 to 12 months of exposure to a foreign language. Results from laboratory language training show effects of very targeted training on nonnative speech contrasts within only 1 to 4 weeks of training. Results from immersion studies are inconclusive, but some suggest continued effects on nonnative speech perception after 6 to 8 years of experience. We investigated this apparent discrepancy in the timing of adaptation to foreign speech sounds in a longitudinal study of foreign language learning. We examined two groups of Danish language officer cadets learning either Arabic (Modern Standard Arabic and Egyptian Arabic) or Dari (Afghan Farsi) through intensive multifaceted language training. We conducted two experiments (identification and discrimination) with the cadets who were tested four times: at the start (T0), after 3 weeks (T1), 6 months (T2), and 19 months (T3). We used a phonemic Arabic contrast (pharyngeal vs. glottal frication) and a phonemic Dari contrast (sibilant voicing) as stimuli. We observed an effect of learning on the Dari learners’ identification of the Dari stimuli already after 3 weeks of language training, which was sustained, but not improved, after 6 and 19 months. The changes in the Dari learners’ identification functions were positively correlated with their grades after 6 months. We observed no other learning effects at the group level. We discuss the results in the light of predictions from speech learning models. PMID:27551355
Teaching Research Methods in Communication Disorders: "A Problem-Based Learning Approach"
ERIC Educational Resources Information Center
Greenwald, Margaret L.
2006-01-01
A critical professional issue in speech-language pathology and audiology is the current shortage of researchers. In this context, the most effective methods for training graduate students in research must be identified and implemented. This article describes a problem-based approach to teaching research methods. In this approach, the instructor…
White-Schwoch, Travis; Woodruff Carr, Kali; Anderson, Samira; Strait, Dana L; Kraus, Nina
2013-11-06
Aging results in pervasive declines in nervous system function. In the auditory system, these declines include neural timing delays in response to fast-changing speech elements; this causes older adults to experience difficulty understanding speech, especially in challenging listening environments. These age-related declines are not inevitable, however: older adults with a lifetime of music training do not exhibit neural timing delays. Yet many people play an instrument for a few years without making a lifelong commitment. Here, we examined neural timing in a group of human older adults who had nominal amounts of music training early in life, but who had not played an instrument for decades. We found that a moderate amount (4-14 years) of music training early in life is associated with faster neural timing in response to speech later in life, long after training stopped (>40 years). We suggest that early music training sets the stage for subsequent interactions with sound. These experiences may interact over time to sustain sharpened neural processing in central auditory nuclei well into older age.
White-Schwoch, Travis; Carr, Kali Woodruff; Anderson, Samira; Strait, Dana L.
2013-01-01
Aging results in pervasive declines in nervous system function. In the auditory system, these declines include neural timing delays in response to fast-changing speech elements; this causes older adults to experience difficulty understanding speech, especially in challenging listening environments. These age-related declines are not inevitable, however: older adults with a lifetime of music training do not exhibit neural timing delays. Yet many people play an instrument for a few years without making a lifelong commitment. Here, we examined neural timing in a group of human older adults who had nominal amounts of music training early in life, but who had not played an instrument for decades. We found that a moderate amount (4–14 years) of music training early in life is associated with faster neural timing in response to speech later in life, long after training stopped (>40 years). We suggest that early music training sets the stage for subsequent interactions with sound. These experiences may interact over time to sustain sharpened neural processing in central auditory nuclei well into older age. PMID:24198359
Using the Pecha Kucha Speech to Analyze and Train Humor Skills
ERIC Educational Resources Information Center
Waisanen, Don
2018-01-01
Courses: Public speaking; communication courses requiring speeches. Objective: Students will learn how to apply humor principles to speeches through a slideshow method supportive of this goal, and to become more discerning about the possibilities and pitfalls of humorous communication.
Awan, S N
1993-03-01
This study details a comparison of the speaking F0 and intensity values of young male and female adults with and without vocal training, as well as the superimposition of the speaking F0 and intensity data upon phonetograms. Results indicated that (a) trained vocalists have similar mean speaking F0's than do untrained vocalists, but exhibit significantly greater speaking F0 ranges than do untrained vocalists; (b) trained vocalists are significantly greater mean intensity levels in speech, as well as significantly greater speaking intensity ranges than do untrained vocalists; (c) the mean speaking F0 for both trained and untrained vocalists was found in the vicinity of the 5-7% frequency level of the entire phonational F0 range (in Hz), equivalent to 12-16% of the phonational F0 range in semitones; (d) the overall speech area (mean speaking F0 and minimum and maximum speaking F0 peaks) was found in the lower 23-31% of the entire phonational F0 range (in semitones), with the untrained subjects utilizing the lower 25% of the phonational range (in semitones) and the trained subjects extending this area to the lower 28-31%; and (e) significant correlations were observed between the total intensity range and intensity range used in speech in trained female vocalists and between total F0 range and speaking F0 range in the combined trained male and female group. These results have important implications for the use of the phonetogram, as well as the clinical applicability of vocal training exercises in various speech and voice therapy cases.
Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis
NASA Astrophysics Data System (ADS)
Büchler, Michael; Allegro, Silvia; Launer, Stefan; Dillier, Norbert
2005-12-01
A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are inspired by auditory scene analysis are extracted from the sound signal. These features describe amplitude modulations, spectral profile, harmonicity, amplitude onsets, and rhythm. They are evaluated together with different pattern classifiers. Simple classifiers, such as rule-based and minimum-distance classifiers, are compared with more complex approaches, such as Bayes classifier, neural network, and hidden Markov model. Sounds from a large database are employed for both training and testing of the system. The achieved recognition rates are very high except for the class "speech in noise." Problems arise in the classification of compressed pop music, strongly reverberated speech, and tonal or fluctuating noises.
Across-site patterns of modulation detection: Relation to speech recognitiona)
Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.
2012-01-01
The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects’ speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant. PMID:22559376
Telephone speech comprehension in children with multichannel cochlear implants.
Aronson, L; Estienne, P; Arauz, S L; Pallante, S A
1997-11-01
Telephone speech comprehension is being evaluated in six prelingually deaf children implanted with the Nucleus 22 prosthesis fitted with the Speak strategy. All of them have had at least 1.5 years of experience with their implant. When the tests began, they had already had at least 2 months' experience with the same map in their speech processor. The children were trained in the use of the telephone as part of the rehabilitation program. None of them used it regularly but as a game that they found very entertaining. A special battery, the Bate-fon (batería para teléfono = telephone battery), was designed for training and evaluation purposes. It includes the five Spanish vowels in isolation, diphthongs, onomatopoetic animal voices, two-syllable, and three-syllable words. The tests were administered 1.5-2 years after the switch-on of their speech processor. Standard acoustic telephone coupling was used. The speech material was presented to the child on colored cards. Stimuli were presented twice. Children were informed when the response was incorrect. Averaged results indicated that the percentages of correct responses for all the speech material increase in the second presentation. All children have shown some degree of telephone communication abilities. As a result of the training, some of the children are using the telephone to communicate with their families.
NASA Astrophysics Data System (ADS)
Faizel Amri, Umar; Nur Wahidah Nik Hashim, Nik; Hazrin Hany Mohamad Hanif, Noor
2017-11-01
In the department of engineering, students are required to fulfil at least 80 percent of class attendance. Conventional method requires student to sign his/her initial on the attendance sheet. However, this method is prone to cheating by having another student signing for their fellow classmate that is absent. We develop our hypothesis according to a verse in the Holy Qur’an (95:4), “We have created men in the best of mould”. Based on the verse, we believe each psychological characteristic of human being is unique and thus, their speech characteristic should be unique. In this paper we present the development of speech biometric-based attendance system. The system requires user’s voice to be installed in the system as trained data and it is saved in the system for registration of the user. The following voice of the user will be the test data in order to verify with the trained data stored in the system. The system uses PSD (Power Spectral Density) and Transition Parameter as the method for feature extraction of the voices. Euclidean and Mahalanobis distances are used in order to verified the user’s voice. For this research, ten subjects of five females and five males were chosen to be tested for the performance of the system. The system performance in term of recognition rate is found to be 60% correct identification of individuals.
Automatic speech recognition using a predictive echo state network classifier.
Skowronski, Mark D; Harris, John G
2007-04-01
We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.
Effects of speech and language treatment on recovery from aphasia.
Shewan, C M; Kertesz, A
1984-11-01
Language recovery in aphasic patients who received one of three types of speech and language treatment was compared with that in aphasic patients who received no treatment. One hundred aphasic patients were followed from 2 to 4 weeks postonset for 1 year or until recovery, using a standardized test battery administered at systematic intervals. Both treatment methods provided by trained speech-language pathologists were efficacious, while the method provided by trained nonprofessionals approached statistical significance. Small group size prevented resolution of the question of whether one type of treatment was superior to another.
Oba, Sandra I.; Galvin, John J.; Fu, Qian-Jie
2014-01-01
Auditory training has been shown to significantly improve cochlear implant (CI) users’ speech and music perception. However, it is unclear whether post-training gains in performance were due to improved auditory perception or to generally improved attention, memory and/or cognitive processing. In this study, speech and music perception, as well as auditory and visual memory were assessed in ten CI users before, during, and after training with a non-auditory task. A visual digit span (VDS) task was used for training, in which subjects recalled sequences of digits presented visually. After the VDS training, VDS performance significantly improved. However, there were no significant improvements for most auditory outcome measures (auditory digit span, phoneme recognition, sentence recognition in noise, digit recognition in noise), except for small (but significant) improvements in vocal emotion recognition and melodic contour identification. Post-training gains were much smaller with the non-auditory VDS training than observed in previous auditory training studies with CI users. The results suggest that post-training gains observed in previous studies were not solely attributable to improved attention or memory, and were more likely due to improved auditory perception. The results also suggest that CI users may require targeted auditory training to improve speech and music perception. PMID:23516087
Loebach, Jeremy L; Pisoni, David B; Svirsky, Mario A
2009-12-01
The objective of this study was to assess whether training on speech processed with an eight-channel noise vocoder to simulate the output of a cochlear implant would produce transfer of auditory perceptual learning to the recognition of nonspeech environmental sounds, the identification of speaker gender, and the discrimination of talkers by voice. Twenty-four normal-hearing subjects were trained to transcribe meaningful English sentences processed with a noise vocoder simulation of a cochlear implant. An additional 24 subjects served as an untrained control group and transcribed the same sentences in their unprocessed form. All subjects completed pre- and post-test sessions in which they transcribed vocoded sentences to provide an assessment of training efficacy. Transfer of perceptual learning was assessed using a series of closed set, nonlinguistic tasks: subjects identified talker gender, discriminated the identity of pairs of talkers, and identified ecologically significant environmental sounds from a closed set of alternatives. Although both groups of subjects showed significant pre- to post-test improvements, subjects who transcribed vocoded sentences during training performed significantly better at post-test than those in the control group. Both groups performed equally well on gender identification and talker discrimination. Subjects who received explicit training on the vocoded sentences, however, performed significantly better on environmental sound identification than the untrained subjects. Moreover, across both groups, pre-test speech performance and, to a higher degree, post-test speech performance, were significantly correlated with environmental sound identification. For both groups, environmental sounds that were characterized as having more salient temporal information were identified more often than environmental sounds that were characterized as having more salient spectral information. Listeners trained to identify noise-vocoded sentences showed evidence of transfer of perceptual learning to the identification of environmental sounds. In addition, the correlation between environmental sound identification and sentence transcription indicates that subjects who were better able to use the degraded acoustic information to identify the environmental sounds were also better able to transcribe the linguistic content of novel sentences. Both trained and untrained groups performed equally well ( approximately 75% correct) on the gender-identification task, indicating that training did not have an effect on the ability to identify the gender of talkers. Although better than chance, performance on the talker discrimination task was poor overall ( approximately 55%), suggesting that either explicit training is required to discriminate talkers' voices reliably or that additional information (perhaps spectral in nature) not present in the vocoded speech is required to excel in such tasks. Taken together, the results suggest that although transfer of auditory perceptual learning with spectrally degraded speech does occur, explicit task-specific training may be necessary for tasks that cannot rely on temporal information alone.
Investigating Holistic Measures of Speech Prosody
ERIC Educational Resources Information Center
Cunningham, Dana Aliel
2012-01-01
Speech prosody is a multi-faceted dimension of speech which can be measured and analyzed in a variety of ways. In this study, the speech prosody of Mandarin L1 speakers, English L2 speakers, and English L1 speakers was assessed by trained raters who listened to sound clips of the speakers responding to a graph prompt and reading a short passage.…
Brazelton Neonatal Assessment for School Psychologists.
ERIC Educational Resources Information Center
Stoudt, Calvin L.
This speech addresses the "What,""Why," and "How" of Brazelton Neonatal Assessment Training for school psychologists. "What" concerns the Brazelton Neonatal Behavioral Assessment Scale, its administration, and what it assesses. Based on the best performance, the infant's score on this scale is scored in the…
How musical expertise shapes speech perception: evidence from auditory classification images.
Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel
2015-09-24
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
Analyzing Distributional Learning of Phonemic Categories in Unsupervised Deep Neural Networks
Räsänen, Okko; Nagamine, Tasha; Mesgarani, Nima
2017-01-01
Infants’ speech perception adapts to the phonemic categories of their native language, a process assumed to be driven by the distributional properties of speech. This study investigates whether deep neural networks (DNNs), the current state-of-the-art in distributional feature learning, are capable of learning phoneme-like representations of speech in an unsupervised manner. We trained DNNs with unlabeled and labeled speech and analyzed the activations of each layer with respect to the phones in the input segments. The analyses reveal that the emergence of phonemic invariance in DNNs is dependent on the availability of phonemic labeling of the input during the training. No increased phonemic selectivity of the hidden layers was observed in the purely unsupervised networks despite successful learning of low-dimensional representations for speech. This suggests that additional learning constraints or more sophisticated models are needed to account for the emergence of phone-like categories in distributional learning operating on natural speech. PMID:29359204
Tremblay, Marie-Claude; Sabourin, Laura
2012-11-01
The aim of the experiment was to determine whether language learning experience contributes to the development of enhanced speech perception abilities. Monolinguals, bilinguals and multilinguals were compared in their ability to discriminate a non-native contrast behaviorally using an AX task. The experiment was based on a "pre-test-training-post-test" design and performance was tested before and after receiving training on the voiceless aspirated dental/retroflex stop contrast. At post-test, participants were also tested on their ability to transfer training to a similar contrast (i.e., voiceless unaspirated dental/retroflex stop contrast). While no group differences were found at pre-test, analyses of the trained-on contrast at post-test revealed that multilinguals were more accurate than monolinguals and that both the multilingual and bilingual groups were more accurate than a control group that received no training. The results of the experiment not only suggest that multilinguals and bilinguals have enhanced speech perception abilities compared to monolinguals, but they also indicate that bi-/multilingualism helps develop superior learning abilities. This provides support for the idea that learning more than one language has positive effects on the cognitive development of an individual (e.g., Bialystok et al., 2004).
Musical Training during Early Childhood Enhances the Neural Encoding of Speech in Noise
ERIC Educational Resources Information Center
Strait, Dana L.; Parbery-Clark, Alexandra; Hittner, Emily; Kraus, Nina
2012-01-01
For children, learning often occurs in the presence of background noise. As such, there is growing desire to improve a child's access to a target signal in noise. Given adult musicians' perceptual and neural speech-in-noise enhancements, we asked whether similar effects are present in musically-trained children. We assessed the perception and…
ERIC Educational Resources Information Center
Eissa, Mourad Ali
2013-01-01
Phonological awareness is the ability to manipulate the individual speech sounds that make up connected speech. Little information is reported on the acquisition of phonological awareness in special populations. The purpose of this study was to explore the effectiveness of a phonological awareness training intervention on pre-reading skills of…
ERIC Educational Resources Information Center
Casey, Amanda Faith; Emes, Claudia
2011-01-01
Reduced respiratory muscle strength in individuals with Down syndrome (DS) may affect speech respiratory variables such as maximum phonation duration (MPD), initiation volume, and expired mean airflow. Researchers randomly assigned adolescents with DS (N = 28) to either 12 weeks of swim training (DS-ST) or a control group (DS-NT). Repeated…
ERIC Educational Resources Information Center
Carranza, Mario
2016-01-01
This paper addresses the process of transcribing and annotating spontaneous non-native speech with the aim of compiling a training corpus for the development of Computer Assisted Pronunciation Training (CAPT) applications, enhanced with Automatic Speech Recognition (ASR) technology. To better adapt ASR technology to CAPT tools, the recognition…
Actor Vocal Training for the Habilitation of Speech in Adolescent Users of Cochlear Implants
ERIC Educational Resources Information Center
Holt, Colleen M.; Dowell, Richard C.
2011-01-01
This study examined changes to speech production in adolescents with hearing impairment following a period of actor vocal training. In addition to vocal parameters, the study also investigated changes to psychosocial factors such as confidence, self-esteem, and anxiety. The group were adolescent users of cochlear implants (mean age at commencement…
Speech-language pathology students' self-reports on voice training: easier to understand or to do?
Lindhe, Christina; Hartelius, Lena
2009-01-01
The aim of the study was to describe the subjective ratings of the course 'Training of the student's own voice and speech', from a student-centred perspective. A questionnaire was completed after each of the six individual sessions. Six speech and language pathology (SLP) students rated how they perceived the practical exercises in terms of doing and understanding. The results showed that five of the six participants rated the exercises as significantly easier to understand than to do. The exercises were also rated as easier to do over time. Results are interpreted within in a theoretical framework of approaches to learning. The findings support the importance of both the physical and reflective aspects of the voice training process.
Development of the Russian matrix sentence test.
Warzybok, Anna; Zokoll, Melanie; Wardenga, Nina; Ozimek, Edward; Boboshko, Maria; Kollmeier, Birger
2015-01-01
To develop the Russian matrix sentence test for speech intelligibility measurements in noise. Test development included recordings, optimization of speech material, and evaluation to investigate the equivalency of the test lists and training. For each of the 500 test items, the speech intelligibility function, speech reception threshold (SRT: signal-to-noise ratio, SNR, that provides 50% speech intelligibility), and slope was obtained. The speech material was homogenized by applying level corrections. In evaluation measurements, speech intelligibility was measured at two fixed SNRs to compare list-specific intelligibility functions. To investigate the training effect and establish reference data, speech intelligibility was measured adaptively. Overall, 77 normal-hearing native Russian listeners. The optimization procedure decreased the spread in SRTs across words from 2.8 to 0.6 dB. Evaluation measurements confirmed that the 16 test lists were equivalent, with a mean SRT of -9.5 ± 0.2 dB and a slope of 13.8 ± 1.6%/dB. The reference SRT, -8.8 ± 0.8 dB for the open-set and -9.4 ± 0.8 dB for the closed-set format, increased slightly for noise levels above 75 dB SPL. The Russian matrix sentence test is suitable for accurate and reliable speech intelligibility measurements in noise.
Ferguson, Melanie A; Henshaw, Helen; Clark, Daniel P A; Moore, David R
2014-01-01
The aims of this study were to (i) evaluate the efficacy of phoneme discrimination training for hearing and cognitive abilities of adults aged 50 to 74 years with mild sensorineural hearing loss who were not users of hearing aids, and to (ii) determine participant compliance with a self-administered, computer-delivered, home- and game-based auditory training program. This study was a randomized controlled trial with repeated measures and crossover design. Participants were trained and tested over an 8- to 12-week period. One group (Immediate Training) trained during weeks 1 and 4. A second waitlist group (Delayed Training) did no training during weeks 1 and 4, but then trained during weeks 5 and 8. On-task (phoneme discrimination) and transferable outcome measures (speech perception, cognition, self-report of hearing disability) for both groups were obtained during weeks 0, 4, and 8, and for the Delayed Training group only at week 12. Robust phoneme discrimination learning was found for both groups, with the largest improvements in threshold shown for those with the poorest initial thresholds. Between weeks 1 and 4, the Immediate Training group showed moderate, significant improvements on self-report of hearing disability, divided attention, and working memory, specifically for conditions or situations that were more complex and therefore more challenging. Training did not result in consistent improvements in speech perception in noise. There was no evidence of any test-retest effects between weeks 1 and 4 for the Delayed Training group. Retention of benefit at 4 weeks post-training was shown for phoneme discrimination, divided attention, working memory, and self-report of hearing disability. Improved divided attention and reduced self-reported hearing difficulties were highly correlated. It was observed that phoneme discrimination training benefits some but not all people with mild hearing loss. Evidence presented here, together with that of other studies that used different training stimuli, suggests that auditory training may facilitate cognitive skills that index executive function and the self-perception of hearing difficulty in challenging situations. The development of cognitive skills may be more important than the development of sensory skills for improving communication and speech perception in everyday life. However, improvements were modest. Outcome measures need to be appropriately challenging to be sensitive to the effects of the relatively small amount of training performed.
Yoon, Sung Hoon; Nam, Kyoung Won; Yook, Sunhyun; Cho, Baek Hwan; Jang, Dong Pyo; Hong, Sung Hwa; Kim, In Young
2017-03-01
In an effort to improve hearing aid users' satisfaction, recent studies on trainable hearing aids have attempted to implement one or two environmental factors into training. However, it would be more beneficial to train the device based on the owner's personal preferences in a more expanded environmental acoustic conditions. Our study aimed at developing a trainable hearing aid algorithm that can reflect the user's individual preferences in a more extensive environmental acoustic conditions (ambient sound level, listening situation, and degree of noise suppression) and evaluated the perceptual benefit of the proposed algorithm. Ten normal hearing subjects participated in this study. Each subjects trained the algorithm to their personal preference and the trained data was used to record test sounds in three different settings to be utilized to evaluate the perceptual benefit of the proposed algorithm by performing the Comparison Mean Opinion Score test. Statistical analysis revealed that of the 10 subjects, four showed significant differences in amplification constant settings between the noise-only and speech-in-noise situation ( P <0.05) and one subject also showed significant difference between the speech-only and speech-in-noise situation ( P <0.05). Additionally, every subject preferred different β settings for beamforming in all different input sound levels. The positive findings from this study suggested that the proposed algorithm has potential to improve hearing aid users' personal satisfaction under various ambient situations.
I Karipidis, Iliana; Pleisch, Georgette; Röthlisberger, Martina; Hofstetter, Christoph; Dornbierer, Dario; Stämpfli, Philipp; Brem, Silvia
2017-02-01
Learning letter-speech sound correspondences is a major step in reading acquisition and is severely impaired in children with dyslexia. Up to now, it remains largely unknown how quickly neural networks adopt specific functions during audiovisual integration of linguistic information when prereading children learn letter-speech sound correspondences. Here, we simulated the process of learning letter-speech sound correspondences in 20 prereading children (6.13-7.17 years) at varying risk for dyslexia by training artificial letter-speech sound correspondences within a single experimental session. Subsequently, we acquired simultaneously event-related potentials (ERP) and functional magnetic resonance imaging (fMRI) scans during implicit audiovisual presentation of trained and untrained pairs. Audiovisual integration of trained pairs correlated with individual learning rates in right superior temporal, left inferior temporal, and bilateral parietal areas and with phonological awareness in left temporal areas. In correspondence, a differential left-lateralized parietooccipitotemporal ERP at 400 ms for trained pairs correlated with learning achievement and familial risk. Finally, a late (650 ms) posterior negativity indicating audiovisual congruency of trained pairs was associated with increased fMRI activation in the left occipital cortex. Taken together, a short (<30 min) letter-speech sound training initializes audiovisual integration in neural systems that are responsible for processing linguistic information in proficient readers. To conclude, the ability to learn grapheme-phoneme correspondences, the familial history of reading disability, and phonological awareness of prereading children account for the degree of audiovisual integration in a distributed brain network. Such findings on emerging linguistic audiovisual integration could allow for distinguishing between children with typical and atypical reading development. Hum Brain Mapp 38:1038-1055, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Schmidt-Naylor, Anna C.; Brady, Nancy C.
2017-01-01
Purpose We explored alphabet supplementation as an augmentative and alternative communication strategy for adults with minimal literacy. Study 1's goal was to teach onset-letter selection with spoken words and assess generalization to untaught words, demonstrating the alphabetic principle. Study 2 incorporated alphabet supplementation within a naming task and then assessed effects on speech intelligibility. Method Three men with intellectual disabilities (ID) and low speech intelligibility participated. Study 1 used a multiple-probe design, across three 20-word sets, to show that our computer-based training improved onset-letter selection. We also probed generalization to untrained words. Study 2 taught onset-letter selection for 30 new words chosen for functionality. Five listeners transcribed speech samples of the 30 words in 2 conditions: speech only and speech with alphabet supplementation. Results Across studies 1 and 2, participants demonstrated onset-letter selection for at least 90 words. Study 1 showed evidence of the alphabetic principle for some but not all word sets. In study 2, participants readily used alphabet supplementation, enabling listeners to understand twice as many words. Conclusions This is the first demonstration of alphabet supplementation in individuals with ID and minimal literacy. The large number of words learned holds promise both for improving communication and providing a foundation for improved literacy. PMID:28474087
Wang, Ling; Liu, Shao-ming; Liu, Min; Li, Bao-jun; Hui, Zhen-liang; Gao, Xiang
2011-06-01
To assess the clinical efficacy on post-stroke speech disorder treated with acupuncture and psychological intervention combined with rehabilitation training. The multi-central randomized controlled study was adopted. One hundred and twenty cases of brain stroke were divided into a speech rehabilitation group (control group), a speech rehabilitation plus acupuncture group (observation group 1) and a speech rehabilitation plus acupuncture combined with psychotherapy group (observation group 2), 40 cases in each one. The rehabilitation training was conducted by a professional speech trainer. In acupuncture treatment, speech function area in scalp acupuncture, Jinjin (EX-HN 12) and Yuye (EX-HN 13) in tongue acupuncture and Lianquan (CV 23) were the basic points. The supplementary points were selected according to syndrome differentiation. Bloodletting method was used in combination with acupuncture. Psychotherapy was applied by the physician in psychiatric department of the hospital. The corresponding programs were used in each group. Examination of Aphasia of Chinese of Beijing Hospital was adopted to observe the oral speech expression, listening comprehension and reading and writing ability. After 21-day treatment, the total effective rate was 92.5% (37/40) in observation group 1, 97.5% (39/40) in observation group 2 and 87.5% (35/40) in control group. The efficacies were similar in comparison among 3 groups. The remarkable effective rate was 15.0% (6/40) in observation group 1, 50.0% (20/40) in observation group 2 and 2.5% (1/40) in control group. The result in observation group 2 was superior to the other two groups (P<0.01, P<0.001). In comparison of the improvements of oral expression, listening comprehension, reading and writing ability, all of the 3 groups had achieved the improvements to different extents after treatment (P<0.01, P<0.001). The results in observation group 2 were better than those in observation group 1 and control group. Acupuncture and psychological intervention combined with rehabilitation training is obviously advantageous in the treatment of post-stroke speech disorder.
Music training and speech perception: a gene-environment interaction.
Schellenberg, E Glenn
2015-03-01
Claims of beneficial side effects of music training are made for many different abilities, including verbal and visuospatial abilities, executive functions, working memory, IQ, and speech perception in particular. Such claims assume that music training causes the associations even though children who take music lessons are likely to differ from other children in music aptitude, which is associated with many aspects of speech perception. Music training in childhood is also associated with cognitive, personality, and demographic variables, and it is well established that IQ and personality are determined largely by genetics. Recent evidence also indicates that the role of genetics in music aptitude and music achievement is much larger than previously thought. In short, music training is an ideal model for the study of gene-environment interactions but far less appropriate as a model for the study of plasticity. Children seek out environments, including those with music lessons, that are consistent with their predispositions; such environments exaggerate preexisting individual differences. © 2015 New York Academy of Sciences.
Perceptual context effects of speech and nonspeech sounds: the role of auditory categories.
Aravamudhan, Radhika; Lotto, Andrew J; Hawks, John W
2008-09-01
Williams [(1986). "Role of dynamic information in the perception of coarticulated vowels," Ph.D. thesis, University of Connecticut, Standford, CT] demonstrated that nonspeech contexts had no influence on pitch judgments of nonspeech targets, whereas context effects were obtained when instructed to perceive the sounds as speech. On the other hand, Holt et al. [(2000). "Neighboring spectral content influences vowel identification," J. Acoust. Soc. Am. 108, 710-722] showed that nonspeech contexts were sufficient to elicit context effects in speech targets. The current study was to test a hypothesis that could explain the varying effectiveness of nonspeech contexts: Context effects are obtained only when there are well-established perceptual categories for the target stimuli. Experiment 1 examined context effects in speech and nonspeech signals using four series of stimuli: steady-state vowels that perceptually spanned from /inverted ohm/-/I/ in isolation and in the context of /w/ (with no steady-state portion) and two nonspeech sine-wave series that mimicked the acoustics of the speech series. In agreement with previous work context effects were obtained for speech contexts and targets but not for nonspeech analogs. Experiment 2 tested predictions of the hypothesis by testing for nonspeech context effects after the listeners had been trained to categorize the sounds. Following training, context-dependent categorization was obtained for nonspeech stimuli in the training group. These results are presented within a general perceptual-cognitive framework for speech perception research.
Perceptual context effects of speech and nonspeech sounds: The role of auditory categories
Aravamudhan, Radhika; Lotto, Andrew J.; Hawks, John W.
2008-01-01
Williams [(1986). “Role of dynamic information in the perception of coarticulated vowels,” Ph.D. thesis, University of Connecticut, Standford, CT] demonstrated that nonspeech contexts had no influence on pitch judgments of nonspeech targets, whereas context effects were obtained when instructed to perceive the sounds as speech. On the other hand, Holt et al. [(2000). “Neighboring spectral content influences vowel identification,” J. Acoust. Soc. Am. 108, 710–722] showed that nonspeech contexts were sufficient to elicit context effects in speech targets. The current study was to test a hypothesis that could explain the varying effectiveness of nonspeech contexts: Context effects are obtained only when there are well-established perceptual categories for the target stimuli. Experiment 1 examined context effects in speech and nonspeech signals using four series of stimuli: steady-state vowels that perceptually spanned from ∕ʊ∕-∕ɪ∕ in isolation and in the context of ∕w∕ (with no steady-state portion) and two nonspeech sine-wave series that mimicked the acoustics of the speech series. In agreement with previous work context effects were obtained for speech contexts and targets but not for nonspeech analogs. Experiment 2 tested predictions of the hypothesis by testing for nonspeech context effects after the listeners had been trained to categorize the sounds. Following training, context-dependent categorization was obtained for nonspeech stimuli in the training group. These results are presented within a general perceptual-cognitive framework for speech perception research. PMID:19045660
Siupsinskiene, Nora; Lycke, Hugo
2011-07-01
This prospective cross-sectional study examines the effects of voice training on vocal capabilities in vocally healthy age and gender differentiated groups measured by voice range profile (VRP) and speech range profile (SRP). Frequency and intensity measurements of the VRP and SRP using standard singing and speaking voice protocols were derived from 161 trained choir singers (21 males, 59 females, and 81 prepubescent children) and from 188 nonsingers (38 males, 89 females, and 61 children). When compared with nonsingers, both genders of trained adult and child singers exhibited increased mean pitch range, highest frequency, and VRP area in high frequencies (P<0.05). Female singers and child singers also showed significantly increased mean maximum voice intensity, intensity range, and total VRP area. The logistic regression analysis showed that VRP pitch range, highest frequency, maximum voice intensity, and maximum-minimum intensity range, and SRP slope of speaking curve were the key predictors of voice training. Age, gender, and voice training differentiated norms of VRP and SRP parameters are presented. Significant positive effect of voice training on vocal capabilities, mostly singing voice, was confirmed. The presented norms for trained singers, with key parameters differentiated by gender and age, are suggested for clinical practice of otolaryngologists and speech-language pathologists. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Clinical Linguistics--Retrospect and Prospect.
ERIC Educational Resources Information Center
Grunwell, Pamela
In the past 20 years, linguistics has gained a prominent position in speech and language pathology in Britain, evolving into a new field, clinical linguistics. It includes three related areas of activity: training of speech pathologists/therapists; professional practice; and research. Linguistics and speech/language pathology have developed as…
OPERANT PROCEDURES IN REMEDIAL SPEECH AND LANGUAGE TRAINING.
ERIC Educational Resources Information Center
MACAULAY, BARBARA D., ED.; SLOANE, HOWARD N., JR., ED.
INTENDED FOR SPEECH THERAPISTS, THEACHERS OF THE MENTALLY RETARDED, AND OTHERS IN SPECIAL EDUCATION, THE COLLECTION CONTAINS REPORTS BY VARIOUS AUTHORS ON SPEECH AND LANGUAGE MODIFICATION ATTEMPTS THAT HAVE UTILIZED OPERANT CONDITIONING PROCEDURES, AS WELL AS SEVERAL PAPERS ON BACKGROUND TOPICS. BACKGROUND PAPERS ON TEACHING TREAT ENVIRONMENTAL…
Varying irrelevant phonetic features hinders learning of the feature being trained.
Antoniou, Mark; Wong, Patrick C M
2016-01-01
Learning to distinguish nonnative words that differ in a critical phonetic feature can be difficult. Speech training studies typically employ methods that explicitly direct the learner's attention to the relevant nonnative feature to be learned. However, studies on vision have demonstrated that perceptual learning may occur implicitly, by exposing learners to stimulus features, even if they are irrelevant to the task, and it has recently been suggested that this task-irrelevant perceptual learning framework also applies to speech. In this study, subjects took part in a seven-day training regimen to learn to distinguish one of two nonnative features, namely, voice onset time or lexical tone, using explicit training methods consistent with most speech training studies. Critically, half of the subjects were exposed to stimuli that varied not only in the relevant feature, but in the irrelevant feature as well. The results showed that subjects who were trained with stimuli that varied in the relevant feature and held the irrelevant feature constant achieved the best learning outcomes. Varying both features hindered learning and generalization to new stimuli.
A lab-controlled simulation of a letter-speech sound binding deficit in dyslexia.
Aravena, Sebastián; Snellings, Patrick; Tijms, Jurgen; van der Molen, Maurits W
2013-08-01
Dyslexic and non-dyslexic readers engaged in a short training aimed at learning eight basic letter-speech sound correspondences within an artificial orthography. We examined whether a letter-speech sound binding deficit is behaviorally detectable within the initial steps of learning a novel script. Both letter knowledge and word reading ability within the artificial script were assessed. An additional goal was to investigate the influence of instructional approach on the initial learning of letter-speech sound correspondences. We assigned children from both groups to one of three different training conditions: (a) explicit instruction, (b) implicit associative learning within a computer game environment, or (c) a combination of (a) and (b) in which explicit instruction is followed by implicit learning. Our results indicated that dyslexics were outperformed by the controls on a time-pressured binding task and a word reading task within the artificial orthography, providing empirical support for the view that a letter-speech sound binding deficit is a key factor in dyslexia. A combination of explicit instruction and implicit techniques proved to be a more powerful tool in the initial teaching of letter-sound correspondences than implicit training alone. Copyright © 2013 Elsevier Inc. All rights reserved.
[Modeling developmental aspects of sensorimotor control of speech production].
Kröger, B J; Birkholz, P; Neuschaefer-Rube, C
2007-05-01
Detailed knowledge of the neurophysiology of speech acquisition is important for understanding the developmental aspects of speech perception and production and for understanding developmental disorders of speech perception and production. A computer implemented neural model of sensorimotor control of speech production was developed. The model is capable of demonstrating the neural functions of different cortical areas during speech production in detail. (i) Two sensory and two motor maps or neural representations and the appertaining neural mappings or projections establish the sensorimotor feedback control system. These maps and mappings are already formed and trained during the prelinguistic phase of speech acquisition. (ii) The feedforward sensorimotor control system comprises the lexical map (representations of sounds, syllables, and words of the first language) and the mappings from lexical to sensory and to motor maps. The training of the appertaining mappings form the linguistic phase of speech acquisition. (iii) Three prelinguistic learning phases--i. e. silent mouthing, quasi stationary vocalic articulation, and realisation of articulatory protogestures--can be defined on the basis of our simulation studies using the computational neural model. These learning phases can be associated with temporal phases of prelinguistic speech acquisition obtained from natural data. The neural model illuminates the detailed function of specific cortical areas during speech production. In particular it can be shown that developmental disorders of speech production may result from a delayed or incorrect process within one of the prelinguistic learning phases defined by the neural model.
Krzok, Franziska; Rieger, Verena; Niemann, Katharina; Nobis-Bosch, Ruth; Radermacher, Irmgard; Huber, Walter; Willmes, Klaus; Abel, Stefanie
2018-03-01
SAPS-'Sprachsystematisches Aphasiescreening'-is a novel language-systematic aphasia screening developed for the German language, which already had been positively evaluated. It offers a fast assessment of modality-specific psycholinguistic components at different levels of complexity and the derivation of impairment-based treatment foci from the individual performance profile. However, SAPS has not yet been evaluated in combination with the new SAPS-based treatment. To replicate the practicality of SAPS and to investigate the effectiveness of a SAPS-based face-to-face therapy combined with computerised home training in a feasibility study. To examine the soundness of the treatment design, to determine treatment-induced changes in patient performance as measured by SAPS, to assess parallel changes in communicative abilities, and to differentiate therapy effects achieved by face-to-face therapy versus add-on effects achieved by later home training. Sixteen participants with post-stroke aphasia (PWAs) were included into the study. They were administered the SAPS and communicative testing before and after the treatment regimen. Each PWA received one therapy session followed by home training per day, with the individual treatment foci being determined according to initial SAPS profile, and duration of treatment and possible change of focus dependent on performance assessed by continuous therapy monitoring. The combination of therapy and home training based on the SAPS was effective for all participants. We showed significant improvements for impairment-based SAPS performance and, with high inter-individual variability, in everyday communication. These two main targets of speech and language therapy were correlated and SAPS improvements after therapy were significantly higher than after home training. SAPS offers the assessment of an individual performance profile in order to derive sufficiently diversified, well-founded and specific treatment foci and to follow up changes in performance. The appending treatment regimen has shown to be effective for our participants. Thus, the study revealed feasibility of our approach. © 2017 Royal College of Speech and Language Therapists.
Speaker verification using committee neural networks.
Reddy, Narender P; Buch, Ojas A
2003-10-01
Security is a major problem in web based access or remote access to data bases. In the present study, the technique of committee neural networks was developed for speech based speaker verification. Speech data from the designated speaker and several imposters were obtained. Several parameters were extracted in the time and frequency domains, and fed to neural networks. Several neural networks were trained and the five best performing networks were recruited into the committee. The committee decision was based on majority voting of the member networks. The committee opinion was evaluated with further testing data. The committee correctly identified the designated speaker in (50 out of 50) 100% of the cases and rejected imposters in (150 out of 150) 100% of the cases. The committee decision was not unanimous in majority of the cases tested.
Building Searchable Collections of Enterprise Speech Data.
ERIC Educational Resources Information Center
Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret
The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
The Learning of Complex Speech Act Behaviour.
ERIC Educational Resources Information Center
Olshtain, Elite; Cohen, Andrew
1990-01-01
Pre- and posttraining measurement of adult English-as-a-Second-Language learners' (N=18) apology speech act behavior found no clear-cut quantitative improvement after training, although there was an obvious qualitative approximation of native-like speech act behavior in terms of types of intensification and downgrading, choice of strategy, and…
Effectiveness of Speech Therapy in Adults with Intellectual Disabilities
ERIC Educational Resources Information Center
Terband, Hayo; Coppens-Hofman, Marjolein C.; Reffeltrath, Maaike; Maassen, Ben A. M.
2018-01-01
Background: This study investigated the effect of speech therapy in a heterogeneous group of adults with intellectual disability. Method: Thirty-six adults with mild and moderate intellectual disabilities (IQs 40-70; age 18-40 years) with reported poor speech intelligibility received tailored training in articulation and listening skills delivered…
Techniques for decoding speech phonemes and sounds: A concept
NASA Technical Reports Server (NTRS)
Lokerson, D. C.; Holby, H. G.
1975-01-01
Techniques studied involve conversion of speech sounds into machine-compatible pulse trains. (1) Voltage-level quantizer produces number of output pulses proportional to amplitude characteristics of vowel-type phoneme waveforms. (2) Pulses produced by quantizer of first speech formants are compared with pulses produced by second formants.
Perceptual Adaptation to Sinewave-Vocoded Speech across Languages
ERIC Educational Resources Information Center
Bent, Tessa; Loebach, Jeremy L.; Phillips, Lawrence; Pisoni, David B.
2011-01-01
Listeners rapidly adapt to many forms of degraded speech. What level of information drives this adaptation, however, remains unresolved. The current study exposed listeners to sinewave-vocoded speech in one of three languages, which manipulated the type of information shared between the training languages (German, Mandarin, or English) and the…
A Review of Training Opportunities for Singing Voice Rehabilitation Specialists.
Gerhard, Julia
2016-05-01
Training opportunities for singing voice rehabilitation specialists are growing and changing. This is happening despite a lack of agreed-on guidelines or an accredited certification acknowledged by the governing bodies in the fields of speech-language pathology and vocal pedagogy, the American Speech-Language Hearing Association and the National Association of Teachers of Singing, respectively. The roles of the speech-language pathologist, the singing teacher, and the person who bridges this gap, the singing voice rehabilitation specialist, are now becoming better defined and more common among the voice care community. To that end, this article aims to review the current opportunities for training in the field of singing voice rehabilitation. A review of available university training programs, private training programs and mentorships, clinical fellowships, professional organizations, conferences, vocal training across genres, and self-study opportunities was conducted. All institutional listings are with permission from program leaders. Although many avenues are available for training of singing voice rehabilitation specialists, there is no accredited comprehensive training program at this point. This review gathers information on current training opportunities from across various modalities. The listings are not intended to be comprehensive but rather representative of possibilities for interested practitioners. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Tactual Hearing Experiment with Deaf and Hearing Subjects. Research Bulletin Vol. 14, No. 5.
ERIC Educational Resources Information Center
Engelmann, Siegfried; Rosov, Robert J.
Four hearing Ss (20- to 30-years old) and 4 deaf Ss (8- to 14-years old) trained in speech discrimination using a vocoder (a device which converts speech into tactual vibrations received through the skin). Hearing Ss (artificially deafened by white noise transmitted through headphones) received from 20 to 80 hours of training in isolated words…
ERIC Educational Resources Information Center
San Jose State Univ., CA.
This final report discusses the activities and outcomes of a project designed to train specialists to work collaboratively across settings to improve the outcomes of young children with language and learning disabilities. It provided education for trainees that led to a Masters degree in speech-language pathology with a specialty in early…
Pitch and Time Processing in Speech and Tones: The Effects of Musical Training and Attention
ERIC Educational Resources Information Center
Sares, Anastasia G.; Foster, Nicholas E. V.; Allen, Kachina; Hyde, Krista L.
2018-01-01
Purpose: Musical training is often linked to enhanced auditory discrimination, but the relative roles of pitch and time in music and speech are unclear. Moreover, it is unclear whether pitch and time processing are correlated across individuals and how they may be affected by attention. This study aimed to examine pitch and time processing in…
Multilingual Data Selection for Low Resource Speech Recognition
2016-09-12
Figure 1: Identification of language clusters using scores from an LID system training languages used in the Base and OP1 evaluation periods of the Babel...the posterior scores over frames. For a set of languages that are used to train the lan- guage identification (LID) network, pairs of languages that...which are combined during test time to produce 10 dimensional language 3854 Figure 3: Identification of language clusters using scores from individually
Post interaural neural net-based vowel recognition
NASA Astrophysics Data System (ADS)
Jouny, Ismail I.
2001-10-01
Interaural head related transfer functions are used to process speech signatures prior to neural net based recognition. Data representing the head related transfer function of a dummy has been collected at MIT and made available on the Internet. This data is used to pre-process vowel signatures to mimic the effects of human ear on speech perception. Signatures representing various vowels of the English language are then presented to a multi-layer perceptron trained using the back propagation algorithm for recognition purposes. The focus in this paper is to assess the effects of human interaural system on vowel recognition performance particularly when using a classification system that mimics the human brain such as a neural net.
Auditory verbal habilitation is associated with improved outcome for children with cochlear implant.
Percy-Smith, Lone; Tønning, Tenna Lindbjerg; Josvassen, Jane Lignel; Mikkelsen, Jeanette Hølledig; Nissen, Lena; Dieleman, Eveline; Hallstrøm, Maria; Cayé-Thomasen, Per
2018-01-01
To study the impact of (re)habilitation strategy on speech-language outcomes for early, cochlear implanted children enrolled in different intervention programmes post implant. Data relate to a total of 130 children representing two pediatric cohorts consisting of 94 and 36 subjects, respectively. The two cohorts had different speech and language intervention following cochlear implantation, i.e. standard habilitation vs. auditory verbal (AV) intervention. Three tests of speech and language were applied covering language areas of receptive and productive vocabulary and language understanding. Children in AV intervention outperformed children in standard habilitation on all three tests of speech and language. When effect of intervention was adjusted with other covariates children in AV intervention still had higher odds at performing at age equivalent speech and language levels. Compared to standard intervention, AV intervention is associated with improved outcome for children with CI. Based on this finding, we recommend that all children with HI should be offered this intervention and it is, therefore, highly relevant when National boards of Health and Social Affairs recommend basing the habilitation on principles from AV practice. It should be noted, that a minority of children use spoken language with sign support. For this group it is, however, still important that educational services provide auditory skills training.
1985-10-01
speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised
Real-time classification of auditory sentences using evoked cortical activity in humans
NASA Astrophysics Data System (ADS)
Moses, David A.; Leonard, Matthew K.; Chang, Edward F.
2018-06-01
Objective. Recent research has characterized the anatomical and functional basis of speech perception in the human auditory cortex. These advances have made it possible to decode speech information from activity in brain regions like the superior temporal gyrus, but no published work has demonstrated this ability in real-time, which is necessary for neuroprosthetic brain-computer interfaces. Approach. Here, we introduce a real-time neural speech recognition (rtNSR) software package, which was used to classify spoken input from high-resolution electrocorticography signals in real-time. We tested the system with two human subjects implanted with electrode arrays over the lateral brain surface. Subjects listened to multiple repetitions of ten sentences, and rtNSR classified what was heard in real-time from neural activity patterns using direct sentence-level and HMM-based phoneme-level classification schemes. Main results. We observed single-trial sentence classification accuracies of 90% or higher for each subject with less than 7 minutes of training data, demonstrating the ability of rtNSR to use cortical recordings to perform accurate real-time speech decoding in a limited vocabulary setting. Significance. Further development and testing of the package with different speech paradigms could influence the design of future speech neuroprosthetic applications.
A comparative intelligibility study of single-microphone noise reduction algorithms.
Hu, Yi; Loizou, Philipos C
2007-09-01
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Improving Acoustic Models by Watching Television
NASA Technical Reports Server (NTRS)
Witbrock, Michael J.; Hauptmann, Alexander G.
1998-01-01
Obtaining sufficient labelled training data is a persistent difficulty for speech recognition research. Although well transcribed data is expensive to produce, there is a constant stream of challenging speech data and poor transcription broadcast as closed-captioned television. We describe a reliable unsupervised method for identifying accurately transcribed sections of these broadcasts, and show how these segments can be used to train a recognition system. Starting from acoustic models trained on the Wall Street Journal database, a single iteration of our training method reduced the word error rate on an independent broadcast television news test set from 62.2% to 59.5%.
The influence of speaking rate on nasality in the speech of hearing-impaired individuals.
Dwyer, Claire H; Robb, Michael P; O'Beirne, Greg A; Gilbert, Harvey R
2009-10-01
The purpose of this study was to determine whether deliberate increases in speaking rate would serve to decrease the amount of nasality in the speech of severely hearing-impaired individuals. The participants were 11 severely to profoundly hearing-impaired students, ranging in age from 12 to 19 years (M = 16 years). Each participant provided a baseline speech sample (R1) followed by 3 training sessions during which participants were trained to increase their speaking rate. Following the training sessions, a second speech sample was obtained (R2). Acoustic and perceptual analyses of the speech samples obtained at R1 and R2 were undertaken. The acoustic analysis focused on changes in first (F(1)) and second (F(2)) formant frequency and formant bandwidths. The perceptual analysis involved listener ratings of the speech samples (at R1 and R2) for perceived nasality. Findings indicated a significant increase in speaking rate at R2. In addition, significantly narrower F(2) bandwidth and lower perceptual rating scores of nasality were obtained at R2 across all participants, suggesting a decrease in nasality as speaking rate increases. The nasality demonstrated by hearing-impaired individuals is amenable to change when speaking rate is increased. The influences of speaking rate changes on the perception and production of nasality in hearing-impaired individuals are discussed.
Mapping the cortical representation of speech sounds in a syllable repetition task.
Markiewicz, Christopher J; Bohland, Jason W
2016-11-01
Speech repetition relies on a series of distributed cortical representations and functional pathways. A speaker must map auditory representations of incoming sounds onto learned speech items, maintain an accurate representation of those items in short-term memory, interface that representation with the motor output system, and fluently articulate the target sequence. A "dorsal stream" consisting of posterior temporal, inferior parietal and premotor regions is thought to mediate auditory-motor representations and transformations, but the nature and activation of these representations for different portions of speech repetition tasks remains unclear. Here we mapped the correlates of phonetic and/or phonological information related to the specific phonemes and syllables that were heard, remembered, and produced using a series of cortical searchlight multi-voxel pattern analyses trained on estimates of BOLD responses from individual trials. Based on responses linked to input events (auditory syllable presentation), predictive vowel-level information was found in the left inferior frontal sulcus, while syllable prediction revealed significant clusters in the left ventral premotor cortex and central sulcus and the left mid superior temporal sulcus. Responses linked to output events (the GO signal cueing overt production) revealed strong clusters of vowel-related information bilaterally in the mid to posterior superior temporal sulcus. For the prediction of onset and coda consonants, input-linked responses yielded distributed clusters in the superior temporal cortices, which were further informative for classifiers trained on output-linked responses. Output-linked responses in the Rolandic cortex made strong predictions for the syllables and consonants produced, but their predictive power was reduced for vowels. The results of this study provide a systematic survey of how cortical response patterns covary with the identity of speech sounds, which will help to constrain and guide theoretical models of speech perception, speech production, and phonological working memory. Copyright © 2016 Elsevier Inc. All rights reserved.
Efficacy of continuous positive airway pressure for treatment of hypernasality.
Kuehn, David P; Imrey, Peter B; Tomes, Lucrezia; Jones, David L; O'Gara, Mary M; Seaver, Earl J; Smith, Bonnie E; Van Demark, D R; Wachtel, Jayne M
2002-05-01
To determine whether speech hypernasality in subjects born with cleft palate can be reduced by graded velopharyngeal resistance training against continuous positive airway pressure (CPAP). Pretreatment versus immediate posttreatment comparison study. Eight university and hospital speech clinics. Forty-three subjects born with cleft palate, aged 3 years 10 months to 23 years 8 months, diagnosed with speech hypernasality. Eight weeks of 6 days per week in-home speech exercise sessions, increasing from 10 to 24 minutes, speaking against transnasal CPAP increasing from 4 to 8.5 cm H(2)0. MAIN OUTCOME MEASURES Pretreatment to immediate posttherapy change in perceptual nasality score based on blinded comparisons of subjects' speech samples to standard reference samples by six expert clinician-investigators. Participating clinical centers treated from two to nine eligible subjects, and results differed significantly across centers (interaction p =.004). Overall, there was statistically significant reduction in mean nasality score after 8 weeks of CPAP therapy, whether weighted equally across patients (mean reduction = 0.20 units on a scale of 1.0 to 7.0, p =.016) or across clinical centers (mean = 0.19, p =.046). This change was about one-sixth the maximum possible reduction from pretreatment. Nine patients showed reductions of at least half the maximum possible, but hypernasality of eight patients increased at least 30% above pretreatment level. Most improvement was seen during the second month when therapy was more intense (p =.045 for nonlinearity). No interactions with age or sex were detected. Patients receiving 8 weeks of velopharyngeal CPAP resistance training showed a net overall reduction in speech hypernasality, although response was quite variable across patients and clinical centers. The net reduction in hypernasality is not readily explainable by random variability, subject maturation, placebo effect, or regression to the mean. CPAP appears capable of substantially reducing speech hypernasality for some subjects with cleft palate.
NASA Technical Reports Server (NTRS)
Olorenshaw, Lex; Trawick, David
1991-01-01
The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.
ERIC Educational Resources Information Center
Natalicio, Diana S.; Williams, Frederick
This paper reports the attempt to see which characteristics of the speech of Black and Mexican American children would be reliably evaluated by experts specializing in dialect study. Presumably, if selected characteristics were evaluated with consistency and bases for these evaluations were given, such results could serve in training teachers to…
Robust recognition of loud and Lombard speech in the fighter cockpit environment
NASA Astrophysics Data System (ADS)
Stanton, Bill J., Jr.
1988-08-01
There are a number of challenges associated with incorporating speech recognition technology into the fighter cockpit. One of the major problems is the wide range of variability in the pilot's voice. That can result from changing levels of stress and workload. Increasing the training set to include abnormal speech is not an attractive option because of the innumerable conditions that would have to be represented and the inordinate amount of time to collect such a training set. A more promising approach is to study subsets of abnormal speech that have been produced under controlled cockpit conditions with the purpose of characterizing reliable shifts that occur relative to normal speech. Such was the initiative of this research. Analyses were conducted for 18 features on 17671 phoneme tokens across eight speakers for normal, loud, and Lombard speech. It was discovered that there was a consistent migration of energy in the sonorants. This discovery of reliable energy shifts led to the development of a method to reduce or eliminate these shifts in the Euclidean distances between LPC log magnitude spectra. This combination significantly improved recognition performance of loud and Lombard speech. Discrepancies in recognition error rates between normal and abnormal speech were reduced by approximately 50 percent for all eight speakers combined.
Fels, S S; Hinton, G E
1997-01-01
Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Yeend, Ingrid; Beach, Elizabeth Francis; Sharma, Mridula; Dillon, Harvey
2017-09-01
Recent animal research has shown that exposure to single episodes of intense noise causes cochlear synaptopathy without affecting hearing thresholds. It has been suggested that the same may occur in humans. If so, it is hypothesized that this would result in impaired encoding of sound and lead to difficulties hearing at suprathreshold levels, particularly in challenging listening environments. The primary aim of this study was to investigate the effect of noise exposure on auditory processing, including the perception of speech in noise, in adult humans. A secondary aim was to explore whether musical training might improve some aspects of auditory processing and thus counteract or ameliorate any negative impacts of noise exposure. In a sample of 122 participants (63 female) aged 30-57 years with normal or near-normal hearing thresholds, we conducted audiometric tests, including tympanometry, audiometry, acoustic reflexes, otoacoustic emissions and medial olivocochlear responses. We also assessed temporal and spectral processing, by determining thresholds for detection of amplitude modulation and temporal fine structure. We assessed speech-in-noise perception, and conducted tests of attention, memory and sentence closure. We also calculated participants' accumulated lifetime noise exposure and administered questionnaires to assess self-reported listening difficulty and musical training. The results showed no clear link between participants' lifetime noise exposure and performance on any of the auditory processing or speech-in-noise tasks. Musical training was associated with better performance on the auditory processing tasks, but not the on the speech-in-noise perception tasks. The results indicate that sentence closure skills, working memory, attention, extended high frequency hearing thresholds and medial olivocochlear suppression strength are important factors that are related to the ability to process speech in noise. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Pérez Zaballos, María Teresa; Ramos de Miguel, Ángel; Pérez Plasencia, Daniel; Zaballos González, María Luisa; Ramos Macías, Ángel
2015-12-01
To evaluate 1) if air traffic controllers (ATC) perform better than non-air traffic controllers in an open-set speech-in-noise test because of their experience with radio communications, and 2) if high-frequency information (>8000 Hz) substantially improves speech-in-noise perception across populations. The control group comprised 28 normal-hearing subjects, and the target group comprised 48 ATCs aged between 19 and 55 years who were native Spanish speakers. The hearing -in-noise abilities of the two groups were characterized under two signal conditions: 1) speech tokens and white noise sampled at 44.1 kHz (unfiltered condition) and 2) speech tokens plus white noise, each passed through a 4th order Butterworth filter with 70 and 8000 Hz low and high cutoffs (filtered condition). These tests were performed at signal-to-noise ratios of +5, 0, and -5-dB SNR. The ATCs outperformed the control group in all conditions. The differences were statistically significant in all cases, and the largest difference was observed under the most difficult conditions (-5 dB SNR). Overall, scores were higher when high-frequency components were not suppressed for both groups, although statistically significant differences were not observed for the control group at 0 dB SNR. The results indicate that ATCs are more capable of identifying speech in noise. This may be due to the effect of their training. On the other hand, performance seems to decrease when the high frequency components of speech are removed, regardless of training.
ERIC Educational Resources Information Center
Sayeski, Kristin L.; Earle, Gentry A.; Eslinger, R. Paige; Whitenton, Jessy N.
2017-01-01
Matching phonemes (speech sounds) to graphemes (letters and letter combinations) is an important aspect of decoding (translating print to speech) and encoding (translating speech to print). Yet, many teacher candidates do not receive explicit training in phoneme-grapheme correspondence. Difficulty with accurate phoneme production and/or lack of…
Instructional Uses of Videotape: A Symposium.
ERIC Educational Resources Information Center
Nelson, Harold E.; And Others
This collection of seven articles for the college teacher of speech relates specific ways that videotape has been used in training teachers and in teaching the fundamentals of speech, advanced public speaking, and discussion. Included are articles by (1) Harold E. Nelson, who explains how videotape is used in college speech classes to aid in…
Federal Register 2010, 2011, 2012, 2013, 2014
2011-10-27
... areas of hearing and balance; smell and taste; and voice, speech, and language. The Strategic Plan... research training in the normal and disordered processes of hearing, balance, smell, taste, voice, speech... into three program areas: Hearing and balance; smell and taste; and voice, speech, and language. The...
ERIC Educational Resources Information Center
Schloff, Rose-Laurie; Martinez, Silvia
Guidelines for writing assessments of the English language skills of minority, bilingual, preschool and elementary school children are presented for monolingual speech-language pathologists. In addition, a project (Project Communicate) providing direct client services and training of speech-language pathologists is briefly described. With regard…
ERIC Educational Resources Information Center
Coleman, Thomas; Langberg, George
An experimental public school speech therapy program is described, which offers automated, programed instruction in sound production and auditory training. The experiment includes self-teaching methods, as well as utilization of paraprofessional personnel under the supervision of a qualified speech therapist. Although the automated program was…
Subauditory Speech Recognition based on EMG/EPG Signals
NASA Technical Reports Server (NTRS)
Jorgensen, Charles; Lee, Diana Dee; Agabon, Shane; Lau, Sonie (Technical Monitor)
2003-01-01
Sub-vocal electromyogram/electro palatogram (EMG/EPG) signal classification is demonstrated as a method for silent speech recognition. Recorded electrode signals from the larynx and sublingual areas below the jaw are noise filtered and transformed into features using complex dual quad tree wavelet transforms. Feature sets for six sub-vocally pronounced words are trained using a trust region scaled conjugate gradient neural network. Real time signals for previously unseen patterns are classified into categories suitable for primitive control of graphic objects. Feature construction, recognition accuracy and an approach for extension of the technique to a variety of real world application areas are presented.
The Picture Exchange Communication System.
Bondy, A; Frost, L
2001-10-01
The Picture Exchange Communication System (PECS) is an alternative/augmentative communication system that was developed to teach functional communication to children with limited speech. The approach is unique in that it teaches children to initiate communicative interactions within a social framework. This article describes the advantages to implementing PECS over traditional approaches. The PECS training protocol is described wherein children are taught to exchange a single picture for a desired item and eventually to construct picture-based sentences and use a variety of attributes in their requests. The relationship of PECS's implementation to the development of speech in previously nonvocal students is reviewed.
ERIC Educational Resources Information Center
Thiemann-Bourque, Kathy S.; McGuff, Sara; Goldstein, Howard
2017-01-01
Purpose: This study examined effects of a peer-mediated intervention that provided training on the use of a speech-generating device for preschoolers with severe autism spectrum disorder (ASD) and peer partners. Method: Effects were examined using a multiple probe design across 3 children with ASD and limited to no verbal skills. Three peers…
ERIC Educational Resources Information Center
Kaplan, Harriet, Comp.; Lloyd, Lyle L., Comp.
Programs of agencies within the Department of Health, Education, and Welfare that support research, training, and clinical service projects in hearing, speech, and language development are reviewed. Information on each program usually includes areas of communication development and disorders specific to each agency; the funding mechanism used by…
Developing a corpus of clinical notes manually annotated for part-of-speech.
Pakhomov, Serguei V; Coden, Anni; Chute, Christopher G
2006-06-01
This paper presents a project whose main goal is to construct a corpus of clinical text manually annotated for part-of-speech (POS) information. We describe and discuss the process of training three domain experts to perform linguistic annotation. Three domain experts were trained to perform manual annotation of a corpus of clinical notes. A part of this corpus was combined with the Penn Treebank corpus of general purpose English text and another part was set aside for testing. The corpora were then used for training and testing statistical part-of-speech taggers. We list some of the challenges as well as encouraging results pertaining to inter-rater agreement and consistency of annotation. We used the Trigrams'n'Tags (TnT) [T. Brants, TnT-a statistical part-of-speech tagger, In: Proceedings of NAACL/ANLP-2000 Symposium, 2000] tagger trained on general English data to achieve 89.79% correctness. The same tagger trained on a portion of the medical data annotated for this project improved the performance to 94.69%. Furthermore, we find that discriminating between different types of discourse represented by different sections of clinical text may be very beneficial to improve correctness of POS tagging. Our preliminary experimental results indicate the necessity for adapting state-of-the-art POS taggers to the sublanguage domain of clinical text.
Brain dynamics that correlate with effects of learning on auditory distance perception.
Wisniewski, Matthew G; Mercado, Eduardo; Church, Barbara A; Gramann, Klaus; Makeig, Scott
2014-01-01
Accuracy in auditory distance perception can improve with practice and varies for sounds differing in familiarity. Here, listeners were trained to judge the distances of English, Bengali, and backwards speech sources pre-recorded at near (2-m) and far (30-m) distances. Listeners' accuracy was tested before and after training. Improvements from pre-test to post-test were greater for forward speech, demonstrating a learning advantage for forward speech sounds. Independent component (IC) processes identified in electroencephalographic (EEG) data collected during pre- and post-testing revealed three clusters of ICs across subjects with stimulus-locked spectral perturbations related to learning and accuracy. One cluster exhibited a transient stimulus-locked increase in 4-8 Hz power (theta event-related synchronization; ERS) that was smaller after training and largest for backwards speech. For a left temporal cluster, 8-12 Hz decreases in power (alpha event-related desynchronization; ERD) were greatest for English speech and less prominent after training. In contrast, a cluster of IC processes centered at or near anterior portions of the medial frontal cortex showed learning-related enhancement of sustained increases in 10-16 Hz power (upper-alpha/low-beta ERS). The degree of this enhancement was positively correlated with the degree of behavioral improvements. Results suggest that neural dynamics in non-auditory cortical areas support distance judgments. Further, frontal cortical networks associated with attentional and/or working memory processes appear to play a role in perceptual learning for source distance.
How much is a word? Predicting ease of articulation planning from apraxic speech error patterns.
Ziegler, Wolfram; Aichert, Ingrid
2015-08-01
According to intuitive concepts, 'ease of articulation' is influenced by factors like word length or the presence of consonant clusters in an utterance. Imaging studies of speech motor control use these factors to systematically tax the speech motor system. Evidence from apraxia of speech, a disorder supposed to result from speech motor planning impairment after lesions to speech motor centers in the left hemisphere, supports the relevance of these and other factors in disordered speech planning and the genesis of apraxic speech errors. Yet, there is no unified account of the structural properties rendering a word easy or difficult to pronounce. To model the motor planning demands of word articulation by a nonlinear regression model trained to predict the likelihood of accurate word production in apraxia of speech. We used a tree-structure model in which vocal tract gestures are embedded in hierarchically nested prosodic domains to derive a recursive set of terms for the computation of the likelihood of accurate word production. The model was trained with accuracy data from a set of 136 words averaged over 66 samples from apraxic speakers. In a second step, the model coefficients were used to predict a test dataset of accuracy values for 96 new words, averaged over 120 samples produced by a different group of apraxic speakers. Accurate modeling of the first dataset was achieved in the training study (R(2)adj = .71). In the cross-validation, the test dataset was predicted with a high accuracy as well (R(2)adj = .67). The model shape, as reflected by the coefficient estimates, was consistent with current phonetic theories and with clinical evidence. In accordance with phonetic and psycholinguistic work, a strong influence of word stress on articulation errors was found. The proposed model provides a unified and transparent account of the motor planning requirements of word articulation. Copyright © 2015 Elsevier Ltd. All rights reserved.
Assistive Software Tools for Secondary-Level Students with Literacy Difficulties
ERIC Educational Resources Information Center
Lange, Alissa A.; McPhillips, Martin; Mulhern, Gerry; Wylie, Judith
2006-01-01
The present study assessed the compensatory effectiveness of four assistive software tools (speech synthesis, spellchecker, homophone tool, and dictionary) on literacy. Secondary-level students (N = 93) with reading difficulties completed computer-based tests of literacy skills. Training on their respective software followed for those assigned to…
Rohlfing, Katharina J.; Nachtigäller, Kerstin
2016-01-01
The learning of spatial prepositions is assumed to be based on experience in space. In a slow mapping study, we investigated whether 31 German 28-month-old children could robustly learn the German spatial prepositions hinter [behind] and neben [next to] from pictures, and whether a narrative input can compensate for a lack of immediate experience in space. One group of children received pictures with a narrative input as a training to understand spatial prepositions. In two further groups, we controlled (a) for the narrative input by providing unconnected speech during the training and (b) for the learning material by training the children on toys rather than pictures. We assessed children’s understanding of spatial prepositions at three different time points: pretest, immediate test, and delayed posttest. Results showed improved word retention in children from the narrative but not the control group receiving unconnected speech. Neither of the trained groups succeeded in generalization to novel referents. Finally, all groups were instructed to deal with untrained material in the test to investigate the robustness of learning across tasks. None of the groups succeeded in this task transfer. PMID:27471479
GraphoGame – a catalyst for multi-level promotion of literacy in diverse contexts
Ojanen, Emma; Ronimus, Miia; Ahonen, Timo; Chansa-Kabali, Tamara; February, Pamela; Jere-Folotiya, Jacqueline; Kauppinen, Karri-Pekka; Ketonen, Ritva; Ngorosho, Damaris; Pitkänen, Mikko; Puhakka, Suzanne; Sampa, Francis; Walubita, Gabriel; Yalukanda, Christopher; Pugh, Ken; Richardson, Ulla; Serpell, Robert; Lyytinen, Heikki
2015-01-01
GraphoGame (GG) is originally a technology-based intervention method for supporting children with reading difficulties. It is now known that children who face problems in reading acquisition have difficulties in learning to differentiate and manipulate speech sounds and consequently, in connecting these sounds to corresponding letters. GG was developed to provide intensive training in matching speech sounds and larger units of speech to their written counterparts. GG has been shown to benefit children with reading difficulties and the game is now available for all Finnish school children for literacy support. Presently millions of children in Africa fail to learn to read despite years of primary school education. As many African languages have transparent writing systems similar in structure to Finnish, it was hypothesized that GG-based training of letter-sound correspondences could also be effective in supporting children’s learning in African countries. In this article we will describe how GG has been developed from a Finnish dyslexia prevention game to an intervention method that can be used not only to improve children’s reading performance but also to raise teachers’ and parents’ awareness of the development of reading skill and effective reading instruction methods. We will also provide an overview of the GG activities in Zambia, Kenya, Tanzania, and Namibia, and the potential to promote education for all with a combination of scientific research and mobile learning. PMID:26113825
Georgieva, Dobrinka; Woźniak, Tomasz; Topbaş, Seyhun; Vitaskova, Katerina; Vukovic, Mile; Zemva, Nada; Duranovic, Mirela
2014-01-01
To provide an overview of student training in speech and language therapy/logopedics (SLT) in selected Central and Southeastern European countries (Poland, Slovenia, Bulgaria, Czech Republic, Serbia, Bosnia and Herzegovina and Turkey). Data were collected using a special questionnaire developed by Söderpalm and supplemented by Georgieva. Results from 23 SLT programs in the seven countries were collected and organized. In all these countries, SLT has roots in special education or health and is centralized in the university environment. The training programs have positive accreditation provided by the national agencies of accreditation and evaluation. Results were examined specifically for evidence of the new paradigm of evidence-based practice (EBP) according to the revised International Association of Logopedics and Phoniatrics (IALP) guidelines and the application of research-based teaching in SLT. The professional bodies that govern clinical practice in public health and/or educational fields are in the process of EBP implementation. Most speech and language therapists/logopedists in the selected countries work in an educational setting, clinical organization and/or hospital as well as in social day care centers. Except in Turkey, private practices are not regulated by the law. In the seven countries examined in this survey, SLT is progressing as a professional discipline but must be supported by government funding of SLT education and services to relevant populations. © 2015 S. Karger AG, Basel.
Adults with Specific Language Impairment fail to consolidate speech sounds during sleep.
Earle, F Sayako; Landi, Nicole; Myers, Emily B
2018-02-14
Specific Language Impairment (SLI) is a common learning disability that is associated with poor speech sound representations. These differences in representational quality are thought to impose a burden on spoken language processing. The underlying mechanism to account for impoverished speech sound representations remains in debate. Previous findings that implicate sleep as important for building speech representations, combined with reports of atypical sleep in SLI, motivate the current investigation into a potential consolidation mechanism as a source of impoverished representations in SLI. In the current study, we trained individuals with SLI on a new (nonnative) set of speech sounds, and tracked their perceptual accuracy and neural responses to these sounds over two days. Adults with SLI achieved comparable performance to typical controls during training, however demonstrated a distinct lack of overnight gains on the next day. We propose that those with SLI may be impaired in the consolidation of acoustic-phonetic information. Published by Elsevier B.V.
Segmenting words from natural speech: subsegmental variation in segmental cues.
Rytting, C Anton; Brew, Chris; Fosler-Lussier, Eric
2010-06-01
Most computational models of word segmentation are trained and tested on transcripts of speech, rather than the speech itself, and assume that speech is converted into a sequence of symbols prior to word segmentation. We present a way of representing speech corpora that avoids this assumption, and preserves acoustic variation present in speech. We use this new representation to re-evaluate a key computational model of word segmentation. One finding is that high levels of phonetic variability degrade the model's performance. While robustness to phonetic variability may be intrinsically valuable, this finding needs to be complemented by parallel studies of the actual abilities of children to segment phonetically variable speech.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
These proceedings discuss human factor issues related to aerospace systems, aging, communications, computer systems, consumer products, education and forensic topics, environmental design, industrial ergonomics, international technology transfer, organizational design and management, personality and individual differences in human performance, safety, system development, test and evaluation, training, and visual performance. Particular attention is given to HUDs, attitude indicators, and sensor displays; human factors of space exploration; behavior and aging; the design and evaluation of phone-based interfaces; knowledge acquisition and expert systems; handwriting, speech, and other input techniques; interface design for text, numerics, and speech; and human factor issues in medicine. Also discussedmore » are cumulative trauma disorders, industrial safety, evaluative techniques for automation impacts on the human operators, visual issues in training, and interpreting and organizing human factor concepts and information.« less
Actor vocal training for the habilitation of speech in adolescent users of cochlear implants.
Holt, Colleen M; Dowell, Richard C
2011-01-01
This study examined changes to speech production in adolescents with hearing impairment following a period of actor vocal training. In addition to vocal parameters, the study also investigated changes to psychosocial factors such as confidence, self-esteem, and anxiety. The group were adolescent users of cochlear implants (mean age at commencement of training 15.9 years), with approximately half of the group wearing a hearing aid in the contralateral ear. The mean age of implantation of the group was 7.6 years and the participants displayed a range of speech production abilities. Evaluation of posttraining outcomes was performed via a combination of perceptual and acoustic analyses. Significant posttraining changes to vocal parameters included increased pitch range and variability and decreased speaking rate. From a psychosocial perspective, posttraining stress levels were significantly lowered. This study suggested that actor vocal training may benefit young people with hearing impairment, both in the way in which they use their voices and in the way in which they view themselves.
EMG-based speech recognition using hidden markov models with global control variables.
Lee, Ki-Seung
2008-03-01
It is well known that a strong relationship exists between human voices and the movement of articulatory facial muscles. In this paper, we utilize this knowledge to implement an automatic speech recognition scheme which uses solely surface electromyogram (EMG) signals. The sequence of EMG signals for each word is modelled by a hidden Markov model (HMM) framework. The main objective of the work involves building a model for state observation density when multichannel observation sequences are given. The proposed model reflects the dependencies between each of the EMG signals, which are described by introducing a global control variable. We also develop an efficient model training method, based on a maximum likelihood criterion. In a preliminary study, 60 isolated words were used as recognition variables. EMG signals were acquired from three articulatory facial muscles. The findings indicate that such a system may have the capacity to recognize speech signals with an accuracy of up to 87.07%, which is superior to the independent probabilistic model.
Tjaden, Kris; Wilding, Greg
2011-01-01
The primary purpose of this study was to investigate how speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS) accomplish voluntary reductions in speech rate. A group of talkers with no history of neurological disease was included for comparison. This study was motivated by the idea that knowledge of how speakers with dysarthria voluntarily accomplish a reduced speech rate would contribute toward a descriptive model of speaking rate change in dysarthria. Such a model has the potential to assist in identifying rate control strategies to receive focus in clinical treatment programs and also would advance understanding of global speech timing in dysarthria. All speakers read a passage in Habitual and Slow conditions. Speech rate, articulation rate, pause duration, and pause frequency were measured. All speaker groups adjusted articulation time as well as pause time to reduce overall speech rate. Group differences in how voluntary rate reduction was accomplished were primarily one of quantity or degree. Overall, a slower-than-normal rate was associated with a reduced articulation rate, shorter speech runs that included fewer syllables, and longer more frequent pauses. Taken together, these results suggest that existing skills or strategies used by patients should be emphasized in dysarthria training programs focusing on rate reduction. Results further suggest that a model of voluntary speech rate reduction based on neurologically normal speech shows promise as being applicable for mild to moderate dysarthria. The reader will be able to: (1) describe the importance of studying voluntary adjustments in speech rate in dysarthria, (2) discuss how speakers with Parkinson's disease and Multiple Sclerosis adjust articulation time and pause time to slow speech rate. Copyright © 2011 Elsevier Inc. All rights reserved.
Speech: An Opening or a Dead End for Married Women.
ERIC Educational Resources Information Center
Forusz, Judith Pulin
For the American woman, educated and trained in the speech profession, marriage and motherhood induce a shock for which she is unprepared. Society still expects the main responsibility of child rearing to be that of the mother, while the speech profession, which has prepared all students to be teachers and scholars, is uncooperative in providing…
The Influence of Child-Directed Speech on Word Learning and Comprehension
ERIC Educational Resources Information Center
Foursha-Stevenson, Cassandra; Schembri, Taylor; Nicoladis, Elena; Eriksen, Cody
2017-01-01
This paper describes an investigation into the function of child-directed speech (CDS) across development. In the first experiment, 10-21-month-olds were presented with familiar words in CDS and trained on novel words in CDS or adult-directed speech (ADS). All children preferred the matching display for familiar words. However, only older toddlers…
ERIC Educational Resources Information Center
Ambler, Bob
The University of Tennessee (Knoxville) offers as a special section of the public speaking curriculum, a "speech anxiety" program, taught by faculty and graduate students from the speech and theatre and educational psychology departments and staff from the counseling services center. The students spend the first few weeks of the special…
ERIC Educational Resources Information Center
Seitz, Aaron R.; Protopapas, Athanassios; Tsushima, Yoshiaki; Vlahou, Eleni L.; Gori, Simone; Grossberg, Stephen; Watanabe, Takeo
2010-01-01
Learning a second language as an adult is particularly effortful when new phonetic representations must be formed. Therefore the processes that allow learning of speech sounds are of great theoretical and practical interest. Here we examined whether perception of single formant transitions, that is, sound components critical in speech perception,…
ERIC Educational Resources Information Center
Bernhardt, B. May; Bacsfalvi, Penelope; Adler-Bock, Marcy; Shimizu, Reiko; Cheney, Audrey; Giesbrecht, Nathan; O'Connell, Maureen; Sirianni, Jason; Radanov, Bosko
2008-01-01
Ultrasound has shown promise as a visual feedback tool in speech therapy. Rural clients, however, often have minimal access to new technologies. The purpose of the current study was to evaluate consultative treatment using ultrasound in rural communities. Two speech-language pathologists (SLPs) trained in ultrasound use provided consultation with…
Open-set speaker identification with diverse-duration speech data
NASA Astrophysics Data System (ADS)
Karadaghi, Rawande; Hertlein, Heinz; Ariyaeeinia, Aladdin
2015-05-01
The concern in this paper is an important category of applications of open-set speaker identification in criminal investigation, which involves operating with short and varied duration speech. The study presents investigations into the adverse effects of such an operating condition on the accuracy of open-set speaker identification, based on both GMMUBM and i-vector approaches. The experiments are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover the real-world operating conditions in the considered application area, the study includes experiments with various combinations of training and testing data duration. The paper details the characteristics of the experimental investigations conducted and provides a thorough analysis of the results obtained.
Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn
2017-01-01
This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants' auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research.
Polur, Prasad D; Miller, Gerald E
2006-10-01
Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.
McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.
2014-01-01
Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558
Hidden Markov models in automatic speech recognition
NASA Astrophysics Data System (ADS)
Wrzoskowicz, Adam
1993-11-01
This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Teaming for Speech and Auditory Training.
ERIC Educational Resources Information Center
Nussbaum, Debra B.; Waddy-Smith, Bettie
1985-01-01
The article suggests three strategies for the audiologist and speech/communication specialist to use in assisting the preschool teacher to implement student's individualized education program: (1) demonstration teaming, (2) dual teaming; and (3) rotation teaming. (CL)
Rate and rhythm control strategies for apraxia of speech in nonfluent primary progressive aphasia.
Beber, Bárbara Costa; Berbert, Monalise Costa Batista; Grawer, Ruth Siqueira; Cardoso, Maria Cristina de Almeida Freitas
2018-01-01
The nonfluent/agrammatic variant of primary progressive aphasia is characterized by apraxia of speech and agrammatism. Apraxia of speech limits patients' communication due to slow speaking rate, sound substitutions, articulatory groping, false starts and restarts, segmentation of syllables, and increased difficulty with increasing utterance length. Speech and language therapy is known to benefit individuals with apraxia of speech due to stroke, but little is known about its effects in primary progressive aphasia. This is a case report of a 72-year-old, illiterate housewife, who was diagnosed with nonfluent primary progressive aphasia and received speech and language therapy for apraxia of speech. Rate and rhythm control strategies for apraxia of speech were trained to improve initiation of speech. We discuss the importance of these strategies to alleviate apraxia of speech in this condition and the future perspectives in the area.
Soli, Sigfrid D; Amano-Kusumoto, Akiko; Clavier, Odile; Wilbur, Jed; Casto, Kristen; Freed, Daniel; Laroche, Chantal; Vaillancourt, Véronique; Giguère, Christian; Dreschler, Wouter A; Rhebergen, Koenraad S
2018-05-01
Validate use of the Extended Speech Intelligibility Index (ESII) for prediction of speech intelligibility in non-stationary real-world noise environments. Define a means of using these predictions for objective occupational hearing screening for hearing-critical public safety and law enforcement jobs. Analyses of predicted and measured speech intelligibility in recordings of real-world noise environments were performed in two studies using speech recognition thresholds (SRTs) and intelligibility measures. ESII analyses of the recordings were used to predict intelligibility. Noise recordings were made in prison environments and at US Army facilities for training ground and airborne forces. Speech materials included full bandwidth sentences and bandpass filtered sentences that simulated radio transmissions. A total of 22 adults with normal hearing (NH) and 15 with mild-moderate hearing impairment (HI) participated in the two studies. Average intelligibility predictions for individual NH and HI subjects were accurate in both studies (r 2 ≥ 0.94). Pooled predictions were slightly less accurate (0.78 ≤ r 2 ≤ 0.92). An individual's SRT and audiogram can accurately predict the likelihood of effective speech communication in noise environments with known ESII characteristics, where essential hearing-critical tasks are performed. These predictions provide an objective means of occupational hearing screening.
Burk, Matthew H; Humes, Larry E; Amos, Nathan E; Strauser, Lauren E
2006-06-01
The objective of this study was to evaluate the effectiveness of a training program for hearing-impaired listeners to improve their speech-recognition performance within a background noise when listening to amplified speech. Both noise-masked young normal-hearing listeners, used to model the performance of elderly hearing-impaired listeners, and a group of elderly hearing-impaired listeners participated in the study. Of particular interest was whether training on an isolated word list presented by a standardized talker can generalize to everyday speech communication across novel talkers. Word-recognition performance was measured for both young normal-hearing (n = 16) and older hearing-impaired (n = 7) adults. Listeners were trained on a set of 75 monosyllabic words spoken by a single female talker over a 9- to 14-day period. Performance for the familiar (trained) talker was measured before and after training in both open-set and closed-set response conditions. Performance on the trained words of the familiar talker were then compared with those same words spoken by three novel talkers and to performance on a second set of untrained words presented by both the familiar and unfamiliar talkers. The hearing-impaired listeners returned 6 mo after their initial training to examine retention of the trained words as well as their ability to transfer any knowledge gained from word training to sentences containing both trained and untrained words. Both young normal-hearing and older hearing-impaired listeners performed significantly better on the word list in which they were trained versus a second untrained list presented by the same talker. Improvements on the untrained words were small but significant, indicating some generalization to novel words. The large increase in performance on the trained words, however, was maintained across novel talkers, pointing to the listener's greater focus on lexical memorization of the words rather than a focus on talker-specific acoustic characteristics. On return in 6 mo, listeners performed significantly better on the trained words relative to their initial baseline performance. Although the listeners performed significantly better on trained versus untrained words in isolation, once the trained words were embedded in sentences, no improvement in recognition over untrained words within the same sentences was shown. Older hearing-impaired listeners were able to significantly improve their word-recognition abilities through training with one talker and to the same degree as young normal-hearing listeners. The improved performance was maintained across talkers and across time. This might imply that training a listener using a standardized list and talker may still provide benefit when these same words are presented by novel talkers outside the clinic. However, training on isolated words was not sufficient to transfer to fluent speech for the specific sentence materials used within this study. Further investigation is needed regarding approaches to improve a hearing aid user's speech understanding in everyday communication situations.
rTMS treatments combined with speech training for a conduction aphasia patient
Zhang, Hui; Chen, Ying; Hu, Ruiping; Yang, Liqing; Wang, Mengxing; Zhang, Jilei; Lu, Haifeng; Wu, Yi; Du, Xiaoxia
2017-01-01
Abstract Rationale: To date, little is known regarding the neural mechanisms of the functional recovery of language after repetitive transcranial magnetic stimulation (rTMS) in aphasia. Our aim was to investigate the mechanism that underlies rTMS and speech training in a case report. Patient concerns and diagnoses: We report the case of a 39-year-old woman who was initially diagnosed with conduction aphasia following a left hemisphere stroke. Interventions: The rTMS location comprised the left Broca area, and a frequency of 5 Hz for 20 min/d for 10 days during a 2-week period was used. She had received speech rehabilitation training 1 month after stroke. Functional magnetic resonance imaging (fMRI) and diffusion tensor imaging were used to investigate the functional and microstructural changes before and after rTMS treatment. Outcomes: The results demonstrated that the Western Aphasia Battery scores significantly improved for language ability at 2 weeks post-treatment, and the gains were steadily increased at 2.5 months post-treatment. The fMRI results indicated a more focused activation pattern and showed significant activation in the left dominant hemisphere relative to the right hemisphere, especially in the perilesional areas, post-treatment during 2 language tasks compared with pretreatment. Moreover, the fractional anisotropy increased in the left superior temporal gyrus, which comprises an important area that is involved in language processing. Lessons: Our findings suggest that rTMS combined with speech training improved the speech-language ability of this chronic conduction aphasia patient and enhanced the cerebral functional and microstructural reorganization. PMID:28796033
The Americleft Speech Project: A Training and Reliability Study.
Chapman, Kathy L; Baylis, Adriane; Trost-Cardamone, Judith; Cordero, Kelly Nett; Dixon, Angela; Dobbelsteyn, Cindy; Thurmes, Anna; Wilson, Kristina; Harding-Bell, Anne; Sweeney, Triona; Stoddard, Gregory; Sell, Debbie
2016-01-01
To describe the results of two reliability studies and to assess the effect of training on interrater reliability scores. The first study (1) examined interrater and intrarater reliability scores (weighted and unweighted kappas) and (2) compared interrater reliability scores before and after training on the use of the Cleft Audit Protocol for Speech-Augmented (CAPS-A) with British English-speaking children. The second study examined interrater and intrarater reliability on a modified version of the CAPS-A (CAPS-A Americleft Modification) with American and Canadian English-speaking children. Finally, comparisons were made between the interrater and intrarater reliability scores obtained for Study 1 and Study 2. The participants were speech-language pathologists from the Americleft Speech Project. In Study 1, interrater reliability scores improved for 6 of the 13 parameters following training on the CAPS-A protocol. Comparison of the reliability results for the two studies indicated lower scores for Study 2 compared with Study 1. However, this appeared to be an artifact of the kappa statistic that occurred due to insufficient variability in the reliability samples for Study 2. When percent agreement scores were also calculated, the ratings appeared similar across Study 1 and Study 2. The findings of this study suggested that improvements in interrater reliability could be obtained following a program of systematic training. However, improvements were not uniform across all parameters. Acceptable levels of reliability were achieved for those parameters most important for evaluation of velopharyngeal function.
Rao, Aparna; Rishiq, Dania; Yu, Luodi; Zhang, Yang; Abrams, Harvey
The objectives of this study were to investigate the effects of hearing aid use and the effectiveness of ReadMyQuips (RMQ), an auditory training program, on speech perception performance and auditory selective attention using electrophysiological measures. RMQ is an audiovisual training program designed to improve speech perception in everyday noisy listening environments. Participants were adults with mild to moderate hearing loss who were first-time hearing aid users. After 4 weeks of hearing aid use, the experimental group completed RMQ training in 4 weeks, and the control group received listening practice on audiobooks during the same period. Cortical late event-related potentials (ERPs) and the Hearing in Noise Test (HINT) were administered at prefitting, pretraining, and post-training to assess effects of hearing aid use and RMQ training. An oddball paradigm allowed tracking of changes in P3a and P3b ERPs to distractors and targets, respectively. Behavioral measures were also obtained while ERPs were recorded from participants. After 4 weeks of hearing aid use but before auditory training, HINT results did not show a statistically significant change, but there was a significant P3a reduction. This reduction in P3a was correlated with improvement in d prime (d') in the selective attention task. Increased P3b amplitudes were also correlated with improvement in d' in the selective attention task. After training, this correlation between P3b and d' remained in the experimental group, but not in the control group. Similarly, HINT testing showed improved speech perception post training only in the experimental group. The criterion calculated in the auditory selective attention task showed a reduction only in the experimental group after training. ERP measures in the auditory selective attention task did not show any changes related to training. Hearing aid use was associated with a decrement in involuntary attention switch to distractors in the auditory selective attention task. RMQ training led to gains in speech perception in noise and improved listener confidence in the auditory selective attention task.
ERIC Educational Resources Information Center
McAllen, Audrey E.
This book gives teachers an understanding of speech training through specially selected exercises. The book's exercises aim to help develop clear speaking in the classroom. Methodically and perceptively used, the book will assist those concerned with the creative powers of speech as a teaching art. In Part 1, there are sections on the links…
ERIC Educational Resources Information Center
Lousada, M.; Jesus, Luis M. T.; Capelas, S.; Margaca, C.; Simoes, D.; Valente, A.; Hall, A.; Joffe, V. L.
2013-01-01
Background: In Portugal, the routine clinical practice of speech and language therapists (SLTs) in treating children with all types of speech sound disorder (SSD) continues to be articulation therapy (AT). There is limited use of phonological therapy (PT) or phonological awareness training in Portugal. Additionally, at an international level there…
ERIC Educational Resources Information Center
Yeni-Komshian, Grace; And Others
This study was designed to compare children and adults on their initial ability to identify and reproduce novel speech sounds and to evaluate their performance after receiving several training sessions in producing these sounds. The novel speech sounds used were two voiceless fricatives which are consonant phonemes in Arabic but which are…
A Comparison of the Interpersonal Orientations of Speech Anxious and Non Speech Anxious Students.
ERIC Educational Resources Information Center
Ambler, Bob
A special section of a public speaking class at the Universtiy of Tennessee was developed in the spring of 1977 for speech anxious students. The course was designed to incorporate the basic spirit of the regular classes and to provide special training in techniques for reducing nervousness about speaking and in methods for coping with the…
A Demonstration Project of Speech Training for the Preschool Cleft Palate Child. Final Report.
ERIC Educational Resources Information Center
Harrison, Robert J.
To ascertain the efficacy of a program of language and speech stimulation for the preschool cleft palate child, a research and demonstration project was conducted using 137 subjects (ages 18 to 72 months) with defects involving the soft palate. Their language and speech skills were matched with those of a noncleft peer group revealing that the…
ERIC Educational Resources Information Center
Friedman, Herbert L.; Johnson, Raymond L.
Research in training subjects to comprehend compressed speech has led to deeper studies of basic listening skills. The connected discourse is produced by a technique which deletes segments of the speech record and joins the remainder together without pitch distortion. The two problems dealt with were the sources of individual differences in the…
Using Visible Speech to Train Perception and Production of Speech for Individuals with Hearing Loss.
ERIC Educational Resources Information Center
Massaro, Dominic W.; Light, Joanna
2004-01-01
The main goal of this study was to implement a computer-animated talking head, Baldi, as a language tutor for speech perception and production for individuals with hearing loss. Baldi can speak slowly; illustrate articulation by making the skin transparent to reveal the tongue, teeth, and palate; and show supplementary articulatory features, such…
2016-07-01
music of varying complexities. We did observe improvement from the first to the last lesson and the subject expressed appreciation for the training...hearing threshold data. C. Collect pre- and post-operative speech perception data. D. Collect music appraisal and pitch data. E. Administer training...localization, and music data. We are also collecting quality of life and functional questionnaire data. In Figure 2, we show post-operative speech
Lee, Jong-Seok; Park, Cheol Hoon
2010-08-01
We propose a novel stochastic optimization algorithm, hybrid simulated annealing (SA), to train hidden Markov models (HMMs) for visual speech recognition. In our algorithm, SA is combined with a local optimization operator that substitutes a better solution for the current one to improve the convergence speed and the quality of solutions. We mathematically prove that the sequence of the objective values converges in probability to the global optimum in the algorithm. The algorithm is applied to train HMMs that are used as visual speech recognizers. While the popular training method of HMMs, the expectation-maximization algorithm, achieves only local optima in the parameter space, the proposed method can perform global optimization of the parameters of HMMs and thereby obtain solutions yielding improved recognition performance. The superiority of the proposed algorithm to the conventional ones is demonstrated via isolated word recognition experiments.
Lai, Ying-Hui; Tsao, Yu; Lu, Xugang; Chen, Fei; Su, Yu-Ting; Chen, Kuang-Chao; Chen, Yu-Hsuan; Chen, Li-Ching; Po-Hung Li, Lieber; Lee, Chin-Hui
2018-01-20
We investigate the clinical effectiveness of a novel deep learning-based noise reduction (NR) approach under noisy conditions with challenging noise types at low signal to noise ratio (SNR) levels for Mandarin-speaking cochlear implant (CI) recipients. The deep learning-based NR approach used in this study consists of two modules: noise classifier (NC) and deep denoising autoencoder (DDAE), thus termed (NC + DDAE). In a series of comprehensive experiments, we conduct qualitative and quantitative analyses on the NC module and the overall NC + DDAE approach. Moreover, we evaluate the speech recognition performance of the NC + DDAE NR and classical single-microphone NR approaches for Mandarin-speaking CI recipients under different noisy conditions. The testing set contains Mandarin sentences corrupted by two types of maskers, two-talker babble noise, and a construction jackhammer noise, at 0 and 5 dB SNR levels. Two conventional NR techniques and the proposed deep learning-based approach are used to process the noisy utterances. We qualitatively compare the NR approaches by the amplitude envelope and spectrogram plots of the processed utterances. Quantitative objective measures include (1) normalized covariance measure to test the intelligibility of the utterances processed by each of the NR approaches; and (2) speech recognition tests conducted by nine Mandarin-speaking CI recipients. These nine CI recipients use their own clinical speech processors during testing. The experimental results of objective evaluation and listening test indicate that under challenging listening conditions, the proposed NC + DDAE NR approach yields higher intelligibility scores than the two compared classical NR techniques, under both matched and mismatched training-testing conditions. When compared to the two well-known conventional NR techniques under challenging listening condition, the proposed NC + DDAE NR approach has superior noise suppression capabilities and gives less distortion for the key speech envelope information, thus, improving speech recognition more effectively for Mandarin CI recipients. The results suggest that the proposed deep learning-based NR approach can potentially be integrated into existing CI signal processors to overcome the degradation of speech perception caused by noise.
Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina
2014-12-01
In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Howell, Peter; Sackin, Stevie; Glenn, Kazan
2007-01-01
This program of work is intended to develop automatic recognition procedures to locate and assess stuttered dysfluencies. This and the following article together, develop and test recognizers for repetitions and prolongations. The automatic recognizers classify the speech in two stages: In the first, the speech is segmented and in the second the segments are categorized. The units that are segmented are words. Here assessments by human judges on the speech of 12 children who stutter are described using a corresponding procedure. The accuracy of word boundary placement across judges, categorization of the words as fluent, repetition or prolongation, and duration of the different fluency categories are reported. These measures allow reliable instances of repetitions and prolongations to be selected for training and assessing the recognizers in the subsequent paper. PMID:9328878
Visual Speech-Training Aid for the Deaf
NASA Technical Reports Server (NTRS)
Miller, Robert J.
1987-01-01
Teaching deaf to speak aided by electronic system provides striking colored, pictorial representation of sound; energy at different frequencies as function of time. Other modalities, such as nasality, intra-oral pressure, and lip-muscle contraction, pictorialized simultaneously. Use of standard components, including personal microcomputer, helps reduce cost below prior voice-training systems. Speech-training system, microphone output separated by filters into narrow frequency bands, changed into digital signals, formatted by computer, and displayed on television screen. Output from other sensors displayed simultaneously or screen split to allow sound produced by student to be compared with that of teacher.
Speech entrainment enables patients with Broca’s aphasia to produce fluent speech
Hubbard, H. Isabel; Hudspeth, Sarah Grace; Holland, Audrey L.; Bonilha, Leonardo; Fromm, Davida; Rorden, Chris
2012-01-01
A distinguishing feature of Broca’s aphasia is non-fluent halting speech typically involving one to three words per utterance. Yet, despite such profound impairments, some patients can mimic audio-visual speech stimuli enabling them to produce fluent speech in real time. We call this effect ‘speech entrainment’ and reveal its neural mechanism as well as explore its usefulness as a treatment for speech production in Broca’s aphasia. In Experiment 1, 13 patients with Broca’s aphasia were tested in three conditions: (i) speech entrainment with audio-visual feedback where they attempted to mimic a speaker whose mouth was seen on an iPod screen; (ii) speech entrainment with audio-only feedback where patients mimicked heard speech; and (iii) spontaneous speech where patients spoke freely about assigned topics. The patients produced a greater variety of words using audio-visual feedback compared with audio-only feedback and spontaneous speech. No difference was found between audio-only feedback and spontaneous speech. In Experiment 2, 10 of the 13 patients included in Experiment 1 and 20 control subjects underwent functional magnetic resonance imaging to determine the neural mechanism that supports speech entrainment. Group results with patients and controls revealed greater bilateral cortical activation for speech produced during speech entrainment compared with spontaneous speech at the junction of the anterior insula and Brodmann area 47, in Brodmann area 37, and unilaterally in the left middle temporal gyrus and the dorsal portion of Broca’s area. Probabilistic white matter tracts constructed for these regions in the normal subjects revealed a structural network connected via the corpus callosum and ventral fibres through the extreme capsule. Unilateral areas were connected via the arcuate fasciculus. In Experiment 3, all patients included in Experiment 1 participated in a 6-week treatment phase using speech entrainment to improve speech production. Behavioural and functional magnetic resonance imaging data were collected before and after the treatment phase. Patients were able to produce a greater variety of words with and without speech entrainment at 1 and 6 weeks after training. Treatment-related decrease in cortical activation associated with speech entrainment was found in areas of the left posterior-inferior parietal lobe. We conclude that speech entrainment allows patients with Broca’s aphasia to double their speech output compared with spontaneous speech. Neuroimaging results suggest that speech entrainment allows patients to produce fluent speech by providing an external gating mechanism that yokes a ventral language network that encodes conceptual aspects of speech. Preliminary results suggest that training with speech entrainment improves speech production in Broca’s aphasia providing a potential therapeutic method for a disorder that has been shown to be particularly resistant to treatment. PMID:23250889
Revisiting the "enigma" of musicians with dyslexia: Auditory sequencing and speech abilities.
Zuk, Jennifer; Bishop-Liebler, Paula; Ozernov-Palchik, Ola; Moore, Emma; Overy, Katie; Welch, Graham; Gaab, Nadine
2017-04-01
Previous research has suggested a link between musical training and auditory processing skills. Musicians have shown enhanced perception of auditory features critical to both music and speech, suggesting that this link extends beyond basic auditory processing. It remains unclear to what extent musicians who also have dyslexia show these specialized abilities, considering often-observed persistent deficits that coincide with reading impairments. The present study evaluated auditory sequencing and speech discrimination in 52 adults comprised of musicians with dyslexia, nonmusicians with dyslexia, and typical musicians. An auditory sequencing task measuring perceptual acuity for tone sequences of increasing length was administered. Furthermore, subjects were asked to discriminate synthesized syllable continua varying in acoustic components of speech necessary for intraphonemic discrimination, which included spectral (formant frequency) and temporal (voice onset time [VOT] and amplitude envelope) features. Results indicate that musicians with dyslexia did not significantly differ from typical musicians and performed better than nonmusicians with dyslexia for auditory sequencing as well as discrimination of spectral and VOT cues within syllable continua. However, typical musicians demonstrated superior performance relative to both groups with dyslexia for discrimination of syllables varying in amplitude information. These findings suggest a distinct profile of speech processing abilities in musicians with dyslexia, with specific weaknesses in discerning amplitude cues within speech. Because these difficulties seem to remain persistent in adults with dyslexia despite musical training, this study only partly supports the potential for musical training to enhance the auditory processing skills known to be crucial for literacy in individuals with dyslexia. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Speak on time! Effects of a musical rhythmic training on children with hearing loss.
Hidalgo, Céline; Falk, Simone; Schön, Daniele
2017-08-01
This study investigates temporal adaptation in speech interaction in children with normal hearing and in children with cochlear implants (CIs) and/or hearing aids (HAs). We also address the question of whether musical rhythmic training can improve these skills in children with hearing loss (HL). Children named pictures presented on the screen in alternation with a virtual partner. Alternation rate (fast or slow) and the temporal predictability (match vs mismatch of stress occurrences) were manipulated. One group of children with normal hearing (NH) and one with HL were tested. The latter group was tested twice: once after 30 min of speech therapy and once after 30 min of musical rhythmic training. Both groups of children (NH and with HL) can adjust their speech production to the rate of alternation of the virtual partner. Moreover, while children with normal hearing benefit from the temporal regularity of stress occurrences, children with HL become sensitive to this manipulation only after rhythmic training. Rhythmic training may help children with HL to structure the temporal flow of their verbal interactions. Copyright © 2017 Elsevier B.V. All rights reserved.
36 CFR 1192.61 - Public information system.
Code of Federal Regulations, 2011 CFR
2011-07-01
... COMPLIANCE BOARD AMERICANS WITH DISABILITIES ACT (ADA) ACCESSIBILITY GUIDELINES FOR TRANSPORTATION VEHICLES... or digitized human speech messages, to announce stations and provide other passenger information... transportation system personnel, or recorded or digitized human speech messages, to announce train, route, or...
Speech Acquisition and Automatic Speech Recognition for Integrated Spacesuit Audio Systems
NASA Technical Reports Server (NTRS)
Huang, Yiteng; Chen, Jingdong; Chen, Shaoyan
2010-01-01
A voice-command human-machine interface system has been developed for spacesuit extravehicular activity (EVA) missions. A multichannel acoustic signal processing method has been created for distant speech acquisition in noisy and reverberant environments. This technology reduces noise by exploiting differences in the statistical nature of signal (i.e., speech) and noise that exists in the spatial and temporal domains. As a result, the automatic speech recognition (ASR) accuracy can be improved to the level at which crewmembers would find the speech interface useful. The developed speech human/machine interface will enable both crewmember usability and operational efficiency. It can enjoy a fast rate of data/text entry, small overall size, and can be lightweight. In addition, this design will free the hands and eyes of a suited crewmember. The system components and steps include beam forming/multi-channel noise reduction, single-channel noise reduction, speech feature extraction, feature transformation and normalization, feature compression, model adaption, ASR HMM (Hidden Markov Model) training, and ASR decoding. A state-of-the-art phoneme recognizer can obtain an accuracy rate of 65 percent when the training and testing data are free of noise. When it is used in spacesuits, the rate drops to about 33 percent. With the developed microphone array speech-processing technologies, the performance is improved and the phoneme recognition accuracy rate rises to 44 percent. The recognizer can be further improved by combining the microphone array and HMM model adaptation techniques and using speech samples collected from inside spacesuits. In addition, arithmetic complexity models for the major HMMbased ASR components were developed. They can help real-time ASR system designers select proper tasks when in the face of constraints in computational resources.
ERIC Educational Resources Information Center
Wray, Denise; Flexer, Carol
2010-01-01
A collaborative team of faculty from The University of Akron (UA) in Akron, Ohio, and Kent State University (KSU) in Kent, Ohio, were awarded a federal grant from the U.S. Department of Education to develop a specialty area in the graduate speech-language pathology (SLP) programs of UA and KSU that would train a total of 32 SLP students (trainees)…
Moradi, Shahram; Wahlin, Anna; Hällgren, Mathias; Rönnberg, Jerker; Lidestam, Björn
2017-01-01
This study aimed to examine the efficacy and maintenance of short-term (one-session) gated audiovisual speech training for improving auditory sentence identification in noise in experienced elderly hearing-aid users. Twenty-five hearing aid users (16 men and 9 women), with an average age of 70.8 years, were randomly divided into an experimental (audiovisual training, n = 14) and a control (auditory training, n = 11) group. Participants underwent gated speech identification tasks comprising Swedish consonants and words presented at 65 dB sound pressure level with a 0 dB signal-to-noise ratio (steady-state broadband noise), in audiovisual or auditory-only training conditions. The Hearing-in-Noise Test was employed to measure participants’ auditory sentence identification in noise before the training (pre-test), promptly after training (post-test), and 1 month after training (one-month follow-up). The results showed that audiovisual training improved auditory sentence identification in noise promptly after the training (post-test vs. pre-test scores); furthermore, this improvement was maintained 1 month after the training (one-month follow-up vs. pre-test scores). Such improvement was not observed in the control group, neither promptly after the training nor at the one-month follow-up. However, no significant between-groups difference nor an interaction between groups and session was observed. Conclusion: Audiovisual training may be considered in aural rehabilitation of hearing aid users to improve listening capabilities in noisy conditions. However, the lack of a significant between-groups effect (audiovisual vs. auditory) or an interaction between group and session calls for further research. PMID:28348542
Schultz-Coulon, H J; Borghorst, U
1982-03-01
Acoustic signals of low frequencies can be percepted by the tactile sense as vibrations to a limited extent. In educating deaf children one takes trouble to combine tactile and visual speech perception in order to improve speech discrimination. The question of this study was, whether an improvement of tactile discrimination can be achieved even by patients with late aquired deafness. In a 45 years old female patient who had become deaf after adolescence the tactile discrimination of instrumental sounds (electric organ) within the frequency area c3-c4 (131-262 cps) as well as of speech sounds (30 mono- and multisyllabic words) was trained by means of the SIEMENS-Fonator. After two training courses of ten hours each (à 45 min) the patient was not only able to recognize pitch differences of two half steps and more as well as the tones of the scale with few errors only, but could also identify the words to a high percentage; in monosyllables she reached an identification rate of 75.6% and in words with 3 syllables of 85%. Additionally, a marked improvement of speech discrimination by lip reading was observed when using the Fonator. Accordingly, even in patients with late aquired deafness it appears to be worthwhile to train the tactile discrimination of vibration stimuli as to support lip reading.
ERIC Educational Resources Information Center
Brown, Paula M.; Quenin, Cathy
2010-01-01
The specialty preparation program within the speech-language pathology master's degree program at Nazareth College in Rochester, New York, was designed to train speech-language pathologists to work with children who are deaf and hard of hearing, ages 0 to 21. The program is offered in collaboration with the Rochester Institute of Technology,…
Auditory Training: Evidence for Neural Plasticity in Older Adults
Anderson, Samira; Kraus, Nina
2014-01-01
Improvements in digital amplification, cochlear implants, and other innovations have extended the potential for improving hearing function; yet, there remains a need for further hearing improvement in challenging listening situations, such as when trying to understand speech in noise or when listening to music. Here, we review evidence from animal and human models of plasticity in the brain’s ability to process speech and other meaningful stimuli. We considered studies targeting populations of younger through older adults, emphasizing studies that have employed randomized controlled designs and have made connections between neural and behavioral changes. Overall results indicate that the brain remains malleable through older adulthood, provided that treatment algorithms have been modified to allow for changes in learning with age. Improvements in speech-in-noise perception and cognition function accompany neural changes in auditory processing. The training-related improvements noted across studies support the need to consider auditory training strategies in the management of individuals who express concerns about hearing in difficult listening situations. Given evidence from studies engaging the brain’s reward centers, future research should consider how these centers can be naturally activated during training. PMID:25485037
Naval Computer-Based Instruction: Cost, Implementation and Effectiveness Issues.
1988-03-01
logical follow on to MITIPAC and are an attempt to use some artificial intelligence (AI) techniques with computer-based training. A good intelligent ...principles of steam plant operation and maintenance. Steamer was written in LISP on a LISP machine in an attempt to use artificial intelligence . "What... Artificial Intelligence and Speech Technology", Electronic Learning, September 1987. Montague, William. E., code 5, Navy Personnel Research and
Yu, Jyaehyoung; Jeon, Hanjae; Song, Changgeun; Han, Woojae
2017-01-01
The goal of the present study was to develop an auditory training program using a mobile device and to test its efficacy by applying it to older adults suffering from moderate-to-severe sensorineural hearing loss. Among the 20 elderly hearing-impaired listeners who participated, 10 were randomly assigned to a training group (TG) and 10 were assigned to a non-training group (NTG) as a control. As a baseline, all participants were measured by vowel, consonant and sentence tests. In the experiment, the TG had been trained for 4 weeks using a mobile program, which had four levels and consisted of 10 Korean nonsense syllables, with each level completed in 1 week. In contrast, traditional auditory training had been provided for the NTG during the same period. To evaluate whether a training effect was achieved, the two groups also carried out the same tests as the baseline after completing the experiment. The results showed that performance on the consonant and sentence tests in the TG was significantly increased compared with that of the NTG. Also, improved scores of speech perception were retained at 2 weeks after the training was completed. However, vowel scores were not changed after the 4-week training in both the TG and the NTG. This result pattern suggests that a moderate amount of auditory training using the mobile device with cost-effective and minimal supervision is useful when it is used to improve the speech understanding of older adults with hearing loss. Geriatr Gerontol Int 2017; 17: 61-68. © 2015 Japan Geriatrics Society.
Silva, Regiane Serafim Abreu; Simões-Zenari, Marcia; Nemr, Nair Kátia
2012-01-01
To analyze the impact of auditory training for auditory-perceptual assessment carried out by Speech-Language Pathology undergraduate students. During two semesters, 17 undergraduate students enrolled in theoretical subjects regarding phonation (Phonation/Phonation Disorders) analyzed samples of altered and unaltered voices (selected for this purpose), using the GRBAS scale. All subjects received auditory training during nine 15-minute meetings. In each meeting, a different parameter was presented using the different voices sample, with predominance of the trained aspect in each session. Sample assessment using the scale was carried out before and after training, and in other four opportunities throughout the meetings. Students' assessments were compared to an assessment carried out by three voice-experts speech-language pathologists who were the judges. To verify training effectiveness, the Friedman's test and the Kappa index were used. The rate of correct answers in the pre-training was considered between regular and good. It was observed maintenance of the number of correct answers throughout assessments, for most of the scale parameters. In the post-training moment, the students showed improvements in the analysis of asthenia, a parameter that was emphasized during training after the students reported difficulties analyzing it. There was a decrease in the number of correct answers for the roughness parameter after it was approached segmented into hoarseness and harshness, and observed in association with different diagnoses and acoustic parameters. Auditory training enhances students' initial abilities to perform the evaluation, aside from guiding adjustments in the dynamics of the university subject.
Perceptual restoration of degraded speech is preserved with advancing age.
Saija, Jefta D; Akyürek, Elkan G; Andringa, Tjeerd C; Başkent, Deniz
2014-02-01
Cognitive skills, such as processing speed, memory functioning, and the ability to divide attention, are known to diminish with aging. The present study shows that, despite these changes, older adults can successfully compensate for degradations in speech perception. Critically, the older participants of this study were not pre-selected for high performance on cognitive tasks, but only screened for normal hearing. We measured the compensation for speech degradation using phonemic restoration, where intelligibility of degraded speech is enhanced using top-down repair mechanisms. Linguistic knowledge, Gestalt principles of perception, and expectations based on situational and linguistic context are used to effectively fill in the inaudible masked speech portions. A positive compensation effect was previously observed only with young normal hearing people, but not with older hearing-impaired populations, leaving the question whether the lack of compensation was due to aging or due to age-related hearing problems. Older participants in the present study showed poorer intelligibility of degraded speech than the younger group, as expected from previous reports of aging effects. However, in conditions that induce top-down restoration, a robust compensation was observed. Speech perception by the older group was enhanced, and the enhancement effect was similar to that observed with the younger group. This effect was even stronger with slowed-down speech, which gives more time for cognitive processing. Based on previous research, the likely explanations for these observations are that older adults can overcome age-related cognitive deterioration by relying on linguistic skills and vocabulary that they have accumulated over their lifetime. Alternatively, or simultaneously, they may use different cerebral activation patterns or exert more mental effort. This positive finding on top-down restoration skills by the older individuals suggests that new cognitive training methods can teach older adults to effectively use compensatory mechanisms to cope with the complex listening environments of everyday life.
Chang, Ming; Iizuka, Hiroyuki; Kashioka, Hideki; Naruse, Yasushi; Furukawa, Masahiro; Ando, Hideyuki; Maeda, Taro
2017-01-01
When people learn foreign languages, they find it difficult to perceive speech sounds that are nonexistent in their native language, and extensive training is consequently necessary. Our previous studies have shown that by using neurofeedback based on the mismatch negativity event-related brain potential, participants could unconsciously achieve learning in the auditory discrimination of pure tones that could not be consciously discriminated without the neurofeedback. Here, we examined whether mismatch negativity neurofeedback is effective for helping someone to perceive new speech sounds in foreign language learning. We developed a task for training native Japanese speakers to discriminate between 'l' and 'r' sounds in English, as they usually cannot discriminate between these two sounds. Without participants attending to auditory stimuli or being aware of the nature of the experiment, neurofeedback training helped them to achieve significant improvement in unconscious auditory discrimination and recognition of the target words 'light' and 'right'. There was also improvement in the recognition of other words containing 'l' and 'r' (e.g., 'blight' and 'bright'), even though these words had not been presented during training. This method could be used to facilitate foreign language learning and can be extended to other fields of auditory and clinical research and even other senses.
Iizuka, Hiroyuki; Kashioka, Hideki; Naruse, Yasushi; Furukawa, Masahiro; Ando, Hideyuki; Maeda, Taro
2017-01-01
When people learn foreign languages, they find it difficult to perceive speech sounds that are nonexistent in their native language, and extensive training is consequently necessary. Our previous studies have shown that by using neurofeedback based on the mismatch negativity event-related brain potential, participants could unconsciously achieve learning in the auditory discrimination of pure tones that could not be consciously discriminated without the neurofeedback. Here, we examined whether mismatch negativity neurofeedback is effective for helping someone to perceive new speech sounds in foreign language learning. We developed a task for training native Japanese speakers to discriminate between ‘l’ and ‘r’ sounds in English, as they usually cannot discriminate between these two sounds. Without participants attending to auditory stimuli or being aware of the nature of the experiment, neurofeedback training helped them to achieve significant improvement in unconscious auditory discrimination and recognition of the target words ‘light’ and ‘right’. There was also improvement in the recognition of other words containing ‘l’ and ‘r’ (e.g., ‘blight’ and ‘bright’), even though these words had not been presented during training. This method could be used to facilitate foreign language learning and can be extended to other fields of auditory and clinical research and even other senses. PMID:28617861
Voiceprints: Hearing for Those Who Can't?
ERIC Educational Resources Information Center
Science News, 1979
1979-01-01
An electrical engineer and phoneticist has taught himself to read voiceprints, spectrograms that display a plot of the sound frequency of speech against time. This technique may prove useful in training the deaf to read voiceprints or to mimic natural speech. (BB)
Automatic intelligibility classification of sentence-level pathological speech
Kim, Jangwon; Kumar, Naveen; Tsiartas, Andreas; Li, Ming; Narayanan, Shrikanth S.
2014-01-01
Pathological speech usually refers to the condition of speech distortion resulting from atypicalities in voice and/or in the articulatory mechanisms owing to disease, illness or other physical or biological insult to the production system. Although automatic evaluation of speech intelligibility and quality could come in handy in these scenarios to assist experts in diagnosis and treatment design, the many sources and types of variability often make it a very challenging computational processing problem. In this work we propose novel sentence-level features to capture abnormal variation in the prosodic, voice quality and pronunciation aspects in pathological speech. In addition, we propose a post-classification posterior smoothing scheme which refines the posterior of a test sample based on the posteriors of other test samples. Finally, we perform feature-level fusions and subsystem decision fusion for arriving at a final intelligibility decision. The performances are tested on two pathological speech datasets, the NKI CCRT Speech Corpus (advanced head and neck cancer) and the TORGO database (cerebral palsy or amyotrophic lateral sclerosis), by evaluating classification accuracy without overlapping subjects’ data among training and test partitions. Results show that the feature sets of each of the voice quality subsystem, prosodic subsystem, and pronunciation subsystem, offer significant discriminating power for binary intelligibility classification. We observe that the proposed posterior smoothing in the acoustic space can further reduce classification errors. The smoothed posterior score fusion of subsystems shows the best classification performance (73.5% for unweighted, and 72.8% for weighted, average recalls of the binary classes). PMID:25414544
Guo, Ruiling; Bain, Barbara A; Willer, Janene
2008-04-01
The research assesses the information needs of speech-language pathologists (SLPs) and audiologists in Idaho and identifies specific needs for training in evidence-based practice (EBP) principles and searching EBP resources. A survey was developed to assess knowledge and skills in accessing information. Questionnaires were distributed to 217 members of the Idaho Speech-Language-Hearing Association, who were given multiple options to return the assessment survey (web, email, mail). Data were analyzed descriptively and statistically. The total response rate was 38.7% (84/217). Of the respondents, 87.0% (73/84) indicated insufficient knowledge and skills to search PubMed. Further, 47.6% (40/84) indicated limited knowledge of EBP. Of professionals responding, 52.4% (44/84) reported interest in learning more about EBP and 47.6% (40/84) reported interest in learning to search PubMed. SLPs and audiologists who graduated within the last 10 years were more likely to respond online, while those graduating prior to that time preferred to respond via hard copy. DISCUSSIONS/CONCLUSION: More effort should be made to ensure that SLPs and audiologists develop skills in locating information to support their practice. Results from this information needs assessment were used to design a training and outreach program on EBP and EBP database searching for SLPs and audiologists in Idaho.
Establishing a Public School Dysphagia Program: A Model for Administration and Service Provision
ERIC Educational Resources Information Center
Homer, Emily M.
2008-01-01
Purpose: Many school-based speech-language pathologists (SLPs) are hampered in participating in managing children with dysphagia by their school systems' lack of supportive policies and procedures. A need exists to better define the dysphagia-trained SLP's role and clarify the district's responsibility. The purpose of this article is to address…
ERIC Educational Resources Information Center
Treurniet, William
A study applied artificial neural networks, trained with the back-propagation learning algorithm, to modelling phonemes extracted from the DARPA TIMIT multi-speaker, continuous speech data base. A number of proposed network architectures were applied to the phoneme classification task, ranging from the simple feedforward multilayer network to more…
Goldrick, Matthew; Keshet, Joseph; Gustafson, Erin; Heller, Jordana; Needle, Jeremy
2016-04-01
Traces of the cognitive mechanisms underlying speaking can be found within subtle variations in how we pronounce sounds. While speech errors have traditionally been seen as categorical substitutions of one sound for another, acoustic/articulatory analyses show they partially reflect the intended sound. When "pig" is mispronounced as "big," the resulting /b/ sound differs from correct productions of "big," moving towards intended "pig"-revealing the role of graded sound representations in speech production. Investigating the origins of such phenomena requires detailed estimation of speech sound distributions; this has been hampered by reliance on subjective, labor-intensive manual annotation. Computational methods can address these issues by providing for objective, automatic measurements. We develop a novel high-precision computational approach, based on a set of machine learning algorithms, for measurement of elicited speech. The algorithms are trained on existing manually labeled data to detect and locate linguistically relevant acoustic properties with high accuracy. Our approach is robust, is designed to handle mis-productions, and overall matches the performance of expert coders. It allows us to analyze a very large dataset of speech errors (containing far more errors than the total in the existing literature), illuminating properties of speech sound distributions previously impossible to reliably observe. We argue that this provides novel evidence that two sources both contribute to deviations in speech errors: planning processes specifying the targets of articulation and articulatory processes specifying the motor movements that execute this plan. These findings illustrate how a much richer picture of speech provides an opportunity to gain novel insights into language processing. Copyright © 2016 Elsevier B.V. All rights reserved.
Connected word recognition using a cascaded neuro-computational model
NASA Astrophysics Data System (ADS)
Hoya, Tetsuya; van Leeuwen, Cees
2016-10-01
We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.
49 CFR 382.603 - Training for supervisors.
Code of Federal Regulations, 2010 CFR
2010-10-01
... minutes of training on controlled substances use. The training will be used by the supervisors to determine whether reasonable suspicion exists to require a driver to undergo testing under § 382.307. The training shall include the physical, behavioral, speech, and performance indicators of probable alcohol...
Sadakata, Makiko; McQueen, James M
2013-08-01
This study reports effects of a high-variability training procedure on nonnative learning of a Japanese geminate-singleton fricative contrast. Thirty native speakers of Dutch took part in a 5-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. Participants were trained with either many repetitions of a limited set of words recorded by a single speaker (low-variability training) or with fewer repetitions of a more variable set of words recorded by multiple speakers (high-variability training). Both types of training enhanced identification of speech but not of nonspeech materials, indicating that learning was domain specific. High-variability training led to superior performance in identification but not in discrimination tests, and supported better generalization of learning as shown by transfer from the trained fricatives to the identification of untrained stops and affricates. Variability thus helps nonnative listeners to form abstract categories rather than to enhance early acoustic analysis.
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.
Gebru, Israel D; Ba, Sileye; Li, Xiaofei; Horaud, Radu
2018-05-01
Speaker diarization consists of assigning speech signals to people engaged in a dialogue. An audio-visual spatiotemporal diarization model is proposed. The model is well suited for challenging scenarios that consist of several participants engaged in multi-party interaction while they move around and turn their heads towards the other participants rather than facing the cameras and the microphones. Multiple-person visual tracking is combined with multiple speech-source localization in order to tackle the speech-to-person association problem. The latter is solved within a novel audio-visual fusion method on the following grounds: binaural spectral features are first extracted from a microphone pair, then a supervised audio-visual alignment technique maps these features onto an image, and finally a semi-supervised clustering method assigns binaural spectral features to visible persons. The main advantage of this method over previous work is that it processes in a principled way speech signals uttered simultaneously by multiple persons. The diarization itself is cast into a latent-variable temporal graphical model that infers speaker identities and speech turns, based on the output of an audio-visual association process, executed at each time slice, and on the dynamics of the diarization variable itself. The proposed formulation yields an efficient exact inference procedure. A novel dataset, that contains audio-visual training data as well as a number of scenarios involving several participants engaged in formal and informal dialogue, is introduced. The proposed method is thoroughly tested and benchmarked with respect to several state-of-the art diarization algorithms.
The Asia Pacific Rebalance: Tipping the Scale with Landpower
2013-04-01
Defense.gov Speech: Shangri - La Security Dialogue, As Delivered by Secretary of Defense Leon E. Panetta, Shangri - La Hotel , Singapore, June 02, 2012, linked... Shangri - La Hotel , Singapore, June 02, 2012, linked from the U.S. Department of Defense web site at: http://www.defense.gov/speeches/speech.aspx... Shangri - La Hotel , Singapore, June 02, 2012. 22 Training and Doctrine Command, “Operational Environments to 2028: The Strategic Environment for
Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav
2018-04-01
A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).
Speech Enhancement Using Gaussian Scale Mixture Models
Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.
2011-01-01
This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. PMID:21359139
Lockart, Rebekah; McLeod, Sharynne
2013-08-01
To investigate speech-language pathology students' ability to identify errors and transcribe typical and atypical speech in Cantonese, a nonnative language. Thirty-three English-speaking speech-language pathology students completed 3 tasks in an experimental within-subjects design. Task 1 (baseline) involved transcribing English words. In Task 2, students transcribed 25 words spoken by a Cantonese adult. An average of 59.1% consonants was transcribed correctly (72.9% when Cantonese-English transfer patterns were allowed). There was higher accuracy on shared English and Cantonese syllable-initial consonants /m,n,f,s,h,j,w,l/ and syllable-final consonants. In Task 3, students identified consonant errors and transcribed 100 words spoken by Cantonese-speaking children under 4 additive conditions: (1) baseline, (2) +adult model, (3) +information about Cantonese phonology, and (4) all variables (2 and 3 were counterbalanced). There was a significant improvement in the students' identification and transcription scores for conditions 2, 3, and 4, with a moderate effect size. Increased skill was not based on listeners' proficiency in speaking another language, perceived transcription skill, musicality, or confidence with multilingual clients. Speech-language pathology students, with no exposure to or specific training in Cantonese, have some skills to identify errors and transcribe Cantonese. Provision of a Cantonese-adult model and information about Cantonese phonology increased students' accuracy in transcribing Cantonese speech.
Relationships between music training, speech processing, and word learning: a network perspective.
Elmer, Stefan; Jäncke, Lutz
2018-03-15
Numerous studies have documented the behavioral advantages conferred on professional musicians and children undergoing music training in processing speech sounds varying in the spectral and temporal dimensions. These beneficial effects have previously often been associated with local functional and structural changes in the auditory cortex (AC). However, this perspective is oversimplified, in that it does not take into account the intrinsic organization of the human brain, namely, neural networks and oscillatory dynamics. Therefore, we propose a new framework for extending these previous findings to a network perspective by integrating multimodal imaging, electrophysiology, and neural oscillations. In particular, we provide concrete examples of how functional and structural connectivity can be used to model simple neural circuits exerting a modulatory influence on AC activity. In addition, we describe how such a network approach can be used for better comprehending the beneficial effects of music training on more complex speech functions, such as word learning. © 2018 New York Academy of Sciences.
Sleep duration predicts behavioral and neural differences in adult speech sound learning.
Earle, F Sayako; Landi, Nicole; Myers, Emily B
2017-01-01
Sleep is important for memory consolidation and contributes to the formation of new perceptual categories. This study examined sleep as a source of variability in typical learners' ability to form new speech sound categories. We trained monolingual English speakers to identify a set of non-native speech sounds at 8PM, and assessed their ability to identify and discriminate between these sounds immediately after training, and at 8AM on the following day. We tracked sleep duration overnight, and found that light sleep duration predicted gains in identification performance, while total sleep duration predicted gains in discrimination ability. Participants obtained an average of less than 6h of sleep, pointing to the degree of sleep deprivation as a potential factor. Behavioral measures were associated with ERP indexes of neural sensitivity to the learned contrast. These results demonstrate that the relative success in forming new perceptual categories depends on the duration of post-training sleep. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Sleep duration predicts behavioral and neural differences in adult speech sound learning
Earle, F. Sayako; Landi, Nicole; Myers, Emily B.
2016-01-01
Sleep is important for memory consolidation and contributes to the formation of new perceptual categories. This study examined sleep as a source of variability in typical learners’ ability to form new speech sound categories. We trained monolingual English speakers to identify a set of non-native speech sounds at 8PM, and assessed their ability to identify and discriminate between these sounds immediately after training, and at 8AM on the following day. We tracked sleep duration overnight, and found that light sleep duration predicted gains in identification performance, while total sleep duration predicted gains in discrimination ability. Participants obtained an average of less than 6 hours of sleep, pointing to the degree of sleep deprivation as a potential factor. Behavioral measures were associated with ERP indexes of neural sensitivity to the learned contrast. These results demonstrate that the relative success in forming new perceptual categories depends on the duration of post-training sleep. PMID:27793703
Lim, Sung-joo; Holt, Lori L
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: Increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information, and players' responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5h across 5 days exhibited improvements in /r/-/l/ perception on par with 2-4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. Copyright © 2011 Cognitive Science Society, Inc.
Lim, Sung-joo; Holt, Lori L.
2011-01-01
Although speech categories are defined by multiple acoustic dimensions, some are perceptually-weighted more than others and there are residual effects of native-language weightings in non-native speech perception. Recent research on nonlinguistic sound category learning suggests that the distribution characteristics of experienced sounds influence perceptual cue weights: increasing variability across a dimension leads listeners to rely upon it less in subsequent category learning (Holt & Lotto, 2006). The present experiment investigated the implications of this among native Japanese learning English /r/-/l/ categories. Training was accomplished using a videogame paradigm that emphasizes associations among sound categories, visual information and players’ responses to videogame characters rather than overt categorization or explicit feedback. Subjects who played the game for 2.5 hours across 5 days exhibited improvements in /r/-/l/ perception on par with 2–4 weeks of explicit categorization training in previous research and exhibited a shift toward more native-like perceptual cue weights. PMID:21827533
Gagnon, Bernadine; Miozzo, Michele
2017-01-01
Purpose This study aimed to test whether an approach to distinguishing errors arising in phonological processing from those arising in motor planning also predicts the extent to which repetition-based training can lead to improved production of difficult sound sequences. Method Four individuals with acquired speech production impairment who produced consonant cluster errors involving deletion were examined using a repetition task. We compared the acoustic details of productions with deletion errors in target consonant clusters to singleton consonants. Changes in accuracy over the course of the study were also compared. Results Two individuals produced deletion errors consistent with a phonological locus of the errors, and 2 individuals produced errors consistent with a motoric locus of the errors. The 2 individuals who made phonologically driven errors showed no change in performance on a repetition training task, whereas the 2 individuals with motoric errors improved in their production of both trained and untrained items. Conclusions The results extend previous findings about a metric for identifying the source of sound production errors in individuals with both apraxia of speech and aphasia. In particular, this work may provide a tool for identifying predominant error types in individuals with complex deficits. PMID:28655044
Collaboration between human and nonhuman players in Night Vision Tactical Trainer-Shadow
NASA Astrophysics Data System (ADS)
Berglie, Stephen T.; Gallogly, James J.
2016-05-01
The Night Vision Tactical Trainer - Shadow (NVTT-S) is a U.S. Army-developed training tool designed to improve critical Manned-Unmanned Teaming (MUMT) communication skills for payload operators in Unmanned Aerial Sensor (UAS) crews. The trainer is composed of several Government Off-The-Shelf (GOTS) simulation components and takes the trainee through a series of escalating engagements using tactically relevant, realistically complex, scenarios involving a variety of manned, unmanned, aerial, and ground-based assets. The trainee is the only human player in the game and he must collaborate, from his web-based mock operating station, with various non-human players via spoken natural language over simulated radio in order to execute the training missions successfully. Non-human players are modeled in two complementary layers - OneSAF provides basic background behaviors for entities while NVTT provides higher level models that control entity actions based on intent extracted from the trainee's spoken natural dialog with game entities. Dialog structure is modeled based on Army standards for communication and verbal protocols. This paper presents an architecture that integrates the U.S. Army's Night Vision Image Generator (NVIG), One Semi- Automated Forces (OneSAF), a flight dynamics model, as well as Commercial Off The Shelf (COTS) speech recognition and text to speech products to effect an environment with sufficient entity counts and fidelity to enable meaningful teaching and reinforcement of critical communication skills. It further demonstrates the model dynamics and synchronization mechanisms employed to execute purpose-built training scenarios, and to achieve ad-hoc collaboration on-the-fly between human and non-human players in the simulated environment.
Konig, Alexandra; Satt, Aharon; Sorin, Alex; Hoory, Ran; Derreumaux, Alexandre; David, Renaud; Robert, Phillippe H
2018-01-01
Various types of dementia and Mild Cognitive Impairment (MCI) are manifested as irregularities in human speech and language, which have proven to be strong predictors for the disease presence and progress ion. Therefore, automatic speech analytics provided by a mobile application may be a useful tool in providing additional indicators for assessment and detection of early stage dementia and MCI. 165 participants (subjects with subjective cognitive impairment (SCI), MCI patients, Alzheimer's disease (AD) and mixed dementia (MD) patients) were recorded with a mobile application while performing several short vocal cognitive tasks during a regular consultation. These tasks included verbal fluency, picture description, counting down and a free speech task. The voice recordings were processed in two steps: in the first step, vocal markers were extracted using speech signal processing techniques; in the second, the vocal markers were tested to assess their 'power' to distinguish between SCI, MCI, AD and MD. The second step included training automatic classifiers for detecting MCI and AD, based on machine learning methods, and testing the detection accuracy. The fluency and free speech tasks obtain the highest accuracy rates of classifying AD vs. MD vs. MCI vs. SCI. Using the data, we demonstrated classification accuracy as follows: SCI vs. AD = 92% accuracy; SCI vs. MD = 92% accuracy; SCI vs. MCI = 86% accuracy and MCI vs. AD = 86%. Our results indicate the potential value of vocal analytics and the use of a mobile application for accurate automatic differentiation between SCI, MCI and AD. This tool can provide the clinician with meaningful information for assessment and monitoring of people with MCI and AD based on a non-invasive, simple and low-cost method. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.
2011-01-01
Previous research has shown that familiarity with a talker’s voice can improve linguistic processing (herein, “Familiar Talker Advantage”), but this benefit is constrained by the context in which the talker’s voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers’ voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers’ voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. PMID:22225059
Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T
2001-09-01
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Randomized clinical trial: the use of SpeechEasy® in stuttering treatment.
Ritto, Ana Paula; Juste, Fabiola Staróbole; Stuart, Andrew; Kalinowski, Joseph; de Andrade, Claudia Regina Furquim
2016-11-01
Numerous studies have demonstrated the benefit of devices delivering altered auditory feedback (AAF) as a therapeutic alternative for those who stutter. The effectiveness of a device delivering AAF (SpeechEasy®) was compared with behavioural techniques in the treatment of stuttering in a randomized clinical trial. Two groups of adults who stutter participated: group 1 consisted of 10 men and one woman aged 21-42 years (mean = 30.0). Group 2 consisted of six men and one woman aged 20-50 years (mean = 35.6). Participants in group 1 were fit with a SpeechEasy® device and were not given any additional training (i.e., supplementary fluency enhancing techniques). Participants used the device daily for 6 months. Participants in group 2 received treatment in the form of a 12-week fluency promotion protocol with techniques based on both fluency shaping and stuttering modification. There were no statistically significant differences (p > .05) between groups in participants' stuttered syllables following treatment. That is, both therapeutic protocols achieved approximately 40% reduction in number of stuttered syllables from baseline measures, with no significant relapse after 3 or 6 months post-treatment. The results suggest that the SpeechEasy® device can be a viable option for the treatment of stuttering. © 2016 Royal College of Speech and Language Therapists.
Two Stage Data Augmentation for Low Resourced Speech Recognition (Author’s Manuscript)
2016-09-12
speech recognition, deep neural networks, data augmentation 1. Introduction When training data is limited—whether it be audio or text—the obvious...Schwartz, and S. Tsakalidis, “Enhancing low resource keyword spotting with au- tomatically retrieved web documents,” in Interspeech, 2015, pp. 839–843. [2...and F. Seide, “Feature learning in deep neural networks - a study on speech recognition tasks,” in International Conference on Learning Representations
[Early auditory training of children with auditory deficiencies].
Herman, N
1988-01-01
The author insists on the importance of an early diagnosis and hearing training of the young deaf child. She shows some aspects of the new possibilities of technology in the approach of the deaf child by hearing- and speech training.
Akanuma, Kyoko; Meguro, Kenichi; Satoh, Masayuki; Tashiro, Manabu; Itoh, Masatoshi
2016-01-01
Clinically, we know that some aphasic patients can sing well despite their speech disturbances. Herein, we report 10 patients with non-fluent aphasia, of which half of the patients improved their speech function after singing training. We studied ten patients with non-fluent aphasia complaining of difficulty finding words. All had lesions in the left basal ganglia or temporal lobe. They selected the melodies they knew well, but which they could not sing. We made a new lyric with a familiar melody using words they could not name. The singing training using these new lyrics was performed for 30 minutes once a week for 10 weeks. Before and after the training, their speech functions were assessed by language tests. At baseline, 6 of them received positron emission tomography to evaluate glucose metabolism. Five patients exhibited improvements after intervention; all but one exhibited intact right basal ganglia and left temporal lobes, but all exhibited left basal ganglia lesions. Among them, three subjects exhibited preserved glucose metabolism in the right temporal lobe. We considered that patients who exhibit intact right basal ganglia and left temporal lobes, together with preserved right hemispheric glucose metabolism, might be an indication of the effectiveness of singing therapy.
Putter-Katz, Hanna; Adi-Bensaid, Limor; Feldman, Irit; Hildesheimer, Minka
2008-01-01
Twenty children with central auditory processing disorders [(C)APD] were subjected to a structured intervention program of listening skills in quiet and in noise. Their performance was compared to that of a control group of 10 children with (C)APD with no special treatment. Pretests were conducted in quiet and in degraded listening conditions (speech noise and competing speech). The (C)APD management approach was integrative and included top-down and bottom-up strategies. It focused on environmental modifications, remediation techniques, and compensatory strategies. Training was conducted with monosyllabic and polysyllabic words, sentences and phrases in quiet and in noise. Comparisons of pre- and post-management measures indicated increase in speech recognition performance in background noise and competing speech for the treatment group. This improvement was exhibited for both ears. A significant difference between ears was found with the left ear showing improvement in both the short and the long versions of competing sentence tests and the right ear performing better in the long competing sentences only following intervention. No changes were documented for the control group. These findings add to a growing body of literature suggesting that interactive auditory training can improve listening skills.
Miles, Anna; Friary, Philippa; Jackson, Bianca; Sekula, Julia; Braakhuis, Andrea
2016-06-01
This study evaluated hospital readiness and interprofessional clinical reasoning in speech-language pathology and dietetics students following a simulation-based teaching package. Thirty-one students participated in two half-day simulation workshops. The training included orientation to the hospital setting, part-task skill learning and immersive simulated cases. Students completed workshop evaluation forms. They filled in a 10-question survey regarding confidence, knowledge and preparedness for working in a hospital environment before and immediately after the workshops. Students completed written 15-min clinical vignettes at 1 month prior to training, immediately prior to training and immediately after training. A marking rubric was devised to evaluate the responses to the clinical vignettes within a framework of interprofessional education. The simulation workshops were well received by all students. There was a significant increase in students' self-ratings of confidence, preparedness and knowledge following the study day (p < .001). There was a significant increase in student overall scores in clinical vignettes after training with the greatest increase in clinical reasoning (p < .001). Interprofessional simulation-based training has benefits in developing hospital readiness and clinical reasoning in allied health students.
Consolidation and transfer of learning after observing hand gesture.
Cook, Susan Wagner; Duffy, Ryan G; Fenn, Kimberly M
2013-01-01
Children who observe gesture while learning mathematics perform better than children who do not, when tested immediately after training. How does observing gesture influence learning over time? Children (n = 184, ages = 7-10) were instructed with a videotaped lesson on mathematical equivalence and tested immediately after training and 24 hr later. The lesson either included speech and gesture or only speech. Children who saw gesture performed better overall and performance improved after 24 hr. Children who only heard speech did not improve after the delay. The gesture group also showed stronger transfer to different problem types. These findings suggest that gesture enhances learning of abstract concepts and affects how learning is consolidated over time. © 2013 The Authors. Child Development © 2013 Society for Research in Child Development, Inc.
ERIC Educational Resources Information Center
Lennox, Maria; Garvis, Susanne; Westerveld, Marleen
2017-01-01
This paper explores teachers' and teacher assistants' self-efficacy of delivering PrepSTART, a classroom based, oral language and early literacy program for five-year-old students. In the current study, speech pathologists developed, provided training and monitored program implementation. Teachers and teacher assistants (n = 17) shared their…
ERIC Educational Resources Information Center
Brebner, Chris; Attrill, Stacie; Marsh, Claire; Coles, Lilienne
2017-01-01
Professional development can provide opportunities to develop new skills and knowledge, and to apply them to practice in a sustainable way. However, delivery of professional development needs to consider the philosophies and pedagogies of training recipients, and activities should be tailored to meet their needs. This article reports on an…
ERIC Educational Resources Information Center
Campbell-Thrane, Lucille
An overview of cooperation between CETA (Comprehensive Employment and Training Act) and vocational education is presented in this speech, including a look at data on legislation, history, and funding sources. In light of CETA legislation's specificity on how local sponsors are to work with vocational educators, the speech gives excerpts and…
A behavior analytic analogue of learning to use synonyms, syntax, and parts of speech.
Chase, Philip N; Ellenwood, David W; Madden, Gregory
2008-01-01
Matching-to-sample and sequence training procedures were used to develop responding to stimulus classes that were considered analogous to 3 aspects of verbal behavior: identifying synonyms and parts of speech, and using syntax. Matching-to-sample procedures were used to train 12 paired associates from among 24 stimuli. These pairs were analogous to synonyms. Then, sequence characteristics were trained to 6 of the stimuli. The result was the formation of 3 classes of 4 stimuli, with the classes controlling a sequence response analogous to a simple ordering syntax: first, second, and third. Matching-to-sample procedures were then used to add 4 stimuli to each class. These stimuli, without explicit sequence training, also began to control the same sequence responding as the other members of their class. Thus, three 8-member functionally equivalent sequence classes were formed. These classes were considered to be analogous to parts of speech. Further testing revealed three 8-member equivalence classes and 512 different sequences of first, second, and third. The study indicated that behavior analytic procedures may be used to produce some generative aspects of verbal behavior related to simple syntax and semantics.
Automated Speech Rate Measurement in Dysarthria.
Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc
2015-06-01
In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. The new algorithm was trained and tested using Dutch speech samples of 36 speakers with no history of speech impairment and 40 speakers with mild to moderate dysarthria. We tested the algorithm under various conditions: according to speech task type (sentence reading, passage reading, and storytelling) and algorithm optimization method (speaker group optimization and individual speaker optimization). Correlations between automated and human SR determination were calculated for each condition. High correlations between automated and human SR determination were found in the various testing conditions. The new algorithm measures SR in a sufficiently reliable manner. It is currently being integrated in a clinical software tool for assessing and managing prosody in dysarthric speech. Further research is needed to fine-tune the algorithm to severely dysarthric speech, to make the algorithm less sensitive to background noise, and to evaluate how the algorithm deals with syllabic consonants.
Francis, Alexander L; Driscoll, Courtney
2006-09-01
We examined the effect of perceptual training on a well-established hemispheric asymmetry in speech processing. Eighteen listeners were trained to use a within-category difference in voice onset time (VOT) to cue talker identity. Successful learners (n=8) showed faster response times for stimuli presented only to the left ear than for those presented only to the right. The development of a left-ear/right-hemisphere advantage for processing a prototypically phonetic cue supports a model of speech perception in which lateralization is driven by functional demands (talker identification vs. phonetic categorization) rather than by acoustic stimulus properties alone.
DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1
NASA Astrophysics Data System (ADS)
Garofolo, J. S.; Lamel, L. F.; Fisher, W. M.; Fiscus, J. G.; Pallett, D. S.
1993-02-01
The Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus of read speech has been designed to provide speech data for the acquisition of acoustic-phonetic knowledge and for the development and evaluation of automatic speech recognition systems. TIMIT contains speech from 630 speakers representing 8 major dialect divisions of American English, each speaking 10 phonetically-rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic, and word transcriptions, as well as speech waveform data for each spoken sentence. The release of TIMIT contains several improvements over the Prototype CD-ROM released in December, 1988: (1) full 630-speaker corpus, (2) checked and corrected transcriptions, (3) word-alignment transcriptions, (4) NIST SPHERE-headered waveform files and header manipulation software, (5) phonemic dictionary, (6) new test and training subsets balanced for dialectal and phonetic coverage, and (7) more extensive documentation.
Speech-Language Pathologists' Opinions on Communication Disorders and Violence
ERIC Educational Resources Information Center
Sanger, Dixie; Moore-Brown, Barbara J.; Montgomery, Judy; Hellerich, Susan
2004-01-01
Purpose: This study investigated the opinions of speech-language pathologists (SLPs) regarding their role, education, and training in serving students with communication disorders who have been involved in violence. Method: A survey consisting of 26 items was given to 598 SLPs from eight states representing geographic regions of the United…
Effective Vocal Production in Performance.
ERIC Educational Resources Information Center
King, Robert G.
If speech instructors are to teach students to recreate for an audience an author's intellectual and emotional meanings, they must teach them to use human voice effectively. Seven essential elements of effective vocal production that often pose problems for oral interpretation students should be central to any speech training program: (1)…
Some Generalization and Follow-Up Measures on Autistic Children in Behavior Therapy.
ERIC Educational Resources Information Center
Lovaas, O. Ivar; And Others
Reported was a behavior therapy program emphasizing language training for 20 autistic children who variously exhibited apparent sensory deficit, severe affect isolation, self stimulatory behavior, mutism, echolalic speech, absence of receptive speech and social and self help behaviors, and self destructive tendencies. The treatment emphasized…
Dickinson, Ann-Marie; Baker, Richard; Siciliano, Catherine; Munro, Kevin J
2014-10-01
To identify which training approach, if any, is most effective for improving perception of frequency-compressed speech. A between-subject design using repeated measures. Forty young adults with normal hearing were randomly allocated to one of four groups: a training group (sentence or consonant) or a control group (passive exposure or test-only). Test and training material differed in terms of material and speaker. On average, sentence training and passive exposure led to significantly improved sentence recognition (11.0% and 11.7%, respectively) compared with the consonant training group (2.5%) and test-only group (0.4%), whilst, consonant training led to significantly improved consonant recognition (8.8%) compared with the sentence training group (1.9%), passive exposure group (2.8%), and test-only group (0.8%). Sentence training led to improved sentence recognition, whilst consonant training led to improved consonant recognition. This suggests learning transferred between speakers and material but not stimuli. Passive exposure to sentence material led to an improvement in sentence recognition that was equivalent to gains from active training. This suggests that it may be possible to adapt passively to frequency-compressed speech.
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor
NASA Astrophysics Data System (ADS)
Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro
2006-12-01
We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Intensive treatment of speech disorders in robin sequence: a case report.
Pinto, Maria Daniela Borro; Pegoraro-Krook, Maria Inês; Andrade, Laura Katarine Félix de; Correa, Ana Paula Carvalho; Rosa-Lugo, Linda Iris; Dutka, Jeniffer de Cássia Rillo
2017-10-23
To describe the speech of a patient with Pierre Robin Sequence (PRS) and severe speech disorders before and after participating in an Intensive Speech Therapy Program (ISTP). The ISTP consisted of two daily sessions of therapy over a 36-week period, resulting in a total of 360 therapy sessions. The sessions included the phases of establishment, generalization, and maintenance. A combination of strategies, such as modified contrast therapy and speech sound perception training, were used to elicit adequate place of articulation. The ISTP addressed correction of place of production of oral consonants and maximization of movement of the pharyngeal walls with a speech bulb reduction program. Therapy targets were addressed at the phonetic level with a gradual increase in the complexity of the productions hierarchically (e.g., syllables, words, phrases, conversation) while simultaneously addressing the velopharyngeal hypodynamism with speech bulb reductions. Re-evaluation after the ISTP revealed normal speech resonance and articulation with the speech bulb. Nasoendoscopic assessment indicated consistent velopharyngeal closure for all oral sounds with the speech bulb in place. Intensive speech therapy, combined with the use of the speech bulb, yielded positive outcomes in the rehabilitation of a clinical case with severe speech disorders associated with velopharyngeal dysfunction in Pierre Robin Sequence.
A novel method of language modeling for automatic captioning in TC video teleconferencing.
Zhang, Xiaojia; Zhao, Yunxin; Schopp, Laura
2007-05-01
We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors' speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy.
Howlin, Patricia; Gordon, R Kate; Pasco, Greg; Wade, Angie; Charman, Tony
2007-05-01
To assess the effectiveness of expert training and consultancy for teachers of children with autism spectrum disorder in the use of the Picture Exchange Communication System (PECS). Group randomised, controlled trial (3 groups: immediate treatment, delayed treatment, no treatment). 84 elementary school children, mean age 6.8 years. A 2-day PECS workshop for teachers plus 6 half-day, school-based training sessions with expert consultants over 5 months. Rates of: communicative initiations, use of PECS, and speech in the classroom; Autism Diagnostic Observation Schedule-Generic (ADOS-G) domain scores for Communication and Reciprocal Social Interaction; scores on formal language tests. Controlling for baseline age, developmental quotient (DQ) and language; rates of initiations and PECS usage increased significantly immediately post-treatment (Odds Ratio (OR) of being in a higher ordinal rate category 2.72, 95% confidence interval 1.22-6.09, p < .05 and OR 3.90 (95%CI 1.75-8.68), p < .001, respectively). There were no increases in frequency of speech, or improvements in ADOS-G ratings or language test scores. The results indicate modest effectiveness of PECS teacher training/consultancy. Rates of pupils' initiations and use of symbols in the classroom increased, although there was no evidence of improvement in other areas of communication. TREATMENT effects were not maintained once active intervention ceased.
Loss tolerant speech decoder for telecommunications
NASA Technical Reports Server (NTRS)
Prieto, Jr., Jaime L. (Inventor)
1999-01-01
A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.
Estimating psycho-physiological state of a human by speech analysis
NASA Astrophysics Data System (ADS)
Ronzhin, A. L.
2005-05-01
Adverse effects of intoxication, fatigue and boredom could degrade performance of highly trained operators of complex technical systems with potentially catastrophic consequences. Existing physiological fitness for duty tests are time consuming, costly, invasive, and highly unpopular. Known non-physiological tests constitute a secondary task and interfere with the busy workload of the tested operator. Various attempts to assess the current status of the operator by processing of "normal operational data" often lead to excessive amount of computations, poorly justified metrics, and ambiguity of results. At the same time, speech analysis presents a natural, non-invasive approach based upon well-established efficient data processing. In addition, it supports both behavioral and physiological biometric. This paper presents an approach facilitating robust speech analysis/understanding process in spite of natural speech variability and background noise. Automatic speech recognition is suggested as a technique for the detection of changes in the psycho-physiological state of a human that typically manifest themselves by changes of characteristics of voice tract and semantic-syntactic connectivity of conversation. Preliminary tests have confirmed that the statistically significant correlation between the error rate of automatic speech recognition and the extent of alcohol intoxication does exist. In addition, the obtained data allowed exploring some interesting correlations and establishing some quantitative models. It is proposed to utilize this approach as a part of fitness for duty test and compare its efficiency with analyses of iris, face geometry, thermography and other popular non-invasive biometric techniques.
Soto, Gloria; Clarke, Michael T
2017-07-12
This study was conducted to evaluate the effects of a conversation-based intervention on the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Eight children aged from 8 to 13 years participated in the study. After a baseline period, a conversation-based intervention was provided for each participant, in which they were supported to learn and use linguistic structures essential for the formation of clauses and the grammaticalization of their utterances, such as pronouns, verbs, and bound morphemes, in the context of personally meaningful and scaffolded conversations with trained clinicians. The conversations were videotaped, transcribed, and analyzed using the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1991). Results indicate that participants showed improvements in their use of spontaneous clauses, and a greater use of pronouns, verbs, and bound morphemes. These improvements were sustained and generalized to conversations with familiar partners. The results demonstrate the positive effects of the conversation-based intervention for improving the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Clinical and theoretical implications of conversation-based interventions are discussed and future research needs are identified. https://doi.org/10.23641/asha.5150113.
Clarke, Michael T.
2017-01-01
Purpose This study was conducted to evaluate the effects of a conversation-based intervention on the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Method Eight children aged from 8 to 13 years participated in the study. After a baseline period, a conversation-based intervention was provided for each participant, in which they were supported to learn and use linguistic structures essential for the formation of clauses and the grammaticalization of their utterances, such as pronouns, verbs, and bound morphemes, in the context of personally meaningful and scaffolded conversations with trained clinicians. The conversations were videotaped, transcribed, and analyzed using the Systematic Analysis of Language Transcripts (SALT; Miller & Chapman, 1991). Results Results indicate that participants showed improvements in their use of spontaneous clauses, and a greater use of pronouns, verbs, and bound morphemes. These improvements were sustained and generalized to conversations with familiar partners. Conclusion The results demonstrate the positive effects of the conversation-based intervention for improving the expressive vocabulary and grammatical skills of children with severe motor speech disorders and expressive language delay who use augmentative and alternative communication. Clinical and theoretical implications of conversation-based interventions are discussed and future research needs are identified. Supplemental Materials https://doi.org/10.23641/asha.5150113 PMID:28672283
Guo, Ruiling; Bain, Barbara A.; Willer, Janene
2008-01-01
Objectives: The research assesses the information needs of speech-language pathologists (SLPs) and audiologists in Idaho and identifies specific needs for training in evidence-based practice (EBP) principles and searching EBP resources. Methods: A survey was developed to assess knowledge and skills in accessing information. Questionnaires were distributed to 217 members of the Idaho Speech-Language-Hearing Association, who were given multiple options to return the assessment survey (web, email, mail). Data were analyzed descriptively and statistically. Results: The total response rate was 38.7% (84/217). Of the respondents, 87.0% (73/84) indicated insufficient knowledge and skills to search PubMed. Further, 47.6% (40/84) indicated limited knowledge of EBP. Of professionals responding, 52.4% (44/84) reported interest in learning more about EBP and 47.6% (40/84) reported interest in learning to search PubMed. SLPs and audiologists who graduated within the last 10 years were more likely to respond online, while those graduating prior to that time preferred to respond via hard copy. Discussions/Conclusion: More effort should be made to ensure that SLPs and audiologists develop skills in locating information to support their practice. Results from this information needs assessment were used to design a training and outreach program on EBP and EBP database searching for SLPs and audiologists in Idaho. PMID:18379669
NASA Astrophysics Data System (ADS)
Gjaja, Marin N.
1997-11-01
Neural networks for supervised and unsupervised learning are developed and applied to problems in remote sensing, continuous map learning, and speech perception. Adaptive Resonance Theory (ART) models are real-time neural networks for category learning, pattern recognition, and prediction. Unsupervised fuzzy ART networks synthesize fuzzy logic and neural networks, and supervised ARTMAP networks incorporate ART modules for prediction and classification. New ART and ARTMAP methods resulting from analyses of data structure, parameter specification, and category selection are developed. Architectural modifications providing flexibility for a variety of applications are also introduced and explored. A new methodology for automatic mapping from Landsat Thematic Mapper (TM) and terrain data, based on fuzzy ARTMAP, is developed. System capabilities are tested on a challenging remote sensing problem, prediction of vegetation classes in the Cleveland National Forest from spectral and terrain features. After training at the pixel level, performance is tested at the stand level, using sites not seen during training. Results are compared to those of maximum likelihood classifiers, back propagation neural networks, and K-nearest neighbor algorithms. Best performance is obtained using a hybrid system based on a convex combination of fuzzy ARTMAP and maximum likelihood predictions. This work forms the foundation for additional studies exploring fuzzy ARTMAP's capability to estimate class mixture composition for non-homogeneous sites. Exploratory simulations apply ARTMAP to the problem of learning continuous multidimensional mappings. A novel system architecture retains basic ARTMAP properties of incremental and fast learning in an on-line setting while adding components to solve this class of problems. The perceptual magnet effect is a language-specific phenomenon arising early in infant speech development that is characterized by a warping of speech sound perception. An unsupervised neural network model is proposed that embodies two principal hypotheses supported by experimental data--that sensory experience guides language-specific development of an auditory neural map and that a population vector can predict psychological phenomena based on map cell activities. Model simulations show how a nonuniform distribution of map cell firing preferences can develop from language-specific input and give rise to the magnet effect.
Lina, Xu; Feng, Li; Yanyun, Zhang; Nan, Gao; Mingfang, Hu
2016-12-01
To explore the phonological characteristics and rehabilitation training of abnormal velar in patients with functional articulation disorders (FAD). Eighty-seven patients with FAD were observed of the phonological characteristics of velar. Seventy-two patients with abnormal velar accepted speech training. The correlation and simple linear regression analysis were carried out on abnormal velar articulation and age. The articulation disorder of /g/ mainly showed replacement by /d/, /b/ or omission. /k/ mainly showed replacement by /d/, /t/, /g/, /p/, /b/. /h/ mainly showed replacement by /g/, /f/, /p/, /b/ or omission. The common erroneous articulation forms of /g/, /k/, /h/ were fronting of tongue and replacement by bilabial consonants. When velar combined with vowels contained /a/ and /e/, the main error was fronting of tongue. When velar combined with vowels contained /u/, the errors trended to be replacement by bilabial consonants. After 3 to 10 times of speech training, the number of erroneous words decreased to (6.24±2.61) from (40.28±6.08) before the speech training was established, the difference was statistically significant (Z=-7.379, P=0.000). The number of erroneous words was negatively correlated with age (r=-0.691, P=0.000). The result of simple linear regression analysis showed that the determination coefficient was 0.472. The articulation disorder of velar mainly shows replacement, varies with the vowels. The targeted rehabilitation training hereby established is significantly effective. Age plays an important role in the outcome of velar.
Impact of human emotions on physiological characteristics
NASA Astrophysics Data System (ADS)
Partila, P.; Voznak, M.; Peterek, T.; Penhaker, M.; Novak, V.; Tovarek, J.; Mehic, Miralem; Vojtech, L.
2014-05-01
Emotional states of humans and their impact on physiological and neurological characteristics are discussed in this paper. This problem is the goal of many teams who have dealt with this topic. Nowadays, it is necessary to increase the accuracy of methods for obtaining information about correlations between emotional state and physiological changes. To be able to record these changes, we focused on two majority emotional states. Studied subjects were psychologically stimulated to neutral - calm and then to the stress state. Electrocardiography, Electroencephalography and blood pressure represented neurological and physiological samples that were collected during patient's stimulated conditions. Speech activity was recording during the patient was reading selected text. Feature extraction was calculated by speech processing operations. Classifier based on Gaussian Mixture Model was trained and tested using Mel-Frequency Cepstral Coefficients extracted from the patient's speech. All measurements were performed in a chamber with electromagnetic compatibility. The article discusses a method for determining the influence of stress emotional state on the human and his physiological and neurological changes.
Speech recognition technology: an outlook for human-to-machine interaction.
Erdel, T; Crooks, S
2000-01-01
Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
Chobert, Julie; François, Clément; Velay, Jean-Luc; Besson, Mireille
2014-04-01
Musical training has been shown to positively influence linguistic abilities. To follow the developmental dynamics of this transfer effect at the preattentive level, we conducted a longitudinal study over 2 school years with nonmusician children randomly assigned to music or to painting training. We recorded the mismatch negativity (MMN), a cortical correlate of preattentive mismatch detection, to syllables that differed in vowel frequency, vowel duration, and voice onset time (VOT), using a test-training-retest procedure and 3 times of testing: before training, after 6 months and after 12 months of training. While no between-group differences were found before training, enhanced preattentive processing of syllabic duration and VOT, as reflected by greater MMN amplitude, but not of frequency, was found after 12 months of training in the music group only. These results demonstrate neuroplasticity in the child brain and suggest that active musical training rather than innate predispositions for music yielded the improvements in musically trained children. These results also highlight the influence of musical training for duration perception in speech and for the development of phonological representations in normally developing children. They support the importance of music-based training programs for children's education and open new remediation strategies for children with language-based learning impairments.
Brinca, Lilia; Batista, Ana Paula; Tavares, Ana Inês; Pinto, Patrícia N; Araújo, Lara
2015-11-01
The main objective of the present study was to investigate if the type of voice stimuli-sustained vowel, oral reading, and connected speech-results in good intrarater and interrater agreement/reliability. A short-term panel study was performed. Voice samples from 30 native European Portuguese speakers were used in the present study. The speech materials used were (1) the sustained vowel /a/, (2) oral reading of the European Portuguese version of "The Story of Arthur the Rat," and (3) connected speech. After an extensive training with textual and auditory anchors, the judges were asked to rate the severity of dysphonic voice stimuli using the phonation dimensions G, R, and B from the GRBAS scale. The voice samples were judged 6 months and 1 year after the training. Intrarater agreement and reliability were generally very good for all the phonation dimensions and voice stimuli. The highest interrater reliability was obtained using the oral reading stimulus, particularly for phonation dimensions grade (G) and breathiness (B). Roughness (R) was the voice quality that was the most difficult to evaluate, leading to interrater unreliability in all voice quality ratings. Extensive training using textual and auditory anchors and the use of anchors during the voice evaluations appear to be good methods for auditory-perceptual evaluation of dysphonic voices. The best results of interrater reliability were obtained when the oral reading stimulus was used. Breathiness appears to be a voice quality that is easier to evaluate than roughness. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Sadakata, Makiko; McQueen, James M.
2014-01-01
Although the high-variability training method can enhance learning of non-native speech categories, this can depend on individuals’ aptitude. The current study asked how general the effects of perceptual aptitude are by testing whether they occur with training materials spoken by native speakers and whether they depend on the nature of the to-be-learned material. Forty-five native Dutch listeners took part in a 5-day training procedure in which they identified bisyllabic Mandarin pseudowords (e.g., asa) pronounced with different lexical tone combinations. The training materials were presented to different groups of listeners at three levels of variability: low (many repetitions of a limited set of words recorded by a single speaker), medium (fewer repetitions of a more variable set of words recorded by three speakers), and high (similar to medium but with five speakers). Overall, variability did not influence learning performance, but this was due to an interaction with individuals’ perceptual aptitude: increasing variability hindered improvements in performance for low-aptitude perceivers while it helped improvements in performance for high-aptitude perceivers. These results show that the previously observed interaction between individuals’ aptitude and effects of degree of variability extends to natural tokens of Mandarin speech. This interaction was not found, however, in a closely matched study in which native Dutch listeners were trained on the Japanese geminate/singleton consonant contrast. This may indicate that the effectiveness of high-variability training depends not only on individuals’ aptitude in speech perception but also on the nature of the categories being acquired. PMID:25505434
Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure
Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas
2015-01-01
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014
Gold, Rinat; Gold, Azgad
2018-02-06
The purpose of this study was to examine the attitudes, feelings, and practice characteristics of speech-language pathologists (SLPs) in Israel regarding the subject of delivering bad news. One hundred and seventy-three Israeli SLPs answered an online survey. Respondents represented SLPs in Israel in all stages of vocational experience, with varying academic degrees, from a variety of employment settings. The survey addressed emotions involved in the process of delivering bad news, training on this subject, and background information of the respondents. Frequency distributions of the responses of the participants were determined, and Pearson correlations were computed to determine the relation between years of occupational experience and the following variables: frequency of delivering bad news, opinions regarding training, and emotions experienced during the process of bad news delivery. Our survey showed that bad news delivery is a task that most participants are confronted with from the very beginning of their careers. Participants regarded training in the subject of delivering bad news as important but, at the same time, reported receiving relatively little training on this subject. In addition, our survey showed that negative emotions are involved in the process of delivering bad news. Training SLPs on specific techniques is required for successfully delivering bad news. The emotional burden associated with breaking bad news in the field of speech-language pathology should be noticed and addressed.
ERIC Educational Resources Information Center
Franchak, Stephen J., Ed.
The document contains the proceedings of a conference on manpower research at the Federal, State, and local levels. Two papers from the opening general session focus on the interrelationship of economic, demographic, and educational changes based on its effects on occupational education. J. C. Pettinger's keynote speech presented issues regarding…
2015-01-01
Table 2: Segregation results in terms of STOI on a variety of novel noises (SNR=-2 dB) Babble-20 Cafeteria Factory Babble-100 Living Room Cafe Park...NOISEX-92 corpus [13], and a living room, a cafe and a park noise from the DEMAND corpus [12]. To put the performance of the noise-independent model in
Cameron, Ashley; Hudson, Kyla; Finch, Emma; Fleming, Jennifer; Lethlean, Jennifer; McPhail, Steven
2018-06-05
Communication partner training (CPT) has been used to support communication partners to interact successfully with people with aphasia (PWA). Through successful CPT interaction PWA's accessibility to healthcare is notably improved. The present study sought to build on prior studies by investigating the experiences of individuals with aphasia and healthcare providers to ascertain what they deemed to be beneficial from CPT and what could be refined or improved, dependent on the setting and skill set of those participating. To gain an understanding of the experiences of PWA involved in the provision of CPT to health professional (HP) students. Also to investigate the experiences of HP students who participated in the CPT programme. Eight PWA and 77 HP students who had completed a CPT programme participated in a focus group/semi-structured interview (PWA) and feedback session (HP students) moderated by two speech-language pathologists (SLPs). These sessions were recorded (audio and video), transcribed verbatim, including non-verbal communication, and analyzed using qualitative content analysis. Overall, the study sought to understand experiences of the training. Both the PWA and HP students reported positive experiences of CPT. PWA discussed their perception that CPT improved HPs and HP students' understanding and interactions conversing with them and emphasized the need for training and education for all health related professions. HP students enjoyed the opportunity to experience interacting with PWA, without being 'assessed' and felt it consolidated their learning based on lecture content. Inclusive and accessible healthcare is paramount to ensure the engagement of patients and providers. Based on the experiences and feedback of the participants in this current study, CPT offers a salient and practical training method with potential to improve practice. Participants perceived CPT to be beneficial and validated the need for the training to support PWA accessing healthcare. © 2018 Royal College of Speech and Language Therapists.
Fast phonetic learning occurs already in 2-to-3-month old infants: an ERP study
Wanrooij, Karin; Boersma, Paul; van Zuijen, Titia L.
2014-01-01
An important mechanism for learning speech sounds in the first year of life is “distributional learning,” i.e., learning by simply listening to the frequency distributions of the speech sounds in the environment. In the lab, fast distributional learning has been reported for infants in the second half of the first year; the present study examined whether it can also be demonstrated at a much younger age, long before the onset of language-specific speech perception (which roughly emerges between 6 and 12 months). To investigate this, Dutch infants aged 2 to 3 months were presented with either a unimodal or a bimodal vowel distribution based on the English /æ/~/ε/ contrast, for only 12 minutes. Subsequently, mismatch responses (MMRs) were measured in an oddball paradigm, where one half of the infants in each group heard a representative [æ] as the standard and a representative [ε] as the deviant, and the other half heard the same reversed. The results (from the combined MMRs during wakefulness and active sleep) disclosed a larger MMR, implying better discrimination of [æ] and [ε], for bimodally than unimodally trained infants, thus extending an effect of distributional training found in previous behavioral research to a much younger age when speech perception is still universal rather than language-specific, and to a new method (using event-related potentials). Moreover, the analysis revealed a robust interaction between the distribution (unimodal vs. bimodal) and the identity of the standard stimulus ([æ] vs. [ε]), which provides evidence for an interplay between a perceptual asymmetry and distributional learning. The outcomes show that distributional learning can affect vowel perception already in the first months of life. PMID:24701203
McKinnon, David H; McLeod, Sharynne; Reilly, Sheena
2007-01-01
The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to schoolchildren with speech disorders. Students with speech disorders were identified from 10,425 students in Australia using a 4-stage process: training in the data collection process, teacher identification, confirmation by a speech-language pathologist, and consultation with district special needs advisors. The prevalence of students with speech disorders was estimated; specifically, 0.33% of students were identified as stuttering, 0.12% as having a voice disorder, and 1.06% as having a speech-sound disorder. There was a higher prevalence of speech disorders in males than in females. As grade level increased, the prevalence of speech disorders decreased. There was no significant difference in the pattern of prevalence across the three speech disorders and four socioeconomic groups; however, students who were identified with a speech disorder were more likely to be in the higher socioeconomic groups. Finally, there was a difference between the perceived and actual level of support that was provided to these students. These prevalence figures are lower than those using initial identification by speech-language pathologists and similar to those using parent report.
Van Dort, Sandra; Coyle, Julia; Wilson, Linda; Ibrahim, Hasherah Mohd
2013-02-01
The lead article by Wylie, McAllister, Davidson, and Marshall (2013) puts forward pertinent issues facing the speech-language pathology profession raised by the World Report on Disability. This paper continues the discussion by reporting on a capacity building action research study on the development, implementation, and evaluation of a new approach to early intervention speech-language pathology through clinical education in Malaysia. This research evaluated a student-led service in community-based rehabilitation that supplemented existing and more typical institution-based services. A Malaysian community-based rehabilitation project was chosen due to its emphasis on increasing the equitability and accessibility of services for people with disabilities which was a catalyst for this research. Also, expanding awareness-building, education, and training activities about communication disability was important. The intention was to provide students with experience of working in such settings, and facilitate their development as advocates for broadening the scope of practice of speech-language pathology services in Malaysia. This article focuses on the findings pertaining to the collaborative process and the learning experiences of the adult participants. Through reflection on the positive achievements, as well as some failures, it aims to provide deeper understanding of the use of such a model.