Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant.
Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa
2016-07-01
The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy used. This study is prospective cross-sectional cohort study of a descriptive character. We selected ten cochlear implant users that were characterized by hearing threshold by the application of speech perception tests and of the Hearing Handicap Inventory for Adults. There was no significant difference when comparing the variables subject age, age at acquisition of hearing loss, etiology, time of hearing deprivation, time of cochlear implant use and mean hearing threshold with the cochlear implant with the shift in speech coding strategy. There was no relationship between lack of handicap perception and improvement in speech perception in both speech coding strategies used. There was no significant difference between the strategies evaluated and no relation was observed between them and the variables studied.
Improving Speech Perception in Noise with Current Focusing in Cochlear Implant Users
Srinivasan, Arthi G.; Padilla, Monica; Shannon, Robert V.; Landsberger, David M.
2013-01-01
Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. PMID:23467170
Improving speech perception in noise with current focusing in cochlear implant users.
Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M
2013-05-01
Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. Copyright © 2013 Elsevier B.V. All rights reserved.
Effect of technological advances on cochlear implant performance in adults.
Lenarz, Minoo; Joseph, Gert; Sönmez, Hasibe; Büchner, Andreas; Lenarz, Thomas
2011-12-01
To evaluate the effect of technological advances in the past 20 years on the hearing performance of a large cohort of adult cochlear implant (CI) patients. Individual, retrospective, cohort study. According to technological developments in electrode design and speech-processing strategies, we defined five virtual intervals on the time scale between 1984 and 2008. A cohort of 1,005 postlingually deafened adults was selected for this study, and their hearing performance with a CI was evaluated retrospectively according to these five technological intervals. The test battery was composed of four standard German speech tests: Freiburger monosyllabic test, speech tracking test, Hochmair-Schulz-Moser (HSM) sentence test in quiet, and HSM sentence test in 10 dB noise. The direct comparison of the speech perception in postlingually deafened adults, who were implanted during different technological periods, reveals an obvious improvement in the speech perception in patients who benefited from the recent electrode designs and speech-processing strategies. The major influence of technological advances on CI performance seems to be on speech perception in noise. Better speech perception in noisy surroundings is strong proof for demonstrating the success rate of new electrode designs and speech-processing strategies. Standard (internationally comparable) speech tests in noise should become an obligatory part of the postoperative test battery for adult CI patients. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Coding strategies for cochlear implants under adverse environments
NASA Astrophysics Data System (ADS)
Tahmina, Qudsia
Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise.
ERIC Educational Resources Information Center
Beatty, Michael J.
1988-01-01
Examines the choice-making processes of students engaged in the selection of speech introduction strategies. Finds that the frequency of students making decision-making errors was a positive function of public speaking apprehension. (MS)
A speech processing study using an acoustic model of a multiple-channel cochlear implant
NASA Astrophysics Data System (ADS)
Xu, Ying
1998-10-01
A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.
A novel speech-processing strategy incorporating tonal information for cochlear implants.
Lan, N; Nie, K B; Gao, S K; Zeng, F G
2004-05-01
Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
The Effects of Pre-processing Strategies for Pediatric Cochlear Implant Recipients
Rakszawski, Bernadette; Wright, Rose; Cadieux, Jamie H.; Davidson, Lisa S.; Brenner, Christine
2016-01-01
Background Cochlear implants (CIs) have been shown to improve children’s speech recognition over traditional amplification when severe to profound sensorineural hearing loss is present. Despite improvements, understanding speech at low-level intensities or in the presence of background noise remains difficult. In an effort to improve speech understanding in challenging environments, Cochlear Ltd. offers pre-processing strategies that apply various algorithms prior to mapping the signal to the internal array. Two of these strategies include Autosensitivity Control™ (ASC) and Adaptive Dynamic Range Optimization (ADRO®). Based on previous research, the manufacturer’s default pre-processing strategy for pediatrics’ everyday programs combines ASC+ADRO®. Purpose The purpose of this study is to compare pediatric speech perception performance across various pre-processing strategies while applying a specific programming protocol utilizing increased threshold (T) levels to ensure access to very low-level sounds. Research Design This was a prospective, cross-sectional, observational study. Participants completed speech perception tasks in four pre-processing conditions: no pre-processing, ADRO®, ASC, ASC+ADRO®. Study Sample Eleven pediatric Cochlear Ltd. cochlear implant users were recruited: six bilateral, one unilateral, and four bimodal. Intervention Four programs, with the participants’ everyday map, were loaded into the processor with different pre-processing strategies applied in each of the four positions: no pre-processing, ADRO®, ASC, and ASC+ADRO®. Data Collection and Analysis Participants repeated CNC words presented at 50 and 70 dB SPL in quiet and HINT sentences presented adaptively with competing R-Space noise at 60 and 70 dB SPL. Each measure was completed as participants listened with each of the four pre-processing strategies listed above. Test order and condition were randomized. A repeated-measures analysis of variance (ANOVA) was used to compare each pre-processing strategy across group data. Critical differences were utilized to determine significant score differences between each pre-processing strategy for individual participants. Results For CNC words presented at 50 dB SPL, the group data revealed significantly better scores using ASC+ADRO® compared to all other pre-processing conditions while ASC resulted in poorer scores compared to ADRO® and ASC+ADRO®. Group data for HINT sentences presented in 70 dB SPL of R-Space noise revealed significantly improved scores using ASC and ASC+ADRO® compared to no pre-processing, with ASC+ADRO® scores being better than ADRO® alone scores. Group data for CNC words presented at 70 dB SPL and adaptive HINT sentences presented in 60 dB SPL of R-Space noise showed no significant difference among conditions. Individual data showed that the pre-processing strategy yielding the best scores varied across measures and participants. Conclusions Group data reveals an advantage with ASC+ADRO® for speech perception presented at lower levels and in higher levels of background noise. Individual data revealed that the optimal pre-processing strategy varied among participants; indicating that a variety of pre-processing strategies should be explored for each CI user considering his or her performance in challenging listening environments. PMID:26905529
Influence of signal processing strategy in auditory abilities.
Melo, Tatiana Mendes de; Bevilacqua, Maria Cecília; Costa, Orozimbo Alves; Moret, Adriane Lima Mortari
2013-01-01
The signal processing strategy is a parameter that may influence the auditory performance of cochlear implant and is important to optimize this parameter to provide better speech perception, especially in difficult listening situations. To evaluate the individual's auditory performance using two different signal processing strategy. Prospective study with 11 prelingually deafened children with open-set speech recognition. A within-subjects design was used to compare performance with standard HiRes and HiRes 120 in three different moments. During test sessions, subject's performance was evaluated by warble-tone sound-field thresholds, speech perception evaluation, in quiet and in noise. In the silence, children S1, S4, S5, S7 showed better performance with the HiRes 120 strategy and children S2, S9, S11 showed better performance with the HiRes strategy. In the noise was also observed that some children performed better using the HiRes 120 strategy and other with HiRes. Not all children presented the same pattern of response to the different strategies used in this study, which reinforces the need to look at optimizing cochlear implant clinical programming.
Lopez-Poveda, Enrique A; Eustaquio-Martín, Almudena; Stohl, Joshua S; Wolford, Robert D; Schatzer, Reinhold; Gorospe, José M; Ruiz, Santiago Santa Cruz; Benito, Fernando; Wilson, Blake S
2017-05-01
We have recently proposed a binaural cochlear implant (CI) sound processing strategy inspired by the contralateral medial olivocochlear reflex (the MOC strategy) and shown that it improves intelligibility in steady-state noise (Lopez-Poveda et al., 2016, Ear Hear 37:e138-e148). The aim here was to evaluate possible speech-reception benefits of the MOC strategy for speech maskers, a more natural type of interferer. Speech reception thresholds (SRTs) were measured in six bilateral and two single-sided deaf CI users with the MOC strategy and with a standard (STD) strategy. SRTs were measured in unilateral and bilateral listening conditions, and for target and masker stimuli located at azimuthal angles of (0°, 0°), (-15°, +15°), and (-90°, +90°). Mean SRTs were 2-5 dB better with the MOC than with the STD strategy for spatially separated target and masker sources. For bilateral CI users, the MOC strategy (1) facilitated the intelligibility of speech in competition with spatially separated speech maskers in both unilateral and bilateral listening conditions; and (2) led to an overall improvement in spatial release from masking in the two listening conditions. Insofar as speech is a more natural type of interferer than steady-state noise, the present results suggest that the MOC strategy holds potential for promising outcomes for CI users. Copyright © 2017. Published by Elsevier B.V.
Blackie, Rebecca A; Kocovski, Nancy L
2016-01-01
According to cognitive models, post-event processing (PEP) is a key factor in the maintenance of social anxiety. Given that decreasing PEP can be challenging for socially anxious individuals, it is important to identify potentially useful strategies. Although distraction may help to decrease PEP, the findings have been equivocal. The primary purpose of this study was to examine whether a brief distraction period immediately following a speech would lead to less PEP the next day. The secondary aim was to examine the effect of distraction following an initial speech on anticipatory anxiety for a second speech, via reductions in PEP. Participants (N = 77 undergraduates with elevated social anxiety; 67.53% female) delivered a speech and were randomly assigned to a distraction, rumination, or control condition. The following day, participants reported levels of PEP in relation to the first speech, as well as anxiety regarding a second, upcoming speech. As expected, those in the distraction condition reported less PEP than those in the rumination and control conditions. Additionally, distraction following the first speech was indirectly related to anticipatory anxiety for the second speech, via PEP. Distraction may represent a potentially useful strategy for reducing PEP and other maladaptive processes that may maintain social anxiety.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.
Borrie, Stephanie A
2015-03-01
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut
2014-01-01
Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
How musical expertise shapes speech perception: evidence from auditory classification images.
Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel
2015-09-24
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y
2014-09-01
Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.
Dole, Marjorie; Hoen, Michel; Meunier, Fanny
2012-06-01
Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type, presenting single target-words against backgrounds made of cocktail party sounds, modulated speech-derived noise or stationary noise. We also evaluated the effect of three listening configurations differing in terms of the amount of spatial processing required. In a monaural condition, signal and noise were presented to the same ear while in a dichotic situation, target and concurrent sound were presented to two different ears, finally in a spatialised configuration, target and competing signals were presented as if they originated from slightly differing positions in the auditory scene. Our results confirm the presence of a speech-in-noise perception deficit in dyslexic adults, in particular when the competing signal is also speech, and when both signals are presented to the same ear, an observation potentially relating to phonological accounts of dyslexia. However, adult dyslexics demonstrated better levels of spatial release of masking than normal reading controls when the background was speech, suggesting that they are well able to rely on denoising strategies based on spatial auditory scene analysis strategies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies.
Mitra, Vikramjit; Nam, Hosung; Espy-Wilson, Carol Y; Saltzman, Elliot; Goldstein, Louis
2010-09-13
Many different studies have claimed that articulatory information can be used to improve the performance of automatic speech recognition systems. Unfortunately, such articulatory information is not readily available in typical speaker-listener situations. Consequently, such information has to be estimated from the acoustic signal in a process which is usually termed "speech-inversion." This study aims to propose and compare various machine learning strategies for speech inversion: Trajectory mixture density networks (TMDNs), feedforward artificial neural networks (FF-ANN), support vector regression (SVR), autoregressive artificial neural network (AR-ANN), and distal supervised learning (DSL). Further, using a database generated by the Haskins Laboratories speech production model, we test the claim that information regarding constrictions produced by the distinct organs of the vocal tract (vocal tract variables) is superior to flesh-point information (articulatory pellet trajectories) for the inversion process.
Application of the Envelope Difference Index to Spectrally Sparse Speech
ERIC Educational Resources Information Center
Souza, Pamela; Hoover, Eric; Gallun, Frederick
2012-01-01
Purpose: Amplitude compression is a common hearing aid processing strategy that can improve speech audibility and loudness comfort but also has the potential to alter important cues carried by the speech envelope. In previous work, a measure of envelope change, the Envelope Difference Index (EDI; Fortune, Woodruff, & Preves, 1994), was moderately…
Eustaquio-Martín, Almudena; Stohl, Joshua S.; Wolford, Robert D.; Schatzer, Reinhold; Wilson, Blake S.
2016-01-01
Objectives: In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. Design: Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. Results: In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. Conclusions: The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids. PMID:26862711
Lopez-Poveda, Enrique A; Eustaquio-Martín, Almudena; Stohl, Joshua S; Wolford, Robert D; Schatzer, Reinhold; Wilson, Blake S
2016-01-01
In natural hearing, cochlear mechanical compression is dynamically adjusted via the efferent medial olivocochlear reflex (MOCR). These adjustments probably help understanding speech in noisy environments and are not available to the users of current cochlear implants (CIs). The aims of the present study are to: (1) present a binaural CI sound processing strategy inspired by the control of cochlear compression provided by the contralateral MOCR in natural hearing; and (2) assess the benefits of the new strategy for understanding speech presented in competition with steady noise with a speech-like spectrum in various spatial configurations of the speech and noise sources. Pairs of CI sound processors (one per ear) were constructed to mimic or not mimic the effects of the contralateral MOCR on compression. For the nonmimicking condition (standard strategy or STD), the two processors in a pair functioned similarly to standard clinical processors (i.e., with fixed back-end compression and independently of each other). When configured to mimic the effects of the MOCR (MOC strategy), the two processors communicated with each other and the amount of back-end compression in a given frequency channel of each processor in the pair decreased/increased dynamically (so that output levels dropped/increased) with increases/decreases in the output energy from the corresponding frequency channel in the contralateral processor. Speech reception thresholds in speech-shaped noise were measured for 3 bilateral CI users and 2 single-sided deaf unilateral CI users. Thresholds were compared for the STD and MOC strategies in unilateral and bilateral listening conditions and for three spatial configurations of the speech and noise sources in simulated free-field conditions: speech and noise sources colocated in front of the listener, speech on the left ear with noise in front of the listener, and speech on the left ear with noise on the right ear. In both bilateral and unilateral listening, the electrical stimulus delivered to the test ear(s) was always calculated as if the listeners were wearing bilateral processors. In both unilateral and bilateral listening conditions, mean speech reception thresholds were comparable with the two strategies for colocated speech and noise sources, but were at least 2 dB lower (better) with the MOC than with the STD strategy for spatially separated speech and noise sources. In unilateral listening conditions, mean thresholds improved with increasing the spatial separation between the speech and noise sources regardless of the strategy but the improvement was significantly greater with the MOC strategy. In bilateral listening conditions, thresholds improved significantly with increasing the speech-noise spatial separation only with the MOC strategy. The MOC strategy (1) significantly improved the intelligibility of speech presented in competition with a spatially separated noise source, both in unilateral and bilateral listening conditions; (2) produced significant spatial release from masking in bilateral listening conditions, something that did not occur with fixed compression; and (3) enhanced spatial release from masking in unilateral listening conditions. The MOC strategy as implemented here, or a modified version of it, may be usefully applied in CIs and in hearing aids.
Relationship Among Signal Fidelity, Hearing Loss, and Working Memory for Digital Noise Suppression.
Arehart, Kathryn; Souza, Pamela; Kates, James; Lunner, Thomas; Pedersen, Michael Syskind
2015-01-01
This study considered speech modified by additive babble combined with noise-suppression processing. The purpose was to determine the relative importance of the signal modifications, individual peripheral hearing loss, and individual cognitive capacity on speech intelligibility and speech quality. The participant group consisted of 31 individuals with moderate high-frequency hearing loss ranging in age from 51 to 89 years (mean = 69.6 years). Speech intelligibility and speech quality were measured using low-context sentences presented in babble at several signal-to-noise ratios. Speech stimuli were processed with a binary mask noise-suppression strategy with systematic manipulations of two parameters (error rate and attenuation values). The cumulative effects of signal modification produced by babble and signal processing were quantified using an envelope-distortion metric. Working memory capacity was assessed with a reading span test. Analysis of variance was used to determine the effects of signal processing parameters on perceptual scores. Hierarchical linear modeling was used to determine the role of degree of hearing loss and working memory capacity in individual listener response to the processed noisy speech. The model also considered improvements in envelope fidelity caused by the binary mask and the degradations to envelope caused by error and noise. The participants showed significant benefits in terms of intelligibility scores and quality ratings for noisy speech processed by the ideal binary mask noise-suppression strategy. This benefit was observed across a range of signal-to-noise ratios and persisted when up to a 30% error rate was introduced into the processing. Average intelligibility scores and average quality ratings were well predicted by an objective metric of envelope fidelity. Degree of hearing loss and working memory capacity were significant factors in explaining individual listener's intelligibility scores for binary mask processing applied to speech in babble. Degree of hearing loss and working memory capacity did not predict listeners' quality ratings. The results indicate that envelope fidelity is a primary factor in determining the combined effects of noise and binary mask processing for intelligibility and quality of speech presented in babble noise. Degree of hearing loss and working memory capacity are significant factors in explaining variability in listeners' speech intelligibility scores but not in quality ratings.
Azadpour, Mahan; McKay, Colette M
2014-01-01
Auditory brainstem implants (ABI) use the same processing strategy as was developed for cochlear implants (CI). However, the cochlear nucleus (CN), the stimulation site of ABIs, is anatomically and physiologically more complex than the auditory nerve and consists of neurons with differing roles in auditory processing. The aim of this study was to evaluate the hypotheses that ABI users are less able than CI users to access speech spectro-temporal information delivered by the existing strategies and that the sites stimulated by different locations of CI and ABI electrode arrays differ in encoding of temporal patterns in the stimulation. Six CI users and four ABI users of Nucleus implants with ACE processing strategy participated in this study. Closed-set perception of aCa syllables (16 consonants) and bVd words (11 vowels) was evaluated via experimental processing strategies that activated one, two, or four of the electrodes of the array in a CIS manner as well as subjects' clinical strategies. Three single-channel strategies presented the overall temporal envelope variations of the signal on a single-implant electrode located at the high-, medium-, and low-frequency regions of the array. Implantees' ability to discriminate within electrode temporal patterns of stimulation for phoneme perception and their ability to make use of spectral information presented by increased number of active electrodes were assessed in the single- and multiple-channel strategies, respectively. Overall percentages and information transmission of phonetic features were obtained for each experimental program. Phoneme perception performance of three ABI users was within the range of CI users in most of the experimental strategies and improved as the number of active electrodes increased. One ABI user performed close to chance with all the single and multiple electrode strategies. There was no significant difference between apical, basal, and middle CI electrodes in transmitting speech temporal information, except a trend that the voicing feature was the least transmitted by the basal electrode. A similar electrode-location pattern could be observed in most ABI subjects. Although the number of tested ABI subjects was small, their wide range of phoneme perception performance was consistent with previous reports of overall speech perception in ABI patients. The better-performing ABI user participants had access to speech temporal and spectral information that was comparable to that of average CI user. The poor-performing ABI user did not have access to within-channel speech temporal information and did not benefit from an increased number of spectral channels. The within-subject variability between different ABI electrodes was less than the variability across users in transmission of speech temporal information. The difference in the performance of ABI users could be related to the location of their electrode array on the CN, anatomy, and physiology of their CN or the damage to their auditory brainstem due to tumor or surgery.
Higgins, Paul; Searchfield, Grant; Coad, Gavin
2012-06-01
The aim of this study was to determine which level-dependent hearing aid digital signal-processing strategy (DSP) participants preferred when listening to music and/or performing a speech-in-noise task. Two receiver-in-the-ear hearing aids were compared: one using 32-channel adaptive dynamic range optimization (ADRO) and the other wide dynamic range compression (WDRC) incorporating dual fast (4 channel) and slow (15 channel) processing. The manufacturers' first-fit settings based on participants' audiograms were used in both cases. Results were obtained from 18 participants on a quick speech-in-noise (QuickSIN; Killion, Niquette, Gudmundsen, Revit, & Banerjee, 2004) task and for 3 music listening conditions (classical, jazz, and rock). Participants preferred the quality of music and performed better at the QuickSIN task using the hearing aids with ADRO processing. A potential reason for the better performance of the ADRO hearing aids was less fluctuation in output with change in sound dynamics. ADRO processing has advantages for both music quality and speech recognition in noise over the multichannel WDRC processing that was used in the study. Further evaluations of which DSP aspects contribute to listener preference are required.
Perception of temporally modified speech in auditory neuropathy.
Hassan, Dalia Mohamed
2011-01-01
Disrupted auditory nerve activity in auditory neuropathy (AN) significantly impairs the sequential processing of auditory information, resulting in poor speech perception. This study investigated the ability of AN subjects to perceive temporally modified consonant-vowel (CV) pairs and shed light on their phonological awareness skills. Four Arabic CV pairs were selected: /ki/-/gi/, /to/-/do/, /si/-/sti/ and /so/-/zo/. The formant transitions in consonants and the pauses between CV pairs were prolonged. Rhyming, segmentation and blending skills were tested using words at a natural rate of speech and with prolongation of the speech stream. Fourteen adult AN subjects were compared to a matched group of cochlear-impaired patients in their perception of acoustically processed speech. The AN group distinguished the CV pairs at a low speech rate, in particular with modification of the consonant duration. Phonological awareness skills deteriorated in adult AN subjects but improved with prolongation of the speech inter-syllabic time interval. A rehabilitation program for AN should consider temporal modification of speech, training for auditory temporal processing and the use of devices with innovative signal processing schemes. Verbal modifications as well as visual imaging appear to be promising compensatory strategies for remediating the affected phonological processing skills.
Investigation of potential cognitive tests for use with older adults in audiology clinics.
Vaughan, Nancy; Storzbach, Daniel; Furukawa, Izumi
2008-01-01
Cognitive declines in working memory and processing speed are hallmarks of aging. Deficits in speech understanding also are seen in aging individuals. A clinical test to determine whether the cognitive aging changes contribute to aging speech understanding difficulties would be helpful for determining rehabilitation strategies in audiology clinics. To identify a clinical neurocognitive test or battery of tests that could be used in audiology clinics to help explain deficits in speech recognition in some older listeners. A correlational study examining the association between certain cognitive test scores and speech recognition performance. Speeded (time-compressed) speech was used to increase the cognitive processing load. Two hundred twenty-five adults aged 50 through 75 years were participants in this study. Both batteries of tests were administered to all participants in two separate sessions. A selected battery of neurocognitive tests and a time-compressed speech recognition test battery using various rates of speech were administered. Principal component analysis was used to extract the important component factors from each set of tests, and regression models were constructed to examine the association between tests and to identify the neurocognitive test most strongly associated with speech recognition performance. A sequencing working memory test (Letter-Number Sequencing [LNS]) was most strongly associated with rapid speech understanding. The association between the LNS test results and the compressed sentence recognition scores (CSRS) was strong even when age and hearing loss were controlled. The LNS is a sequencing test that provides information about temporal processing at the cognitive level and may prove useful in diagnosis of speech understanding problems, and in the development of aural rehabilitation and training strategies.
Co-occurrence statistics as a language-dependent cue for speech segmentation.
Saksida, Amanda; Langus, Alan; Nespor, Marina
2017-05-01
To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even without rich language-specific knowledge. However, it is not clear how strong and reliable the distributional cues are that humans might use to segment speech. We investigate cross-linguistic viability of different statistical learning strategies by analyzing child-directed speech corpora from nine languages and by modeling possible statistics-based speech segmentations. We show that languages vary as to which statistical segmentation strategies are most successful. The variability of the results can be partially explained by systematic differences between languages, such as rhythmical differences. The results confirm previous findings that different statistical learning strategies are successful in different languages and suggest that infants may have to primarily rely on non-statistical cues when they begin their process of speech segmentation. © 2016 John Wiley & Sons Ltd.
ERIC Educational Resources Information Center
Dole, Marjorie; Hoen, Michel; Meunier, Fanny
2012-01-01
Developmental dyslexia is associated with impaired speech-in-noise perception. The goal of the present research was to further characterize this deficit in dyslexic adults. In order to specify the mechanisms and processing strategies used by adults with dyslexia during speech-in-noise perception, we explored the influence of background type,…
Speech perception in individuals with auditory dys-synchrony.
Kumar, U A; Jayaram, M
2011-03-01
This study aimed to evaluate the effect of lengthening the transition duration of selected speech segments upon the perception of those segments in individuals with auditory dys-synchrony. Thirty individuals with auditory dys-synchrony participated in the study, along with 30 age-matched normal hearing listeners. Eight consonant-vowel syllables were used as auditory stimuli. Two experiments were conducted. Experiment one measured the 'just noticeable difference' time: the smallest prolongation of the speech sound transition duration which was noticeable by the subject. In experiment two, speech sounds were modified by lengthening the transition duration by multiples of the just noticeable difference time, and subjects' speech identification scores for the modified speech sounds were assessed. Subjects with auditory dys-synchrony demonstrated poor processing of temporal auditory information. Lengthening of speech sound transition duration improved these subjects' perception of both the placement and voicing features of the speech syllables used. These results suggest that innovative speech processing strategies which enhance temporal cues may benefit individuals with auditory dys-synchrony.
Lorens, Artur; Zgoda, Małgorzata; Obrycka, Anita; Skarżynski, Henryk
2010-12-01
Presently, there are only few studies examining the benefits of fine structure information in coding strategies. Against this background, this study aims to assess the objective and subjective performance of children experienced with the C40+ cochlear implant using the CIS+ coding strategy who were upgraded to the OPUS 2 processor using FSP and HDCIS. In this prospective study, 60 children with more than 3.5 years of experience with the C40+ cochlear implant were upgraded to the OPUS 2 processor and fit and tested with HDCIS (Interval I). After 3 months of experience with HDCIS, they were fit with the FSP coding strategy (Interval II) and tested with all strategies (FSP, HDCIS, CIS+). After an additional 3-4 months, they were assessed on all three strategies and asked to choose their take-home strategy (Interval III). The children were tested using the Adaptive Auditory Speech Test which measures speech reception threshold (SRT) in quiet and noise at each test interval. The children were also asked to rate on a Visual Analogue Scale their satisfaction and coding strategy preference when listening to speech and a pop song. However, since not all tests could be performed at one single visit, some children were not able complete all tests at all intervals. At the study endpoint, speech in quiet showed a significant difference in SRT of 1.0 dB between FSP and HDCIS, with FSP performing better. FSP proved a better strategy compared with CIS+, showing lower SRT results of 5.2 dB. Speech in noise tests showed FSP to be significantly better than CIS+ by 0.7 dB, and HDCIS to be significantly better than CIS+ by 0.8 dB. Both satisfaction and coding strategy preference ratings also revealed that FSP and HDCIS strategies were better than CIS+ strategy when listening to speech and music. FSP was better than HDCIS when listening to speech. This study demonstrates that long-term pediatric users of the COMBI 40+ are able to upgrade to a newer processor and coding strategy without compromising their listening performance and even improving their performance with FSP after a short time of experience. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Language familiarity modulates relative attention to the eyes and mouth of a talker.
Barenholtz, Elan; Mavica, Lauren; Lewkowicz, David J
2016-02-01
We investigated whether the audiovisual speech cues available in a talker's mouth elicit greater attention when adults have to process speech in an unfamiliar language vs. a familiar language. Participants performed a speech-encoding task while watching and listening to videos of a talker in a familiar language (English) or an unfamiliar language (Spanish or Icelandic). Attention to the mouth increased in monolingual subjects in response to an unfamiliar language condition but did not in bilingual subjects when the task required speech processing. In the absence of an explicit speech-processing task, subjects attended equally to the eyes and mouth in response to both familiar and unfamiliar languages. Overall, these results demonstrate that language familiarity modulates selective attention to the redundant audiovisual speech cues in a talker's mouth in adults. When our findings are considered together with similar findings from infants, they suggest that this attentional strategy emerges very early in life. Copyright © 2015 Elsevier B.V. All rights reserved.
Integrating cognitive and peripheral factors in predicting hearing-aid processing effectiveness
Kates, James M.; Arehart, Kathryn H.; Souza, Pamela E.
2013-01-01
Individual factors beyond the audiogram, such as age and cognitive abilities, can influence speech intelligibility and speech quality judgments. This paper develops a neural network framework for combining multiple subject factors into a single model that predicts speech intelligibility and quality for a nonlinear hearing-aid processing strategy. The nonlinear processing approach used in the paper is frequency compression, which is intended to improve the audibility of high-frequency speech sounds by shifting them to lower frequency regions where listeners with high-frequency loss have better hearing thresholds. An ensemble averaging approach is used for the neural network to avoid the problems associated with overfitting. Models are developed for two subject groups, one having nearly normal hearing and the other mild-to-moderate sloping losses. PMID:25669257
Impact of Noise Reduction Algorithm in Cochlear Implant Processing on Music Enjoyment.
Kohlberg, Gavriel D; Mancuso, Dean M; Griffin, Brianna M; Spitzer, Jaclyn B; Lalwani, Anil K
2016-06-01
Noise reduction algorithm (NRA) in speech processing strategy has positive impact on speech perception among cochlear implant (CI) listeners. We sought to evaluate the effect of NRA on music enjoyment. Prospective analysis of music enjoyment. Academic medical center. Normal-hearing (NH) adults (N = 16) and CI listeners (N = 9). Subjective rating of music excerpts. NH and CI listeners evaluated country music piece on three enjoyment modalities: pleasantness, musicality, and naturalness. Participants listened to the original version and 20 modified, less complex versions created by including subsets of musical instruments from the original song. NH participants listened to the segments through CI simulation and CI listeners listened to the segments with their usual speech processing strategy, with and without NRA. Decreasing the number of instruments was significantly associated with increase in the pleasantness and naturalness in both NH and CI subjects (p < 0.05). However, there was no difference in music enjoyment with or without NRA for either NH listeners with CI simulation or CI listeners across all three modalities of pleasantness, musicality, and naturalness (p > 0.05): this was true for the original and the modified music segments with one to three instruments (p > 0.05). NRA does not affect music enjoyment in CI listener or NH individual with CI simulation. This suggests that strategies to enhance speech processing will not necessarily have a positive impact on music enjoyment. However, reducing the complexity of music shows promise in enhancing music enjoyment and should be further explored.
Single and Multiple Microphone Noise Reduction Strategies in Cochlear Implants
Azimi, Behnam; Hu, Yi; Friedland, David R.
2012-01-01
To restore hearing sensation, cochlear implants deliver electrical pulses to the auditory nerve by relying on sophisticated signal processing algorithms that convert acoustic inputs to electrical stimuli. Although individuals fitted with cochlear implants perform well in quiet, in the presence of background noise, the speech intelligibility of cochlear implant listeners is more susceptible to background noise than that of normal hearing listeners. Traditionally, to increase performance in noise, single-microphone noise reduction strategies have been used. More recently, a number of approaches have suggested that speech intelligibility in noise can be improved further by making use of two or more microphones, instead. Processing strategies based on multiple microphones can better exploit the spatial diversity of speech and noise because such strategies rely mostly on spatial information about the relative position of competing sound sources. In this article, we identify and elucidate the most significant theoretical aspects that underpin single- and multi-microphone noise reduction strategies for cochlear implants. More analytically, we focus on strategies of both types that have been shown to be promising for use in current-generation implant devices. We present data from past and more recent studies, and furthermore we outline the direction that future research in the area of noise reduction for cochlear implants could follow. PMID:22923425
Putter-Katz, Hanna; Adi-Bensaid, Limor; Feldman, Irit; Hildesheimer, Minka
2008-01-01
Twenty children with central auditory processing disorders [(C)APD] were subjected to a structured intervention program of listening skills in quiet and in noise. Their performance was compared to that of a control group of 10 children with (C)APD with no special treatment. Pretests were conducted in quiet and in degraded listening conditions (speech noise and competing speech). The (C)APD management approach was integrative and included top-down and bottom-up strategies. It focused on environmental modifications, remediation techniques, and compensatory strategies. Training was conducted with monosyllabic and polysyllabic words, sentences and phrases in quiet and in noise. Comparisons of pre- and post-management measures indicated increase in speech recognition performance in background noise and competing speech for the treatment group. This improvement was exhibited for both ears. A significant difference between ears was found with the left ear showing improvement in both the short and the long versions of competing sentence tests and the right ear performing better in the long competing sentences only following intervention. No changes were documented for the control group. These findings add to a growing body of literature suggesting that interactive auditory training can improve listening skills.
Kumar, U A; Jayaram, M
2013-07-01
The purpose of this study was to evaluate the effect of lengthening of voice onset time and burst duration of selected speech stimuli on perception by individuals with auditory dys-synchrony. This is the second of a series of articles reporting the effect of signal enhancing strategies on speech perception by such individuals. Two experiments were conducted: (1) assessment of the 'just-noticeable difference' for voice onset time and burst duration of speech sounds; and (2) assessment of speech identification scores when speech sounds were modified by lengthening the voice onset time and the burst duration in units of one just-noticeable difference, both in isolation and in combination with each other plus transition duration modification. Lengthening of voice onset time as well as burst duration improved perception of voicing. However, the effect of voice onset time modification was greater than that of burst duration modification. Although combined lengthening of voice onset time, burst duration and transition duration resulted in improved speech perception, the improvement was less than that due to lengthening of transition duration alone. These results suggest that innovative speech processing strategies that enhance temporal cues may benefit individuals with auditory dys-synchrony.
Restoring speech perception with cochlear implants by spanning defective electrode contacts.
Frijns, Johan H M; Snel-Bongers, Jorien; Vellinga, Dirk; Schrage, Erik; Vanpoucke, Filiep J; Briaire, Jeroen J
2013-04-01
Even with six defective contacts, spanning can largely restore speech perception with the HiRes 120 speech processing strategy to the level supported by an intact electrode array. Moreover, the sound quality is not degraded. Previous studies have demonstrated reduced speech perception scores (SPS) with defective contacts in HiRes 120. This study investigated whether replacing defective contacts by spanning, i.e. current steering on non-adjacent contacts, is able to restore speech recognition to the level supported by an intact electrode array. Ten adult cochlear implant recipients (HiRes90K, HiFocus1J) with experience with HiRes 120 participated in this study. Three different defective electrode arrays were simulated (six separate defective contacts, three pairs or two triplets). The participants received three take-home strategies and were asked to evaluate the sound quality in five predefined listening conditions. After 3 weeks, SPS were evaluated with monosyllabic words in quiet and in speech-shaped background noise. The participants rated the sound quality equal for all take-home strategies. SPS with background noise were equal for all conditions tested. However, SPS in quiet (85% phonemes correct on average with the full array) decreased significantly with increasing spanning distance, with a 3% decrease for each spanned contact.
Abdeltawwab, Mohamed M; Khater, Ahmed; El-Anwar, Mohammad W
2016-01-01
The combination of acoustic and electric stimulation as a way to enhance speech recognition performance in cochlear implant (CI) users has generated considerable interest in the recent years. The purpose of this study was to evaluate the bimodal advantage of the FS4 speech processing strategy in combination with hearing aids (HA) as a means to improve low-frequency resolution in CI patients. Nineteen postlingual CI adults were selected to participate in this study. All patients wore implants on one side and HA on the contralateral side with residual hearing. Monosyllabic word recognition, speech in noise, and emotion and talker identification were assessed using CI with fine structure processing/FS4 and high-definition continuous interleaved sampling strategies, HA alone, and a combination of CI and HA. The bimodal stimulation showed improvement in speech performance and emotion identification for the question/statement/order tasks, which was statistically significant compared to patients with CI alone, but there were no significant statistical differences in intragender talker discrimination and emotion identification for the happy/angry/neutral tasks. The poorest performance was obtained with HA only, and it was statistically significant compared to the other modalities. The bimodal stimulation showed enhanced speech performance in CI patients, and it improves the limitations provided by electric or acoustic stimulation alone. © 2016 S. Karger AG, Basel.
A novel speech processing algorithm based on harmonicity cues in cochlear implant
NASA Astrophysics Data System (ADS)
Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng
2017-08-01
This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.
Language-learning disabilities: Paradigms for the nineties.
Wiig, E H
1991-01-01
We are beginning a decade, during which many traditional paradigms in education, special education, and speech-language pathology will undergo change. Among paradigms considered promising for speech-language pathology in the schools are collaborative language intervention and strategy training for language and communication. This presentation introduces management models for developing a collaborative language intervention process, among them the Deming Management Method for Total Quality (TQ) (Deming 1986). Implementation models for language assessment and IEP planning and multicultural issues are also introduced (Damico and Nye 1990; Secord and Wiig in press). While attention to processes involved in developing and implementing collaborative language intervention is paramount, content should not be neglected. To this end, strategy training for language and communication is introduced as a viable paradigm. Macro- and micro-level process models for strategy training are featured and general issues are discussed (Ellis, Deshler, and Schumaker 1989; Swanson 1989; Wiig 1989).
Lexical and sublexical units in speech perception.
Giroux, Ibrahima; Rey, Arnaud
2009-03-01
Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vinter, 1998 in an experiment where we compare the lexical and sublexical recognition performance of adults after hearing 2 or 10 min of an artificial spoken language. The results are consistent with Parser's predictions and the clustering approach, showing that performance on words is better than performance on part-words only after 10 min. This result suggests that word segmentation abilities are not merely due to stronger associations between sublexical units but to the emergence of stronger lexical representations during the development of speech perception processes. Copyright © 2009, Cognitive Science Society, Inc.
Monstrey, Jolijn; Deeks, John M.; Macherey, Olivier
2014-01-01
Objective To evaluate a speech-processing strategy in which the lowest frequency channel is conveyed using an asymmetric pulse shape and “phantom stimulation”, where current is injected into one intra-cochlear electrode and where the return current is shared between an intra-cochlear and an extra-cochlear electrode. This strategy is expected to provide more selective excitation of the cochlear apex, compared to a standard strategy where the lowest-frequency channel is conveyed by symmetric pulses in monopolar mode. In both strategies all other channels were conveyed by monopolar stimulation. Design Within-subjects comparison between the two strategies. Four experiments: (1) discrimination between the strategies, controlling for loudness differences, (2) consonant identification, (3) recognition of lowpass-filtered sentences in quiet, (4) sentence recognition in the presence of a competing speaker. Study sample Eight users of the Advanced Bionics CII/Hi-Res 90k cochlear implant. Results Listeners could easily discriminate between the two strategies but no consistent differences in performance were observed. Conclusions The proposed method does not improve speech perception, at least in the short term. PMID:25358027
Carlyon, Robert P; Monstrey, Jolijn; Deeks, John M; Macherey, Olivier
2014-12-01
To evaluate a speech-processing strategy in which the lowest frequency channel is conveyed using an asymmetric pulse shape and "phantom stimulation", where current is injected into one intra-cochlear electrode and where the return current is shared between an intra-cochlear and an extra-cochlear electrode. This strategy is expected to provide more selective excitation of the cochlear apex, compared to a standard strategy where the lowest-frequency channel is conveyed by symmetric pulses in monopolar mode. In both strategies all other channels were conveyed by monopolar stimulation. Within-subjects comparison between the two strategies. Four experiments: (1) discrimination between the strategies, controlling for loudness differences, (2) consonant identification, (3) recognition of lowpass-filtered sentences in quiet, (4) sentence recognition in the presence of a competing speaker. Eight users of the Advanced Bionics CII/Hi-Res 90k cochlear implant. Listeners could easily discriminate between the two strategies but no consistent differences in performance were observed. The proposed method does not improve speech perception, at least in the short term.
Interactive Processing of Words in Connected Speech in L1 and L2.
ERIC Educational Resources Information Center
Hayashi, Takuo
1991-01-01
A study exploring the differences between first- and second-language word recognition strategies revealed that second-language listeners used more higher level information than native language listeners, when access to higher level information was not hindered by a competence-ceiling effect, indicating that word processing strategy is a function…
Strategies of Clarification in Judges' Use of Language: From the Written to the Spoken.
ERIC Educational Resources Information Center
Philips, Susan U.
1985-01-01
Reports on a study of judges' strategies in clarifying their verbal explanations of constitutional rights to criminal defendants. Identifies six clarification processes and compares them with other studies of clarification processes and with the properties of simplified registers, particularly speech addressed to first- and second-language…
Planning of Hiatus-Breaking Inserted /?/ in the Speech of Australian English-Speaking Children
ERIC Educational Resources Information Center
Yuen, Ivan; Cox, Felicity; Demuth, Katherine
2017-01-01
Purpose: Non-rhotic varieties of English often use /?/ insertion as a connected speech process to separate heterosyllabic V1.V2 hiatus contexts. However, there has been little research on children's development of this strategy. This study investigated whether children use /?/ insertion and, if so, whether hiatus-breaking /?/ can be considered…
A Framework for Speech Activity Detection Using Adaptive Auditory Receptive Fields.
Carlin, Michael A; Elhilali, Mounya
2015-12-01
One of the hallmarks of sound processing in the brain is the ability of the nervous system to adapt to changing behavioral demands and surrounding soundscapes. It can dynamically shift sensory and cognitive resources to focus on relevant sounds. Neurophysiological studies indicate that this ability is supported by adaptively retuning the shapes of cortical spectro-temporal receptive fields (STRFs) to enhance features of target sounds while suppressing those of task-irrelevant distractors. Because an important component of human communication is the ability of a listener to dynamically track speech in noisy environments, the solution obtained by auditory neurophysiology implies a useful adaptation strategy for speech activity detection (SAD). SAD is an important first step in a number of automated speech processing systems, and performance is often reduced in highly noisy environments. In this paper, we describe how task-driven adaptation is induced in an ensemble of neurophysiological STRFs, and show how speech-adapted STRFs reorient themselves to enhance spectro-temporal modulations of speech while suppressing those associated with a variety of nonspeech sounds. We then show how an adapted ensemble of STRFs can better detect speech in unseen noisy environments compared to an unadapted ensemble and a noise-robust baseline. Finally, we use a stimulus reconstruction task to demonstrate how the adapted STRF ensemble better captures the spectrotemporal modulations of attended speech in clean and noisy conditions. Our results suggest that a biologically plausible adaptation framework can be applied to speech processing systems to dynamically adapt feature representations for improving noise robustness.
Stathopoulos, Elaine T; Huber, Jessica E; Richardson, Kelly; Kamphaus, Jennifer; DeCicco, Devan; Darling, Meghan; Fulcher, Katrina; Sussman, Joan E
2014-01-01
The objective of the present study was to investigate whether speakers with hypophonia, secondary to Parkinson's disease (PD), would increases their vocal intensity when speaking in a noisy environment (Lombard effect). The other objective was to examine the underlying laryngeal and respiratory strategies used to increase vocal intensity. Thirty-three participants with PD were included for study. Each participant was fitted with the SpeechVive™ device that played multi-talker babble noise into one ear during speech. Using acoustic, aerodynamic and respiratory kinematic techniques, the simultaneous laryngeal and respiratory mechanisms used to regulate vocal intensity were examined. Significant group results showed that most speakers with PD (26/33) were successful at increasing their vocal intensity when speaking in the condition of multi-talker babble noise. They were able to support their increased vocal intensity and subglottal pressure with combined strategies from both the laryngeal and respiratory mechanisms. Individual speaker analysis indicated that the particular laryngeal and respiratory interactions differed among speakers. The SpeechVive™ device elicited higher vocal intensities from patients with PD. Speakers used different combinations of laryngeal and respiratory physiologic mechanisms to increase vocal intensity, thus suggesting that disease process does not uniformly affect the speech subsystems. Readers will be able to: (1) identify speech characteristics of people with Parkinson's disease (PD), (2) identify typical respiratory strategies for increasing sound pressure level (SPL), (3) identify typical laryngeal strategies for increasing SPL, (4) define the Lombard effect. Copyright © 2014 Elsevier Inc. All rights reserved.
Lebib, Riadh; Papo, David; Douiri, Abdel; de Bode, Stella; Gillon Dowens, Margaret; Baudonnière, Pierre-Marie
2004-11-30
Lipreading reliably improve speech perception during face-to-face conversation. Within the range of good dubbing, however, adults tolerate some audiovisual (AV) discrepancies and lipreading, then, can give rise to confusion. We used event-related brain potentials (ERPs) to study the perceptual strategies governing the intermodal processing of dynamic and bimodal speech stimuli, either congruently dubbed or not. Electrophysiological analyses revealed that non-coherent audiovisual dubbings modulated in amplitude an endogenous ERP component, the N300, we compared to a 'N400-like effect' reflecting the difficulty to integrate these conflicting pieces of information. This result adds further support for the existence of a cerebral system underlying 'integrative processes' lato sensu. Further studies should take advantage of this 'N400-like effect' with AV speech stimuli to open new perspectives in the domain of psycholinguistics.
Reversal of age-related neural timing delays with training
Anderson, Samira; White-Schwoch, Travis; Parbery-Clark, Alexandra; Kraus, Nina
2013-01-01
Neural slowing is commonly noted in older adults, with consequences for sensory, motor, and cognitive domains. One of the deleterious effects of neural slowing is impairment of temporal resolution; older adults, therefore, have reduced ability to process the rapid events that characterize speech, especially in noisy environments. Although hearing aids provide increased audibility, they cannot compensate for deficits in auditory temporal processing. Auditory training may provide a strategy to address these deficits. To that end, we evaluated the effects of auditory-based cognitive training on the temporal precision of subcortical processing of speech in noise. After training, older adults exhibited faster neural timing and experienced gains in memory, speed of processing, and speech-in-noise perception, whereas a matched control group showed no changes. Training was also associated with decreased variability of brainstem response peaks, suggesting a decrease in temporal jitter in response to a speech signal. These results demonstrate that auditory-based cognitive training can partially restore age-related deficits in temporal processing in the brain; this plasticity in turn promotes better cognitive and perceptual skills. PMID:23401541
Benkendorf, J L; Prince, M B; Rose, M A; De Fina, A; Hamilton, H E
2001-01-01
To date, research examining adherence to genetic counseling principles has focused on specific counseling activities such as the giving or withholding of information and responding to client requests for advice. We audiotaped 43 prenatal genetic counseling sessions and used data-driven, qualitative, sociolinguistic methodologies to investigate how language choices facilitate or hinder the counseling process. Transcripts of each session were prepared for sociolinguistic analysis of the emergent discourse that included studying conversational style, speaker-listener symmetry, directness, and other interactional patterns. Analysis of our data demonstrates that: 1) indirect speech, marked by the use of hints, hedges, and other politeness strategies, facilitates rapport and mitigates the tension between a client-centered relationship and a counselor-driven agenda; 2) direct speech, or speaking literally, is an effective strategy for providing information and education; and 3) confusion exists between the use of indirect speech and the intent to provide nondirective counseling, especially when facilitating client decision-making. Indirect responses to client questions, such as those that include the phrases "some people" or "most people," helped to maintain counselor neutrality; however, this well-intended indirectness, used to preserve client autonomy, may have obstructed direct explorations of client needs. We argue that the genetic counseling process requires increased flexibility in the use of direct and indirect speech and provide new insights into how "talk" affects the work of genetic counselors.
Cooke, Martin; Lu, Youyi
2010-10-01
Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Utianski, Rene L; Caviness, John N; Liss, Julie M
2015-01-01
High-density electroencephalography was used to evaluate cortical activity during speech comprehension via a sentence verification task. Twenty-four participants assigned true or false to sentences produced with 3 noise-vocoded channel levels (1--unintelligible, 6--decipherable, 16--intelligible), during simultaneous EEG recording. Participant data were sorted into higher- (HP) and lower-performing (LP) groups. The identification of a late-event related potential for LP listeners in the intelligible condition and in all listeners when challenged with a 6-Ch signal supports the notion that this induced potential may be related to either processing degraded speech, or degraded processing of intelligible speech. Different cortical locations are identified as neural generators responsible for this activity; HP listeners are engaging motor aspects of their language system, utilizing an acoustic-phonetic based strategy to help resolve the sentence, while LP listeners do not. This study presents evidence for neurophysiological indices associated with more or less successful speech comprehension performance across listening conditions. Copyright © 2014 Elsevier Inc. All rights reserved.
Orthography and Modality Influence Speech Production in Adults and Children.
Saletta, Meredith; Goffman, Lisa; Hogan, Tiffany P
2016-12-01
The acquisition of literacy skills influences the perception and production of spoken language. We examined if orthography influences implicit processing in speech production in child readers and in adult readers with low and high reading proficiency. Children (n = 17), adults with typical reading skills (n = 17), and adults demonstrating low reading proficiency (n = 18) repeated or read aloud nonwords varying in orthographic transparency. Analyses of implicit linguistic processing (segmental accuracy and speech movement stability) were conducted. The accuracy and articulatory stability of productions of the nonwords were assessed before and after repetition or reading. Segmental accuracy results indicate that all 3 groups demonstrated greater learning when they were able to read, rather than just hear, the nonwords. Speech movement results indicate that, for adults with poor reading skills, exposure to the nonwords in a transparent spelling reduces the articulatory variability of speech production. Reading skill was correlated with speech movement stability in the groups of adults. In children and adults, orthography interacts with speech production; all participants integrate orthography into their lexical representations. Adults with poor reading skills do not use the same reading or speaking strategies as children with typical reading skills.
Orthography and Modality Influence Speech Production in Adults and Children
Goffman, Lisa; Hogan, Tiffany P.
2016-01-01
Purpose The acquisition of literacy skills influences the perception and production of spoken language. We examined if orthography influences implicit processing in speech production in child readers and in adult readers with low and high reading proficiency. Method Children (n = 17), adults with typical reading skills (n = 17), and adults demonstrating low reading proficiency (n = 18) repeated or read aloud nonwords varying in orthographic transparency. Analyses of implicit linguistic processing (segmental accuracy and speech movement stability) were conducted. The accuracy and articulatory stability of productions of the nonwords were assessed before and after repetition or reading. Results Segmental accuracy results indicate that all 3 groups demonstrated greater learning when they were able to read, rather than just hear, the nonwords. Speech movement results indicate that, for adults with poor reading skills, exposure to the nonwords in a transparent spelling reduces the articulatory variability of speech production. Reading skill was correlated with speech movement stability in the groups of adults. Conclusions In children and adults, orthography interacts with speech production; all participants integrate orthography into their lexical representations. Adults with poor reading skills do not use the same reading or speaking strategies as children with typical reading skills. PMID:27942710
Rate and rhythm control strategies for apraxia of speech in nonfluent primary progressive aphasia.
Beber, Bárbara Costa; Berbert, Monalise Costa Batista; Grawer, Ruth Siqueira; Cardoso, Maria Cristina de Almeida Freitas
2018-01-01
The nonfluent/agrammatic variant of primary progressive aphasia is characterized by apraxia of speech and agrammatism. Apraxia of speech limits patients' communication due to slow speaking rate, sound substitutions, articulatory groping, false starts and restarts, segmentation of syllables, and increased difficulty with increasing utterance length. Speech and language therapy is known to benefit individuals with apraxia of speech due to stroke, but little is known about its effects in primary progressive aphasia. This is a case report of a 72-year-old, illiterate housewife, who was diagnosed with nonfluent primary progressive aphasia and received speech and language therapy for apraxia of speech. Rate and rhythm control strategies for apraxia of speech were trained to improve initiation of speech. We discuss the importance of these strategies to alleviate apraxia of speech in this condition and the future perspectives in the area.
Temporally selective attention supports speech processing in 3- to 5-year-old children.
Astheimer, Lori B; Sanders, Lisa D
2012-01-01
Recent event-related potential (ERP) evidence demonstrates that adults employ temporally selective attention to preferentially process the initial portions of words in continuous speech. Doing so is an effective listening strategy since word-initial segments are highly informative. Although the development of this process remains unexplored, directing attention to word onsets may be important for speech processing in young children who would otherwise be overwhelmed by the rapidly changing acoustic signals that constitute speech. We examined the use of temporally selective attention in 3- to 5-year-old children listening to stories by comparing ERPs elicited by attention probes presented at four acoustically matched times relative to word onsets: concurrently with a word onset, 100 ms before, 100 ms after, and at random control times. By 80 ms, probes presented at and after word onsets elicited a larger negativity than probes presented before word onsets or at control times. The latency and distribution of this effect is similar to temporally and spatially selective attention effects measured in adults and, despite differences in polarity, spatially selective attention effects measured in children. These results indicate that, like adults, preschool aged children modulate temporally selective attention to preferentially process the initial portions of words in continuous speech. Copyright © 2011 Elsevier Ltd. All rights reserved.
Using Flanagan's phase vocoder to improve cochlear implant performance
NASA Astrophysics Data System (ADS)
Zeng, Fan-Gang
2004-10-01
The cochlear implant has restored partial hearing to more than 100
Ortmann, Magdalene; Zwitserlood, Pienie; Knief, Arne; Baare, Johanna; Brinkheetker, Stephanie; am Zehnhoff-Dinnesen, Antoinette; Dobel, Christian
2017-01-01
Cochlear implants provide individuals who are deaf with access to speech. Although substantial advancements have been made by novel technologies, there still is high variability in language development during childhood, depending on adaptation and neural plasticity. These factors have often been investigated in the auditory domain, with the mismatch negativity as an index for sensory and phonological processing. Several studies have demonstrated that the MMN is an electrophysiological correlate for hearing improvement with cochlear implants. In this study, two groups of cochlear implant users, both with very good basic hearing abilities but with non-overlapping speech performance (very good or very poor speech performance), were matched according to device experience and age at implantation. We tested the perception of phonemes in the context of specific other phonemes from which they were very hard to discriminate (e.g., the vowels in /bu/ vs. /bo/). The most difficult pair was individually determined for each participant. Using behavioral measures, both cochlear implants groups performed worse than matched controls, and the good performers performed better than the poor performers. Cochlear implant groups and controls did not differ during time intervals typically used for the mismatch negativity, but earlier: source analyses revealed increased activity in the region of the right supramarginal gyrus (220–260 ms) in good performers. Poor performers showed increased activity in the left occipital cortex (220–290 ms), which may be an index for cross-modal perception. The time course and the neural generators differ from data from our earlier studies, in which the same phonemes were assessed in an easy-to-discriminate context. The results demonstrate that the groups used different language processing strategies, depending on the success of language development and the particular language context. Overall, our data emphasize the role of neural plasticity and use of adaptive strategies for successful language development with cochlear implants. PMID:28056017
Language/culture/mind/brain. Progress at the margins between disciplines.
Kuhl, P K; Tsao, F M; Liu, H M; Zhang, Y; De Boer, B
2001-05-01
At the forefront of research on language are new data demonstrating infants' strategies in the early acquisition of language. The data show that infants perceptually "map" critical aspects of ambient language in the first year of life before they can speak. Statistical and abstract properties of speech are picked up through exposure to ambient language. Moreover, linguistic experience alters infants' perception of speech, warping perception in a way that enhances native-language speech processing. Infants' strategies are unexpected and unpredicted by historical views. At the same time, research in three additional disciplines is contributing to our understanding of language and its acquisition by children. Cultural anthropologists are demonstrating the universality of adult speech behavior when addressing infants and children across cultures, and this is creating a new view of the role adult speakers play in bringing about language in the child. Neuroscientists, using the techniques of modern brain imaging, are revealing the temporal and structural aspects of language processing by the brain and suggesting new views of the critical period for language. Computer scientists, modeling the computational aspects of childrens' language acquisition, are meeting success using biologically inspired neural networks. Although a consilient view cannot yet be offered, the cross-disciplinary interaction now seen among scientists pursuing one of humans' greatest achievements, language, is quite promising.
Verbal Processing Speed and Executive Functioning in Long-Term Cochlear Implant Users
Pisoni, David B.; Kronenberger, William G.
2015-01-01
Purpose The purpose of this study was to report how verbal rehearsal speed (VRS), a form of covert speech used to maintain verbal information in working memory, and another verbal processing speed measure, perceptual encoding speed, are related to 3 domains of executive function (EF) at risk in cochlear implant (CI) users: verbal working memory, fluency-speed, and inhibition-concentration. Method EF, speech perception, and language outcome measures were obtained from 55 prelingually deaf, long-term CI users and matched controls with normal hearing (NH controls). Correlational analyses were used to assess relations between VRS (articulation rate), perceptual encoding speed (digit and color naming), and the outcomes in each sample. Results CI users displayed slower verbal processing speeds than NH controls. Verbal rehearsal speed was related to 2 EF domains in the NH sample but was unrelated to EF outcomes in CI users. Perceptual encoding speed was related to all EF domains in both groups. Conclusions Verbal rehearsal speed may be less influential for EF quality in CI users than for NH controls, whereas rapid automatized labeling skills and EF are closely related in both groups. CI users may develop processing strategies in EF tasks that differ from the covert speech strategies routinely employed by NH individuals. PMID:25320961
Na, Wondo; Kim, Gibbeum; Kim, Gungu; Han, Woojae; Kim, Jinsook
2017-01-01
The current study aimed to evaluate hearing-related changes in terms of speech-in-noise processing, fast-rate speech processing, and working memory; and to identify which of these three factors is significantly affected by age-related hearing loss. One hundred subjects aged 65-84 years participated in the study. They were classified into four groups ranging from normal hearing to moderate-to-severe hearing loss. All the participants were tested for speech perception in quiet and noisy conditions and for speech perception with time alteration in quiet conditions. Forward- and backward-digit span tests were also conducted to measure the participants' working memory. 1) As the level of background noise increased, speech perception scores systematically decreased in all the groups. This pattern was more noticeable in the three hearing-impaired groups than in the normal hearing group. 2) As the speech rate increased faster, speech perception scores decreased. A significant interaction was found between speed of speech and hearing loss. In particular, 30% of compressed sentences revealed a clear differentiation between moderate hearing loss and moderate-to-severe hearing loss. 3) Although all the groups showed a longer span on the forward-digit span test than the backward-digit span test, there was no significant difference as a function of hearing loss. The degree of hearing loss strongly affects the speech recognition of babble-masked and time-compressed speech in the elderly but does not affect the working memory. We expect these results to be applied to appropriate rehabilitation strategies for hearing-impaired elderly who experience difficulty in communication.
An Intrinsically Digital Amplification Scheme for Hearing Aids
NASA Astrophysics Data System (ADS)
Blamey, Peter J.; Macfarlane, David S.; Steele, Brenton R.
2005-12-01
Results for linear and wide-dynamic range compression were compared with a new 64-channel digital amplification strategy in three separate studies. The new strategy addresses the requirements of the hearing aid user with efficient computations on an open-platform digital signal processor (DSP). The new amplification strategy is not modeled on prior analog strategies like compression and linear amplification, but uses statistical analysis of the signal to optimize the output dynamic range in each frequency band independently. Using the open-platform DSP processor also provided the opportunity for blind trial comparisons of the different processing schemes in BTE and ITE devices of a high commercial standard. The speech perception scores and questionnaire results show that it is possible to provide improved audibility for sound in many narrow frequency bands while simultaneously improving comfort, speech intelligibility in noise, and sound quality.
Everyday listening questionnaire: correlation between subjective hearing and objective performance.
Brendel, Martina; Frohne-Buechner, Carolin; Lesinski-Schiedat, Anke; Lenarz, Thomas; Buechner, Andreas
2014-01-01
Clinical experience has demonstrated that speech understanding by cochlear implant (CI) recipients has improved over recent years with the development of new technology. The Everyday Listening Questionnaire 2 (ELQ 2) was designed to collect information regarding the challenges faced by CI recipients in everyday listening. The aim of this study was to compare self-assessment of CI users using ELQ 2 with objective speech recognition measures and to compare results between users of older and newer coding strategies. During their regular clinical review appointments a group of representative adult CI recipients implanted with the Advanced Bionics implant system were asked to complete the questionnaire. The first 100 patients who agreed to participate in this survey were recruited independent of processor generation and speech coding strategy. Correlations between subjectively scored hearing performance in everyday listening situations and objectively measured speech perception abilities were examined relative to the speech coding strategies used. When subjects were grouped by strategy there were significant differences between users of older 'standard' strategies and users of the newer, currently available strategies (HiRes and HiRes 120), especially in the categories of telephone use and music perception. Significant correlations were found between certain subjective ratings and the objective speech perception data in noise. There is a good correlation between subjective and objective data. Users of more recent speech coding strategies tend to have fewer problems in difficult hearing situations.
Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z
2011-01-01
Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer experience with Fidelity120. Another three subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8-wk time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in vowel F2 frequency perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (vowel F1 frequency or consonant place of articulation). Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise.
Bradlow, Ann R; Alexander, Jennifer A
2007-04-01
Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users
Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y.
2018-01-01
Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband “ripple” stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects’ spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. PMID:28601530
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age
Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik
2015-01-01
Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
A Survey of Practices and Strategies for Marketing Communication Majors.
ERIC Educational Resources Information Center
Gray, Philip A.; Wilson, Gerald L.
Fifty college speech departments responded to a survey intended to discover some of the common practices and strategies for marketing undergraduate speech communication majors. The results indicated that the most frequent name for the departments responding was "Communication" rather than "Speech Communication," completely the opposite of what was…
The Role of Corticostriatal Systems in Speech Category Learning
Yi, Han-Gyol; Maddox, W. Todd; Mumford, Jeanette A.; Chandrasekaran, Bharath
2016-01-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. Keywords: category learning, fMRI, corticostriatal systems, speech, putamen PMID:25331600
McCormack, Jane; Easton, Catherine; Morkel-Kingsbury, Lenni
2014-01-01
The landscape of tertiary education is changing. Developments in information and communications technology have created new ways of engaging with subject material and supporting students on their learning journeys. Therefore, it is timely to reconsider and re-imagine the education of speech-language pathology (SLP) students within this new learning space. In this paper, we outline the design of a new Master of Speech Pathology course being offered by distance education at Charles Sturt University (CSU) in Australia. We discuss the catalyst for the course and the commitments of the SLP team at CSU, then describe the curriculum design process, focusing on the pedagogical approach and the learning and teaching strategies utilised in the course delivery. We explain how the learning and teaching strategies have been selected to support students' online learning experience and enable greater interaction between students and the subject material, with students and subject experts, and among student groups. Finally, we highlight some of the challenges in designing and delivering a distance education SLP program and identify future directions for educating students in an online world. © 2015 S. Karger AG, Basel.
McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.
2014-01-01
Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558
Donaldson, Gail S.; Dawson, Patricia K.; Borden, Lamar Z.
2010-01-01
Objectives Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many CI users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral-cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. Design A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 weeks during the main study; a subset of five subjects used Fidelity120 for 3 additional months following the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency (Vow F1), vowel F2 frequency (Vow F2) and consonant place of articulation (Con PLC); overall transmitted information for vowels (Vow STIM) and consonants (Con STIM); and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle and basal regions of the implanted array using a psychophysical pitch-ranking task. Results With one exception, there was no effect of strategy (HiRes vs. Fidelity120) on the speech measures tested, either during the main study (n=10) or after extended use of Fidelity120 (n=5). The exception was a small but significant advantage for HiRes over Fidelity120 for the Con STIM measure during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 weeks or longer experience with Fidelity120. Another 3 subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8 week time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in Vow F2 perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (Vow F1 or Con PLC). Conclusions Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise. PMID:21084987
Rhetorical and Linguistic Analysis of Bush's Second Inaugural Speech
ERIC Educational Resources Information Center
Sameer, Imad Hayif
2017-01-01
This study attempts to analyze Bush's second inaugural speech. It aims at investigating the use of linguistic strategies in it. It resorts to two models which are Aristotle's model while the second is that of Atkinson's (1984) to draw the attention towards linguistic strategies. The analysis shows that Bush's second inaugural speech is successful…
NASA Astrophysics Data System (ADS)
Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier
2005-07-01
Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
ERIC Educational Resources Information Center
Sovilla, J. Buttet, Ed.; de Weck, G., Ed.
1998-01-01
These articles on scaffolding in language and speech pathology/therapy are included in this issue: "Strategies d'etayage avec des enfants disphasiques: sont-elles specifiques?" ("Scaffolding Strategies for Dysphasic Children: Are They Specific?") (Genevieve de Weck); "Comparaison des strategies discursives d'etayage dans un conte et un recit…
Speech perception of young children using nucleus 22-channel or CLARION cochlear implants.
Young, N M; Grohne, K M; Carrasco, V N; Brown, C
1999-04-01
This study compares the auditory perceptual skill development of 23 congenitally deaf children who received the Nucleus 22-channel cochlear implant with the SPEAK speech coding strategy, and 20 children who received the CLARION Multi-Strategy Cochlear Implant with the Continuous Interleaved Sampler (CIS) speech coding strategy. All were under 5 years old at implantation. Preimplantation, there were no significant differences between the groups in age, length of hearing aid use, or communication mode. Auditory skills were assessed at 6 months and 12 months after implantation. Postimplantation, the mean scores on all speech perception tests were higher for the Clarion group. These differences were statistically significant for the pattern perception and monosyllable subtests of the Early Speech Perception battery at 6 months, and for the Glendonald Auditory Screening Procedure at 12 months. Multiple regression analysis revealed that device type accounted for the greatest variance in performance after 12 months of implant use. We conclude that children using the CIS strategy implemented in the Clarion implant may develop better auditory perceptual skills during the first year postimplantation than children using the SPEAK strategy with the Nucleus device.
NASA Astrophysics Data System (ADS)
Gonzalez, Julio; Oliver, Juan C.
2005-07-01
Considerable research on speech intelligibility for cochlear-implant users has been conducted using acoustic simulations with normal-hearing subjects. However, some relevant topics about perception through cochlear implants remain scantly explored. The present study examined the perception by normal-hearing subjects of gender and identity of a talker as a function of the number of channels in spectrally reduced speech. Two simulation strategies were compared. They were implemented by two different processors that presented signals as either the sum of sine waves at the center of the channels or as the sum of noise bands. In Experiment 1, 15 subjects determined the gender of 40 talkers (20 males + 20 females) from a natural utterance processed through 3, 4, 5, 6, 8, 10, 12, and 16 channels with both processors. In Experiment 2, 56 subjects matched a natural sentence uttered by 10 talkers with the corresponding simulation replicas processed through 3, 4, 8, and 16 channels for each processor. In Experiment 3, 72 subjects performed the same task but different sentences were used for natural and processed stimuli. A control Experiment 4 was conducted to equate the processing steps between the two simulation strategies. Results showed that gender and talker identification was better for the sine-wave processor, and that performance through the noise-band processor was more sensitive to the number of channels. Implications and possible explanations for the superiority of sine-wave simulations are discussed.
Spectro-temporal cues enhance modulation sensitivity in cochlear implant users.
Zheng, Yi; Escabí, Monty; Litovsky, Ruth Y
2017-08-01
Although speech understanding is highly variable amongst cochlear implants (CIs) subjects, the remarkably high speech recognition performance of many CI users is unexpected and not well understood. Numerous factors, including neural health and degradation of the spectral information in the speech signal of CIs, likely contribute to speech understanding. We studied the ability to use spectro-temporal modulations, which may be critical for speech understanding and discrimination, and hypothesize that CI users adopt a different perceptual strategy than normal-hearing (NH) individuals, whereby they rely more heavily on joint spectro-temporal cues to enhance detection of auditory cues. Modulation detection sensitivity was studied in CI users and NH subjects using broadband "ripple" stimuli that were modulated spectrally, temporally, or jointly, i.e., spectro-temporally. The spectro-temporal modulation transfer functions of CI users and NH subjects was decomposed into spectral and temporal dimensions and compared to those subjects' spectral-only and temporal-only modulation transfer functions. In CI users, the joint spectro-temporal sensitivity was better than that predicted by spectral-only and temporal-only sensitivity, indicating a heightened spectro-temporal sensitivity. Such an enhancement through the combined integration of spectral and temporal cues was not observed in NH subjects. The unique use of spectro-temporal cues by CI patients can yield benefits for use of cues that are important for speech understanding. This finding has implications for developing sound processing strategies that may rely on joint spectro-temporal modulations to improve speech comprehension of CI users, and the findings of this study may be valuable for developing clinical assessment tools to optimize CI processor performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Strategies to Improve Regeneration of the Soft Palate Muscles After Cleft Palate Repair
Carvajal Monroy, Paola L.; Grefte, Sander; Kuijpers-Jagtman, Anne Marie; Wagener, Frank A.D.T.G.
2012-01-01
Children with a cleft in the soft palate have difficulties with speech, swallowing, and sucking. These patients are unable to separate the nasal from the oral cavity leading to air loss during speech. Although surgical repair ameliorates soft palate function by joining the clefted muscles of the soft palate, optimal function is often not achieved. The regeneration of muscles in the soft palate after surgery is hampered because of (1) their low intrinsic regenerative capacity, (2) the muscle properties related to clefting, and (3) the development of fibrosis. Adjuvant strategies based on tissue engineering may improve the outcome after surgery by approaching these specific issues. Therefore, this review will discuss myogenesis in the noncleft and cleft palate, the characteristics of soft palate muscles, and the process of muscle regeneration. Finally, novel therapeutic strategies based on tissue engineering to improve soft palate function after surgical repair are presented. PMID:22697475
Strategies to improve regeneration of the soft palate muscles after cleft palate repair.
Carvajal Monroy, Paola L; Grefte, Sander; Kuijpers-Jagtman, Anne Marie; Wagener, Frank A D T G; Von den Hoff, Johannes W
2012-12-01
Children with a cleft in the soft palate have difficulties with speech, swallowing, and sucking. These patients are unable to separate the nasal from the oral cavity leading to air loss during speech. Although surgical repair ameliorates soft palate function by joining the clefted muscles of the soft palate, optimal function is often not achieved. The regeneration of muscles in the soft palate after surgery is hampered because of (1) their low intrinsic regenerative capacity, (2) the muscle properties related to clefting, and (3) the development of fibrosis. Adjuvant strategies based on tissue engineering may improve the outcome after surgery by approaching these specific issues. Therefore, this review will discuss myogenesis in the noncleft and cleft palate, the characteristics of soft palate muscles, and the process of muscle regeneration. Finally, novel therapeutic strategies based on tissue engineering to improve soft palate function after surgical repair are presented.
Strategies for Treating Compensatory Articulation in Patients with Cleft Palate
Del Carmen Pamplona, Maria; Ysunza, Antonio; Morales, Santiago
2014-01-01
Patients with cleft palate frequently show compensatory articulation (CA). CA requires a prolonged period of speech intervention. Some scaffolding strategies can be useful for correcting placement and manner of articulation in these cases. The purpose of this paper was to study whether the use of specific strategies of speech pathology can be more effective if applied according to the level of severity of CA. Ninety patients with CA were studied in two groups. One group was treated using strategies specific for their level of severity of articulation, whereas in the other group all strategies were used indistinctively. The degree of severity of CA was compared at the end of the speech intervention. After the speech therapy intervention, the group of patients in which the strategies were used selectively, showed a significantly greater decrease in the severity of CA, as compared with the patients in whom all the strategies were used indistinctively. An assessment of the severity of CA can be useful for selecting the strategies, which can be more effective for correcting the compensatory errors. PMID:24711749
Use of listening strategies for the speech of individuals with dysarthria and cerebral palsy.
Hustad, Katherine C; Dardis, Caitlin M; Kramper, Amy J
2011-03-01
This study examined listeners' endorsement of cognitive, linguistic, segmental, and suprasegmental strategies employed when listening to speakers with dysarthria. The study also examined whether strategy endorsement differed between listeners who earned the highest and lowest intelligibility scores. Speakers were eight individuals with dysarthria and cerebral palsy. Listeners were 80 individuals who transcribed speech stimuli and rated their use of each of 24 listening strategies on a 4-point scale. Results showed that cognitive and linguistic strategies were most highly endorsed. Use of listening strategies did not differ between listeners with the highest and lowest intelligibility scores. Results suggest that there may be a core of strategies common to listeners of speakers with dysarthria that may be supplemented by additional strategies, based on characteristics of the speaker and speech signal.
Investigations in mechanisms and strategies to enhance hearing with cochlear implants
NASA Astrophysics Data System (ADS)
Churchill, Tyler H.
Cochlear implants (CIs) produce hearing sensations by stimulating the auditory nerve (AN) with current pulses whose amplitudes are modulated by filtered acoustic temporal envelopes. While this technology has provided hearing for multitudinous CI recipients, even bilaterally-implanted listeners have more difficulty understanding speech in noise and localizing sounds than normal hearing (NH) listeners. Three studies reported here have explored ways to improve electric hearing abilities. Vocoders are often used to simulate CIs for NH listeners. Study 1 was a psychoacoustic vocoder study examining the effects of harmonic carrier phase dispersion and simulated CI current spread on speech intelligibility in noise. Results showed that simulated current spread was detrimental to speech understanding and that speech vocoded with carriers whose components' starting phases were equal was the least intelligible. Cross-correlogram analyses of AN model simulations confirmed that carrier component phase dispersion resulted in better neural envelope representation. Localization abilities rely on binaural processing mechanisms in the brainstem and mid-brain that are not fully understood. In Study 2, several potential mechanisms were evaluated based on the ability of metrics extracted from stereo AN simulations to predict azimuthal locations. Results suggest that unique across-frequency patterns of binaural cross-correlation may provide a strong cue set for lateralization and that interaural level differences alone cannot explain NH sensitivity to lateral position. While it is known that many bilateral CI users are sensitive to interaural time differences (ITDs) in low-rate pulsatile stimulation, most contemporary CI processing strategies use high-rate, constant-rate pulse trains. In Study 3, we examined the effects of pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition by bilateral CI listeners. Results showed that listeners were able to use low-rate pulse timing cues presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli even when mixed with high rates on other electrodes. These results have contributed to a better understanding of those aspects of the auditory system that support speech understanding and binaural hearing, suggested vocoder parameters that may simulate aspects of electric hearing, and shown that redundant, low-rate pulse timing supports improved spatial hearing for bilateral CI listeners.
NASA Astrophysics Data System (ADS)
Athaudage, Chandranath R. N.; Bradley, Alan B.; Lech, Margaret
2003-12-01
A dynamic programming-based optimization strategy for a temporal decomposition (TD) model of speech and its application to low-rate speech coding in storage and broadcasting is presented. In previous work with the spectral stability-based event localizing (SBEL) TD algorithm, the event localization was performed based on a spectral stability criterion. Although this approach gave reasonably good results, there was no assurance on the optimality of the event locations. In the present work, we have optimized the event localizing task using a dynamic programming-based optimization strategy. Simulation results show that an improved TD model accuracy can be achieved. A methodology of incorporating the optimized TD algorithm within the standard MELP speech coder for the efficient compression of speech spectral information is also presented. The performance evaluation results revealed that the proposed speech coding scheme achieves 50%-60% compression of speech spectral information with negligible degradation in the decoded speech quality.
Comparing Binaural Pre-processing Strategies III
Warzybok, Anna; Ernst, Stephan M. A.
2015-01-01
A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level. PMID:26721922
Advanced Persuasive Speaking, English, Speech: 5114.112.
ERIC Educational Resources Information Center
Dade County Public Schools, Miami, FL.
Developed as a high school quinmester unit on persuasive speaking, this guide provides the teacher with teaching strategies for a course which analyzes speeches from "Vital Speeches of the Day," political speeches, TV commercials, and other types of speeches. Practical use of persuasive methods for school, community, county, state, and…
ERIC Educational Resources Information Center
Holden, Laura K.; Vandali, Andrew E.; Skinner, Margaret W.; Fourakis, Marios S.; Holden, Timothy A.
2005-01-01
One of the difficulties faced by cochlear implant (CI) recipients is perception of low-intensity speech cues. A. E. Vandali (2001) has developed the transient emphasis spectral maxima (TESM) strategy to amplify short-duration, low-level sounds. The aim of the present study was to determine whether speech scores would be significantly higher with…
Preisig, Basil C; Eggenberger, Noëmi; Zito, Giuseppe; Vanbellingen, Tim; Schumacher, Rahel; Hopfner, Simone; Nyffeler, Thomas; Gutbrod, Klemens; Annoni, Jean-Marie; Bohlhalter, Stephan; Müri, René M
2015-03-01
Co-speech gestures are part of nonverbal communication during conversations. They either support the verbal message or provide the interlocutor with additional information. Furthermore, they prompt as nonverbal cues the cooperative process of turn taking. In the present study, we investigated the influence of co-speech gestures on the perception of dyadic dialogue in aphasic patients. In particular, we analysed the impact of co-speech gestures on gaze direction (towards speaker or listener) and fixation of body parts. We hypothesized that aphasic patients, who are restricted in verbal comprehension, adapt their visual exploration strategies. Sixteen aphasic patients and 23 healthy control subjects participated in the study. Visual exploration behaviour was measured by means of a contact-free infrared eye-tracker while subjects were watching videos depicting spontaneous dialogues between two individuals. Cumulative fixation duration and mean fixation duration were calculated for the factors co-speech gesture (present and absent), gaze direction (to the speaker or to the listener), and region of interest (ROI), including hands, face, and body. Both aphasic patients and healthy controls mainly fixated the speaker's face. We found a significant co-speech gesture × ROI interaction, indicating that the presence of a co-speech gesture encouraged subjects to look at the speaker. Further, there was a significant gaze direction × ROI × group interaction revealing that aphasic patients showed reduced cumulative fixation duration on the speaker's face compared to healthy controls. Co-speech gestures guide the observer's attention towards the speaker, the source of semantic input. It is discussed whether an underlying semantic processing deficit or a deficit to integrate audio-visual information may cause aphasic patients to explore less the speaker's face. Copyright © 2014 Elsevier Ltd. All rights reserved.
ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies.
Agrawal, Deepashri; Timm, Lydia; Viola, Filipa Campos; Debener, Stefan; Büchner, Andreas; Dengler, Reinhard; Wittfoth, Matthias
2012-09-20
Emotionally salient information in spoken language can be provided by variations in speech melody (prosody) or by emotional semantics. Emotional prosody is essential to convey feelings through speech. In sensori-neural hearing loss, impaired speech perception can be improved by cochlear implants (CIs). Aim of this study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with vocoded stimuli. Semantically neutral sentences with emotional (happy, angry and neutral) prosody were used. Sentences were manipulated to simulate two CI speech-coding strategies: the Advance Combination Encoder (ACE) and the newly developed Psychoacoustic Advanced Combination Encoder (PACE). Twenty NH adults were asked to recognize emotional prosody from ACE and PACE simulations. Performance was assessed using behavioral tests and event-related potentials (ERPs). Behavioral data revealed superior performance with original stimuli compared to the simulations. For simulations, better recognition for happy and angry prosody was observed compared to the neutral. Irrespective of simulated or unsimulated stimulus type, a significantly larger P200 event-related potential was observed for happy prosody after sentence onset than the other two emotions. Further, the amplitude of P200 was significantly more positive for PACE strategy use compared to the ACE strategy. Results suggested P200 peak as an indicator of active differentiation and recognition of emotional prosody. Larger P200 peak amplitude for happy prosody indicated importance of fundamental frequency (F0) cues in prosody processing. Advantage of PACE over ACE highlighted a privileged role of the psychoacoustic masking model in improving prosody perception. Taken together, the study emphasizes on the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilizing.
Impact of personality on the cerebral processing of emotional prosody.
Brück, Carolin; Kreifelts, Benjamin; Kaza, Evangelia; Lotze, Martin; Wildgruber, Dirk
2011-09-01
While several studies have focused on identifying common brain mechanisms governing the decoding of emotional speech melody, interindividual variations in the cerebral processing of prosodic information, in comparison, have received only little attention to date: Albeit, for instance, differences in personality among individuals have been shown to modulate emotional brain responses, personality influences on the neural basis of prosody decoding have not been investigated systematically yet. Thus, the present study aimed at delineating relationships between interindividual differences in personality and hemodynamic responses evoked by emotional speech melody. To determine personality-dependent modulations of brain reactivity, fMRI activation patterns during the processing of emotional speech cues were acquired from 24 healthy volunteers and subsequently correlated with individual trait measures of extraversion and neuroticism obtained for each participant. Whereas correlation analysis did not indicate any link between brain activation and extraversion, strong positive correlations between measures of neuroticism and hemodynamic responses of the right amygdala, the left postcentral gyrus as well as medial frontal structures including the right anterior cingulate cortex emerged, suggesting that brain mechanisms mediating the decoding of emotional speech melody may vary depending on differences in neuroticism among individuals. Observed trait-specific modulations are discussed in the light of processing biases as well as differences in emotion control or task strategies which may be associated with the personality trait of neuroticism. Copyright © 2011 Elsevier Inc. All rights reserved.
Communication Supports for People with Motor Speech Disorders
ERIC Educational Resources Information Center
Hanson, Elizabeth K.; Fager, Susan K.
2017-01-01
Communication supports for people with motor speech disorders can include strategies and technologies to supplement natural speech efforts, resolve communication breakdowns, and replace natural speech when necessary to enhance participation in all communicative contexts. This article emphasizes communication supports that can enhance…
Age Estimation Based on Children's Voice: A Fuzzy-Based Decision Fusion Strategy
Ting, Hua-Nong
2014-01-01
Automatic estimation of a speaker's age is a challenging research topic in the area of speech analysis. In this paper, a novel approach to estimate a speaker's age is presented. The method features a “divide and conquer” strategy wherein the speech data are divided into six groups based on the vowel classes. There are two reasons behind this strategy. First, reduction in the complicated distribution of the processing data improves the classifier's learning performance. Second, different vowel classes contain complementary information for age estimation. Mel-frequency cepstral coefficients are computed for each group and single layer feed-forward neural networks based on self-adaptive extreme learning machine are applied to the features to make a primary decision. Subsequently, fuzzy data fusion is employed to provide an overall decision by aggregating the classifier's outputs. The results are then compared with a number of state-of-the-art age estimation methods. Experiments conducted based on six age groups including children aged between 7 and 12 years revealed that fuzzy fusion of the classifier's outputs resulted in considerable improvement of up to 53.33% in age estimation accuracy. Moreover, the fuzzy fusion of decisions aggregated the complementary information of a speaker's age from various speech sources. PMID:25006595
Speaker recognition with temporal cues in acoustic and electric hearing
NASA Astrophysics Data System (ADS)
Vongphoe, Michael; Zeng, Fan-Gang
2005-08-01
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Comparing Binaural Pre-processing Strategies I: Instrumental Evaluation.
Baumgärtel, Regina M; Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M A; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-12-30
In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. © The Author(s) 2015.
Comparing Binaural Pre-processing Strategies I
Krawczyk-Becker, Martin; Marquardt, Daniel; Völker, Christoph; Hu, Hongmei; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Ernst, Stephan M. A.; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-01-01
In a collaborative research project, several monaural and binaural noise reduction algorithms have been comprehensively evaluated. In this article, eight selected noise reduction algorithms were assessed using instrumental measures, with a focus on the instrumental evaluation of speech intelligibility. Four distinct, reverberant scenarios were created to reflect everyday listening situations: a stationary speech-shaped noise, a multitalker babble noise, a single interfering talker, and a realistic cafeteria noise. Three instrumental measures were employed to assess predicted speech intelligibility and predicted sound quality: the intelligibility-weighted signal-to-noise ratio, the short-time objective intelligibility measure, and the perceptual evaluation of speech quality. The results show substantial improvements in predicted speech intelligibility as well as sound quality for the proposed algorithms. The evaluated coherence-based noise reduction algorithm was able to provide improvements in predicted audio signal quality. For the tested single-channel noise reduction algorithm, improvements in intelligibility-weighted signal-to-noise ratio were observed in all but the nonstationary cafeteria ambient noise scenario. Binaural minimum variance distortionless response beamforming algorithms performed particularly well in all noise scenarios. PMID:26721920
A minority perspective in the diagnosis of child language disorders.
Seymour, H N; Bland, L
1991-01-01
The effective diagnosis and treatment of persons from diverse minority language backgrounds has become an important issue in the field of speech and language pathology. Yet, many SLPs have had little or no formal training in minority language, there is a paucity of normative data on language acquisition in minority groups, and there are few standardized speech and language tests appropriate for these groups. We described a diagnostic process that addresses these problems. The diagnostic protocol we have proposed for a child from a Black English-speaking background characterizes many of the major issues in treating minority children. In summary, we proposed four assessment strategies: gathering referral source data; making direct observations; using standardized tests of non-speech and language behavior (cognition, perception, motor, etc.); and eliciting language samples and probes.
AuBuchon, Angela M.; Pisoni, David B.; Kronenberger, William G.
2015-01-01
OBJECTIVES Determine if early-implanted, long-term cochlear implant (CI) users display delays in verbal short-term and working memory capacity when processes related to audibility and speech production are eliminated. DESIGN Twenty-three long-term CI users and 23 normal-hearing controls each completed forward and backward digit span tasks under testing conditions which differed in presentation modality (auditory or visual) and response output (spoken recall or manual pointing). RESULTS Normal-hearing controls reproduced more lists of digits than the CI users, even when the test items were presented visually and the responses were made manually via touchscreen response. CONCLUSIONS Short-term and working memory delays observed in CI users are not due to greater demands from peripheral sensory processes such as audibility or from overt speech-motor planning and response output organization. Instead, CI users are less efficient at encoding and maintaining phonological representations in verbal short-term memory utilizing phonological and linguistic strategies during memory tasks. PMID:26496666
AuBuchon, Angela M; Pisoni, David B; Kronenberger, William G
2015-01-01
To determine whether early-implanted, long-term cochlear implant (CI) users display delays in verbal short-term and working memory capacity when processes related to audibility and speech production are eliminated. Twenty-three long-term CI users and 23 normal-hearing controls each completed forward and backward digit span tasks under testing conditions that differed in presentation modality (auditory or visual) and response output (spoken recall or manual pointing). Normal-hearing controls reproduced more lists of digits than the CI users, even when the test items were presented visually and the responses were made manually via touchscreen response. Short-term and working memory delays observed in CI users are not due to greater demands from peripheral sensory processes such as audibility or from overt speech-motor planning and response output organization. Instead, CI users are less efficient at encoding and maintaining phonological representations in verbal short-term memory using phonological and linguistic strategies during memory tasks.
Barone, Pascal; Chambaudie, Laure; Strelnikov, Kuzma; Fraysse, Bernard; Marx, Mathieu; Belin, Pascal; Deguine, Olivier
2016-10-01
Due to signal distortion, speech comprehension in cochlear-implanted (CI) patients relies strongly on visual information, a compensatory strategy supported by important cortical crossmodal reorganisations. Though crossmodal interactions are evident for speech processing, it is unclear whether a visual influence is observed in CI patients during non-linguistic visual-auditory processing, such as face-voice interactions, which are important in social communication. We analyse and compare visual-auditory interactions in CI patients and normal-hearing subjects (NHS) at equivalent auditory performance levels. Proficient CI patients and NHS performed a voice-gender categorisation in the visual-auditory modality from a morphing-generated voice continuum between male and female speakers, while ignoring the presentation of a male or female visual face. Our data show that during the face-voice interaction, CI deaf patients are strongly influenced by visual information when performing an auditory gender categorisation task, in spite of maximum recovery of auditory speech. No such effect is observed in NHS, even in situations of CI simulation. Our hypothesis is that the functional crossmodal reorganisation that occurs in deafness could influence nonverbal processing, such as face-voice interaction; this is important for patient internal supramodal representation. Copyright © 2016 Elsevier Ltd. All rights reserved.
Carey, Daniel; Mercure, Evelyne; Pizzioli, Fabrizio; Aydelott, Jennifer
2014-12-01
The effects of ear of presentation and competing speech on N400s to spoken words in context were examined in a dichotic sentence priming paradigm. Auditory sentence contexts with a strong or weak semantic bias were presented in isolation to the right or left ear, or with a competing signal presented in the other ear at a SNR of -12 dB. Target words were congruent or incongruent with the sentence meaning. Competing speech attenuated N400s to both congruent and incongruent targets, suggesting that the demand imposed by a competing signal disrupts the engagement of semantic comprehension processes. Bias strength affected N400 amplitudes differentially depending upon ear of presentation: weak contexts presented to the le/RH produced a more negative N400 response to targets than strong contexts, whereas no significant effect of bias strength was observed for sentences presented to the re/LH. The results are consistent with a model of semantic processing in which the RH relies on integrative processing strategies in the interpretation of sentence-level meaning. Copyright © 2014 Elsevier Ltd. All rights reserved.
Acoustic properties of naturally produced clear speech at normal speaking rates
NASA Astrophysics Data System (ADS)
Krause, Jean C.; Braida, Louis D.
2004-01-01
Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.
ERIC Educational Resources Information Center
Schaadt, Gesa; Männel, Claudia; van der Meer, Elke; Pannekamp, Ann; Friederici, Angela D.
2016-01-01
Successful communication in everyday life crucially involves the processing of auditory and visual components of speech. Viewing our interlocutor and processing visual components of speech facilitates speech processing by triggering auditory processing. Auditory phoneme processing, analyzed by event-related brain potentials (ERP), has been shown…
Design and Evaluation of a Cochlear Implant Strategy Based on a “Phantom” Channel
Nogueira, Waldo; Litvak, Leonid M.; Saoji, Aniket A.; Büchner, Andreas
2015-01-01
Unbalanced bipolar stimulation, delivered using charge balanced pulses, was used to produce “Phantom stimulation”, stimulation beyond the most apical contact of a cochlear implant’s electrode array. The Phantom channel was allocated audio frequencies below 300Hz in a speech coding strategy, conveying energy some two octaves lower than the clinical strategy and hence delivering the fundamental frequency of speech and of many musical tones. A group of 12 Advanced Bionics cochlear implant recipients took part in a chronic study investigating the fitting of the Phantom strategy and speech and music perception when using Phantom. The evaluation of speech in noise was performed immediately after fitting Phantom for the first time (Session 1) and after one month of take-home experience (Session 2). A repeated measures of analysis of variance (ANOVA) within factors strategy (Clinical, Phantom) and interaction time (Session 1, Session 2) revealed a significant effect for the interaction time and strategy. Phantom obtained a significant improvement in speech intelligibility after one month of use. Furthermore, a trend towards a better performance with Phantom (48%) with respect to F120 (37%) after 1 month of use failed to reach significance after type 1 error correction. Questionnaire results show a preference for Phantom when listening to music, likely driven by an improved balance between high and low frequencies. PMID:25806818
Baumgärtel, Regina M; Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-12-30
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. © The Author(s) 2015.
Speech processing using maximum likelihood continuity mapping
Hogden, John E.
2000-01-01
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hogden, J.E.
Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
The Role of Corticostriatal Systems in Speech Category Learning.
Yi, Han-Gyol; Maddox, W Todd; Mumford, Jeanette A; Chandrasekaran, Bharath
2016-04-01
One of the most difficult category learning problems for humans is learning nonnative speech categories. While feedback-based category training can enhance speech learning, the mechanisms underlying these benefits are unclear. In this functional magnetic resonance imaging study, we investigated neural and computational mechanisms underlying feedback-dependent speech category learning in adults. Positive feedback activated a large corticostriatal network including the dorsolateral prefrontal cortex, inferior parietal lobule, middle temporal gyrus, caudate, putamen, and the ventral striatum. Successful learning was contingent upon the activity of domain-general category learning systems: the fast-learning reflective system, involving the dorsolateral prefrontal cortex that develops and tests explicit rules based on the feedback content, and the slow-learning reflexive system, involving the putamen in which the stimuli are implicitly associated with category responses based on the reward value in feedback. Computational modeling of response strategies revealed significant use of reflective strategies early in training and greater use of reflexive strategies later in training. Reflexive strategy use was associated with increased activation in the putamen. Our results demonstrate a critical role for the reflexive corticostriatal learning system as a function of response strategy and proficiency during speech category learning. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo
2014-01-01
Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society's Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children's speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation.
The Segmentation Problem in the Study of Impromptu Speech.
ERIC Educational Resources Information Center
Loman, Bengt
A fundamental problem in the study of spontaneous speech is how to segment it for analysis. The segments should be relevant for the study of linguistic structures, speech planning, speech production, or communication strategies. Operational rules for segmentation should consider a wide variety of criteria and be hierarchically ordered. This is…
Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas
2016-01-01
Objectives: This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Design: Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Results: Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural performance (50%), binaural integration advantages were found regardless of whether a mismatch was simulated or not. When the CI-simulation ear supported a superior level of monaural performance (71%), evidence of binaural integration was absent when a mismatch was simulated using both the Realistic and the Ideal processing strategies. This absence of integration could not be accounted for by ceiling effects or by changes in SNR. Conclusions: If generalizable to unilaterally deaf CI users, the results of the current simulation study would suggest that benefits to speech perception in noise can be obtained by integrating information from an implanted ear and an NH ear. A mismatch in the delivery of spectral information between the ears due to a misalignment in the mapping of frequency to place may disrupt binaural integration in situations where both ears cannot support a similar level of monaural speech understanding. Previous studies that have measured the speech perception of unilaterally deaf individuals after CI but with nonindividualized frequency-to-electrode allocations may therefore have underestimated the potential benefits of providing binaural hearing. However, it remains unclear whether the size and nature of the potential incremental benefits from individualized allocations are sufficient to justify the time and resources required to derive them based on cochlear imaging or pitch-matching tasks. PMID:27116049
Ma, Ning; Morris, Saffron; Kitterick, Pádraig Thomas
2016-01-01
This study used vocoder simulations with normal-hearing (NH) listeners to (1) measure their ability to integrate speech information from an NH ear and a simulated cochlear implant (CI), and (2) investigate whether binaural integration is disrupted by a mismatch in the delivery of spectral information between the ears arising from a misalignment in the mapping of frequency to place. Eight NH volunteers participated in the study and listened to sentences embedded in background noise via headphones. Stimuli presented to the left ear were unprocessed. Stimuli presented to the right ear (referred to as the CI-simulation ear) were processed using an eight-channel noise vocoder with one of the three processing strategies. An Ideal strategy simulated a frequency-to-place map across all channels that matched the delivery of spectral information between the ears. A Realistic strategy created a misalignment in the mapping of frequency to place in the CI-simulation ear where the size of the mismatch between the ears varied across channels. Finally, a Shifted strategy imposed a similar degree of misalignment in all channels, resulting in consistent mismatch between the ears across frequency. The ability to report key words in sentences was assessed under monaural and binaural listening conditions and at signal to noise ratios (SNRs) established by estimating speech-reception thresholds in each ear alone. The SNRs ensured that the monaural performance of the left ear never exceeded that of the CI-simulation ear. The advantages of binaural integration were calculated by comparing binaural performance with monaural performance using the CI-simulation ear alone. Thus, these advantages reflected the additional use of the experimentally constrained left ear and were not attributable to better-ear listening. Binaural performance was as accurate as, or more accurate than, monaural performance with the CI-simulation ear alone. When both ears supported a similar level of monaural performance (50%), binaural integration advantages were found regardless of whether a mismatch was simulated or not. When the CI-simulation ear supported a superior level of monaural performance (71%), evidence of binaural integration was absent when a mismatch was simulated using both the Realistic and the Ideal processing strategies. This absence of integration could not be accounted for by ceiling effects or by changes in SNR. If generalizable to unilaterally deaf CI users, the results of the current simulation study would suggest that benefits to speech perception in noise can be obtained by integrating information from an implanted ear and an NH ear. A mismatch in the delivery of spectral information between the ears due to a misalignment in the mapping of frequency to place may disrupt binaural integration in situations where both ears cannot support a similar level of monaural speech understanding. Previous studies that have measured the speech perception of unilaterally deaf individuals after CI but with nonindividualized frequency-to-electrode allocations may therefore have underestimated the potential benefits of providing binaural hearing. However, it remains unclear whether the size and nature of the potential incremental benefits from individualized allocations are sufficient to justify the time and resources required to derive them based on cochlear imaging or pitch-matching tasks.
Freedom of Speech Newsletter, Volume 4, Number 1, October 1977.
ERIC Educational Resources Information Center
Kelley, Michael P., Ed.
This newsletter features an essay, "Anticipatory Democracy and Citizen Involvement: Strategies for Communication Education in the Future," which discusses strategies for improving citizen involvement and examines ways in which educators can prepare students for constructive citizen involvement. Notes on Speech Communication Association meetings…
Evolution of crossmodal reorganization of the voice area in cochlear-implanted deaf patients.
Rouger, Julien; Lagleyre, Sébastien; Démonet, Jean-François; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal
2012-08-01
Psychophysical and neuroimaging studies in both animal and human subjects have clearly demonstrated that cortical plasticity following sensory deprivation leads to a brain functional reorganization that favors the spared modalities. In postlingually deaf patients, the use of a cochlear implant (CI) allows a recovery of the auditory function, which will probably counteract the cortical crossmodal reorganization induced by hearing loss. To study the dynamics of such reversed crossmodal plasticity, we designed a longitudinal neuroimaging study involving the follow-up of 10 postlingually deaf adult CI users engaged in a visual speechreading task. While speechreading activates Broca's area in normally hearing subjects (NHS), the activity level elicited in this region in CI patients is abnormally low and increases progressively with post-implantation time. Furthermore, speechreading in CI patients induces abnormal crossmodal activations in right anterior regions of the superior temporal cortex normally devoted to processing human voice stimuli (temporal voice-sensitive areas-TVA). These abnormal activity levels diminish with post-implantation time and tend towards the levels observed in NHS. First, our study revealed that the neuroplasticity after cochlear implantation involves not only auditory but also visual and audiovisual speech processing networks. Second, our results suggest that during deafness, the functional links between cortical regions specialized in face and voice processing are reallocated to support speech-related visual processing through cross-modal reorganization. Such reorganization allows a more efficient audiovisual integration of speech after cochlear implantation. These compensatory sensory strategies are later completed by the progressive restoration of the visuo-audio-motor speech processing loop, including Broca's area. Copyright © 2011 Wiley Periodicals, Inc.
Van Lancker Sidtis, Diana
2004-01-01
Although interest in the language sciences was previously focused on newly created sentences, more recently much attention has turned to the importance of formulaic expressions in normal and disordered communication. Also referred to as formulaic expressions and made up of speech formulas, idioms, expletives, serial and memorized speech, slang, sayings, clichés, and conventional expressions, non-propositional language forms a large proportion of every speaker's competence, and may be differentially disturbed in neurological disorders. This review aims to examine non-propositional speech with respect to linguistic descriptions, psycholinguistic experiments, sociolinguistic studies, child language development, clinical language disorders, and neurological studies. Evidence from numerous sources reveals differentiated and specialized roles for novel and formulaic verbal functions, and suggests that generation of novel sentences and management of prefabricated expressions represent two legitimate and separable processes in language behaviour. A preliminary model of language behaviour that encompasses unitary and compositional properties and their integration in everyday language use is proposed. Integration and synchronizing of two disparate processes in language behaviour, formulaic and novel, characterizes normal communicative function and contributes to creativity in language. This dichotomy is supported by studies arising from other disciplines in neurology and psychology. Further studies are necessary to determine in what ways the various categories of formulaic expressions are related, and how these categories are processed by the brain. Better understanding of how non-propositional categories of speech are stored and processed in the brain can lead to better informed treatment strategies in language disorders.
Audiovisual speech perception development at varying levels of perceptual processing
Lalonde, Kaylah; Holt, Rachael Frush
2016-01-01
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children. PMID:27106318
Audiovisual speech perception development at varying levels of perceptual processing.
Lalonde, Kaylah; Holt, Rachael Frush
2016-04-01
This study used the auditory evaluation framework [Erber (1982). Auditory Training (Alexander Graham Bell Association, Washington, DC)] to characterize the influence of visual speech on audiovisual (AV) speech perception in adults and children at multiple levels of perceptual processing. Six- to eight-year-old children and adults completed auditory and AV speech perception tasks at three levels of perceptual processing (detection, discrimination, and recognition). The tasks differed in the level of perceptual processing required to complete them. Adults and children demonstrated visual speech influence at all levels of perceptual processing. Whereas children demonstrated the same visual speech influence at each level of perceptual processing, adults demonstrated greater visual speech influence on tasks requiring higher levels of perceptual processing. These results support previous research demonstrating multiple mechanisms of AV speech processing (general perceptual and speech-specific mechanisms) with independent maturational time courses. The results suggest that adults rely on both general perceptual mechanisms that apply to all levels of perceptual processing and speech-specific mechanisms that apply when making phonetic decisions and/or accessing the lexicon. Six- to eight-year-old children seem to rely only on general perceptual mechanisms across levels. As expected, developmental differences in AV benefit on this and other recognition tasks likely reflect immature speech-specific mechanisms and phonetic processing in children.
Berenstein, Carlo K; Mens, Lucas H M; Mulder, Jef J S; Vanpoucke, Filiep J
2008-04-01
To compare the effects of Monopole (Mono), Tripole (Tri), and "Virtual channel" (Vchan) electrode configurations on spectral resolution and speech perception in a crossover design. Nine experienced adults who received an Advanced Bionics CII/90K cochlear implant participated in a crossover design using three experimental strategies for 2 wk each. Three strategies were compared: (1) Mono; (2) Tri with current partly returning to adjacent electrodes and partly (25 or 75%) to the extracochlear reference; and (3) a monopolar "Vchan" strategy creating seven intermediate channels between two contacts. Each strategy was a variant of the standard "HiRes" processing strategy using 14 channels and 1105 pulses/sec/ channel, and a pulse duration of 32 microsec/phase. Spectral resolution was measured using broadband noise with a sinusoidally rippled spectral envelope with peaks evenly spaced on a logarithmic frequency scale. Speech perception was measured for monosyllables in quiet and in steady-state and fluctuating noises. Subjective comments on music experience and preferences in everyday use were assessed through questionnaires. Thresholds and most comfortable levels with Mono and Vchan were both significantly lower than levels with Tri. Spectral resolution was significantly higher with Tri than with Mono; spectral resolution with Vchan did not differ significantly from the other configurations. Moderate but significant correlations between word recognition and spectral resolution were found in speech in quiet and fluctuating noise. For speech in quiet, word recognition was best with Mono and worst with Vchan; Tri did not significantly differ from the other configurations. Pooled across the noise conditions, word recognition was best with Tri and worst with Vchan (Mono did not significantly differ from the other configurations). These differences were small and insufficient to result in a clear increase in performance across subjects if the result from the best configuration per subject was compared with the result from Mono. Across all subjects, music appreciation and satisfaction in everyday use did not clearly differ between configurations. (1) Although spectral resolution was improved with the tripolar configuration, differences in speech performance were too small in this limited group of subjects to justify clinical introduction. (2) Overall spectral resolution remained extremely poor compared with normal hearing; it remains to be seen whether further manipulations of the electrical field will be more effective.
McCormack, Jane; McLeod, Sharynne; McAllister, Lindy; Harrison, Linda J
2010-10-01
The purpose of this article was to understand the experience of speech impairment (speech sound disorders) in everyday life as described by children with speech impairment and their communication partners. Interviews were undertaken with 13 preschool children with speech impairment (mild to severe) and 21 significant others (family members and teachers). A phenomenological analysis of the interview transcripts revealed 2 global themes regarding the experience of living with speech impairment for these children and their families. The first theme encompassed the problems experienced by participants, namely (a) the child's inability to "speak properly," (b) the communication partner's failure to "listen properly," and (c) frustration caused by the speaking and listening problems. The second theme described the solutions participants used to overcome the problems. Solutions included (a) strategies to improve the child's speech accuracy (e.g., home practice, speech-language pathology) and (b) strategies to improve the listener's understanding (e.g., using gestures, repetition). Both short- and long-term solutions were identified. Successful communication is dependent on the skills of speakers and listeners. Intervention with children who experience speech impairment needs to reflect this reciprocity by supporting both the speaker and the listener and by addressing the frustration they experience.
Speech effort measurement and stuttering: investigating the chorus reading effect.
Ingham, Roger J; Warner, Allison; Byrd, Anne; Cotton, John
2006-06-01
The purpose of this study was to investigate chorus reading's (CR's) effect on speech effort during oral reading by adult stuttering speakers and control participants. The effect of a speech effort measurement highlighting strategy was also investigated. Twelve persistent stuttering (PS) adults and 12 normally fluent control participants completed 1-min base rate readings (BR-nonchorus) and CRs within a BR/CR/BR/CR/BR experimental design. Participants self-rated speech effort using a 9-point scale after each reading trial. Stuttering frequency, speech rate, and speech naturalness measures were also obtained. Instructions highlighting speech effort ratings during BR and CR phases were introduced after the first CR. CR improved speech effort ratings for the PS group, but the control group showed a reverse trend. Both groups' effort ratings were not significantly different during CR phases but were significantly poorer than the control group's effort ratings during BR phases. The highlighting strategy did not significantly change effort ratings. The findings show that CR will produce not only stutter-free and natural sounding speech but also reliable reductions in speech effort. However, these reductions do not reach effort levels equivalent to those achieved by normally fluent speakers, thereby conditioning its use as a gold standard of achievable normal fluency by PS speakers.
Real-time spectrum estimation–based dual-channel speech-enhancement algorithm for cochlear implant
2012-01-01
Background Improvement of the cochlear implant (CI) front-end signal acquisition is needed to increase speech recognition in noisy environments. To suppress the directional noise, we introduce a speech-enhancement algorithm based on microphone array beamforming and spectral estimation. The experimental results indicate that this method is robust to directional mobile noise and strongly enhances the desired speech, thereby improving the performance of CI devices in a noisy environment. Methods The spectrum estimation and the array beamforming methods were combined to suppress the ambient noise. The directivity coefficient was estimated in the noise-only intervals, and was updated to fit for the mobile noise. Results The proposed algorithm was realized in the CI speech strategy. For actual parameters, we use Maxflat filter to obtain fractional sampling points and cepstrum method to differentiate the desired speech frame and the noise frame. The broadband adjustment coefficients were added to compensate the energy loss in the low frequency band. Discussions The approximation of the directivity coefficient is tested and the errors are discussed. We also analyze the algorithm constraint for noise estimation and distortion in CI processing. The performance of the proposed algorithm is analyzed and further be compared with other prevalent methods. Conclusions The hardware platform was constructed for the experiments. The speech-enhancement results showed that our algorithm can suppresses the non-stationary noise with high SNR. Excellent performance of the proposed algorithm was obtained in the speech enhancement experiments and mobile testing. And signal distortion results indicate that this algorithm is robust with high SNR improvement and low speech distortion. PMID:23006896
ERIC Educational Resources Information Center
Manik, Sondang; Hutagaol, Juniati
2015-01-01
This study aims to find out the politeness strategies used by the teachers and how the politeness affects to the student's compliance. The focus is on directive and expressive speech acts. The subjects of this study were two teachers and the students of class II-A and II-B at SD 024184 Binjai Timur Binjai. The data was gathered by video audio…
Sensory-Cognitive Interaction in the Neural Encoding of Speech in Noise: A Review
Anderson, Samira; Kraus, Nina
2011-01-01
Background Speech-in-noise (SIN) perception is one of the most complex tasks faced by listeners on a daily basis. Although listening in noise presents challenges for all listeners, background noise inordinately affects speech perception in older adults and in children with learning disabilities. Hearing thresholds are an important factor in SIN perception, but they are not the only factor. For successful comprehension, the listener must perceive and attend to relevant speech features, such as the pitch, timing, and timbre of the target speaker’s voice. Here, we review recent studies linking SIN and brainstem processing of speech sounds. Purpose To review recent work that has examined the ability of the auditory brainstem response to complex sounds (cABR), which reflects the nervous system’s transcription of pitch, timing, and timbre, to be used as an objective neural index for hearing-in-noise abilities. Study Sample We examined speech-evoked brainstem responses in a variety of populations, including children who are typically developing, children with language-based learning impairment, young adults, older adults, and auditory experts (i.e., musicians). Data Collection and Analysis In a number of studies, we recorded brainstem responses in quiet and babble noise conditions to the speech syllable /da/ in all age groups, as well as in a variable condition in children in which /da/ was presented in the context of seven other speech sounds. We also measured speech-in-noise perception using the Hearing-in-Noise Test (HINT) and the Quick Speech-in-Noise Test (QuickSIN). Results Children and adults with poor SIN perception have deficits in the subcortical spectrotemporal representation of speech, including low-frequency spectral magnitudes and the timing of transient response peaks. Furthermore, auditory expertise, as engendered by musical training, provides both behavioral and neural advantages for processing speech in noise. Conclusions These results have implications for future assessment and management strategies for young and old populations whose primary complaint is difficulty hearing in background noise. The cABR provides a clinically applicable metric for objective assessment of individuals with SIN deficits, for determination of the biologic nature of disorders affecting SIN perception, for evaluation of appropriate hearing aid algorithms, and for monitoring the efficacy of auditory remediation and training. PMID:21241645
Moharir, Madhavi; Barnett, Noel; Taras, Jillian; Cole, Martha; Ford-Jones, E Lee; Levin, Leo
2014-01-01
Failure to recognize and intervene early in speech and language delays can lead to multifaceted and potentially severe consequences for early child development and later literacy skills. While routine evaluations of speech and language during well-child visits are recommended, there is no standardized (office) approach to facilitate this. Furthermore, extensive wait times for speech and language pathology consultation represent valuable lost time for the child and family. Using speech and language expertise, and paediatric collaboration, key content for an office-based tool was developed. The tool aimed to help physicians achieve three main goals: early and accurate identification of speech and language delays as well as children at risk for literacy challenges; appropriate referral to speech and language services when required; and teaching and, thus, empowering parents to create rich and responsive language environments at home. Using this tool, in combination with the Canadian Paediatric Society’s Read, Speak, Sing and Grow Literacy Initiative, physicians will be better positioned to offer practical strategies to caregivers to enhance children’s speech and language capabilities. The tool represents a strategy to evaluate speech and language delays. It depicts age-specific linguistic/phonetic milestones and suggests interventions. The tool represents a practical interim treatment while the family is waiting for formal speech and language therapy consultation. PMID:24627648
Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan
2017-02-01
Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Manasse, N J; Hux, K; Rankin-Erickson, J L
2000-11-01
Impairments in motor functioning, language processing, and cognitive status may impact the written language performance of traumatic brain injury (TBI) survivors. One strategy to minimize the impact of these impairments is to use a speech recognition system. The purpose of this study was to explore the effect of mild dysarthria and mild cognitive-communication deficits secondary to TBI on a 19-year-old survivor's mastery and use of such a system-specifically, Dragon Naturally Speaking. Data included the % of the participant's words accurately perceived by the system over time, the participant's accuracy over time in using commands for navigation and error correction, and quantitative and qualitative changes in the participant's written texts generated with and without the use of the speech recognition system. Results showed that Dragon NaturallySpeaking was approximately 80% accurate in perceiving words spoken by the participant, and the participant quickly and easily mastered all navigation and error correction commands presented. Quantitatively, the participant produced a greater amount of text using traditional word processing and a standard keyboard than using the speech recognition system. Minimal qualitative differences appeared between writing samples. Discussion of factors that may have contributed to the obtained results and that may affect the generalization of the findings to other TBI survivors is provided.
Taking It to the Classroom: Strategies for Re-Claiming Citizenship through Communication Education.
ERIC Educational Resources Information Center
Kelley, Colleen E.
Speech communication in general and rhetoric in particular address suggestions for protecting the "space" in which public conversation occurs in a democracy. Although the framers of the Constitution clearly envisioned significant citizen involvement in policy making, citizen access to the process has greatly declined, especially at the…
ERIC Educational Resources Information Center
Olesh, Ryan
2016-01-01
Language learners need to understand and apply appropriate discourse as part of the process of attaining "communicative competence" (Canale, 1983) needed to fulfill academic and social-adaptive functions. Students who are able to apply discourse strategies within the classroom demonstrate higher levels of metacognition and critical…
Souza, Pamela; Arehart, Kathryn; Neher, Tobias
2015-01-01
Working memory—the ability to process and store information—has been identified as an important aspect of speech perception in difficult listening environments. Working memory can be envisioned as a limited-capacity system which is engaged when an input signal cannot be readily matched to a stored representation or template. This “mismatch” is expected to occur more frequently when the signal is degraded. Because working memory capacity varies among individuals, those with smaller capacity are expected to demonstrate poorer speech understanding when speech is degraded, such as in background noise. However, it is less clear whether (and how) working memory should influence practical decisions, such as hearing treatment. Here, we consider the relationship between working memory capacity and response to specific hearing aid processing strategies. Three types of signal processing are considered, each of which will alter the acoustic signal: fast-acting wide-dynamic range compression, which smooths the amplitude envelope of the input signal; digital noise reduction, which may inadvertently remove speech signal components as it suppresses noise; and frequency compression, which alters the relationship between spectral peaks. For fast-acting wide-dynamic range compression, a growing body of data suggests that individuals with smaller working memory capacity may be more susceptible to such signal alterations, and may receive greater amplification benefit with “low alteration” processing. While the evidence for a relationship between wide-dynamic range compression and working memory appears robust, the effects of working memory on perceptual response to other forms of hearing aid signal processing are less clear cut. We conclude our review with a discussion of the opportunities (and challenges) in translating information on individual working memory into clinical treatment, including clinically feasible measures of working memory. PMID:26733899
Computational validation of the motor contribution to speech perception.
Badino, Leonardo; D'Ausilio, Alessandro; Fadiga, Luciano; Metta, Giorgio
2014-07-01
Action perception and recognition are core abilities fundamental for human social interaction. A parieto-frontal network (the mirror neuron system) matches visually presented biological motion information onto observers' motor representations. This process of matching the actions of others onto our own sensorimotor repertoire is thought to be important for action recognition, providing a non-mediated "motor perception" based on a bidirectional flow of information along the mirror parieto-frontal circuits. State-of-the-art machine learning strategies for hand action identification have shown better performances when sensorimotor data, as opposed to visual information only, are available during learning. As speech is a particular type of action (with acoustic targets), it is expected to activate a mirror neuron mechanism. Indeed, in speech perception, motor centers have been shown to be causally involved in the discrimination of speech sounds. In this paper, we review recent neurophysiological and machine learning-based studies showing (a) the specific contribution of the motor system to speech perception and (b) that automatic phone recognition is significantly improved when motor data are used during training of classifiers (as opposed to learning from purely auditory data). Copyright © 2014 Cognitive Science Society, Inc.
Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices
Falk, Tiago H.; Parsa, Vijay; Santos, João F.; Arehart, Kathryn; Hazrati, Oldooz; Huber, Rainer; Kates, James M.; Scollie, Susan
2015-01-01
This article presents an overview of twelve existing objective speech quality and intelligibility prediction tools. Two classes of algorithms are presented, namely intrusive and non-intrusive, with the former requiring the use of a reference signal, while the latter does not. Investigated metrics include both those developed for normal hearing listeners, as well as those tailored particularly for hearing impaired (HI) listeners who are users of assistive listening devices (i.e., hearing aids, HAs, and cochlear implants, CIs). Representative examples of those optimized for HI listeners include the speech-to-reverberation modulation energy ratio, tailored to hearing aids (SRMR-HA) and to cochlear implants (SRMR-CI); the modulation spectrum area (ModA); the hearing aid speech quality (HASQI) and perception indices (HASPI); and the PErception MOdel - hearing impairment quality (PEMO-Q-HI). The objective metrics are tested on three subjectively-rated speech datasets covering reverberation-alone, noise-alone, and reverberation-plus-noise degradation conditions, as well as degradations resultant from nonlinear frequency compression and different speech enhancement strategies. The advantages and limitations of each measure are highlighted and recommendations are given for suggested uses of the different tools under specific environmental and processing conditions. PMID:26052190
Auditory Training Effects on the Listening Skills of Children With Auditory Processing Disorder.
Loo, Jenny Hooi Yin; Rosen, Stuart; Bamiou, Doris-Eva
2016-01-01
Children with auditory processing disorder (APD) typically present with "listening difficulties,"' including problems understanding speech in noisy environments. The authors examined, in a group of such children, whether a 12-week computer-based auditory training program with speech material improved the perception of speech-in-noise test performance, and functional listening skills as assessed by parental and teacher listening and communication questionnaires. The authors hypothesized that after the intervention, (1) trained children would show greater improvements in speech-in-noise perception than untrained controls; (2) this improvement would correlate with improvements in observer-rated behaviors; and (3) the improvement would be maintained for at least 3 months after the end of training. This was a prospective randomized controlled trial of 39 children with normal nonverbal intelligence, ages 7 to 11 years, all diagnosed with APD. This diagnosis required a normal pure-tone audiogram and deficits in at least two clinical auditory processing tests. The APD children were randomly assigned to (1) a control group that received only the current standard treatment for children diagnosed with APD, employing various listening/educational strategies at school (N = 19); or (2) an intervention group that undertook a 3-month 5-day/week computer-based auditory training program at home, consisting of a wide variety of speech-based listening tasks with competing sounds, in addition to the current standard treatment. All 39 children were assessed for language and cognitive skills at baseline and on three outcome measures at baseline and immediate postintervention. Outcome measures were repeated 3 months postintervention in the intervention group only, to assess the sustainability of treatment effects. The outcome measures were (1) the mean speech reception threshold obtained from the four subtests of the listening in specialized noise test that assesses sentence perception in various configurations of masking speech, and in which the target speakers and test materials were unrelated to the training materials; (2) the Children's Auditory Performance Scale that assesses listening skills, completed by the children's teachers; and (3) the Clinical Evaluation of Language Fundamental-4 pragmatic profile that assesses pragmatic language use, completed by parents. All outcome measures significantly improved at immediate postintervention in the intervention group only, with effect sizes ranging from 0.76 to 1.7. Improvements in speech-in-noise performance correlated with improved scores in the Children's Auditory Performance Scale questionnaire in the trained group only. Baseline language and cognitive assessments did not predict better training outcome. Improvements in speech-in-noise performance were sustained 3 months postintervention. Broad speech-based auditory training led to improved auditory processing skills as reflected in speech-in-noise test performance and in better functional listening in real life. The observed correlation between improved functional listening with improved speech-in-noise perception in the trained group suggests that improved listening was a direct generalization of the auditory training.
Chung, King
2004-01-01
This review discusses the challenges in hearing aid design and fitting and the recent developments in advanced signal processing technologies to meet these challenges. The first part of the review discusses the basic concepts and the building blocks of digital signal processing algorithms, namely, the signal detection and analysis unit, the decision rules, and the time constants involved in the execution of the decision. In addition, mechanisms and the differences in the implementation of various strategies used to reduce the negative effects of noise are discussed. These technologies include the microphone technologies that take advantage of the spatial differences between speech and noise and the noise reduction algorithms that take advantage of the spectral difference and temporal separation between speech and noise. The specific technologies discussed in this paper include first-order directional microphones, adaptive directional microphones, second-order directional microphones, microphone matching algorithms, array microphones, multichannel adaptive noise reduction algorithms, and synchrony detection noise reduction algorithms. Verification data for these technologies, if available, are also summarized. PMID:15678225
Auditory Training: Evidence for Neural Plasticity in Older Adults
Anderson, Samira; Kraus, Nina
2014-01-01
Improvements in digital amplification, cochlear implants, and other innovations have extended the potential for improving hearing function; yet, there remains a need for further hearing improvement in challenging listening situations, such as when trying to understand speech in noise or when listening to music. Here, we review evidence from animal and human models of plasticity in the brain’s ability to process speech and other meaningful stimuli. We considered studies targeting populations of younger through older adults, emphasizing studies that have employed randomized controlled designs and have made connections between neural and behavioral changes. Overall results indicate that the brain remains malleable through older adulthood, provided that treatment algorithms have been modified to allow for changes in learning with age. Improvements in speech-in-noise perception and cognition function accompany neural changes in auditory processing. The training-related improvements noted across studies support the need to consider auditory training strategies in the management of individuals who express concerns about hearing in difficult listening situations. Given evidence from studies engaging the brain’s reward centers, future research should consider how these centers can be naturally activated during training. PMID:25485037
Congdon, Eliza L; Novack, Miriam A; Brooks, Neon; Hemani-Lopez, Naureen; O'Keefe, Lucy; Goldin-Meadow, Susan
2017-08-01
When teachers gesture during instruction, children retain and generalize what they are taught (Goldin-Meadow, 2014). But why does gesture have such a powerful effect on learning? Previous research shows that children learn most from a math lesson when teachers present one problem-solving strategy in speech while simultaneously presenting a different, but complementary, strategy in gesture (Singer & Goldin-Meadow, 2005). One possibility is that gesture is powerful in this context because it presents information simultaneously with speech. Alternatively, gesture may be effective simply because it involves the body, in which case the timing of information presented in speech and gesture may be less important for learning. Here we find evidence for the importance of simultaneity: 3 rd grade children retain and generalize what they learn from a math lesson better when given instruction containing simultaneous speech and gesture than when given instruction containing sequential speech and gesture. Interpreting these results in the context of theories of multimodal learning, we find that gesture capitalizes on its synchrony with speech to promote learning that lasts and can be generalized.
Hazan, Valerie; Tuomainen, Outi; Pettinato, Michèle
2016-12-01
This study investigated the acoustic characteristics of spontaneous speech by talkers aged 9-14 years and their ability to adapt these characteristics to maintain effective communication when intelligibility was artificially degraded for their interlocutor. Recordings were made for 96 children (50 female participants, 46 male participants) engaged in a problem-solving task with a same-sex friend; recordings for 20 adults were used as reference. The task was carried out in good listening conditions (normal transmission) and in degraded transmission conditions. Articulation rate, median fundamental frequency (f0), f0 range, and relative energy in the 1- to 3-kHz range were analyzed. With increasing age, children significantly reduced their median f0 and f0 range, became faster talkers, and reduced their mid-frequency energy in spontaneous speech. Children produced similar clear speech adaptations (in degraded transmission conditions) as adults, but only children aged 11-14 years increased their f0 range, an unhelpful strategy not transmitted via the vocoder. Changes made by children were consistent with a general increase in vocal effort. Further developments in speech production take place during later childhood. Children use clear speech strategies to benefit an interlocutor facing intelligibility problems but may not be able to attune these strategies to the same degree as adults.
ERIC Educational Resources Information Center
Turney, Michael T.; And Others
This report on speech research contains papers describing experiments involving both information processing and speech production. The papers concerned with information processing cover such topics as peripheral and central processes in vision, separate speech and nonspeech processing in dichotic listening, and dichotic fusion along an acoustic…
Neurophysiological Influence of Musical Training on Speech Perception
Shahin, Antoine J.
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL. PMID:21716639
Neurophysiological influence of musical training on speech perception.
Shahin, Antoine J
2011-01-01
Does musical training affect our perception of speech? For example, does learning to play a musical instrument modify the neural circuitry for auditory processing in a way that improves one's ability to perceive speech more clearly in noisy environments? If so, can speech perception in individuals with hearing loss (HL), who struggle in noisy situations, benefit from musical training? While music and speech exhibit some specialization in neural processing, there is evidence suggesting that skills acquired through musical training for specific acoustical processes may transfer to, and thereby improve, speech perception. The neurophysiological mechanisms underlying the influence of musical training on speech processing and the extent of this influence remains a rich area to be explored. A prerequisite for such transfer is the facilitation of greater neurophysiological overlap between speech and music processing following musical training. This review first establishes a neurophysiological link between musical training and speech perception, and subsequently provides further hypotheses on the neurophysiological implications of musical training on speech perception in adverse acoustical environments and in individuals with HL.
ERIC Educational Resources Information Center
Blood, Gordon W.; Boyle, Michael P.; Blood, Ingrid M.; Nalesnik, Gina R.
2010-01-01
Bullying in school-age children is a global epidemic. School personnel play a critical role in eliminating this problem. The goals of this study were to examine speech-language pathologists' (SLPs) perceptions of bullying, endorsement of potential strategies for dealing with bullying, and associations among SLPs' responses and specific demographic…
ERIC Educational Resources Information Center
Yang, Tae-Kyoung
2002-01-01
Examines how apology speech act strategies frequently used in daily life are transferred in the framework of interlanguage pragmatics and sociolinguistics and how they are influenced by sociolinguistic variations such as social status, social distance, severity of offense, and formal or private relationships. (Author/VWL)
Apology Strategies Employed by Saudi EFL Teachers
ERIC Educational Resources Information Center
Alsulayyi, Marzouq Nasser
2016-01-01
This study examines the apology strategies used by 30 Saudi EFL teachers in Najran, the Kingdom of Saudi Arabia (KSA), paying special attention to variables such as social distance and power and offence severity. The study also delineates gender differences in the respondents' speech as opposed to studies that only examined speech act output by…
Speech Intelligibility Predicted from Neural Entrainment of the Speech Envelope.
Vanthornhout, Jonas; Decruy, Lien; Wouters, Jan; Simon, Jonathan Z; Francart, Tom
2018-04-01
Speech intelligibility is currently measured by scoring how well a person can identify a speech signal. The results of such behavioral measures reflect neural processing of the speech signal, but are also influenced by language processing, motivation, and memory. Very often, electrophysiological measures of hearing give insight in the neural processing of sound. However, in most methods, non-speech stimuli are used, making it hard to relate the results to behavioral measures of speech intelligibility. The use of natural running speech as a stimulus in electrophysiological measures of hearing is a paradigm shift which allows to bridge the gap between behavioral and electrophysiological measures. Here, by decoding the speech envelope from the electroencephalogram, and correlating it with the stimulus envelope, we demonstrate an electrophysiological measure of neural processing of running speech. We show that behaviorally measured speech intelligibility is strongly correlated with our electrophysiological measure. Our results pave the way towards an objective and automatic way of assessing neural processing of speech presented through auditory prostheses, reducing confounds such as attention and cognitive capabilities. We anticipate that our electrophysiological measure will allow better differential diagnosis of the auditory system, and will allow the development of closed-loop auditory prostheses that automatically adapt to individual users.
Asad, Areej Nimer; Purdy, Suzanne C; Ballard, Elaine; Fairgray, Liz; Bowen, Caroline
2018-04-27
In this descriptive study, phonological processes were examined in the speech of children aged 5;0-7;6 (years; months) with mild to profound hearing loss using hearing aids (HAs) and cochlear implants (CIs), in comparison to their peers. A second aim was to compare phonological processes of HA and CI users. Children with hearing loss (CWHL, N = 25) were compared to children with normal hearing (CWNH, N = 30) with similar age, gender, linguistic, and socioeconomic backgrounds. Speech samples obtained from a list of 88 words, derived from three standardized speech tests, were analyzed using the CASALA (Computer Aided Speech and Language Analysis) program to evaluate participants' phonological systems, based on lax (a process appeared at least twice in the speech of at least two children) and strict (a process appeared at least five times in the speech of at least two children) counting criteria. Developmental phonological processes were eliminated in the speech of younger and older CWNH while eleven developmental phonological processes persisted in the speech of both age groups of CWHL. CWHL showed a similar trend of age of elimination to CWNH, but at a slower rate. Children with HAs and CIs produced similar phonological processes. Final consonant deletion, weak syllable deletion, backing, and glottal replacement were present in the speech of HA users, affecting their overall speech intelligibility. Developmental and non-developmental phonological processes persist in the speech of children with mild to profound hearing loss compared to their peers with typical hearing. The findings indicate that it is important for clinicians to consider phonological assessment in pre-school CWHL and the use of evidence-based speech therapy in order to reduce non-developmental and non-age-appropriate developmental processes, thereby enhancing their speech intelligibility. Copyright © 2018 Elsevier Inc. All rights reserved.
Shriberg, Lawrence D; Strand, Edythe A; Fourakis, Marios; Jakielski, Kathy J; Hall, Sheryl D; Karlsson, Heather B; Mabie, Heather L; McSweeny, Jane L; Tilkens, Christie M; Wilson, David L
2017-04-14
Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS-AAS similarities in the age-sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits.
Shamma, Shihab; Lorenzi, Christian
2013-05-01
There is much debate on how the spectrotemporal modulations of speech (or its spectrogram) are encoded in the responses of the auditory nerve, and whether speech intelligibility is best conveyed via the "envelope" (E) or "temporal fine-structure" (TFS) of the neural responses. Wide use of vocoders to resolve this question has commonly assumed that manipulating the amplitude-modulation and frequency-modulation components of the vocoded signal alters the relative importance of E or TFS encoding on the nerve, thus facilitating assessment of their relative importance to intelligibility. Here we argue that this assumption is incorrect, and that the vocoder approach is ineffective in differentially altering the neural E and TFS. In fact, we demonstrate using a simplified model of early auditory processing that both neural E and TFS encode the speech spectrogram with constant and comparable relative effectiveness regardless of the vocoder manipulations. However, we also show that neural TFS cues are less vulnerable than their E counterparts under severe noisy conditions, and hence should play a more prominent role in cochlear stimulation strategies.
ERIC Educational Resources Information Center
Perrachione, Tyler K.; Wong, Patrick C. M.
2007-01-01
Brain imaging studies of voice perception often contrast activation from vocal and verbal tasks to identify regions uniquely involved in processing voice. However, such a strategy precludes detection of the functional relationship between speech and voice perception. In a pair of experiments involving identifying voices from native and foreign…
ERIC Educational Resources Information Center
Kraut, Rachel
2015-01-01
Morphological awareness facilitates many reading processes. For this reason, L1 and L2 learners of English are often directly taught to use their knowledge of English morphology as a useful reading strategy for determining parts of speech and meaning of novel words. Over time, use of morphological awareness skills while reading develops into an…
Music and Literacy: Strategies Using "Comprehension Connections" by Tanny McGregor
ERIC Educational Resources Information Center
Frasher, Kathleen Diane
2014-01-01
Music and literacy share many of the same skills; therefore, it is no surprise that music and literacy programs can be used together to help children learn to read. Music study can help promote literacy skills such as vocabulary, articulation, pronunciation, grammar, fluency, writing, sentence patterns, rhythm/parts of speech, auditory processing,…
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise
Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath
2017-01-01
Purpose Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing varying degrees of linguistic information (2-talker babble or pink noise). Method Twenty-nine monolingual English speakers were instructed to ignore the lexical status of spoken syllables (e.g., gift vs. kift) and to only categorize the initial phonemes (/g/ vs. /k/). The same participants then performed speech recognition tasks in the presence of 2-talker babble or pink noise in audio-only and audiovisual conditions. Results Individuals who demonstrated greater lexical influences on phonemic processing experienced greater speech processing difficulties in 2-talker babble than in pink noise. These selective difficulties were present across audio-only and audiovisual conditions. Conclusion Individuals with greater reliance on lexical processes during speech perception exhibit impaired speech recognition in listening conditions in which competing talkers introduce audible linguistic interferences. Future studies should examine the locus of lexical influences/interferences on phonemic processing and speech-in-speech processing. PMID:28586824
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E.; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z.
2015-01-01
In the last decade, the debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. However, the exact role of the motor system in auditory speech processing remains elusive. Here we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. The patient’s spontaneous speech was marked by frequent phonological/articulatory errors, and those errors were caused, at least in part, by motor-level impairments with speech production. We found that the patient showed a normal phonemic categorical boundary when discriminating two nonwords that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the nonword stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labeling impairment. These data suggest that the identification (i.e. labeling) of nonword speech sounds may involve the speech motor system, but that the perception of speech sounds (i.e., discrimination) does not require the motor system. This means that motor processes are not causally involved in perception of the speech signal, and suggest that the motor system may be used when other cues (e.g., meaning, context) are not available. PMID:25951749
The Learning of Complex Speech Act Behaviour.
ERIC Educational Resources Information Center
Olshtain, Elite; Cohen, Andrew
1990-01-01
Pre- and posttraining measurement of adult English-as-a-Second-Language learners' (N=18) apology speech act behavior found no clear-cut quantitative improvement after training, although there was an obvious qualitative approximation of native-like speech act behavior in terms of types of intensification and downgrading, choice of strategy, and…
Speech Effort Measurement and Stuttering: Investigating the Chorus Reading Effect
ERIC Educational Resources Information Center
Ingham, Roger J.; Warner, Allison; Byrd, Anne; Cotton, John
2006-01-01
Purpose: The purpose of this study was to investigate chorus reading's (CR's) effect on speech effort during oral reading by adult stuttering speakers and control participants. The effect of a speech effort measurement highlighting strategy was also investigated. Method: Twelve persistent stuttering (PS) adults and 12 normally fluent control…
Yoder, Paul J.; Molfese, Dennis; Murray, Micah M.; Key, Alexandra P. F.
2013-01-01
Typically developing (TD) preschoolers and age-matched preschoolers with specific language impairment (SLI) received event-related potentials (ERPs) to four monosyllabic speech sounds prior to treatment and, in the SLI group, after 6 months of grammatical treatment. Before treatment, the TD group processed speech sounds faster than the SLI group. The SLI group increased the speed of their speech processing after treatment. Post-treatment speed of speech processing predicted later impairment in comprehending phrase elaboration in the SLI group. During the treatment phase, change in speed of speech processing predicted growth rate of grammar in the SLI group. PMID:24219693
An Investigation of Refusal Strategies as Used by Bahdini Kurdish and Syriac Aramaic Speakers
ERIC Educational Resources Information Center
Shareef, Dilgash M.; Qyrio, Marina Isteefan; Ali, Chiman Nadheer
2018-01-01
For the purpose of achieving a successful communication, issues such as the appropriateness of speech acts and face saving become essential. Therefore, it is very important to achieve a high level of pragmatic competence in speech acts. Bearing this in mind, this study was conducted to investigate the preferred refusal strategies Kurdish and…
ERIC Educational Resources Information Center
Donai, Jeremy J.; Schwartz, Jeremy C.
2016-01-01
Clinical Question: What high-frequency amplification strategy maximizes speechrecognition performance among adult hearing-impaired listeners with mild sloping to moderately severe sensorineural hearing loss? Method: Quick response review. Study Sources: EBSCO, PubMed, Google Scholar, as well as journals from the American Speech-Language-Hearing…
Derix, Johanna; Iljina, Olga; Weiske, Johanna; Schulze-Bonhage, Andreas; Aertsen, Ad; Ball, Tonio
2014-01-01
Exchange of thoughts by means of expressive speech is fundamental to human communication. However, the neuronal basis of real-life communication in general, and of verbal exchange of ideas in particular, has rarely been studied until now. Here, our aim was to establish an approach for exploring the neuronal processes related to cognitive “idea” units (IUs) in conditions of non-experimental speech production. We investigated whether such units corresponding to single, coherent chunks of speech with syntactically-defined borders, are useful to unravel the neuronal mechanisms underlying real-world human cognition. To this aim, we employed simultaneous electrocorticography (ECoG) and video recordings obtained in pre-neurosurgical diagnostics of epilepsy patients. We transcribed non-experimental, daily hospital conversations, identified IUs in transcriptions of the patients' speech, classified the obtained IUs according to a previously-proposed taxonomy focusing on memory content, and investigated the underlying neuronal activity. In each of our three subjects, we were able to collect a large number of IUs which could be assigned to different functional IU subclasses with a high inter-rater agreement. Robust IU-onset-related changes in spectral magnitude could be observed in high gamma frequencies (70–150 Hz) on the inferior lateral convexity and in the superior temporal cortex regardless of the IU content. A comparison of the topography of these responses with mouth motor and speech areas identified by electrocortical stimulation showed that IUs might be of use for extraoperative mapping of eloquent cortex (average sensitivity: 44.4%, average specificity: 91.1%). High gamma responses specific to memory-related IU subclasses were observed in the inferior parietal and prefrontal regions. IU-based analysis of ECoG recordings during non-experimental communication thus elicits topographically- and functionally-specific effects. We conclude that segmentation of spontaneous real-world speech in linguistically-motivated units is a promising strategy for elucidating the neuronal basis of mental processing during non-experimental communication. PMID:24982625
Reaction times of normal listeners to laryngeal, alaryngeal, and synthetic speech.
Evitts, Paul M; Searl, Jeff
2006-12-01
The purpose of this study was to compare listener processing demands when decoding alaryngeal compared to laryngeal speech. Fifty-six listeners were presented with single words produced by 1 proficient speaker from 5 different modes of speech: normal, tracheosophageal (TE), esophageal (ES), electrolaryngeal (EL), and synthetic speech (SS). Cognitive processing load was indexed by listener reaction time (RT). To account for significant durational differences among the modes of speech, an RT ratio was calculated (stimulus duration divided by RT). Results indicated that the cognitive processing load was greater for ES and EL relative to normal speech. TE and normal speech did not differ in terms of RT ratio, suggesting fairly comparable cognitive demands placed on the listener. SS required greater cognitive processing load than normal and alaryngeal speech. The results are discussed relative to alaryngeal speech intelligibility and the role of the listener. Potential clinical applications and directions for future research are also presented.
Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
2017-01-01
Purpose Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method PM and other scores were obtained for 264 participants in 6 groups: CAS in idiopathic, neurogenetic, and complex neurodevelopmental disorders; adult-onset apraxia of speech (AAS) consequent to stroke and primary progressive apraxia of speech; and idiopathic speech delay. Results Participants with CAS and AAS had significantly lower scores than typically speaking reference participants and speech delay controls on measures posited to assess representational and transcoding processes. Representational deficits differed between CAS and AAS groups, with support for both underspecified linguistic representations and memory/access deficits in CAS, but for only the latter in AAS. CAS–AAS similarities in the age–sex standardized percentages of occurrence of the most frequent type of inappropriate pauses (abrupt) and significant differences in the standardized occurrence of appropriate pauses were consistent with speech processing findings. Conclusions Results support the hypotheses of core representational and transcoding speech processing deficits in CAS and theoretical coherence of the PM's pause-speech elements with these deficits. PMID:28384751
SPAIDE: A Real-time Research Platform for the Clarion CII/90K Cochlear Implant
NASA Astrophysics Data System (ADS)
Van Immerseel, L.; Peeters, S.; Dykmans, P.; Vanpoucke, F.; Bracke, P.
2005-12-01
SPAIDE ( sound-processing algorithm integrated development environment) is a real-time platform of Advanced Bionics Corporation (Sylmar, Calif, USA) to facilitate advanced research on sound-processing and electrical-stimulation strategies with the Clarion CII and 90K implants. The platform is meant for testing in the laboratory. SPAIDE is conceptually based on a clear separation of the sound-processing and stimulation strategies, and, in specific, on the distinction between sound-processing and stimulation channels and electrode contacts. The development environment has a user-friendly interface to specify sound-processing and stimulation strategies, and includes the possibility to simulate the electrical stimulation. SPAIDE allows for real-time sound capturing from file or audio input on PC, sound processing and application of the stimulation strategy, and streaming the results to the implant. The platform is able to cover a broad range of research applications; from noise reduction and mimicking of normal hearing, over complex (simultaneous) stimulation strategies, to psychophysics. The hardware setup consists of a personal computer, an interface board, and a speech processor. The software is both expandable and to a great extent reusable in other applications.
Content-based TV sports video retrieval using multimodal analysis
NASA Astrophysics Data System (ADS)
Yu, Yiqing; Liu, Huayong; Wang, Hongbin; Zhou, Dongru
2003-09-01
In this paper, we propose content-based video retrieval, which is a kind of retrieval by its semantical contents. Because video data is composed of multimodal information streams such as video, auditory and textual streams, we describe a strategy of using multimodal analysis for automatic parsing sports video. The paper first defines the basic structure of sports video database system, and then introduces a new approach that integrates visual stream analysis, speech recognition, speech signal processing and text extraction to realize video retrieval. The experimental results for TV sports video of football games indicate that the multimodal analysis is effective for video retrieval by quickly browsing tree-like video clips or inputting keywords within predefined domain.
Kiernan, Barbara; Gray, Shelley
2013-05-01
Talk It Out was developed by speech-language pathologists to teach young children, especially those with speech and language impairments, to recognize problems, use words to solve them, and verbally negotiate solutions. One of the very successful by-products is that these same strategies help children avoid harming their voice. Across a school year, Talk It Out provides teaching and practice in predictable contexts so that children become competent problem solvers. It is especially powerful when implemented as part of the tier 1 preschool curriculum. The purpose of this article is to help school-based speech-language pathologists (1) articulate the need and rationale for early implementation of conflict resolution programs, (2) develop practical skills to implement Talk It Out strategies in their programs, and (3) transfer this knowledge to classroom teachers who can use and reinforce these strategies on a daily basis with the children they serve. Thieme Medical Publishers 333 Seventh Avenue, New York, NY 10001, USA.
Coelho, Ana Cristina; Brasolotto, Alcione Ghedini; Bevilacqua, Maria Cecília
2015-06-01
To compare some perceptual and acoustic characteristics of the voices of children who use the advanced combination encoder (ACE) or fine structure processing (FSP) speech coding strategies, and to investigate whether these characteristics differ from children with normal hearing. Acoustic analysis of the sustained vowel /a/ was performed using the multi-dimensional voice program (MDVP). Analyses of sequential and spontaneous speech were performed using the real time pitch. Perceptual analyses of these samples were performed using visual-analogic scales of pre-selected parameters. Seventy-six children from three years to five years and 11 months of age participated. Twenty-eight were users of ACE, 23 were users of FSP, and 25 were children with normal hearing. Although both groups with CI presented with some deviated vocal features, the users of ACE presented with voice quality more like children with normal hearing than the users of FSP. Sound processing of ACE appeared to provide better conditions for auditory monitoring of the voice, and consequently, for better control of the voice production. However, these findings need to be further investigated due to the lack of comparative studies published to understand exactly which attributes of sound processing are responsible for differences in performance.
ERIC Educational Resources Information Center
Hickok, Gregory
2012-01-01
Speech recognition is an active process that involves some form of predictive coding. This statement is relatively uncontroversial. What is less clear is the source of the prediction. The dual-stream model of speech processing suggests that there are two possible sources of predictive coding in speech perception: the motor speech system and the…
The Timing and Effort of Lexical Access in Natural and Degraded Speech
Wagner, Anita E.; Toffanin, Paolo; Başkent, Deniz
2016-01-01
Understanding speech is effortless in ideal situations, and although adverse conditions, such as caused by hearing impairment, often render it an effortful task, they do not necessarily suspend speech comprehension. A prime example of this is speech perception by cochlear implant users, whose hearing prostheses transmit speech as a significantly degraded signal. It is yet unknown how mechanisms of speech processing deal with such degraded signals, and whether they are affected by effortful processing of speech. This paper compares the automatic process of lexical competition between natural and degraded speech, and combines gaze fixations, which capture the course of lexical disambiguation, with pupillometry, which quantifies the mental effort involved in processing speech. Listeners’ ocular responses were recorded during disambiguation of lexical embeddings with matching and mismatching durational cues. Durational cues were selected due to their substantial role in listeners’ quick limitation of the number of lexical candidates for lexical access in natural speech. Results showed that lexical competition increased mental effort in processing natural stimuli in particular in presence of mismatching cues. Signal degradation reduced listeners’ ability to quickly integrate durational cues in lexical selection, and delayed and prolonged lexical competition. The effort of processing degraded speech was increased overall, and because it had its sources at the pre-lexical level this effect can be attributed to listening to degraded speech rather than to lexical disambiguation. In sum, the course of lexical competition was largely comparable for natural and degraded speech, but showed crucial shifts in timing, and different sources of increased mental effort. We argue that well-timed progress of information from sensory to pre-lexical and lexical stages of processing, which is the result of perceptual adaptation during speech development, is the reason why in ideal situations speech is perceived as an undemanding task. Degradation of the signal or the receiver channel can quickly bring this well-adjusted timing out of balance and lead to increase in mental effort. Incomplete and effortful processing at the early pre-lexical stages has its consequences on lexical processing as it adds uncertainty to the forming and revising of lexical hypotheses. PMID:27065901
Comparing Binaural Pre-processing Strategies II
Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
2015-01-01
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users. PMID:26721921
Speech systems research at Texas Instruments
NASA Technical Reports Server (NTRS)
Doddington, George R.
1977-01-01
An assessment of automatic speech processing technology is presented. Fundamental problems in the development and the deployment of automatic speech processing systems are defined and a technology forecast for speech systems is presented.
Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret
2016-02-01
During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.
Speech perception as an active cognitive process
Heald, Shannon L. M.; Nusbaum, Howard C.
2014-01-01
One view of speech perception is that acoustic signals are transformed into representations for pattern matching to determine linguistic structure. This process can be taken as a statistical pattern-matching problem, assuming realtively stable linguistic categories are characterized by neural representations related to auditory properties of speech that can be compared to speech input. This kind of pattern matching can be termed a passive process which implies rigidity of processing with few demands on cognitive processing. An alternative view is that speech recognition, even in early stages, is an active process in which speech analysis is attentionally guided. Note that this does not mean consciously guided but that information-contingent changes in early auditory encoding can occur as a function of context and experience. Active processing assumes that attention, plasticity, and listening goals are important in considering how listeners cope with adverse circumstances that impair hearing by masking noise in the environment or hearing loss. Although theories of speech perception have begun to incorporate some active processing, they seldom treat early speech encoding as plastic and attentionally guided. Recent research has suggested that speech perception is the product of both feedforward and feedback interactions between a number of brain regions that include descending projections perhaps as far downstream as the cochlea. It is important to understand how the ambiguity of the speech signal and constraints of context dynamically determine cognitive resources recruited during perception including focused attention, learning, and working memory. Theories of speech perception need to go beyond the current corticocentric approach in order to account for the intrinsic dynamics of the auditory encoding of speech. In doing so, this may provide new insights into ways in which hearing disorders and loss may be treated either through augementation or therapy. PMID:24672438
Sentence-Level Movements in Parkinson's Disease: Loud, Clear, and Slow Speech
ERIC Educational Resources Information Center
Kearney,Elaine; Giles, Renuka; Haworth, Brandon; Faloutsos, Petros; Baljko, Melanie; Yunusova, Yana
2017-01-01
Purpose: To further understand the effect of Parkinson's disease (PD) on articulatory movements in speech and to expand our knowledge of therapeutic treatment strategies, this study examined movements of the jaw, tongue blade, and tongue dorsum during sentence production with respect to speech intelligibility and compared the effect of varying…
Speech Communication Anxiety: An Impediment to Academic Achievement in the University Classroom.
ERIC Educational Resources Information Center
Boohar, Richard K.; Seiler, William J.
1982-01-01
The achievement levels of college students taking a bioethics course who demonstrated high and low degrees of speech anxiety were studied. Students with high speech anxiety interacted less with instructors and did not achieve as well as other students. Strategies instructors can use to help students are suggested. (Authors/PP)
Private Speech Moderates the Effects of Effortful Control on Emotionality
ERIC Educational Resources Information Center
Day, Kimberly L.; Smith, Cynthia L.; Neal, Amy; Dunsmore, Julie C.
2018-01-01
Research Findings: In addition to being a regulatory strategy, children's private speech may enhance or interfere with their effortful control used to regulate emotion. The goal of the current study was to investigate whether children's private speech during a selective attention task moderated the relations of their effortful control to their…
Modulation, Adaptation, and Control of Orofacial Pathways in Healthy Adults
ERIC Educational Resources Information Center
Estep, Meredith E.
2009-01-01
Although the healthy adult possesses a large repertoire of coordinative strategies for oromotor behaviors, a range of nonverbal, speech-like movements can be observed during speech. The extent of overlap among sensorimotor speech and nonspeech neural correlates and the role of neuromodulatory inputs generated during oromotor behaviors are unknown.…
Business Speech, Language Arts, Business English: 5128.21.
ERIC Educational Resources Information Center
Dade County Public Schools, Miami, FL.
Developed as part of a high school quinmester unit on business speech, this guide provides the teacher with teaching strategies for a course designed to help people in the business world. The course covers the preparation and delivery of a speech and other business situations which require skill in speaking (sales techniques, committee and group…
Mock Trial: A Window to Free Speech Rights and Abilities
ERIC Educational Resources Information Center
Schwartz, Sherry
2010-01-01
This article provides some strategies to alleviate the current tensions between personal responsibility and freedom of speech rights in the public school classroom. The article advocates the necessity of making sure students understand the points and implications of the first amendment by providing a mock trial unit concerning free speech rights.…
NASA Astrophysics Data System (ADS)
Franck, Bas A. M.; Dreschler, Wouter A.; Lyzenga, Johannes
2004-12-01
In this study we investigated the reliability and convergence characteristics of an adaptive multidirectional pattern search procedure, relative to a nonadaptive multidirectional pattern search procedure. The procedure was designed to optimize three speech-processing strategies. These comprise noise reduction, spectral enhancement, and spectral lift. The search is based on a paired-comparison paradigm, in which subjects evaluated the listening comfort of speech-in-noise fragments. The procedural and nonprocedural factors that influence the reliability and convergence of the procedure are studied using various test conditions. The test conditions combine different tests, initial settings, background noise types, and step size configurations. Seven normal hearing subjects participated in this study. The results indicate that the reliability of the optimization strategy may benefit from the use of an adaptive step size. Decreasing the step size increases accuracy, while increasing the step size can be beneficial to create clear perceptual differences in the comparisons. The reliability also depends on starting point, stop criterion, step size constraints, background noise, algorithms used, as well as the presence of drifting cues and suboptimal settings. There appears to be a trade-off between reliability and convergence, i.e., when the step size is enlarged the reliability improves, but the convergence deteriorates. .
A Proposed Process for Managing the First Amendment Aspects of Campus Hate Speech.
ERIC Educational Resources Information Center
Kaplan, William A.
1992-01-01
A carefully structured process for campus administrative decision making concerning hate speech is proposed and suggestions for implementation are offered. In addition, criteria for evaluating hate speech processes are outlined, and First Amendment principles circumscribing the institution's discretion to regulate hate speech are discussed.…
Left Lateralized Enhancement of Orofacial Somatosensory Processing Due to Speech Sounds
ERIC Educational Resources Information Center
Ito, Takayuki; Johns, Alexis R.; Ostry, David J.
2013-01-01
Purpose: Somatosensory information associated with speech articulatory movements affects the perception of speech sounds and vice versa, suggesting an intimate linkage between speech production and perception systems. However, it is unclear which cortical processes are involved in the interaction between speech sounds and orofacial somatosensory…
Speech Processing and Recognition (SPaRe)
2011-01-01
results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and
Stochastic Online Learning in Dynamic Networks under Unknown Models
2016-08-02
Repeated Game with Incomplete Information, IEEE International Conference on Acoustics, Speech, and Signal Processing. 20-MAR-16, Shanghai, China...in a game theoretic framework for the application of multi-seller dynamic pricing with unknown demand models. We formulated the problem as an...infinitely repeated game with incomplete information and developed a dynamic pricing strategy referred to as Competitive and Cooperative Demand Learning
Döge, Julia; Baumann, Uwe; Weissgerber, Tobias; Rader, Tobias
2017-12-01
To assess auditory localization accuracy and speech reception threshold (SRT) in complex noise conditions in adult patients with acquired single-sided deafness, after intervention with a cochlear implant (CI) in the deaf ear. Nonrandomized, open, prospective patient series. Tertiary referral university hospital. Eleven patients with late-onset single-sided deafness (SSD) and normal hearing in the unaffected ear, who received a CI. All patients were experienced CI users. Unilateral cochlear implantation. Speech perception was tested in a complex multitalker equivalent noise field consisting of multiple sound sources. Speech reception thresholds in noise were determined in aided (with CI) and unaided conditions. Localization accuracy was assessed in complete darkness. Acoustic stimuli were radiated by multiple loudspeakers distributed in the frontal horizontal plane between -60 and +60 degrees. In the aided condition, results show slightly improved speech reception scores compared with the unaided condition in most of the patients. For 8 of the 11 subjects, SRT was improved between 0.37 and 1.70 dB. Three of the 11 subjects showed deteriorations between 1.22 and 3.24 dB SRT. Median localization error decreased significantly by 12.9 degrees compared with the unaided condition. CI in single-sided deafness is an effective treatment to improve the auditory localization accuracy. Speech reception in complex noise conditions is improved to a lesser extent in 73% of the participating CI SSD patients. However, the absence of true binaural interaction effects (summation, squelch) impedes further improvements. The development of speech processing strategies that respect binaural interaction seems to be mandatory to advance speech perception in demanding listening situations in SSD patients.
Restle, Julia; Murakami, Takenobu; Ziemann, Ulf
2012-07-01
The posterior part of the inferior frontal gyrus (pIFG) in the left hemisphere is thought to form part of the putative human mirror neuron system and is assigned a key role in mapping sensory perception onto motor action. Accordingly, the pIFG is involved in motor imitation of the observed actions of others but it is not known to what extent speech repetition of auditory-presented sentences is also a function of the pIFG. Here we applied fMRI-guided facilitating intermittent theta burst transcranial magnetic stimulation (iTBS), or depressant continuous TBS (cTBS), or intermediate TBS (imTBS) over the left pIFG of healthy subjects and compared speech repetition accuracy of foreign Japanese sentences before and after TBS. We found that repetition accuracy improved after iTBS and, to a lesser extent, after imTBS, but remained unchanged after cTBS. In a control experiment, iTBS was applied over the left middle occipital gyrus (MOG), a region not involved in sensorimotor processing of auditory-presented speech. Repetition accuracy remained unchanged after iTBS of MOG. We argue that the stimulation type and stimulation site specific facilitating effect of iTBS over left pIFG on speech repetition accuracy indicates a causal role of the human left-hemispheric pIFG in the translation of phonological perception to motor articulatory output for repetition of speech. This effect may prove useful in rehabilitation strategies that combine repetitive speech training with iTBS of the left pIFG in speech disorders, such as aphasia after cerebral stroke. Copyright © 2012 Elsevier Ltd. All rights reserved.
Speech endpoint detection with non-language speech sounds for generic speech processing applications
NASA Astrophysics Data System (ADS)
McClain, Matthew; Romanowski, Brian
2009-05-01
Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians.
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2014-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes.
The influence of speech rate and accent on access and use of semantic information.
Sajin, Stanislav M; Connine, Cynthia M
2017-04-01
Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
Comparison of formant detection methods used in speech processing applications
NASA Astrophysics Data System (ADS)
Belean, Bogdan
2013-11-01
The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
Schierholz, Irina; Finke, Mareike; Kral, Andrej; Büchner, Andreas; Rach, Stefan; Lenarz, Thomas; Dengler, Reinhard; Sandmann, Pascale
2017-04-01
There is substantial variability in speech recognition ability across patients with cochlear implants (CIs), auditory brainstem implants (ABIs), and auditory midbrain implants (AMIs). To better understand how this variability is related to central processing differences, the current electroencephalography (EEG) study compared hearing abilities and auditory-cortex activation in patients with electrical stimulation at different sites of the auditory pathway. Three different groups of patients with auditory implants (Hannover Medical School; ABI: n = 6, CI: n = 6; AMI: n = 2) performed a speeded response task and a speech recognition test with auditory, visual, and audio-visual stimuli. Behavioral performance and cortical processing of auditory and audio-visual stimuli were compared between groups. ABI and AMI patients showed prolonged response times on auditory and audio-visual stimuli compared with NH listeners and CI patients. This was confirmed by prolonged N1 latencies and reduced N1 amplitudes in ABI and AMI patients. However, patients with central auditory implants showed a remarkable gain in performance when visual and auditory input was combined, in both speech and non-speech conditions, which was reflected by a strong visual modulation of auditory-cortex activation in these individuals. In sum, the results suggest that the behavioral improvement for audio-visual conditions in central auditory implant patients is based on enhanced audio-visual interactions in the auditory cortex. Their findings may provide important implications for the optimization of electrical stimulation and rehabilitation strategies in patients with central auditory prostheses. Hum Brain Mapp 38:2206-2225, 2017. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Cochlear implants: a remarkable past and a brilliant future
Wilson, Blake S.; Dorman, Michael F.
2013-01-01
The aims of this paper are to (i) provide a brief history of cochlear implants; (ii) present a status report on the current state of implant engineering and the levels of speech understanding enabled by that engineering; (iii) describe limitations of current signal processing strategies and (iv) suggest new directions for research. With current technology the “average” implant patient, when listening to predictable conversations in quiet, is able to communicate with relative ease. However, in an environment typical of a workplace the average patient has a great deal of difficulty. Patients who are “above average” in terms of speech understanding, can achieve 100% correct scores on the most difficult tests of speech understanding in quiet but also have significant difficulty when signals are presented in noise. The major factors in these outcomes appear to be (i) a loss of low-frequency, fine structure information possibly due to the envelope extraction algorithms common to cochlear implant signal processing; (ii) a limitation in the number of effective channels of stimulation due to overlap in electric fields from electrodes, and (iii) central processing deficits, especially for patients with poor speech understanding. Two recent developments, bilateral implants and combined electric and acoustic stimulation, have promise to remediate some of the difficulties experienced by patients in noise and to reinstate low-frequency fine structure information. If other possibilities are realized, e.g., electrodes that emit drugs to inhibit cell death following trauma and to induce the growth of neurites toward electrodes, then the future is very bright indeed. PMID:18616994
Lawton, Michelle; Sage, Karen; Haddock, Gillian; Conroy, Paul; Serrant, Laura
2018-05-01
Therapeutic alliance refers to the interactional and relational processes operating during therapeutic interventions. It has been shown to be a strong determinant of treatment efficacy in psychotherapy, and evidence is emerging from a range of healthcare and medical disciplines to suggest that the construct of therapeutic alliance may in fact be a variable component of treatment outcome, engagement and satisfaction. Although this construct appears to be highly relevant to aphasia rehabilitation, no research to date has attempted to explore this phenomenon and thus consider its potential utility as a mechanism for change. To explore speech and language therapists' perceptions and experiences of developing and maintaining therapeutic alliances in aphasia rehabilitation post-stroke. Twenty-two, in-depth, semi-structured interviews were conducted with speech and language therapists working with people with aphasia post-stroke. Qualitative data were analysed using inductive thematic analysis. Analysis resulted in the emergence of three overarching themes: laying the groundwork; augmenting cohesion; and contextual shapers. Recognizing personhood, developing shared expectations of therapy and establishing therapeutic ownership were central to laying the groundwork for therapeutic delivery. Augmenting cohesion was perceived to be dependent on the therapists' responsiveness and ability to resolve both conflict and resistance, as part of an ongoing active process. These processes were further moulded by contextual shapers such as the patient's family, relational continuity and organizational drivers. The findings suggest that therapists used multiple, complex, relational strategies to establish and manage alliances with people with aphasia, which were reliant on a fluid interplay of verbal and non-verbal skills. The data highlight the need for further training to support therapists to forge purposive alliances. Training should develop: therapeutic reflexivity; inclusivity in goal setting, relational strategies; and motivational enhancement techniques. The conceptualization of therapeutic alliance, however, is only provisional. Further research is essential to elucidate the experiences and perceptions of alliance development for people with aphasia undergoing rehabilitation. © 2018 The Authors International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.
Issues in developing valid assessments of speech pathology students' performance in the workplace.
McAllister, Sue; Lincoln, Michelle; Ferguson, Alison; McAllister, Lindy
2010-01-01
Workplace-based learning is a critical component of professional preparation in speech pathology. A validated assessment of this learning is seen to be 'the gold standard', but it is difficult to develop because of design and validation issues. These issues include the role and nature of judgement in assessment, challenges in measuring quality, and the relationship between assessment and learning. Valid assessment of workplace-based performance needs to capture the development of competence over time and account for both occupation specific and generic competencies. This paper reviews important conceptual issues in the design of valid and reliable workplace-based assessments of competence including assessment content, process, impact on learning, measurement issues, and validation strategies. It then goes on to share what has been learned about quality assessment and validation of a workplace-based performance assessment using competency-based ratings. The outcomes of a four-year national development and validation of an assessment tool are described. A literature review of issues in conceptualizing, designing, and validating workplace-based assessments was conducted. Key factors to consider in the design of a new tool were identified and built into the cycle of design, trialling, and data analysis in the validation stages of the development process. This paper provides an accessible overview of factors to consider in the design and validation of workplace-based assessment tools. It presents strategies used in the development and national validation of a tool COMPASS, used in an every speech pathology programme in Australia, New Zealand, and Singapore. The paper also describes Rasch analysis, a model-based statistical approach which is useful for establishing validity and reliability of assessment tools. Through careful attention to conceptual and design issues in the development and trialling of workplace-based assessments, it has been possible to develop the world's first valid and reliable national assessment tool for the assessment of performance in speech pathology.
Rhythmic Priming Enhances the Phonological Processing of Speech
ERIC Educational Resources Information Center
Cason, Nia; Schon, Daniele
2012-01-01
While natural speech does not possess the same degree of temporal regularity found in music, there is recent evidence to suggest that temporal regularity enhances speech processing. The aim of this experiment was to examine whether speech processing would be enhanced by the prior presentation of a rhythmical prime. We recorded electrophysiological…
Visual and Auditory Input in Second-Language Speech Processing
ERIC Educational Resources Information Center
Hardison, Debra M.
2010-01-01
The majority of studies in second-language (L2) speech processing have involved unimodal (i.e., auditory) input; however, in many instances, speech communication involves both visual and auditory sources of information. Some researchers have argued that multimodal speech is the primary mode of speech perception (e.g., Rosenblum 2005). Research on…
Electrophysiologic Assessment of Auditory Training Benefits in Older Adults
Anderson, Samira; Jenkins, Kimberly
2015-01-01
Older adults often exhibit speech perception deficits in difficult listening environments. At present, hearing aids or cochlear implants are the main options for therapeutic remediation; however, they only address audibility and do not compensate for central processing changes that may accompany aging and hearing loss or declines in cognitive function. It is unknown whether long-term hearing aid or cochlear implant use can restore changes in central encoding of temporal and spectral components of speech or improve cognitive function. Therefore, consideration should be given to auditory/cognitive training that targets auditory processing and cognitive declines, taking advantage of the plastic nature of the central auditory system. The demonstration of treatment efficacy is an important component of any training strategy. Electrophysiologic measures can be used to assess training-related benefits. This article will review the evidence for neuroplasticity in the auditory system and the use of evoked potentials to document treatment efficacy. PMID:27587912
Woodruff Carr, Kali; Fitzroy, Ahren B; Tierney, Adam; White-Schwoch, Travis; Kraus, Nina
2017-01-01
Speech communication involves integration and coordination of sensory perception and motor production, requiring precise temporal coupling. Beat synchronization, the coordination of movement with a pacing sound, can be used as an index of this sensorimotor timing. We assessed adolescents' synchronization and capacity to correct asynchronies when given online visual feedback. Variability of synchronization while receiving feedback predicted phonological memory and reading sub-skills, as well as maturation of cortical auditory processing; less variable synchronization during the presence of feedback tracked with maturation of cortical processing of sound onsets and resting gamma activity. We suggest the ability to incorporate feedback during synchronization is an index of intentional, multimodal timing-based integration in the maturing adolescent brain. Precision of temporal coding across modalities is important for speech processing and literacy skills that rely on dynamic interactions with sound. Synchronization employing feedback may prove useful as a remedial strategy for individuals who struggle with timing-based language learning impairments. Copyright © 2016 Elsevier Inc. All rights reserved.
Hadely, Kathleen A; Power, Emma; O'Halloran, Robyn
2014-03-06
Communication and swallowing disorders are a common consequence of stroke. Clinical practice guidelines (CPGs) have been created to assist health professionals to put research evidence into clinical practice and can improve stroke care outcomes. However, CPGs are often not successfully implemented in clinical practice and research is needed to explore the factors that influence speech pathologists' implementation of stroke CPGs. This study aimed to describe speech pathologists' experiences and current use of guidelines, and to identify what factors influence speech pathologists' implementation of stroke CPGs. Speech pathologists working in stroke rehabilitation who had used a stroke CPG were invited to complete a 39-item online survey. Content analysis and descriptive and inferential statistics were used to analyse the data. 320 participants from all states and territories of Australia were surveyed. Almost all speech pathologists had used a stroke CPG and had found the guideline "somewhat useful" or "very useful". Factors that speech pathologists perceived influenced CPG implementation included the: (a) guideline itself, (b) work environment, (c) aspects related to the speech pathologist themselves, (d) patient characteristics, and (e) types of implementation strategies provided. There are many different factors that can influence speech pathologists' implementation of CPGs. The factors that influenced the implementation of CPGs can be understood in terms of knowledge creation and implementation frameworks. Speech pathologists should continue to adapt the stroke CPG to their local work environment and evaluate their use. To enhance guideline implementation, they may benefit from a combination of educational meetings and resources, outreach visits, support from senior colleagues, and audit and feedback strategies.
A software tool for analyzing multichannel cochlear implant signals.
Lai, Wai Kong; Bögli, Hans; Dillier, Norbert
2003-10-01
A useful and convenient means to analyze the radio frequency (RF) signals being sent by a speech processor to a cochlear implant would be to actually capture and display them with appropriate software. This is particularly useful for development or diagnostic purposes. sCILab (Swiss Cochlear Implant Laboratory) is such a PC-based software tool intended for the Nucleus family of Multichannel Cochlear Implants. Its graphical user interface provides a convenient and intuitive means for visualizing and analyzing the signals encoding speech information. Both numerical and graphic displays are available for detailed examination of the captured CI signals, as well as an acoustic simulation of these CI signals. sCILab has been used in the design and verification of new speech coding strategies, and has also been applied as an analytical tool in studies of how different parameter settings of existing speech coding strategies affect speech perception. As a diagnostic tool, it is also useful for troubleshooting problems with the external equipment of the cochlear implant systems.
Stoppelman, Nadav; Harpaz, Tamar; Ben-Shachar, Michal
2013-05-01
Speech processing engages multiple cortical regions in the temporal, parietal, and frontal lobes. Isolating speech-sensitive cortex in individual participants is of major clinical and scientific importance. This task is complicated by the fact that responses to sensory and linguistic aspects of speech are tightly packed within the posterior superior temporal cortex. In functional magnetic resonance imaging (fMRI), various baseline conditions are typically used in order to isolate speech-specific from basic auditory responses. Using a short, continuous sampling paradigm, we show that reversed ("backward") speech, a commonly used auditory baseline for speech processing, removes much of the speech responses in frontal and temporal language regions of adult individuals. On the other hand, signal correlated noise (SCN) serves as an effective baseline for removing primary auditory responses while maintaining strong signals in the same language regions. We show that the response to reversed speech in left inferior frontal gyrus decays significantly faster than the response to speech, thus suggesting that this response reflects bottom-up activation of speech analysis followed up by top-down attenuation once the signal is classified as nonspeech. The results overall favor SCN as an auditory baseline for speech processing.
Patient Fatigue during Aphasia Treatment: A Survey of Speech-Language Pathologists
ERIC Educational Resources Information Center
Riley, Ellyn A.
2017-01-01
The purpose of this study was to measure speech-language pathologists' (SLPs) perceptions of fatigue in clients with aphasia and identify strategies used to manage client fatigue during speech and language therapy. SLPs completed a short online survey containing a series of questions related to their perceptions of patient fatigue. Of 312…
Walking to Medjugorje: Serving Children Who Are Deaf and Hard-of- Hearing in Bosnia-Herzegovina.
ERIC Educational Resources Information Center
Miller, Kevin J.
2002-01-01
This article discusses the experiences of an American team who worked with the Bosnia Speech and Hearing Project. The team collaborated with Bosnian teachers of children with deafness and speech-language pathologists in to share therapy ideas and model strategies that parents could utilize to promote speech and language development. (Author/CR)
Tjaden, Kris; Wilding, Greg
2011-01-01
The primary purpose of this study was to investigate how speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS) accomplish voluntary reductions in speech rate. A group of talkers with no history of neurological disease was included for comparison. This study was motivated by the idea that knowledge of how speakers with dysarthria voluntarily accomplish a reduced speech rate would contribute toward a descriptive model of speaking rate change in dysarthria. Such a model has the potential to assist in identifying rate control strategies to receive focus in clinical treatment programs and also would advance understanding of global speech timing in dysarthria. All speakers read a passage in Habitual and Slow conditions. Speech rate, articulation rate, pause duration, and pause frequency were measured. All speaker groups adjusted articulation time as well as pause time to reduce overall speech rate. Group differences in how voluntary rate reduction was accomplished were primarily one of quantity or degree. Overall, a slower-than-normal rate was associated with a reduced articulation rate, shorter speech runs that included fewer syllables, and longer more frequent pauses. Taken together, these results suggest that existing skills or strategies used by patients should be emphasized in dysarthria training programs focusing on rate reduction. Results further suggest that a model of voluntary speech rate reduction based on neurologically normal speech shows promise as being applicable for mild to moderate dysarthria. The reader will be able to: (1) describe the importance of studying voluntary adjustments in speech rate in dysarthria, (2) discuss how speakers with Parkinson's disease and Multiple Sclerosis adjust articulation time and pause time to slow speech rate. Copyright © 2011 Elsevier Inc. All rights reserved.
Song, Jae-Jin; Lee, Hyo-Jeong; Kang, Hyejin; Lee, Dong Soo; Chang, Sun O; Oh, Seung Ha
2015-03-01
While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H 2 (15) O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed more by visual distractors when confronted with incongruent audiovisual stimuli. To cope with this multimodal conflict, CI users activate the left inferior frontal gyrus to adopt a top-down cognitive modulation pathway, whereas normal hearing individuals primarily adopt a bottom-up strategy.
ERIC Educational Resources Information Center
McDonald, David; Proctor, Penny; Gill, Wendy; Heaven, Sue; Marr, Jane; Young, Jane
2015-01-01
Intensive Speech and Language Therapy (SLT) training courses for Early Childhood Educators (ECEs) can have a positive effect on their use of interaction strategies that support children's communication skills. The impact of brief SLT training courses is not yet clearly understood. The aims of these two studies were to assess the impact of a brief…
The Effect of Speech-to-Text Technology on Learning a Writing Strategy
ERIC Educational Resources Information Center
Haug, Katrina N.; Klein, Perry D.
2018-01-01
Previous research has shown that speech-to-text (STT) software can support students in producing a given piece of writing. This is the 1st study to investigate the use of STT to teach a writing strategy. We pretested 45 Grade 5 students on argument writing and trained them to use STT. Students participated in 4 lessons on an argument writing…
Sadness is unique: neural processing of emotions in speech prosody in musicians and non-musicians
Park, Mona; Gutyrchik, Evgeny; Welker, Lorenz; Carl, Petra; Pöppel, Ernst; Zaytseva, Yuliya; Meindl, Thomas; Blautzik, Janusch; Reiser, Maximilian; Bao, Yan
2015-01-01
Musical training has been shown to have positive effects on several aspects of speech processing, however, the effects of musical training on the neural processing of speech prosody conveying distinct emotions are yet to be better understood. We used functional magnetic resonance imaging (fMRI) to investigate whether the neural responses to speech prosody conveying happiness, sadness, and fear differ between musicians and non-musicians. Differences in processing of emotional speech prosody between the two groups were only observed when sadness was expressed. Musicians showed increased activation in the middle frontal gyrus, the anterior medial prefrontal cortex, the posterior cingulate cortex and the retrosplenial cortex. Our results suggest an increased sensitivity of emotional processing in musicians with respect to sadness expressed in speech, possibly reflecting empathic processes. PMID:25688196
Verbalization and imagery in the process of formation of operator labor skills
NASA Technical Reports Server (NTRS)
Mistyuk, V. V.
1975-01-01
Sensorimotor control tests show that mastering operational skills occurs under conditions that stimulate the operator to independent active analysis and summarization of current information with the goal of clarifying the signs and the integral images that are a model of the situation. Goal directed determination of such an image requires inner and external speech, activates and improves the thinking of the operator, accelerates the training process, increases its effectiveness, and enables the formation of strategies in anticipating the course of events.
Enhanced procedural learning of speech sound categories in a genetic variant of FOXP2.
Chandrasekaran, Bharath; Yi, Han-Gyol; Blanco, Nathaniel J; McGeary, John E; Maddox, W Todd
2015-05-20
A mutation of the forkhead box protein P2 (FOXP2) gene is associated with severe deficits in human speech and language acquisition. In rodents, the humanized form of FOXP2 promotes faster switching from declarative to procedural learning strategies when the two learning systems compete. Here, we examined a polymorphism of FOXP2 (rs6980093) in humans (214 adults; 111 females) for associations with non-native speech category learning success. Neurocomputational modeling results showed that individuals with the GG genotype shifted faster to procedural learning strategies, which are optimal for the task. These findings support an adaptive role for the FOXP2 gene in modulating the function of neural learning systems that have a direct bearing on human speech category learning. Copyright © 2015 the authors 0270-6474/15/357808-05$15.00/0.
Cortical oscillations and entrainment in speech processing during working memory load.
Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten
2018-02-02
Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (< 13 Hz). We found that increases in both types of WM load (background noise and n-back level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
The influence of (central) auditory processing disorder in speech sound disorders.
Barrozo, Tatiane Faria; Pagan-Neves, Luciana de Oliveira; Vilela, Nadia; Carvallo, Renata Mota Mamede; Wertzner, Haydée Fiszbein
2016-01-01
Considering the importance of auditory information for the acquisition and organization of phonological rules, the assessment of (central) auditory processing contributes to both the diagnosis and targeting of speech therapy in children with speech sound disorders. To study phonological measures and (central) auditory processing of children with speech sound disorder. Clinical and experimental study, with 21 subjects with speech sound disorder aged between 7.0 and 9.11 years, divided into two groups according to their (central) auditory processing disorder. The assessment comprised tests of phonology, speech inconsistency, and metalinguistic abilities. The group with (central) auditory processing disorder demonstrated greater severity of speech sound disorder. The cutoff value obtained for the process density index was the one that best characterized the occurrence of phonological processes for children above 7 years of age. The comparison among the tests evaluated between the two groups showed differences in some phonological and metalinguistic abilities. Children with an index value above 0.54 demonstrated strong tendencies towards presenting a (central) auditory processing disorder, and this measure was effective to indicate the need for evaluation in children with speech sound disorder. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
ERIC Educational Resources Information Center
Shriberg, Lawrence D.; Strand, Edythe A.; Fourakis, Marios; Jakielski, Kathy J.; Hall, Sheryl D.; Karlsson, Heather B.; Mabie, Heather L.; McSweeny, Jane L.; Tilkens, Christie M.; Wilson, David L.
2017-01-01
Purpose: Previous articles in this supplement described rationale for and development of the pause marker (PM), a diagnostic marker of childhood apraxia of speech (CAS), and studies supporting its validity and reliability. The present article assesses the theoretical coherence of the PM with speech processing deficits in CAS. Method: PM and other…
Increasing motivation changes subjective reports of listening effort and choice of coping strategy.
Picou, Erin M; Ricketts, Todd A
2014-06-01
The purpose of this project was to examine the effect of changing motivation on subjective ratings of listening effort and on the likelihood that a listener chooses either a controlling or an avoidance coping strategy. Two experiments were conducted, one with auditory-only (AO) and one with auditory-visual (AV) stimuli, both using the same speech recognition in noise materials. Four signal-to-noise ratios (SNRs) were used, two in each experiment. The two SNRs targeted 80% and 50% correct performance. Motivation was manipulated by either having participants listen carefully to the speech (low motivation), or listen carefully to the speech and then answer quiz questions about the speech (high motivation). Sixteen participants with normal hearing participated in each experiment. Eight randomly selected participants participated in both. Using AO and AV stimuli, motivation generally increased subjective ratings of listening effort and tiredness. In addition, using auditory-visual stimuli, motivation generally increased listeners' willingness to do something to improve the situation, and decreased their willingness to avoid the situation. These results suggest a listener's mental state may influence listening effort and choice of coping strategy.
Auditory-Motor Processing of Speech Sounds
Möttönen, Riikka; Dutton, Rebekah; Watkins, Kate E.
2013-01-01
The motor regions that control movements of the articulators activate during listening to speech and contribute to performance in demanding speech recognition and discrimination tasks. Whether the articulatory motor cortex modulates auditory processing of speech sounds is unknown. Here, we aimed to determine whether the articulatory motor cortex affects the auditory mechanisms underlying discrimination of speech sounds in the absence of demanding speech tasks. Using electroencephalography, we recorded responses to changes in sound sequences, while participants watched a silent video. We also disrupted the lip or the hand representation in left motor cortex using transcranial magnetic stimulation. Disruption of the lip representation suppressed responses to changes in speech sounds, but not piano tones. In contrast, disruption of the hand representation had no effect on responses to changes in speech sounds. These findings show that disruptions within, but not outside, the articulatory motor cortex impair automatic auditory discrimination of speech sounds. The findings provide evidence for the importance of auditory-motor processes in efficient neural analysis of speech sounds. PMID:22581846
Galilee, Alena; Stefanidou, Chrysi; McCleery, Joseph P
2017-01-01
Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6-year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age.
Stefanidou, Chrysi; McCleery, Joseph P.
2017-01-01
Previous event-related potential (ERP) research utilizing oddball stimulus paradigms suggests diminished processing of speech versus non-speech sounds in children with an Autism Spectrum Disorder (ASD). However, brain mechanisms underlying these speech processing abnormalities, and to what extent they are related to poor language abilities in this population remain unknown. In the current study, we utilized a novel paired repetition paradigm in order to investigate ERP responses associated with the detection and discrimination of speech and non-speech sounds in 4- to 6—year old children with ASD, compared with gender and verbal age matched controls. ERPs were recorded while children passively listened to pairs of stimuli that were either both speech sounds, both non-speech sounds, speech followed by non-speech, or non-speech followed by speech. Control participants exhibited N330 match/mismatch responses measured from temporal electrodes, reflecting speech versus non-speech detection, bilaterally, whereas children with ASD exhibited this effect only over temporal electrodes in the left hemisphere. Furthermore, while the control groups exhibited match/mismatch effects at approximately 600 ms (central N600, temporal P600) when a non-speech sound was followed by a speech sound, these effects were absent in the ASD group. These findings suggest that children with ASD fail to activate right hemisphere mechanisms, likely associated with social or emotional aspects of speech detection, when distinguishing non-speech from speech stimuli. Together, these results demonstrate the presence of atypical speech versus non-speech processing in children with ASD when compared with typically developing children matched on verbal age. PMID:28738063
Power Spectral Density Error Analysis of Spectral Subtraction Type of Speech Enhancement Methods
NASA Astrophysics Data System (ADS)
Händel, Peter
2006-12-01
A theoretical framework for analysis of speech enhancement algorithms is introduced for performance assessment of spectral subtraction type of methods. The quality of the enhanced speech is related to physical quantities of the speech and noise (such as stationarity time and spectral flatness), as well as to design variables of the noise suppressor. The derived theoretical results are compared with the outcome of subjective listening tests as well as successful design strategies, performed by independent research groups.
NASA Astrophysics Data System (ADS)
Jelinek, H. J.
1986-01-01
This is the Final Report of Electronic Design Associates on its Phase I SBIR project. The purpose of this project is to develop a method for correcting helium speech, as experienced in diver-surface communication. The goal of the Phase I study was to design, prototype, and evaluate a real time helium speech corrector system based upon digital signal processing techniques. The general approach was to develop hardware (an IBM PC board) to digitize helium speech and software (a LAMBDA computer based simulation) to translate the speech. As planned in the study proposal, this initial prototype may now be used to assess expected performance from a self contained real time system which uses an identical algorithm. The Final Report details the work carried out to produce the prototype system. Four major project tasks were: a signal processing scheme for converting helium speech to normal sounding speech was generated. The signal processing scheme was simulated on a general purpose (LAMDA) computer. Actual helium speech was supplied to the simulation and the converted speech was generated. An IBM-PC based 14 bit data Input/Output board was designed and built. A bibliography of references on speech processing was generated.
Processing changes when listening to foreign-accented speech
Romero-Rivas, Carlos; Martin, Clara D.; Costa, Albert
2015-01-01
This study investigates the mechanisms responsible for fast changes in processing foreign-accented speech. Event Related brain Potentials (ERPs) were obtained while native speakers of Spanish listened to native and foreign-accented speakers of Spanish. We observed a less positive P200 component for foreign-accented speech relative to native speech comprehension. This suggests that the extraction of spectral information and other important acoustic features was hampered during foreign-accented speech comprehension. However, the amplitude of the N400 component for foreign-accented speech comprehension decreased across the experiment, suggesting the use of a higher level, lexical mechanism. Furthermore, during native speech comprehension, semantic violations in the critical words elicited an N400 effect followed by a late positivity. During foreign-accented speech comprehension, semantic violations only elicited an N400 effect. Overall, our results suggest that, despite a lack of improvement in phonetic discrimination, native listeners experience changes at lexical-semantic levels of processing after brief exposure to foreign-accented speech. Moreover, these results suggest that lexical access, semantic integration and linguistic re-analysis processes are permeable to external factors, such as the accent of the speaker. PMID:25859209
Cybersecurity: Current Legislation, Executive Branch Initiatives, and Options for Congress
2009-09-30
responsibilities of cybersecurity stakeholders. Privacy and civil liberties—maintaining privacy and freedom of speech protections on the Internet...securing networks before tackling the attendant issues such as freedom of speech , privacy, and civil liberty protections as they pertain to the Internet...legislation to mandate privacy and freedom of speech protections to be incorporated into a national strategy. • Assessing current congressional
Pausing Preceding and Following "Que" in the Production of Native Speakers of French
ERIC Educational Resources Information Center
Genc, Bilal; Mavasoglu, Mustafa; Bada, Erdogan
2011-01-01
Pausing strategies in read and spontaneous speech have been of significant interest for researchers since in literature it was observed that read speech and spontaneous speech pausing patterns do display some considerable differences. This, at least, is the case in the English language as it was produced by native speakers. As to what may be the…
The Hierarchical Cortical Organization of Human Speech Processing
de Heer, Wendy A.; Huth, Alexander G.; Griffiths, Thomas L.
2017-01-01
Speech comprehension requires that the brain extract semantic meaning from the spectral features represented at the cochlea. To investigate this process, we performed an fMRI experiment in which five men and two women passively listened to several hours of natural narrative speech. We then used voxelwise modeling to predict BOLD responses based on three different feature spaces that represent the spectral, articulatory, and semantic properties of speech. The amount of variance explained by each feature space was then assessed using a separate validation dataset. Because some responses might be explained equally well by more than one feature space, we used a variance partitioning analysis to determine the fraction of the variance that was uniquely explained by each feature space. Consistent with previous studies, we found that speech comprehension involves hierarchical representations starting in primary auditory areas and moving laterally on the temporal lobe: spectral features are found in the core of A1, mixtures of spectral and articulatory in STG, mixtures of articulatory and semantic in STS, and semantic in STS and beyond. Our data also show that both hemispheres are equally and actively involved in speech perception and interpretation. Further, responses as early in the auditory hierarchy as in STS are more correlated with semantic than spectral representations. These results illustrate the importance of using natural speech in neurolinguistic research. Our methodology also provides an efficient way to simultaneously test multiple specific hypotheses about the representations of speech without using block designs and segmented or synthetic speech. SIGNIFICANCE STATEMENT To investigate the processing steps performed by the human brain to transform natural speech sound into meaningful language, we used models based on a hierarchical set of speech features to predict BOLD responses of individual voxels recorded in an fMRI experiment while subjects listened to natural speech. Both cerebral hemispheres were actively involved in speech processing in large and equal amounts. Also, the transformation from spectral features to semantic elements occurs early in the cortical speech-processing stream. Our experimental and analytical approaches are important alternatives and complements to standard approaches that use segmented speech and block designs, which report more laterality in speech processing and associated semantic processing to higher levels of cortex than reported here. PMID:28588065
Barista: A Framework for Concurrent Speech Processing by USC-SAIL
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G.; Narayanan, Shrikanth S.
2016-01-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0. PMID:27610047
Barista: A Framework for Concurrent Speech Processing by USC-SAIL.
Can, Doğan; Gibson, James; Vaz, Colin; Georgiou, Panayiotis G; Narayanan, Shrikanth S
2014-05-01
We present Barista, an open-source framework for concurrent speech processing based on the Kaldi speech recognition toolkit and the libcppa actor library. With Barista, we aim to provide an easy-to-use, extensible framework for constructing highly customizable concurrent (and/or distributed) networks for a variety of speech processing tasks. Each Barista network specifies a flow of data between simple actors, concurrent entities communicating by message passing, modeled after Kaldi tools. Leveraging the fast and reliable concurrency and distribution mechanisms provided by libcppa, Barista lets demanding speech processing tasks, such as real-time speech recognizers and complex training workflows, to be scheduled and executed on parallel (and/or distributed) hardware. Barista is released under the Apache License v2.0.
Teaming for Speech and Auditory Training.
ERIC Educational Resources Information Center
Nussbaum, Debra B.; Waddy-Smith, Bettie
1985-01-01
The article suggests three strategies for the audiologist and speech/communication specialist to use in assisting the preschool teacher to implement student's individualized education program: (1) demonstration teaming, (2) dual teaming; and (3) rotation teaming. (CL)
Neural pathways for visual speech perception
Bernstein, Lynne E.; Liebenthal, Einat
2014-01-01
This paper examines the questions, what levels of speech can be perceived visually, and how is visual speech represented by the brain? Review of the literature leads to the conclusions that every level of psycholinguistic speech structure (i.e., phonetic features, phonemes, syllables, words, and prosody) can be perceived visually, although individuals differ in their abilities to do so; and that there are visual modality-specific representations of speech qua speech in higher-level vision brain areas. That is, the visual system represents the modal patterns of visual speech. The suggestion that the auditory speech pathway receives and represents visual speech is examined in light of neuroimaging evidence on the auditory speech pathways. We outline the generally agreed-upon organization of the visual ventral and dorsal pathways and examine several types of visual processing that might be related to speech through those pathways, specifically, face and body, orthography, and sign language processing. In this context, we examine the visual speech processing literature, which reveals widespread diverse patterns of activity in posterior temporal cortices in response to visual speech stimuli. We outline a model of the visual and auditory speech pathways and make several suggestions: (1) The visual perception of speech relies on visual pathway representations of speech qua speech. (2) A proposed site of these representations, the temporal visual speech area (TVSA) has been demonstrated in posterior temporal cortex, ventral and posterior to multisensory posterior superior temporal sulcus (pSTS). (3) Given that visual speech has dynamic and configural features, its representations in feedforward visual pathways are expected to integrate these features, possibly in TVSA. PMID:25520611
Toni, Ivan; Hagoort, Peter; Kelly, Spencer D.; Özyürek, Aslı
2015-01-01
Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech–gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts. PMID:24652857
ERIC Educational Resources Information Center
Thistle, Jennifer J.; McNaughton, David
2015-01-01
Purpose: This study examined the effect of instruction in an active listening strategy on the communication skills of pre-service speech-language pathologists (SLPs). Method: Twenty-three pre-service SLPs in their 2nd year of graduate study received a brief strategy instruction in active listening skills. Participants were videotaped during a…
Terband, H; Maassen, B; Guenther, F H; Brumberg, J
2014-01-01
Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. The reader will be able to: (1) identify the difficulties in studying disordered speech motor development; (2) describe the differences in speech motor characteristics between SSD and subtype CAS; (3) describe the different types of learning that occur in the sensory-motor system during babbling and early speech acquisition; (4) identify the neural control subsystems involved in speech production; (5) describe the potential role of auditory self-monitoring in developmental speech disorders. Copyright © 2014 Elsevier Inc. All rights reserved.
Nigam, Ravi; Schlosser, Ralf W; Lloyd, Lyle L
2006-09-01
Matrix strategies employing parts of speech arranged in systematic language matrices and milieu language teaching strategies have been successfully used to teach word combining skills to children who have cognitive disabilities and some functional speech. The present study investigated the acquisition and generalized production of two-term semantic relationships in a new population using new types of symbols. Three children with cognitive disabilities and little or no functional speech were taught to combine graphic symbols. The matrix strategy and the mand-model procedure were used concomitantly as intervention procedures. A multiple probe design across sets of action-object combinations with generalization probes of untrained combinations was used to teach the production of graphic symbol combinations. Results indicated that two of the three children learned the early syntactic-semantic rule of combining action-object symbols and demonstrated generalization to untrained action-object combinations and generalization across trainers. The results and future directions for research are discussed.
Time-compressed speech test in the elderly.
Arceno, Rayana Silva; Scharlach, Renata Coelho
2017-09-28
The present study aimed to evaluate the performance of elderly people in the time-compressed speech test according to the variables ears and order of display, and analyze the types of errors presented by the volunteers. This is an observational, descriptive, quantitative, analytical and primary cross-sectional study involving 22 elderly with normal hearing or mild sensorineural hearing loss between the ages of 60 and 80. The elderly were submitted to the time-compressed speech test with compression ratio of 60%, through the electromechanical time compression method. A list of 50 disyllables was applied to each ear and the initial side was chosen at random. On what concerns to the performance in the test, the elderly fell short in relation to the adults and there was no statistical difference between the ears. It was found statistical evidence of better performance for the second ear in the test. The most mistaken words were the ones initiated with the phonemes /p/ and /d/. The presence of consonant combination in a word also increased the occurrence of mistakes. The elderly have worse performance in the auditory closure ability when assessed by the time-compressed speech test compared to adults. This result suggests that elderly people have difficulty in recognizing speech when this is pronounced in faster rates. Therefore, strategies must be used to facilitate the communicative process, regardless the presence of hearing loss.
2014-01-01
Background Communication and swallowing disorders are a common consequence of stroke. Clinical practice guidelines (CPGs) have been created to assist health professionals to put research evidence into clinical practice and can improve stroke care outcomes. However, CPGs are often not successfully implemented in clinical practice and research is needed to explore the factors that influence speech pathologists’ implementation of stroke CPGs. This study aimed to describe speech pathologists’ experiences and current use of guidelines, and to identify what factors influence speech pathologists’ implementation of stroke CPGs. Methods Speech pathologists working in stroke rehabilitation who had used a stroke CPG were invited to complete a 39-item online survey. Content analysis and descriptive and inferential statistics were used to analyse the data. Results 320 participants from all states and territories of Australia were surveyed. Almost all speech pathologists had used a stroke CPG and had found the guideline “somewhat useful” or “very useful”. Factors that speech pathologists perceived influenced CPG implementation included the: (a) guideline itself, (b) work environment, (c) aspects related to the speech pathologist themselves, (d) patient characteristics, and (e) types of implementation strategies provided. Conclusions There are many different factors that can influence speech pathologists’ implementation of CPGs. The factors that influenced the implementation of CPGs can be understood in terms of knowledge creation and implementation frameworks. Speech pathologists should continue to adapt the stroke CPG to their local work environment and evaluate their use. To enhance guideline implementation, they may benefit from a combination of educational meetings and resources, outreach visits, support from senior colleagues, and audit and feedback strategies. PMID:24602148
Cortical Auditory Evoked Potentials Recorded From Nucleus Hybrid Cochlear Implant Users.
Brown, Carolyn J; Jeon, Eun Kyung; Chiou, Li-Kuei; Kirby, Benjamin; Karsten, Sue A; Turner, Christopher W; Abbas, Paul J
2015-01-01
Nucleus Hybrid Cochlear Implant (CI) users hear low-frequency sounds via acoustic stimulation and high-frequency sounds via electrical stimulation. This within-subject study compares three different methods of coordinating programming of the acoustic and electrical components of the Hybrid device. Speech perception and cortical auditory evoked potentials (CAEP) were used to assess differences in outcome. The goals of this study were to determine whether (1) the evoked potential measures could predict which programming strategy resulted in better outcome on the speech perception task or was preferred by the listener, and (2) CAEPs could be used to predict which subjects benefitted most from having access to the electrical signal provided by the Hybrid implant. CAEPs were recorded from 10 Nucleus Hybrid CI users. Study participants were tested using three different experimental processor programs (MAPs) that differed in terms of how much overlap there was between the range of frequencies processed by the acoustic component of the Hybrid device and range of frequencies processed by the electrical component. The study design included allowing participants to acclimatize for a period of up to 4 weeks with each experimental program prior to speech perception and evoked potential testing. Performance using the experimental MAPs was assessed using both a closed-set consonant recognition task and an adaptive test that measured the signal-to-noise ratio that resulted in 50% correct identification of a set of 12 spondees presented in background noise. Long-duration, synthetic vowels were used to record both the cortical P1-N1-P2 "onset" response and the auditory "change" response (also known as the auditory change complex [ACC]). Correlations between the evoked potential measures and performance on the speech perception tasks are reported. Differences in performance using the three programming strategies were not large. Peak-to-peak amplitude of the ACC was not found to be sensitive enough to accurately predict the programming strategy that resulted in the best performance on either measure of speech perception. All 10 Hybrid CI users had residual low-frequency acoustic hearing. For all 10 subjects, allowing them to use both the acoustic and electrical signals provided by the implant improved performance on the consonant recognition task. For most subjects, it also resulted in slightly larger cortical change responses. However, the impact that listening mode had on the cortical change responses was small, and again, the correlation between the evoked potential and speech perception results was not significant. CAEPs can be successfully measured from Hybrid CI users. The responses that are recorded are similar to those recorded from normal-hearing listeners. The goal of this study was to see if CAEPs might play a role either in identifying the experimental program that resulted in best performance on a consonant recognition task or in documenting benefit from the use of the electrical signal provided by the Hybrid CI. At least for the stimuli and specific methods used in this study, no such predictive relationship was found.
Vandewalle, Ellen; Boets, Bart; Ghesquière, Pol; Zink, Inge
2012-01-01
This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay (n = 8), (2) children with SLI and normal literacy (n = 10) and (3) typically developing children (n = 14). Moreover, the relations between these auditory processing and speech perception skills and oral language and literacy skills in grade 1 and grade 3 were analyzed. The SLI group with literacy delay scored significantly lower than both other groups on speech perception, but not on temporal auditory processing. Both normal reading groups did not differ in terms of speech perception or auditory processing. Speech perception was significantly related to reading and spelling in grades 1 and 3 and had a unique predictive contribution to reading growth in grade 3, even after controlling reading level, phonological ability, auditory processing and oral language skills in grade 1. These findings indicated that speech perception also had a unique direct impact upon reading development and not only through its relation with phonological awareness. Moreover, speech perception seemed to be more associated with the development of literacy skills and less with oral language ability. Copyright © 2011 Elsevier Ltd. All rights reserved.
Communication skills and thalamic lesion: Strategies of rehabilitation.
Amaddii, Luisa; Centorrino, Santi; Cambi, Jacopo; Passali, Desiderio
2014-01-01
To describe the speech rehabilitation history of patients with thalamic lesions. Thalamic lesions can affect speech and language according to diverse thalamic nuclei involved. Because of the strategic functional position of the thalamus within the cognitive networks, its lesion can also interfere with other cognitive processes, such as attention, memory and executive functions. Alterations of these cognitive domains contribute significantly to language deficits, leading to communicative inefficacy. This fact must be considered in the rehabilitation efforts. Whereas evaluation of cognitive functions and communicative efficiency is different from that of aphasic disorder, treatment should also be different. The treatment must be focused on specific cognitive deficits with belief in the regaining of communicative ability, as well as it occurs in therapy of pragmatic disorder in traumatic brain injury: attention process training, mnemotechnics and prospective memory training. According to our experience: (a) there is a close correlation between cognitive processes and communication skills; (b) alterations of attention, memory and executive functions cause a loss of efficiency in the language use; and (c) appropriate cognitive treatment improves pragmatic competence and therefore the linguistic disorder. For planning a speech-therapy it is important to consider the relationship between cognitive functions and communication. The cognitive/behavioral treatment confirms its therapeutic efficiency for thalamic lesions. Copyright © 2014 Polish Otorhinolaryngology - Head and Neck Surgery Society. Published by Elsevier Urban & Partner Sp. z.o.o. All rights reserved.
Neben, Nicole; Lenarz, Thomas; Schuessler, Mark; Harpel, Theo; Buechner, Andreas
2013-05-01
Results for speech recognition in noise tests when using a new research coding strategy designed to introduce the virtual channel effect provided no advantage over MP3(000™). Although statistically significant smaller just noticeable differences (JNDs) were obtained, the findings for pitch ranking proved to have little clinical impact. The aim of this study was to explore whether modifications to MP3000 by including sequential virtual channel stimulation would lead to further improvements in hearing, particularly for speech recognition in background noise and in competing-talker conditions, and to compare results for pitch perception and melody recognition, as well as informally collect subjective impressions on strategy preference. Nine experienced cochlear implant subjects were recruited for the prospective study. Two variants of the experimental strategy were compared to MP3000. The study design was a single-blinded ABCCBA cross-over trial paradigm with 3 weeks of take-home experience for each user condition. Comparing results of pitch-ranking, a significantly reduced JND was identified. No significant effect of coding strategy on speech understanding in noise or competing-talker materials was found. Melody recognition skills were the same under all user conditions.
Speech Perception as a Cognitive Process: The Interactive Activation Model.
ERIC Educational Resources Information Center
Elman, Jeffrey L.; McClelland, James L.
Research efforts to model speech perception in terms of a processing system in which knowledge and processing are distributed over large numbers of highly interactive--but computationally primative--elements are described in this report. After discussing the properties of speech that demand a parallel interactive processing system, the report…
Alm, Magnus; Behne, Dawn
2015-01-01
Gender and age have been found to affect adults’ audio-visual (AV) speech perception. However, research on adult aging focuses on adults over 60 years, who have an increasing likelihood for cognitive and sensory decline, which may confound positive effects of age-related AV-experience and its interaction with gender. Observed age and gender differences in AV speech perception may also depend on measurement sensitivity and AV task difficulty. Consequently both AV benefit and visual influence were used to measure visual contribution for gender-balanced groups of young (20–30 years) and middle-aged adults (50–60 years) with task difficulty varied using AV syllables from different talkers in alternative auditory backgrounds. Females had better speech-reading performance than males. Whereas no gender differences in AV benefit or visual influence were observed for young adults, visually influenced responses were significantly greater for middle-aged females than middle-aged males. That speech-reading performance did not influence AV benefit may be explained by visual speech extraction and AV integration constituting independent abilities. Contrastingly, the gender difference in visually influenced responses in middle adulthood may reflect an experience-related shift in females’ general AV perceptual strategy. Although young females’ speech-reading proficiency may not readily contribute to greater visual influence, between young and middle-adulthood recurrent confirmation of the contribution of visual cues induced by speech-reading proficiency may gradually shift females AV perceptual strategy toward more visually dominated responses. PMID:26236274
Directive Speech Act of Imamu in Katoba Discourse of Muna Ethnic
NASA Astrophysics Data System (ADS)
Ardianto, Ardianto; Hadirman, Hardiman
2018-05-01
One of the traditions of Muna ethnic is katoba ritual. Katoba ritual is one tradition that values local knowledge maintained its existence for generations until today. Katoba ritual is a ritual to be Islamic person, repentance, and the formation of a child's character (male/female) who will enter adulthood (6-11 years) using directive speech. In katoba ritual, a child who is in-katoba introduced to the teaching of the Islamic religion, customs, manners to parents and his brother and behaviour towards others which is expected to be implemented in daily life. This study aims to describe and explain the directive speech acts of the imamu in the katoba discourse of Muna ethnic. This research uses a qualitative approach. Data are collected from a natural setting, namely katoba speech discourses. The data consist of two types, namely: (a) speech data, and (b) field note data. Data are analyzed using an interactive model with four stages: (1) data collection, (2) data reduction, (3) data display, and (4) conclusion and verification. The result shows, firstly, the form of directive speech acts includes declarative and imperative form; secondly, the function of directive speech acts includes functions of teaching, explaining, suggesting, and expecting; and thirdly, the strategy of directive speech acts includes both direct and indirect strategy. The results of this study could be implied in the development of character learning materials at schools. It also can be one of the contents of local content (mulok) at school.
1983-09-30
determines, in part, what the infant says; and if perception is to guide production, the two processes must be, in some sense, isomorphic. An artificial speech ...influences on speech perception processes . Perception & Psychophysics, 24, 253-257. MacKain, K. S., Studdert-Kennedy, M., Spieker, S., & Stern, D. (1983...sentence contexts. In A. Cohen & S. E. G. Nooteboom (Eds.), Structure and process in speech perception (pp. 69-89). New York: Springer- Verlag. Larkey
Action planning and predictive coding when speaking
Wang, Jun; Mathalon, Daniel H.; Roach, Brian J.; Reilly, James; Keedy, Sarah; Sweeney, John A.; Ford, Judith M.
2014-01-01
Across the animal kingdom, sensations resulting from an animal's own actions are processed differently from sensations resulting from external sources, with self-generated sensations being suppressed. A forward model has been proposed to explain this process across sensorimotor domains. During vocalization, reduced processing of one's own speech is believed to result from a comparison of speech sounds to corollary discharges of intended speech production generated from efference copies of commands to speak. Until now, anatomical and functional evidence validating this model in humans has been indirect. Using EEG with anatomical MRI to facilitate source localization, we demonstrate that inferior frontal gyrus activity during the 300ms before speaking was associated with suppressed processing of speech sounds in auditory cortex around 100ms after speech onset (N1). These findings indicate that an efference copy from speech areas in prefrontal cortex is transmitted to auditory cortex, where it is used to suppress processing of anticipated speech sounds. About 100ms after N1, a subsequent auditory cortical component (P2) was not suppressed during talking. The combined N1 and P2 effects suggest that although sensory processing is suppressed as reflected in N1, perceptual gaps are filled as reflected in the lack of P2 suppression, explaining the discrepancy between sensory suppression and preserved sensory experiences. These findings, coupled with the coherence between relevant brain regions before and during speech, provide new mechanistic understanding of the complex interactions between action planning and sensory processing that provide for differentiated tagging and monitoring of one's own speech, processes disrupted in neuropsychiatric disorders. PMID:24423729
Stasenko, Alena; Bonn, Cory; Teghipco, Alex; Garcea, Frank E; Sweet, Catherine; Dombovy, Mary; McDonough, Joyce; Mahon, Bradford Z
2015-01-01
The debate about the causal role of the motor system in speech perception has been reignited by demonstrations that motor processes are engaged during the processing of speech sounds. Here, we evaluate which aspects of auditory speech processing are affected, and which are not, in a stroke patient with dysfunction of the speech motor system. We found that the patient showed a normal phonemic categorical boundary when discriminating two non-words that differ by a minimal pair (e.g., ADA-AGA). However, using the same stimuli, the patient was unable to identify or label the non-word stimuli (using a button-press response). A control task showed that he could identify speech sounds by speaker gender, ruling out a general labelling impairment. These data suggest that while the motor system is not causally involved in perception of the speech signal, it may be used when other cues (e.g., meaning, context) are not available.
Visual Feedback of Tongue Movement for Novel Speech Sound Learning
Katz, William F.; Mehta, Sonya
2015-01-01
Pronunciation training studies have yielded important information concerning the processing of audiovisual (AV) information. Second language (L2) learners show increased reliance on bottom-up, multimodal input for speech perception (compared to monolingual individuals). However, little is known about the role of viewing one's own speech articulation processes during speech training. The current study investigated whether real-time, visual feedback for tongue movement can improve a speaker's learning of non-native speech sounds. An interactive 3D tongue visualization system based on electromagnetic articulography (EMA) was used in a speech training experiment. Native speakers of American English produced a novel speech sound (/ɖ/; a voiced, coronal, palatal stop) before, during, and after trials in which they viewed their own speech movements using the 3D model. Talkers' productions were evaluated using kinematic (tongue-tip spatial positioning) and acoustic (burst spectra) measures. The results indicated a rapid gain in accuracy associated with visual feedback training. The findings are discussed with respect to neural models for multimodal speech processing. PMID:26635571
Multipath search coding of stationary signals with applications to speech
NASA Astrophysics Data System (ADS)
Fehn, H. G.; Noll, P.
1982-04-01
This paper deals with the application of multipath search coding (MSC) concepts to the coding of stationary memoryless and correlated sources, and of speech signals, at a rate of one bit per sample. Use is made of three MSC classes: (1) codebook coding, or vector quantization, (2) tree coding, and (3) trellis coding. This paper explains the performances of these coders and compares them both with those of conventional coders and with rate-distortion bounds. The potentials of MSC coding strategies are demonstrated by illustrations. The paper reports also on results of MSC coding of speech, where both the strategy of adaptive quantization and of adaptive prediction were included in coder design.
ERIC Educational Resources Information Center
Katongo, Emily Mwamba; Ndhlovu, Daniel
2015-01-01
This study sought to establish the role of music in speech intelligibility of learners with Post Lingual Hearing Impairment (PLHI) and strategies teachers used to enhance speech intelligibility in learners with PLHI in selected special units for the deaf in Lusaka district. The study used a descriptive research design. Qualitative and quantitative…
ERIC Educational Resources Information Center
Masouleh, Fatemeh Abdollahizadeh; Arjmandi, Masoumeh; Vahdany, Fereydoon
2014-01-01
This study deals with the application of the pragmatics research to EFL teaching. The need for language learners to utilize a form of speech acts such as request which involves a series of strategies was significance of the study. Although defining different speech acts has been established since 1960s, recently there has been a shift towards…
McGettigan, Carolyn; Rosen, Stuart; Scott, Sophie K.
2014-01-01
Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds “like a harsh whisper” (Scott et al., 2000, p. 2401). This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon, 2009). Some variability may arise from differences in peripheral representation (e.g., the degree of residual nerve survival) but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels), single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g., vowel duration). We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximize speech recognition performance in the absence of typical cues. PMID:24616669
Is complex signal processing for bone conduction hearing aids useful?
Kompis, Martin; Kurz, Anja; Pfiffner, Flurin; Senn, Pascal; Arnold, Andreas; Caversaccio, Marco
2014-05-01
To establish whether complex signal processing is beneficial for users of bone anchored hearing aids. Review and analysis of two studies from our own group, each comparing a speech processor with basic digital signal processing (either Baha Divino or Baha Intenso) and a processor with complex digital signal processing (either Baha BP100 or Baha BP110 power). The main differences between basic and complex signal processing are the number of audiologist accessible frequency channels and the availability and complexity of the directional multi-microphone noise reduction and loudness compression systems. Both studies show a small, statistically non-significant improvement of speech understanding in quiet with the complex digital signal processing. The average improvement for speech in noise is +0.9 dB, if speech and noise are emitted both from the front of the listener. If noise is emitted from the rear and speech from the front of the listener, the advantage of the devices with complex digital signal processing as opposed to those with basic signal processing increases, on average, to +3.2 dB (range +2.3 … +5.1 dB, p ≤ 0.0032). Complex digital signal processing does indeed improve speech understanding, especially in noise coming from the rear. This finding has been supported by another study, which has been published recently by a different research group. When compared to basic digital signal processing, complex digital signal processing can increase speech understanding of users of bone anchored hearing aids. The benefit is most significant for speech understanding in noise.
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This report on speech research contains 21 papers describing research conducted on a variety of topics concerning speech perception, processing, and production. The initial two reports deal with brain function in speech; several others concern ear function, both in terms of perception and information processing. A number of reports describe…
Walenski, Matthew; Swinney, David
2009-01-01
The central question underlying this study revolves around how children process co-reference relationships—such as those evidenced by pronouns (him) and reflexives (himself)—and how a slowed rate of speech input may critically affect this process. Previous studies of child language processing have demonstrated that typical language developing (TLD) children as young as 4 years of age process co-reference relations in a manner similar to adults on-line. In contrast, off-line measures of pronoun comprehension suggest a developmental delay for pronouns (relative to reflexives). The present study examines dependency relations in TLD children (ages 5–13) and investigates how a slowed rate of speech input affects the unconscious (on-line) and conscious (off-line) parsing of these constructions. For the on-line investigations (using a cross-modal picture priming paradigm), results indicate that at a normal rate of speech TLD children demonstrate adult-like syntactic reflexes. At a slowed rate of speech the typical language developing children displayed a breakdown in automatic syntactic parsing (again, similar to the pattern seen in unimpaired adults). As demonstrated in the literature, our off-line investigations (sentence/picture matching task) revealed that these children performed much better on reflexives than on pronouns at a regular speech rate. However, at the slow speech rate, performance on pronouns was substantially improved, whereas performance on reflexives was not different than at the regular speech rate. We interpret these results in light of a distinction between fast automatic processes (relied upon for on-line processing in real time) and conscious reflective processes (relied upon for off-line processing), such that slowed speech input disrupts the former, yet improves the latter. PMID:19343495
Engaged listeners: shared neural processing of powerful political speeches
Häcker, Frank E. K.; Honey, Christopher J.; Hasson, Uri
2015-01-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners’ brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. PMID:25653012
Processing of speech signals for physical and sensory disabilities.
Levitt, H
1995-01-01
Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities. Images Fig. 4 PMID:7479816
Loss tolerant speech decoder for telecommunications
NASA Technical Reports Server (NTRS)
Prieto, Jr., Jaime L. (Inventor)
1999-01-01
A method and device for extrapolating past signal-history data for insertion into missing data segments in order to conceal digital speech frame errors. The extrapolation method uses past-signal history that is stored in a buffer. The method is implemented with a device that utilizes a finite-impulse response (FIR) multi-layer feed-forward artificial neural network that is trained by back-propagation for one-step extrapolation of speech compression algorithm (SCA) parameters. Once a speech connection has been established, the speech compression algorithm device begins sending encoded speech frames. As the speech frames are received, they are decoded and converted back into speech signal voltages. During the normal decoding process, pre-processing of the required SCA parameters will occur and the results stored in the past-history buffer. If a speech frame is detected to be lost or in error, then extrapolation modules are executed and replacement SCA parameters are generated and sent as the parameters required by the SCA. In this way, the information transfer to the SCA is transparent, and the SCA processing continues as usual. The listener will not normally notice that a speech frame has been lost because of the smooth transition between the last-received, lost, and next-received speech frames.
Processing of Speech Signals for Physical and Sensory Disabilities
NASA Astrophysics Data System (ADS)
Levitt, Harry
1995-10-01
Assistive technology involving voice communication is used primarily by people who are deaf, hard of hearing, or who have speech and/or language disabilities. It is also used to a lesser extent by people with visual or motor disabilities. A very wide range of devices has been developed for people with hearing loss. These devices can be categorized not only by the modality of stimulation [i.e., auditory, visual, tactile, or direct electrical stimulation of the auditory nerve (auditory-neural)] but also in terms of the degree of speech processing that is used. At least four such categories can be distinguished: assistive devices (a) that are not designed specifically for speech, (b) that take the average characteristics of speech into account, (c) that process articulatory or phonetic characteristics of speech, and (d) that embody some degree of automatic speech recognition. Assistive devices for people with speech and/or language disabilities typically involve some form of speech synthesis or symbol generation for severe forms of language disability. Speech synthesis is also used in text-to-speech systems for sightless persons. Other applications of assistive technology involving voice communication include voice control of wheelchairs and other devices for people with mobility disabilities.
Modeling Driving Performance Using In-Vehicle Speech Data From a Naturalistic Driving Study.
Kuo, Jonny; Charlton, Judith L; Koppel, Sjaan; Rudin-Brown, Christina M; Cross, Suzanne
2016-09-01
We aimed to (a) describe the development and application of an automated approach for processing in-vehicle speech data from a naturalistic driving study (NDS), (b) examine the influence of child passenger presence on driving performance, and (c) model this relationship using in-vehicle speech data. Parent drivers frequently engage in child-related secondary behaviors, but the impact on driving performance is unknown. Applying automated speech-processing techniques to NDS audio data would facilitate the analysis of in-vehicle driver-child interactions and their influence on driving performance. Speech activity detection and speaker diarization algorithms were applied to audio data from a Melbourne-based NDS involving 42 families. Multilevel models were developed to evaluate the effect of speech activity and the presence of child passengers on driving performance. Speech activity was significantly associated with velocity and steering angle variability. Child passenger presence alone was not associated with changes in driving performance. However, speech activity in the presence of two child passengers was associated with the most variability in driving performance. The effects of in-vehicle speech on driving performance in the presence of child passengers appear to be heterogeneous, and multiple factors may need to be considered in evaluating their impact. This goal can potentially be achieved within large-scale NDS through the automated processing of observational data, including speech. Speech-processing algorithms enable new perspectives on driving performance to be gained from existing NDS data, and variables that were once labor-intensive to process can be readily utilized in future research. © 2016, Human Factors and Ergonomics Society.
Visemic Processing in Audiovisual Discrimination of Natural Speech: A Simultaneous fMRI-EEG Study
ERIC Educational Resources Information Center
Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle
2012-01-01
In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
Vilela, Nadia; Barrozo, Tatiane Faria; Pagan-Neves, Luciana de Oliveira; Sanches, Seisse Gabriela Gandolfi; Wertzner, Haydée Fiszbein; Carvallo, Renata Mota Mamede
2016-02-01
To identify a cutoff value based on the Percentage of Consonants Correct-Revised index that could indicate the likelihood of a child with a speech-sound disorder also having a (central) auditory processing disorder . Language, audiological and (central) auditory processing evaluations were administered. The participants were 27 subjects with speech-sound disorders aged 7 to 10 years and 11 months who were divided into two different groups according to their (central) auditory processing evaluation results. When a (central) auditory processing disorder was present in association with a speech disorder, the children tended to have lower scores on phonological assessments. A greater severity of speech disorder was related to a greater probability of the child having a (central) auditory processing disorder. The use of a cutoff value for the Percentage of Consonants Correct-Revised index successfully distinguished between children with and without a (central) auditory processing disorder. The severity of speech-sound disorder in children was influenced by the presence of (central) auditory processing disorder. The attempt to identify a cutoff value based on a severity index was successful.
Issues in Perceptual Speech Analysis in Cleft Palate and Related Disorders: A Review
ERIC Educational Resources Information Center
Sell, Debbie
2005-01-01
Perceptual speech assessment is central to the evaluation of speech outcomes associated with cleft palate and velopharyngeal dysfunction. However, the complexity of this process is perhaps sometimes underestimated. To draw together the many different strands in the complex process of perceptual speech assessment and analysis, and make…
The right hemisphere is highlighted in connected natural speech production and perception.
Alexandrou, Anna Maria; Saarinen, Timo; Mäkelä, Sasu; Kujala, Jan; Salmelin, Riitta
2017-05-15
Current understanding of the cortical mechanisms of speech perception and production stems mostly from studies that focus on single words or sentences. However, it has been suggested that processing of real-life connected speech may rely on additional cortical mechanisms. In the present study, we examined the neural substrates of natural speech production and perception with magnetoencephalography by modulating three central features related to speech: amount of linguistic content, speaking rate and social relevance. The amount of linguistic content was modulated by contrasting natural speech production and perception to speech-like non-linguistic tasks. Meaningful speech was produced and perceived at three speaking rates: normal, slow and fast. Social relevance was probed by having participants attend to speech produced by themselves and an unknown person. These speech-related features were each associated with distinct spatiospectral modulation patterns that involved cortical regions in both hemispheres. Natural speech processing markedly engaged the right hemisphere in addition to the left. In particular, the right temporo-parietal junction, previously linked to attentional processes and social cognition, was highlighted in the task modulations. The present findings suggest that its functional role extends to active generation and perception of meaningful, socially relevant speech. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Nittrouer, Susan; Burton, Lisa Thuente
2005-01-01
This study tested the hypothesis that early language experience facilitates the development of language-specific perceptual weighting strategies believed to be critical for accessing phonetic structure. In turn, that structure allows for efficient storage and retrieval of words in verbal working memory, which is necessary for sentence…
Review of Visual Speech Perception by Hearing and Hearing-Impaired People: Clinical Implications
ERIC Educational Resources Information Center
Woodhouse, Lynn; Hickson, Louise; Dodd, Barbara
2009-01-01
Background: Speech perception is often considered specific to the auditory modality, despite convincing evidence that speech processing is bimodal. The theoretical and clinical roles of speech-reading for speech perception, however, have received little attention in speech-language therapy. Aims: The role of speech-read information for speech…
Method and apparatus for obtaining complete speech signals for speech recognition applications
NASA Technical Reports Server (NTRS)
Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)
2009-01-01
The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.
Choi, Ja Young; Hu, Elly R; Perrachione, Tyler K
2018-04-01
The nondeterministic relationship between speech acoustics and abstract phonemic representations imposes a challenge for listeners to maintain perceptual constancy despite the highly variable acoustic realization of speech. Talker normalization facilitates speech processing by reducing the degrees of freedom for mapping between encountered speech and phonemic representations. While this process has been proposed to facilitate the perception of ambiguous speech sounds, it is currently unknown whether talker normalization is affected by the degree of potential ambiguity in acoustic-phonemic mapping. We explored the effects of talker normalization on speech processing in a series of speeded classification paradigms, parametrically manipulating the potential for inconsistent acoustic-phonemic relationships across talkers for both consonants and vowels. Listeners identified words with varying potential acoustic-phonemic ambiguity across talkers (e.g., beet/boat vs. boot/boat) spoken by single or mixed talkers. Auditory categorization of words was always slower when listening to mixed talkers compared to a single talker, even when there was no potential acoustic ambiguity between target sounds. Moreover, the processing cost imposed by mixed talkers was greatest when words had the most potential acoustic-phonemic overlap across talkers. Models of acoustic dissimilarity between target speech sounds did not account for the pattern of results. These results suggest (a) that talker normalization incurs the greatest processing cost when disambiguating highly confusable sounds and (b) that talker normalization appears to be an obligatory component of speech perception, taking place even when the acoustic-phonemic relationships across sounds are unambiguous.
EEG oscillations entrain their phase to high-level features of speech sound.
Zoefel, Benedikt; VanRullen, Rufin
2016-01-01
Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech. Copyright © 2015 Elsevier Inc. All rights reserved.
Cameron, Ashley; McPhail, Steven; Hudson, Kyla; Fleming, Jennifer; Lethlean, Jennifer; Tan, Ngang Ju; Finch, Emma
2018-06-01
The aim of the study was to describe and compare the confidence and knowledge of health professionals (HPs) with and without specialized speech-language training for communicating with people with aphasia (PWA) in a metropolitan hospital setting. Ninety HPs from multidisciplinary teams completed a customized survey to identify their demographic information, knowledge of aphasia, current use of supported conversation strategies and overall communication confidence when interacting with PWA using a 100 mm visual analogue scale (VAS) to rate open-ended questions. Conventional descriptive statistics were used to examine the demographic information. Descriptive statistics and the Mann-Whitney U test were used to analyse VAS confidence rating data. The responses to the open-ended survey questions were grouped into four previously identified key categories. The HPs consisted of 22 (24.4%) participants who were speech-language pathologists and 68 (75.6%) participants from other disciplines (non-speech-language pathology HPs, non-SLP HPs). The non-SLP HPs reported significantly lower confidence levels (U = 159.0, p < 0.001, two-tailed) and identified fewer strategies for communicating effectively with PWA than the trained speech-language pathologists. The non-SLP HPs identified a median of two strategies identified [interquartile range (IQR) 1-3] in contrast to the speech-language pathologists who identified a median of eight strategies (IQR 7-12). These findings suggest that HPs, particularly those without specialized communication education, are likely to benefit from formal training to enhance their confidence, skills and ability to successfully communicate with PWA in their work environment. This may in turn increase the involvement of PWA in their health care decisions. Implications for Rehabilitation Interventions to remediate health professional's (particularly non-speech-language pathology health professionals) lower levels of confidence and ability to communicate with PWA may ultimately help ensure equal access for PWA. Promote informed collaborative decision-making, and foster patient-centred care within the health care setting.
Masked speech perception across the adult lifespan: Impact of age and hearing impairment.
Goossens, Tine; Vercammen, Charlotte; Wouters, Jan; van Wieringen, Astrid
2017-02-01
As people grow older, speech perception difficulties become highly prevalent, especially in noisy listening situations. Moreover, it is assumed that speech intelligibility is more affected in the event of background noises that induce a higher cognitive load, i.e., noises that result in informational versus energetic masking. There is ample evidence showing that speech perception problems in aging persons are partly due to hearing impairment and partly due to age-related declines in cognition and suprathreshold auditory processing. In order to develop effective rehabilitation strategies, it is indispensable to know how these different degrading factors act upon speech perception. This implies disentangling effects of hearing impairment versus age and examining the interplay between both factors in different background noises of everyday settings. To that end, we investigated open-set sentence identification in six participant groups: a young (20-30 years), middle-aged (50-60 years), and older cohort (70-80 years), each including persons who had normal audiometric thresholds up to at least 4 kHz, on the one hand, and persons who were diagnosed with elevated audiometric thresholds, on the other hand. All participants were screened for (mild) cognitive impairment. We applied stationary and amplitude modulated speech-weighted noise, which are two types of energetic maskers, and unintelligible speech, which causes informational masking in addition to energetic masking. By means of these different background noises, we could look into speech perception performance in listening situations with a low and high cognitive load, respectively. Our results indicate that, even when audiometric thresholds are within normal limits up to 4 kHz, irrespective of threshold elevations at higher frequencies, and there is no indication of even mild cognitive impairment, masked speech perception declines by middle age and decreases further on to older age. The impact of hearing impairment is as detrimental for young and middle-aged as it is for older adults. When the background noise becomes cognitively more demanding, there is a larger decline in speech perception, due to age or hearing impairment. Hearing impairment seems to be the main factor underlying speech perception problems in background noises that cause energetic masking. However, in the event of informational masking, which induces a higher cognitive load, age appears to explain a significant part of the communicative impairment as well. We suggest that the degrading effect of age is mediated by deficiencies in temporal processing and central executive functions. This study may contribute to the improvement of auditory rehabilitation programs aiming to prevent aging persons from missing out on conversations, which, in turn, will improve their quality of life. Copyright © 2016 Elsevier B.V. All rights reserved.
Scaling and universality in the human voice.
Luque, Jordi; Luque, Bartolo; Lacasa, Lucas
2015-04-06
Speech is a distinctive complex feature of human capabilities. In order to understand the physics underlying speech production, in this work, we empirically analyse the statistics of large human speech datasets ranging several languages. We first show that during speech, the energy is unevenly released and power-law distributed, reporting a universal robust Gutenberg-Richter-like law in speech. We further show that such 'earthquakes in speech' show temporal correlations, as the interevent statistics are again power-law distributed. As this feature takes place in the intraphoneme range, we conjecture that the process responsible for this complex phenomenon is not cognitive, but it resides in the physiological (mechanical) mechanisms of speech production. Moreover, we show that these waiting time distributions are scale invariant under a renormalization group transformation, suggesting that the process of speech generation is indeed operating close to a critical point. These results are put in contrast with current paradigms in speech processing, which point towards low dimensional deterministic chaos as the origin of nonlinear traits in speech fluctuations. As these latter fluctuations are indeed the aspects that humanize synthetic speech, these findings may have an impact in future speech synthesis technologies. Results are robust and independent of the communication language or the number of speakers, pointing towards a universal pattern and yet another hint of complexity in human speech. © 2015 The Author(s) Published by the Royal Society. All rights reserved.
NASA Astrophysics Data System (ADS)
Tallal, Paula; Miller, Steve L.; Bedi, Gail; Byma, Gary; Wang, Xiaoqin; Nagarajan, Srikantan S.; Schreiner, Christoph; Jenkins, William M.; Merzenich, Michael M.
1996-01-01
A speech processing algorithm was developed to create more salient versions of the rapidly changing elements in the acoustic waveform of speech that have been shown to be deficiently processed by language-learning impaired (LLI) children. LLI children received extensive daily training, over a 4-week period, with listening exercises in which all speech was translated into this synthetic form. They also received daily training with computer "games" designed to adaptively drive improvements in temporal processing thresholds. Significant improvements in speech discrimination and language comprehension abilities were demonstrated in two independent groups of LLI children.
Don’t speak too fast! Processing of fast rate speech in children with specific language impairment
Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique
2018-01-01
Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610
Brozovich, Faith A; Heimberg, Richard G
2013-12-01
The present study investigated whether post-event processing (PEP) involving mental imagery about a past speech is particularly detrimental for socially anxious individuals who are currently anticipating giving a speech. One hundred fourteen high and low socially anxious participants were told they would give a 5 min impromptu speech at the end of the experimental session. They were randomly assigned to one of three manipulation conditions: post-event processing about a past speech incorporating imagery (PEP-Imagery), semantic post-event processing about a past speech (PEP-Semantic), or a control condition, (n=19 per experimental group, per condition [high vs low socially anxious]). After the condition inductions, individuals' anxiety, their predictions of performance in the anticipated speech, and their interpretations of other ambiguous social events were measured. Consistent with predictions, high socially anxious individuals in the PEP-Imagery condition displayed greater anxiety than individuals in the other conditions immediately following the induction and before the anticipated speech task. They also interpreted ambiguous social scenarios in a more socially anxious manner than socially anxious individuals in the control condition. High socially anxious individuals made more negative predictions about their upcoming speech performance than low anxious participants in all conditions. The impact of imagery during post-event processing in social anxiety and its implications are discussed. © 2013.
Central Presbycusis: A Review and Evaluation of the Evidence
Humes, Larry E.; Dubno, Judy R.; Gordon-Salant, Sandra; Lister, Jennifer J.; Cacace, Anthony T.; Cruickshanks, Karen J.; Gates, George A.; Wilson, Richard H.; Wingfield, Arthur
2018-01-01
Background The authors reviewed the evidence regarding the existence of age-related declines in central auditory processes and the consequences of any such declines for everyday communication. Purpose This report summarizes the review process and presents its findings. Data Collection and Analysis The authors reviewed 165 articles germane to central presbycusis. Of the 165 articles, 132 articles with a focus on human behavioral measures for either speech or nonspeech stimuli were selected for further analysis. Results For 76 smaller-scale studies of speech understanding in older adults reviewed, the following findings emerged: (1) the three most commonly studied behavioral measures were speech in competition, temporally distorted speech, and binaural speech perception (especially dichotic listening); (2) for speech in competition and temporally degraded speech, hearing loss proved to have a significant negative effect on performance in most of the laboratory studies; (3) significant negative effects of age, unconfounded by hearing loss, were observed in most of the studies of speech in competing speech, time-compressed speech, and binaural speech perception; and (4) the influence of cognitive processing on speech understanding has been examined much less frequently, but when included, significant positive associations with speech understanding were observed. For 36 smaller-scale studies of the perception of nonspeech stimuli by older adults reviewed, the following findings emerged: (1) the three most frequently studied behavioral measures were gap detection, temporal discrimination, and temporal-order discrimination or identification; (2) hearing loss was seldom a significant factor; and (3) negative effects of age were almost always observed. For 18 studies reviewed that made use of test batteries and medium-to-large sample sizes, the following findings emerged: (1) all studies included speech-based measures of auditory processing; (2) 4 of the 18 studies included nonspeech stimuli; (3) for the speech-based measures, monaural speech in a competing-speech background, dichotic speech, and monaural time-compressed speech were investigated most frequently; (4) the most frequently used tests were the Synthetic Sentence Identification (SSI) test with Ipsilateral Competing Message (ICM), the Dichotic Sentence Identification (DSI) test, and time-compressed speech; (5) many of these studies using speech-based measures reported significant effects of age, but most of these studies were confounded by declines in hearing, cognition, or both; (6) for nonspeech auditory-processing measures, the focus was on measures of temporal processing in all four studies; (7) effects of cognition on nonspeech measures of auditory processing have been studied less frequently, with mixed results, whereas the effects of hearing loss on performance were minimal due to judicious selection of stimuli; and (8) there is a paucity of observational studies using test batteries and longitudinal designs. Conclusions Based on this review of the scientific literature, there is insufficient evidence to confirm the existence of central presbycusis as an isolated entity. On the other hand, recent evidence has been accumulating in support of the existence of central presbycusis as a multifactorial condition that involves age- and/or disease-related changes in the auditory system and in the brain. Moreover, there is a clear need for additional research in this area. PMID:22967738
"Look What I Did!": Student Conferences with Text-to-Speech Software
ERIC Educational Resources Information Center
Young, Chase; Stover, Katie
2014-01-01
The authors describe a strategy that empowers students to edit and revise their own writing. Students input their writing in to text-to-speech software that rereads the text aloud. While listening, students make necessary revisions and edits.
ERIC Educational Resources Information Center
Richard, Gail J.; Hoge, Debra Reichert
Designed for practicing speech-language pathologists, this book discusses different syndrome disabilities, pertinent speech-language characteristics, and goals and strategies to begin intervention efforts at a preschool level. Chapters address: (1) Angelman syndrome; (2) Asperger syndrome; (3) Down syndrome; (4) fetal alcohol syndrome; (5) fetal…
Optimal speech motor control and token-to-token variability: a Bayesian modeling approach.
Patri, Jean-François; Diard, Julien; Perrier, Pascal
2015-12-01
The remarkable capacity of the speech motor system to adapt to various speech conditions is due to an excess of degrees of freedom, which enables producing similar acoustical properties with different sets of control strategies. To explain how the central nervous system selects one of the possible strategies, a common approach, in line with optimal motor control theories, is to model speech motor planning as the solution of an optimality problem based on cost functions. Despite the success of this approach, one of its drawbacks is the intrinsic contradiction between the concept of optimality and the observed experimental intra-speaker token-to-token variability. The present paper proposes an alternative approach by formulating feedforward optimal control in a probabilistic Bayesian modeling framework. This is illustrated by controlling a biomechanical model of the vocal tract for speech production and by comparing it with an existing optimal control model (GEPPETO). The essential elements of this optimal control model are presented first. From them the Bayesian model is constructed in a progressive way. Performance of the Bayesian model is evaluated based on computer simulations and compared to the optimal control model. This approach is shown to be appropriate for solving the speech planning problem while accounting for variability in a principled way.
Automatic Speech Recognition from Neural Signals: A Focused Review.
Herff, Christian; Schultz, Tanja
2016-01-01
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Schoof, Tim; Rosen, Stuart
2014-01-01
Normal-hearing older adults often experience increased difficulties understanding speech in noise. In addition, they benefit less from amplitude fluctuations in the masker. These difficulties may be attributed to an age-related auditory temporal processing deficit. However, a decline in cognitive processing likely also plays an important role. This study examined the relative contribution of declines in both auditory and cognitive processing to the speech in noise performance in older adults. Participants included older (60–72 years) and younger (19–29 years) adults with normal hearing. Speech reception thresholds (SRTs) were measured for sentences in steady-state speech-shaped noise (SS), 10-Hz sinusoidally amplitude-modulated speech-shaped noise (AM), and two-talker babble. In addition, auditory temporal processing abilities were assessed by measuring thresholds for gap, amplitude-modulation, and frequency-modulation detection. Measures of processing speed, attention, working memory, Text Reception Threshold (a visual analog of the SRT), and reading ability were also obtained. Of primary interest was the extent to which the various measures correlate with listeners' abilities to perceive speech in noise. SRTs were significantly worse for older adults in the presence of two-talker babble but not SS and AM noise. In addition, older adults showed some cognitive processing declines (working memory and processing speed) although no declines in auditory temporal processing. However, working memory and processing speed did not correlate significantly with SRTs in babble. Despite declines in cognitive processing, normal-hearing older adults do not necessarily have problems understanding speech in noise as SRTs in SS and AM noise did not differ significantly between the two groups. Moreover, while older adults had higher SRTs in two-talker babble, this could not be explained by age-related cognitive declines in working memory or processing speed. PMID:25429266
Near-Term Fetuses Process Temporal Features of Speech
ERIC Educational Resources Information Center
Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie
2011-01-01
The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…
The Downside of Greater Lexical Influences: Selectively Poorer Speech Perception in Noise
ERIC Educational Resources Information Center
Lam, Boji P. W.; Xie, Zilong; Tessmer, Rachel; Chandrasekaran, Bharath
2017-01-01
Purpose: Although lexical information influences phoneme perception, the extent to which reliance on lexical information enhances speech processing in challenging listening environments is unclear. We examined the extent to which individual differences in lexical influences on phonemic processing impact speech processing in maskers containing…
Key considerations in designing a speech brain-computer interface.
Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Chabardès, Stéphan; Yvert, Blaise
2016-11-01
Restoring communication in case of aphasia is a key challenge for neurotechnologies. To this end, brain-computer strategies can be envisioned to allow artificial speech synthesis from the continuous decoding of neural signals underlying speech imagination. Such speech brain-computer interfaces do not exist yet and their design should consider three key choices that need to be made: the choice of appropriate brain regions to record neural activity from, the choice of an appropriate recording technique, and the choice of a neural decoding scheme in association with an appropriate speech synthesis method. These key considerations are discussed here in light of (1) the current understanding of the functional neuroanatomy of cortical areas underlying overt and covert speech production, (2) the available literature making use of a variety of brain recording techniques to better characterize and address the challenge of decoding cortical speech signals, and (3) the different speech synthesis approaches that can be considered depending on the level of speech representation (phonetic, acoustic or articulatory) envisioned to be decoded at the core of a speech BCI paradigm. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Sleep Disrupts High-Level Speech Parsing Despite Significant Basic Auditory Processing.
Makov, Shiri; Sharon, Omer; Ding, Nai; Ben-Shachar, Michal; Nir, Yuval; Zion Golumbic, Elana
2017-08-09
The extent to which the sleeping brain processes sensory information remains unclear. This is particularly true for continuous and complex stimuli such as speech, in which information is organized into hierarchically embedded structures. Recently, novel metrics for assessing the neural representation of continuous speech have been developed using noninvasive brain recordings that have thus far only been tested during wakefulness. Here we investigated, for the first time, the sleeping brain's capacity to process continuous speech at different hierarchical levels using a newly developed Concurrent Hierarchical Tracking (CHT) approach that allows monitoring the neural representation and processing-depth of continuous speech online. Speech sequences were compiled with syllables, words, phrases, and sentences occurring at fixed time intervals such that different linguistic levels correspond to distinct frequencies. This enabled us to distinguish their neural signatures in brain activity. We compared the neural tracking of intelligible versus unintelligible (scrambled and foreign) speech across states of wakefulness and sleep using high-density EEG in humans. We found that neural tracking of stimulus acoustics was comparable across wakefulness and sleep and similar across all conditions regardless of speech intelligibility. In contrast, neural tracking of higher-order linguistic constructs (words, phrases, and sentences) was only observed for intelligible speech during wakefulness and could not be detected at all during nonrapid eye movement or rapid eye movement sleep. These results suggest that, whereas low-level auditory processing is relatively preserved during sleep, higher-level hierarchical linguistic parsing is severely disrupted, thereby revealing the capacity and limits of language processing during sleep. SIGNIFICANCE STATEMENT Despite the persistence of some sensory processing during sleep, it is unclear whether high-level cognitive processes such as speech parsing are also preserved. We used a novel approach for studying the depth of speech processing across wakefulness and sleep while tracking neuronal activity with EEG. We found that responses to the auditory sound stream remained intact; however, the sleeping brain did not show signs of hierarchical parsing of the continuous stream of syllables into words, phrases, and sentences. The results suggest that sleep imposes a functional barrier between basic sensory processing and high-level cognitive processing. This paradigm also holds promise for studying residual cognitive abilities in a wide array of unresponsive states. Copyright © 2017 the authors 0270-6474/17/377772-10$15.00/0.
ERIC Educational Resources Information Center
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Herve
2018-01-01
To communicate, children must discriminate and identify speech sounds. Because visual speech plays an important role in this process, we explored how visual speech influences phoneme discrimination and identification by children. Critical items had intact visual speech (e.g. baez) coupled to non-intact (excised onsets) auditory speech (signified…
Temporal processing of speech in a time-feature space
NASA Astrophysics Data System (ADS)
Avendano, Carlos
1997-09-01
The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.
Multi-sensory learning and learning to read.
Blomert, Leo; Froyen, Dries
2010-09-01
The basis of literacy acquisition in alphabetic orthographies is the learning of the associations between the letters and the corresponding speech sounds. In spite of this primacy in learning to read, there is only scarce knowledge on how this audiovisual integration process works and which mechanisms are involved. Recent electrophysiological studies of letter-speech sound processing have revealed that normally developing readers take years to automate these associations and dyslexic readers hardly exhibit automation of these associations. It is argued that the reason for this effortful learning may reside in the nature of the audiovisual process that is recruited for the integration of in principle arbitrarily linked elements. It is shown that letter-speech sound integration does not resemble the processes involved in the integration of natural audiovisual objects such as audiovisual speech. The automatic symmetrical recruitment of the assumedly uni-sensory visual and auditory cortices in audiovisual speech integration does not occur for letter and speech sound integration. It is also argued that letter-speech sound integration only partly resembles the integration of arbitrarily linked unfamiliar audiovisual objects. Letter-sound integration and artificial audiovisual objects share the necessity of a narrow time window for integration to occur. However, they differ from these artificial objects, because they constitute an integration of partly familiar elements which acquire meaning through the learning of an orthography. Although letter-speech sound pairs share similarities with audiovisual speech processing as well as with unfamiliar, arbitrary objects, it seems that letter-speech sound pairs develop into unique audiovisual objects that furthermore have to be processed in a unique way in order to enable fluent reading and thus very likely recruit other neurobiological learning mechanisms than the ones involved in learning natural or arbitrary unfamiliar audiovisual associations. Copyright 2010 Elsevier B.V. All rights reserved.
Temporal factors affecting somatosensory–auditory interactions in speech processing
Ito, Takayuki; Gracco, Vincent L.; Ostry, David J.
2014-01-01
Speech perception is known to rely on both auditory and visual information. However, sound-specific somatosensory input has been shown also to influence speech perceptual processing (Ito et al., 2009). In the present study, we addressed further the relationship between somatosensory information and speech perceptual processing by addressing the hypothesis that the temporal relationship between orofacial movement and sound processing contributes to somatosensory–auditory interaction in speech perception. We examined the changes in event-related potentials (ERPs) in response to multisensory synchronous (simultaneous) and asynchronous (90 ms lag and lead) somatosensory and auditory stimulation compared to individual unisensory auditory and somatosensory stimulation alone. We used a robotic device to apply facial skin somatosensory deformations that were similar in timing and duration to those experienced in speech production. Following synchronous multisensory stimulation the amplitude of the ERP was reliably different from the two unisensory potentials. More importantly, the magnitude of the ERP difference varied as a function of the relative timing of the somatosensory–auditory stimulation. Event-related activity change due to stimulus timing was seen between 160 and 220 ms following somatosensory onset, mostly around the parietal area. The results demonstrate a dynamic modulation of somatosensory–auditory convergence and suggest the contribution of somatosensory information for speech processing process is dependent on the specific temporal order of sensory inputs in speech production. PMID:25452733
Neural Tuning to Low-Level Features of Speech throughout the Perisylvian Cortex.
Berezutskaya, Julia; Freudenburg, Zachary V; Güçlü, Umut; van Gerven, Marcel A J; Ramsey, Nick F
2017-08-16
Despite a large body of research, we continue to lack a detailed account of how auditory processing of continuous speech unfolds in the human brain. Previous research showed the propagation of low-level acoustic features of speech from posterior superior temporal gyrus toward anterior superior temporal gyrus in the human brain (Hullett et al., 2016). In this study, we investigate what happens to these neural representations past the superior temporal gyrus and how they engage higher-level language processing areas such as inferior frontal gyrus. We used low-level sound features to model neural responses to speech outside of the primary auditory cortex. Two complementary imaging techniques were used with human participants (both males and females): electrocorticography (ECoG) and fMRI. Both imaging techniques showed tuning of the perisylvian cortex to low-level speech features. With ECoG, we found evidence of propagation of the temporal features of speech sounds along the ventral pathway of language processing in the brain toward inferior frontal gyrus. Increasingly coarse temporal features of speech spreading from posterior superior temporal cortex toward inferior frontal gyrus were associated with linguistic features such as voice onset time, duration of the formant transitions, and phoneme, syllable, and word boundaries. The present findings provide the groundwork for a comprehensive bottom-up account of speech comprehension in the human brain. SIGNIFICANCE STATEMENT We know that, during natural speech comprehension, a broad network of perisylvian cortical regions is involved in sound and language processing. Here, we investigated the tuning to low-level sound features within these regions using neural responses to a short feature film. We also looked at whether the tuning organization along these brain regions showed any parallel to the hierarchy of language structures in continuous speech. Our results show that low-level speech features propagate throughout the perisylvian cortex and potentially contribute to the emergence of "coarse" speech representations in inferior frontal gyrus typically associated with high-level language processing. These findings add to the previous work on auditory processing and underline a distinctive role of inferior frontal gyrus in natural speech comprehension. Copyright © 2017 the authors 0270-6474/17/377906-15$15.00/0.
Terband, H.; Maassen, B.; Guenther, F.H.; Brumberg, J.
2014-01-01
Background/Purpose Differentiating the symptom complex due to phonological-level disorders, speech delay and pediatric motor speech disorders is a controversial issue in the field of pediatric speech and language pathology. The present study investigated the developmental interaction between neurological deficits in auditory and motor processes using computational modeling with the DIVA model. Method In a series of computer simulations, we investigated the effect of a motor processing deficit alone (MPD), and the effect of a motor processing deficit in combination with an auditory processing deficit (MPD+APD) on the trajectory and endpoint of speech motor development in the DIVA model. Results Simulation results showed that a motor programming deficit predominantly leads to deterioration on the phonological level (phonemic mappings) when auditory self-monitoring is intact, and on the systemic level (systemic mapping) if auditory self-monitoring is impaired. Conclusions These findings suggest a close relation between quality of auditory self-monitoring and the involvement of phonological vs. motor processes in children with pediatric motor speech disorders. It is suggested that MPD+APD might be involved in typically apraxic speech output disorders and MPD in pediatric motor speech disorders that also have a phonological component. Possibilities to verify these hypotheses using empirical data collected from human subjects are discussed. PMID:24491630
Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela
2015-01-01
Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes. PMID:26233047
Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela
2015-07-01
Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes.
Using speech sounds to test functional spectral resolution in listeners with cochlear implants
Winn, Matthew B.; Litovsky, Ruth Y.
2015-01-01
In this study, spectral properties of speech sounds were used to test functional spectral resolution in people who use cochlear implants (CIs). Specifically, perception of the /ba/-/da/ contrast was tested using two spectral cues: Formant transitions (a fine-resolution cue) and spectral tilt (a coarse-resolution cue). Higher weighting of the formant cues was used as an index of better spectral cue perception. Participants included 19 CI listeners and 10 listeners with normal hearing (NH), for whom spectral resolution was explicitly controlled using a noise vocoder with variable carrier filter widths to simulate electrical current spread. Perceptual weighting of the two cues was modeled with mixed-effects logistic regression, and was found to systematically vary with spectral resolution. The use of formant cues was greatest for NH listeners for unprocessed speech, and declined in the two vocoded conditions. Compared to NH listeners, CI listeners relied less on formant transitions, and more on spectral tilt. Cue-weighting results showed moderately good correspondence with word recognition scores. The current approach to testing functional spectral resolution uses auditory cues that are known to be important for speech categorization, and can thus potentially serve as the basis upon which CI processing strategies and innovations are tested. PMID:25786954
The interaction of acoustic and linguistic grouping cues in auditory object formation
NASA Astrophysics Data System (ADS)
Shapley, Kathy; Carrell, Thomas
2005-09-01
One of the earliest explanations for good speech intelligibility in poor listening situations was context [Miller et al., J. Exp. Psychol. 41 (1951)]. Context presumably allows listeners to group and predict speech appropriately and is known as a top-down listening strategy. Amplitude comodulation is another mechanism that has been shown to improve sentence intelligibility. Amplitude comodulation provides acoustic grouping information without changing the linguistic content of the desired signal [Carrell and Opie, Percept. Psychophys. 52 (1992); Hu and Wang, Proceedings of ICASSP-02 (2002)] and is considered a bottom-up process. The present experiment investigated how amplitude comodulation and semantic information combined to improve speech intelligibility. Sentences with high- and low-predictability word sequences [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84 (1988)] were constructed in two different formats: time-varying sinusoidal sentences (TVS) and reduced-channel sentences (RC). The stimuli were chosen because they minimally represent the traditionally defined speech cues and therefore emphasized the importance of the high-level context effects and low-level acoustic grouping cues. Results indicated that semantic information did not influence intelligibility levels of TVS and RC sentences. In addition amplitude modulation aided listeners' intelligibility scores in the TVS condition but hindered listeners' intelligibility scores in the RC condition.
Improving speech perception in noise for children with cochlear implants.
Gifford, René H; Olund, Amy P; Dejong, Melissa
2011-10-01
Current cochlear implant recipients are achieving increasingly higher levels of speech recognition; however, the presence of background noise continues to significantly degrade speech understanding for even the best performers. Newer generation Nucleus cochlear implant sound processors can be programmed with SmartSound strategies that have been shown to improve speech understanding in noise for adult cochlear implant recipients. The applicability of these strategies for use in children, however, is not fully understood nor widely accepted. To assess speech perception for pediatric cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE™) array in order to determine whether Nucleus sound processor SmartSound strategies yield improved sentence recognition in noise for children who learn language through the implant. Single subject, repeated measures design. Twenty-two experimental subjects with cochlear implants (mean age 11.1 yr) and 25 control subjects with normal hearing (mean age 9.6 yr) participated in this prospective study. Speech reception thresholds (SRT) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the experimental subjects' everyday program incorporating Adaptive Dynamic Range Optimization (ADRO) as well as with the addition of Autosensitivity control (ASC). Adaptive SRTs with the Hearing In Noise Test (HINT) sentences were obtained for all 22 experimental subjects, and performance-in percent correct-was assessed in a fixed +6 dB SNR (signal-to-noise ratio) for a six-subject subset. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the SmartSound setting on the SRT in noise. The primary findings mirrored those reported previously with adult cochlear implant recipients in that the addition of ASC to ADRO significantly improved speech recognition in noise for pediatric cochlear implant recipients. The mean degree of improvement in the SRT with the addition of ASC to ADRO was 3.5 dB for a mean SRT of 10.9 dB SNR. Thus, despite the fact that these children have acquired auditory/oral speech and language through the use of their cochlear implant(s) equipped with ADRO, the addition of ASC significantly improved their ability to recognize speech in high levels of diffuse background noise. The mean SRT for the control subjects with normal hearing was 0.0 dB SNR. Given that the mean SRT for the experimental group was 10.9 dB SNR, despite the improvements in performance observed with the addition of ASC, cochlear implants still do not completely overcome the speech perception deficit encountered in noisy environments accompanying the diagnosis of severe-to-profound hearing loss. SmartSound strategies currently available in latest generation Nucleus cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise for pediatric cochlear implant recipients. Despite the reluctance of pediatric audiologists to utilize SmartSound settings for regular use, the results of the current study support the addition of ASC to ADRO for everyday listening environments to improve speech perception in a child's typical everyday program. American Academy of Audiology.
Liu, Chang; Jin, Su-Hyun
2015-11-01
This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Language/Culture Modulates Brain and Gaze Processes in Audiovisual Speech Perception.
Hisanaga, Satoko; Sekiyama, Kaoru; Igasaki, Tomohiko; Murayama, Nobuki
2016-10-13
Several behavioural studies have shown that the interplay between voice and face information in audiovisual speech perception is not universal. Native English speakers (ESs) are influenced by visual mouth movement to a greater degree than native Japanese speakers (JSs) when listening to speech. However, the biological basis of these group differences is unknown. Here, we demonstrate the time-varying processes of group differences in terms of event-related brain potentials (ERP) and eye gaze for audiovisual and audio-only speech perception. On a behavioural level, while congruent mouth movement shortened the ESs' response time for speech perception, the opposite effect was observed in JSs. Eye-tracking data revealed a gaze bias to the mouth for the ESs but not the JSs, especially before the audio onset. Additionally, the ERP P2 amplitude indicated that ESs processed multisensory speech more efficiently than auditory-only speech; however, the JSs exhibited the opposite pattern. Taken together, the ESs' early visual attention to the mouth was likely to promote phonetic anticipation, which was not the case for the JSs. These results clearly indicate the impact of language and/or culture on multisensory speech processing, suggesting that linguistic/cultural experiences lead to the development of unique neural systems for audiovisual speech perception.
Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis.
Patel, Aniruddh D
2011-01-01
Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.
Speech and Swallowing in Parkinson’s Disease
Tjaden, Kris
2009-01-01
Dysarthria and dysphagia occur frequently in Parkinson’s disease (PD). Reduced speech intelligibility is a significant functional limitation of dysarthria, and in the case of PD is likely related articulatory and phonatory impairment. Prosodically-based treatments show the most promise for addressing these deficits as well as for maximizing speech intelligibility. Communication-oriented strategies also may help to enhance mutual understanding between a speaker and listener. Dysphagia in PD can result in serious health issues, including aspiration pneumonia, malnutrition, and dehydration. Early identification of swallowing abnormalities is critical so as to minimize the impact of dysphagia on health status and quality of life. Feeding modifications, compensatory strategies, and therapeutic swallowing techniques all have a role in the management of dysphagia in PD. PMID:19946386
Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription
Mohr, David N.; Turner, David W.; Pond, Gregory R.; Kamath, Joseph S.; De Vos, Cathy B.; Carpenter, Paul C.
2003-01-01
Objective. Speech recognition promises to reduce information entry costs for clinical information systems. It is most likely to be accepted across an organization if physicians can dictate without concerning themselves with real-time recognition and editing; assistants can then edit and process the computer-generated document. Our objective was to evaluate the use of speech-recognition technology in a randomized controlled trial using our institutional infrastructure. Design. Clinical note dictations from physicians in two specialty divisions were randomized to either a standard transcription process or a speech-recognition process. Secretaries and transcriptionists also were assigned randomly to each of these processes. Measurements. The duration of each dictation was measured. The amount of time spent processing a dictation to yield a finished document also was measured. Secretarial and transcriptionist productivity, defined as hours of secretary work per minute of dictation processed, was determined for speech recognition and standard transcription. Results. Secretaries in the endocrinology division were 87.3% (confidence interval, 83.3%, 92.3%) as productive with the speech-recognition technology as implemented in this study as they were using standard transcription. Psychiatry transcriptionists and secretaries were similarly less productive. Author, secretary, and type of clinical note were significant (p < 0.05) predictors of productivity. Conclusion. When implemented in an organization with an existing document-processing infrastructure (which included training and interfaces of the speech-recognition editor with the existing document entry application), speech recognition did not improve the productivity of secretaries or transcriptionists. PMID:12509359
Engaged listeners: shared neural processing of powerful political speeches.
Schmälzle, Ralf; Häcker, Frank E K; Honey, Christopher J; Hasson, Uri
2015-08-01
Powerful speeches can captivate audiences, whereas weaker speeches fail to engage their listeners. What is happening in the brains of a captivated audience? Here, we assess audience-wide functional brain dynamics during listening to speeches of varying rhetorical quality. The speeches were given by German politicians and evaluated as rhetorically powerful or weak. Listening to each of the speeches induced similar neural response time courses, as measured by inter-subject correlation analysis, in widespread brain regions involved in spoken language processing. Crucially, alignment of the time course across listeners was stronger for rhetorically powerful speeches, especially for bilateral regions of the superior temporal gyri and medial prefrontal cortex. Thus, during powerful speeches, listeners as a group are more coupled to each other, suggesting that powerful speeches are more potent in taking control of the listeners' brain responses. Weaker speeches were processed more heterogeneously, although they still prompted substantially correlated responses. These patterns of coupled neural responses bear resemblance to metaphors of resonance, which are often invoked in discussions of speech impact, and contribute to the literature on auditory attention under natural circumstances. Overall, this approach opens up possibilities for research on the neural mechanisms mediating the reception of entertaining or persuasive messages. © The Author (2015). Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
Jørgensen, Søren; Dau, Torsten
2011-09-01
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Speech-Language Dissociations, Distractibility, and Childhood Stuttering
Conture, Edward G.; Walden, Tedra A.; Lambert, Warren E.
2015-01-01
Purpose This study investigated the relation among speech-language dissociations, attentional distractibility, and childhood stuttering. Method Participants were 82 preschool-age children who stutter (CWS) and 120 who do not stutter (CWNS). Correlation-based statistics (Bates, Appelbaum, Salcedo, Saygin, & Pizzamiglio, 2003) identified dissociations across 5 norm-based speech-language subtests. The Behavioral Style Questionnaire Distractibility subscale measured attentional distractibility. Analyses addressed (a) between-groups differences in the number of children exhibiting speech-language dissociations; (b) between-groups distractibility differences; (c) the relation between distractibility and speech-language dissociations; and (d) whether interactions between distractibility and dissociations predicted the frequency of total, stuttered, and nonstuttered disfluencies. Results More preschool-age CWS exhibited speech-language dissociations compared with CWNS, and more boys exhibited dissociations compared with girls. In addition, male CWS were less distractible than female CWS and female CWNS. For CWS, but not CWNS, less distractibility (i.e., greater attention) was associated with more speech-language dissociations. Last, interactions between distractibility and dissociations did not predict speech disfluencies in CWS or CWNS. Conclusions The present findings suggest that for preschool-age CWS, attentional processes are associated with speech-language dissociations. Future investigations are warranted to better understand the directionality of effect of this association (e.g., inefficient attentional processes → speech-language dissociations vs. inefficient attentional processes ← speech-language dissociations). PMID:26126203
Mismatch Negativity with Visual-only and Audiovisual Speech
Ponton, Curtis W.; Bernstein, Lynne E.; Auer, Edward T.
2009-01-01
The functional organization of cortical speech processing is thought to be hierarchical, increasing in complexity and proceeding from primary sensory areas centrifugally. The current study used the mismatch negativity (MMN) obtained with electrophysiology (EEG) to investigate the early latency period of visual speech processing under both visual-only (VO) and audiovisual (AV) conditions. Current density reconstruction (CDR) methods were used to model the cortical MMN generator locations. MMNs were obtained with VO and AV speech stimuli at early latencies (approximately 82-87 ms peak in time waveforms relative to the acoustic onset) and in regions of the right lateral temporal and parietal cortices. Latencies were consistent with bottom-up processing of the visible stimuli. We suggest that a visual pathway extracts phonetic cues from visible speech, and that previously reported effects of AV speech in classical early auditory areas, given later reported latencies, could be attributable to modulatory feedback from visual phonetic processing. PMID:19404730
ERIC Educational Resources Information Center
Richard, Gail J.; Hoge, Debra Reichert
Designed for practicing speech-language pathologists, this book discusses different lesser-known syndrome disabilities, pertinent speech-language characteristics, and goals and strategies to begin intervention efforts at a preschool level. Chapters address: (1) Apert syndrome; (2) Beckwith-Wiedemann syndrome; (3) CHARGE syndrome; (4) Cri-du-Chat…
Different Timescales for the Neural Coding of Consonant and Vowel Sounds
Perez, Claudia A.; Engineer, Crystal T.; Jakkamsetti, Vikram; Carraway, Ryan S.; Perry, Matthew S.
2013-01-01
Psychophysical, clinical, and imaging evidence suggests that consonant and vowel sounds have distinct neural representations. This study tests the hypothesis that consonant and vowel sounds are represented on different timescales within the same population of neurons by comparing behavioral discrimination with neural discrimination based on activity recorded in rat inferior colliculus and primary auditory cortex. Performance on 9 vowel discrimination tasks was highly correlated with neural discrimination based on spike count and was not correlated when spike timing was preserved. In contrast, performance on 11 consonant discrimination tasks was highly correlated with neural discrimination when spike timing was preserved and not when spike timing was eliminated. These results suggest that in the early stages of auditory processing, spike count encodes vowel sounds and spike timing encodes consonant sounds. These distinct coding strategies likely contribute to the robust nature of speech sound representations and may help explain some aspects of developmental and acquired speech processing disorders. PMID:22426334
How visual timing and form information affect speech and non-speech processing.
Kim, Jeesun; Davis, Chris
2014-10-01
Auditory speech processing is facilitated when the talker's face/head movements are seen. This effect is typically explained in terms of visual speech providing form and/or timing information. We determined the effect of both types of information on a speech/non-speech task (non-speech stimuli were spectrally rotated speech). All stimuli were presented paired with the talker's static or moving face. Two types of moving face stimuli were used: full-face versions (both spoken form and timing information available) and modified face versions (only timing information provided by peri-oral motion available). The results showed that the peri-oral timing information facilitated response time for speech and non-speech stimuli compared to a static face. An additional facilitatory effect was found for full-face versions compared to the timing condition; this effect only occurred for speech stimuli. We propose the timing effect was due to cross-modal phase resetting; the form effect to cross-modal priming. Copyright © 2014 Elsevier Inc. All rights reserved.
Robust estimators for speech enhancement in real environments
NASA Astrophysics Data System (ADS)
Sandoval-Ibarra, Yuma; Diaz-Ramirez, Victor H.; Kober, Vitaly
2015-09-01
Common statistical estimators for speech enhancement rely on several assumptions about stationarity of speech signals and noise. These assumptions may not always valid in real-life due to nonstationary characteristics of speech and noise processes. We propose new estimators based on existing estimators by incorporation of computation of rank-order statistics. The proposed estimators are better adapted to non-stationary characteristics of speech signals and noise processes. Through computer simulations we show that the proposed estimators yield a better performance in terms of objective metrics than that of known estimators when speech signals are contaminated with airport, babble, restaurant, and train-station noise.
Jerger, Susan; Damian, Markus F; McAlpine, Rachel P; Abdi, Hervé
2017-03-01
Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/-B/aa or/-B/az). The items started with an easy-to-speechread/B/or difficult-to-speechread/G/onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/-B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same-as opposed to different-responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g.,/-B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz-as opposed to az- responses in the audiovisual than auditory mode. Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/-B/aa) and more intact onset responses for nonword repetition (Baz for/-B/az). Thus visual speech altered both discrimination and identification in the CHL-to a large extent for the/B/onsets but only minimally for the/G/onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children's discrimination skills (i.e., d' analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets-even after variation due to the other variables was controlled. These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Jerger, Susan; Damian, Markus F.; McAlpine, Rachel P.; Abdi, Hervé
2017-01-01
Objectives Understanding spoken language is an audiovisual event that depends critically on the ability to discriminate and identify phonemes yet we have little evidence about the role of early auditory experience and visual speech on the development of these fundamental perceptual skills. Objectives of this research were to determine 1) how visual speech influences phoneme discrimination and identification; 2) whether visual speech influences these two processes in a like manner, such that discrimination predicts identification; and 3) how the degree of hearing loss affects this relationship. Such evidence is crucial for developing effective intervention strategies to mitigate the effects of hearing loss on language development. Methods Participants were 58 children with early-onset sensorineural hearing loss (CHL, 53% girls, M = 9;4 yrs) and 58 children with normal hearing (CNH, 53% girls, M = 9;4 yrs). Test items were consonant-vowel (CV) syllables and nonwords with intact visual speech coupled to non-intact auditory speech (excised onsets) as, for example, an intact consonant/rhyme in the visual track (Baa or Baz) coupled to non-intact onset/rhyme in the auditory track (/–B/aa or /–B/az). The items started with an easy-to-speechread /B/ or difficult-to-speechread /G/ onset and were presented in the auditory (static face) vs. audiovisual (dynamic face) modes. We assessed discrimination for intact vs. non-intact different pairs (e.g., Baa:/–B/aa). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more same—as opposed to different—responses in the audiovisual than auditory mode. We assessed identification by repetition of nonwords with non-intact onsets (e.g., /–B/az). We predicted that visual speech would cause the non-intact onset to be perceived as intact and would therefore generate more Baz—as opposed to az— responses in the audiovisual than auditory mode. Results Performance in the audiovisual mode showed more same responses for the intact vs. non-intact different pairs (e.g., Baa:/–B/aa) and more intact onset responses for nonword repetition (Baz for/–B/az). Thus visual speech altered both discrimination and identification in the CHL—to a large extent for the /B/ onsets but only minimally for the /G/ onsets. The CHL identified the stimuli similarly to the CNH but did not discriminate the stimuli similarly. A bias-free measure of the children’s discrimination skills (i.e., d’ analysis) revealed that the CHL had greater difficulty discriminating intact from non-intact speech in both modes. As the degree of HL worsened, the ability to discriminate the intact vs. non-intact onsets in the auditory mode worsened. Discrimination ability in CHL significantly predicted their identification of the onsets—even after variation due to the other variables was controlled. Conclusions These results clearly established that visual speech can fill in non-intact auditory speech, and this effect, in turn, made the non-intact onsets more difficult to discriminate from intact speech and more likely to be perceived as intact. Such results 1) demonstrate the value of visual speech at multiple levels of linguistic processing and 2) support intervention programs that view visual speech as a powerful asset for developing spoken language in CHL. PMID:28167003
Intensive treatment of speech disorders in robin sequence: a case report.
Pinto, Maria Daniela Borro; Pegoraro-Krook, Maria Inês; Andrade, Laura Katarine Félix de; Correa, Ana Paula Carvalho; Rosa-Lugo, Linda Iris; Dutka, Jeniffer de Cássia Rillo
2017-10-23
To describe the speech of a patient with Pierre Robin Sequence (PRS) and severe speech disorders before and after participating in an Intensive Speech Therapy Program (ISTP). The ISTP consisted of two daily sessions of therapy over a 36-week period, resulting in a total of 360 therapy sessions. The sessions included the phases of establishment, generalization, and maintenance. A combination of strategies, such as modified contrast therapy and speech sound perception training, were used to elicit adequate place of articulation. The ISTP addressed correction of place of production of oral consonants and maximization of movement of the pharyngeal walls with a speech bulb reduction program. Therapy targets were addressed at the phonetic level with a gradual increase in the complexity of the productions hierarchically (e.g., syllables, words, phrases, conversation) while simultaneously addressing the velopharyngeal hypodynamism with speech bulb reductions. Re-evaluation after the ISTP revealed normal speech resonance and articulation with the speech bulb. Nasoendoscopic assessment indicated consistent velopharyngeal closure for all oral sounds with the speech bulb in place. Intensive speech therapy, combined with the use of the speech bulb, yielded positive outcomes in the rehabilitation of a clinical case with severe speech disorders associated with velopharyngeal dysfunction in Pierre Robin Sequence.
Language development at 18 months is related to multimodal communicative strategies at 12 months.
Igualada, Alfonso; Bosch, Laura; Prieto, Pilar
2015-05-01
The present study investigated the degree to which an infants' use of simultaneous gesture-speech combinations during controlled social interactions predicts later language development. Nineteen infants participated in a declarative pointing task involving three different social conditions: two experimental conditions (a) available, when the adult was visually attending to the infant but did not attend to the object of reference jointly with the child, and (b) unavailable, when the adult was not visually attending to neither the infant nor the object; and (c) a baseline condition, when the adult jointly engaged with the infant's object of reference. At 12 months of age measures related to infants' speech-only productions, pointing-only gestures, and simultaneous pointing-speech combinations were obtained in each of the three social conditions. Each child's lexical and grammatical output was assessed at 18 months of age through parental report. Results revealed a significant interaction between social condition and type of communicative production. Specifically, only simultaneous pointing-speech combinations increased in frequency during the available condition compared to baseline, while no differences were found for speech-only and pointing-only productions. Moreover, simultaneous pointing-speech combinations in the available condition at 12 months positively correlated with lexical and grammatical development at 18 months of age. The ability to selectively use this multimodal communicative strategy to engage the adult in joint attention by drawing his attention toward an unseen event or object reveals 12-month-olds' clear understanding of referential cues that are relevant for language development. This strategy to successfully initiate and maintain joint attention is related to language development as it increases learning opportunities from social interactions. Copyright © 2015 Elsevier Inc. All rights reserved.
1985-10-01
speech errors. References Anderson, V. A. (1942). Training the speaking voice. New York: Oxford University Press. 50...is only about speech perception , in contrast to some t.at deal with other perceptual processes (e.g., Berkeley, 1709; Fest- inger, Burnham, Ono...there a process of learned equivalence. An example is the claim that the 66 * ° - . . Liberman & Mattingly: The Motor Theory of Speech Perception Revised
Stilp, Christian E.; Goupell, Matthew J.
2015-01-01
Short-time spectral changes in the speech signal are important for understanding noise-vocoded sentences. These information-bearing acoustic changes, measured using cochlea-scaled entropy in cochlear implant simulations [CSECI; Stilp et al. (2013). J. Acoust. Soc. Am. 133(2), EL136–EL141; Stilp (2014). J. Acoust. Soc. Am. 135(3), 1518–1529], may offer better understanding of speech perception by cochlear implant (CI) users. However, perceptual importance of CSECI for normal-hearing listeners was tested at only one spectral resolution and one temporal resolution, limiting generalizability of results to CI users. Here, experiments investigated the importance of these informational changes for understanding noise-vocoded sentences at different spectral resolutions (4–24 spectral channels; Experiment 1), temporal resolutions (4–64 Hz cutoff for low-pass filters that extracted amplitude envelopes; Experiment 2), or when both parameters varied (6–12 channels, 8–32 Hz; Experiment 3). Sentence intelligibility was reduced more by replacing high-CSECI intervals with noise than replacing low-CSECI intervals, but only when sentences had sufficient spectral and/or temporal resolution. High-CSECI intervals were more important for speech understanding as spectral resolution worsened and temporal resolution improved. Trade-offs between CSECI and intermediate spectral and temporal resolutions were minimal. These results suggest that signal processing strategies that emphasize information-bearing acoustic changes in speech may improve speech perception for CI users. PMID:25698018
Simultaneous F 0-F 1 modifications of Arabic for the improvement of natural-sounding
NASA Astrophysics Data System (ADS)
Ykhlef, F.; Bensebti, M.
2013-03-01
Pitch (F 0) modification is one of the most important problems in the area of speech synthesis. Several techniques have been developed in the literature to achieve this goal. The main restrictions of these techniques are in the modification range and the synthesised speech quality, intelligibility and naturalness. The control of formants in a spoken language can significantly improve the naturalness of the synthesised speech. This improvement is mainly dependent on the control of the first formant (F 1). Inspired by this observation, this article proposes a new approach that modifies both F 0 and F 1 of Arabic voiced sounds in order to improve the naturalness of the pitch shifted speech. The developed strategy takes a parallel processing approach, in which the analysis segments are decomposed into sub-bands in the wavelet domain, modified in the desired sub-band by using a resampling technique and reconstructed without affecting the remained sub-bands. Pitch marking and voicing detection are performed in the frequency decomposition step based on the comparison of the multi-level approximation and detail signals. The performance of the proposed technique is evaluated by listening tests and compared to the pitch synchronous overlap and add (PSOLA) technique in the third approximation level. Experimental results have shown that the manipulation in the wavelet domain of F 0 in conjunction with F 1 guarantees natural-sounding of the synthesised speech compared to the classical pitch modification technique. This improvement was appropriate for high pitch modifications.
ERIC Educational Resources Information Center
Leech, Robert; Saygin, Ayse Pinar
2011-01-01
Using functional MRI, we investigated whether auditory processing of both speech and meaningful non-linguistic environmental sounds in superior and middle temporal cortex relies on a complex and spatially distributed neural system. We found that evidence for spatially distributed processing of speech and environmental sounds in a substantial…
Binaural hearing with electrical stimulation
Kan, Alan; Litovsky, Ruth Y.
2014-01-01
Bilateral cochlear implantation is becoming a standard of care in many clinics. While much benefit has been shown through bilateral implantation, patients who have bilateral cochlear implants (CIs) still do not perform as well as normal hearing listeners in sound localization and understanding speech in noisy environments. This difference in performance can arise from a number of different factors, including the areas of hardware and engineering, surgical precision and pathology of the auditory system in deaf persons. While surgical precision and individual pathology are factors that are beyond careful control, improvements can be made in the areas of clinical practice and the engineering of binaural speech processors. These improvements should be grounded in a good understanding of the sensitivities of bilateral CI patients to the acoustic binaural cues that are important to normal hearing listeners for sound localization and speech in noise understanding. To this end, we review the current state-of-the-art in the understanding of the sensitivities of bilateral CI patients to binaural cues in electric hearing, and highlight the important issues and challenges as they relate to clinical practice and the development of new binaural processing strategies. PMID:25193553
Schaadt, Gesa; van der Meer, Elke; Pannekamp, Ann; Oberecker, Regine; Männel, Claudia
2018-01-17
During information processing, individuals benefit from bimodally presented input, as has been demonstrated for speech perception (i.e., printed letters and speech sounds) or the perception of emotional expressions (i.e., facial expression and voice tuning). While typically developing individuals show this bimodal benefit, school children with dyslexia do not. Currently, it is unknown whether the bimodal processing deficit in dyslexia also occurs for visual-auditory speech processing that is independent of reading and spelling acquisition (i.e., no letter-sound knowledge is required). Here, we tested school children with and without spelling problems on their bimodal perception of video-recorded mouth movements pronouncing syllables. We analyzed the event-related potential Mismatch Response (MMR) to visual-auditory speech information and compared this response to the MMR to monomodal speech information (i.e., auditory-only, visual-only). We found a reduced MMR with later onset to visual-auditory speech information in children with spelling problems compared to children without spelling problems. Moreover, when comparing bimodal and monomodal speech perception, we found that children without spelling problems showed significantly larger responses in the visual-auditory experiment compared to the visual-only response, whereas children with spelling problems did not. Our results suggest that children with dyslexia exhibit general difficulties in bimodal speech perception independently of letter-speech sound knowledge, as apparent in altered bimodal speech perception and lacking benefit from bimodal information. This general deficit in children with dyslexia may underlie the previously reported reduced bimodal benefit for letter-speech sound combinations and similar findings in emotion perception. Copyright © 2018 Elsevier Ltd. All rights reserved.
Broderick, Michael P; Anderson, Andrew J; Di Liberto, Giovanni M; Crosse, Michael J; Lalor, Edmund C
2018-03-05
People routinely hear and understand speech at rates of 120-200 words per minute [1, 2]. Thus, speech comprehension must involve rapid, online neural mechanisms that process words' meanings in an approximately time-locked fashion. However, electrophysiological evidence for such time-locked processing has been lacking for continuous speech. Although valuable insights into semantic processing have been provided by the "N400 component" of the event-related potential [3-6], this literature has been dominated by paradigms using incongruous words within specially constructed sentences, with less emphasis on natural, narrative speech comprehension. Building on the discovery that cortical activity "tracks" the dynamics of running speech [7-9] and psycholinguistic work demonstrating [10-12] and modeling [13-15] how context impacts on word processing, we describe a new approach for deriving an electrophysiological correlate of natural speech comprehension. We used a computational model [16] to quantify the meaning carried by words based on how semantically dissimilar they were to their preceding context and then regressed this measure against electroencephalographic (EEG) data recorded from subjects as they listened to narrative speech. This produced a prominent negativity at a time lag of 200-600 ms on centro-parietal EEG channels, characteristics common to the N400. Applying this approach to EEG datasets involving time-reversed speech, cocktail party attention, and audiovisual speech-in-noise demonstrated that this response was very sensitive to whether or not subjects understood the speech they heard. These findings demonstrate that, when successfully comprehending natural speech, the human brain responds to the contextual semantic content of each word in a relatively time-locked fashion. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Struggle with Hate Speech. Teaching Strategy.
ERIC Educational Resources Information Center
Bloom, Jennifer
1995-01-01
Discusses the issue of hate-motivated violence and special laws aimed at deterrence. Presents a secondary school lesson to help students define hate speech and understand constitutional issues related to the topic. Includes three student handouts, student learning objectives, instructional procedures, and a discussion guide. (CFR)
Literacy as Commodity: Redistributing the Goods.
ERIC Educational Resources Information Center
Elsasser, Nan; Irvine, Patricia
1992-01-01
A rationale is presented for educational change and the strategies to achieve it. The model of speech communities of Dell Hymes is used to show how language differences are connected to social and economic disparities. Efforts to create new speech communities to overcome inequalities are discussed. (SLD)
Schmid, Gabriele; Thielmann, Anke; Ziegler, Wolfram
2009-03-01
Patients with lesions of the left hemisphere often suffer from oral-facial apraxia, apraxia of speech, and aphasia. In these patients, visual features often play a critical role in speech and language therapy, when pictured lip shapes or the therapist's visible mouth movements are used to facilitate speech production and articulation. This demands audiovisual processing both in speech and language treatment and in the diagnosis of oral-facial apraxia. The purpose of this study was to investigate differences in audiovisual perception of speech as compared to non-speech oral gestures. Bimodal and unimodal speech and non-speech items were used and additionally discordant stimuli constructed, which were presented for imitation. This study examined a group of healthy volunteers and a group of patients with lesions of the left hemisphere. Patients made substantially more errors than controls, but the factors influencing imitation accuracy were more or less the same in both groups. Error analyses in both groups suggested different types of representations for speech as compared to the non-speech domain, with speech having a stronger weight on the auditory modality and non-speech processing on the visual modality. Additionally, this study was able to show that the McGurk effect is not limited to speech.
Distributed Fusion in Sensor Networks with Information Genealogy
2011-06-28
image processing [2], acoustic and speech recognition [3], multitarget tracking [4], distributed fusion [5], and Bayesian inference [6-7]. For...Adaptation for Distant-Talking Speech Recognition." in Proc Acoustics. Speech , and Signal Processing, 2004 |4| Y Bar-Shalom and T 1-. Fortmann...used in speech recognition and other classification applications [8]. But their use in underwater mine classification is limited. In this paper, we
Electrocardiographic anxiety profiles improve speech anxiety.
Kim, Pyoung Won; Kim, Seung Ae; Jung, Keun-Hwa
2012-12-01
The present study was to set out in efforts to determine the effect of electrocardiographic (ECG) feedback on the performance in speech anxiety. Forty-six high school students participated in a speech performance educational program. They were randomly divided into two groups, an experimental group with ECG feedback (N = 21) and a control group (N = 25). Feedback was given with video recording in the control, whereas in the experimental group, an additional ECG feedback was provided. Speech performance was evaluated by the Korean Broadcasting System (KBS) speech ability test, which determines the 10 different speaking categories. ECG was recorded during rest and speech, together with a video recording of the speech performance. Changes in R-R intervals were used to reflect anxiety profiles. Three trials were performed for 3-week program. Results showed that the subjects with ECG feedback revealed a significant improvement in speech performance and anxiety states, which compared to those in the control group. These findings suggest that visualization of the anxiety profile feedback with ECG can be a better cognitive therapeutic strategy in speech anxiety.
Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang
2018-04-04
Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Factors contributing to speech perception scores in long-term pediatric cochlear implant users.
Davidson, Lisa S; Geers, Ann E; Blamey, Peter J; Tobey, Emily A; Brenner, Christine A
2011-02-01
The objectives of this report are to (1) describe the speech perception abilities of long-term pediatric cochlear implant (CI) recipients by comparing scores obtained at elementary school (CI-E, 8 to 9 yrs) with scores obtained at high school (CI-HS, 15 to 18 yrs); (2) evaluate speech perception abilities in demanding listening conditions (i.e., noise and lower intensity levels) at adolescence; and (3) examine the relation of speech perception scores to speech and language development over this longitudinal timeframe. All 112 teenagers were part of a previous nationwide study of 8- and 9-yr-olds (N = 181) who received a CI between 2 and 5 yrs of age. The test battery included (1) the Lexical Neighborhood Test (LNT; hard and easy word lists); (2) the Bamford Kowal Bench sentence test; (3) the Children's Auditory-Visual Enhancement Test; (4) the Test of Auditory Comprehension of Language at CI-E; (5) the Peabody Picture Vocabulary Test at CI-HS; and (6) the McGarr sentences (consonants correct) at CI-E and CI-HS. CI-HS speech perception was measured in both optimal and demanding listening conditions (i.e., background noise and low-intensity level). Speech perception scores were compared based on age at test, lexical difficulty of stimuli, listening environment (optimal and demanding), input mode (visual and auditory-visual), and language age. All group mean scores significantly increased with age across the two test sessions. Scores of adolescents significantly decreased in demanding listening conditions. The effect of lexical difficulty on the LNT scores, as evidenced by the difference in performance between easy versus hard lists, increased with age and decreased for adolescents in challenging listening conditions. Calculated curves for percent correct speech perception scores (LNT and Bamford Kowal Bench) and consonants correct on the McGarr sentences plotted against age-equivalent language scores on the Test of Auditory Comprehension of Language and Peabody Picture Vocabulary Test achieved asymptote at similar ages, around 10 to 11 yrs. On average, children receiving CIs between 2 and 5 yrs of age exhibited significant improvement on tests of speech perception, lipreading, speech production, and language skills measured between primary grades and adolescence. Evidence suggests that improvement in speech perception scores with age reflects increased spoken language level up to a language age of about 10 yrs. Speech perception performance significantly decreased with softer stimulus intensity level and with introduction of background noise. Upgrades to newer speech processing strategies and greater use of frequency-modulated systems may be beneficial for ameliorating performance under these demanding listening conditions.
Strategies for distant speech recognitionin reverberant environments
NASA Astrophysics Data System (ADS)
Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro
2015-12-01
Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.
So, Wing-Chee; Yi-Feng, Alvan Low; Yap, De-Fu; Kheng, Eugene; Yap, Ju-Min Melvin
2013-01-01
Previous studies have shown that iconic gestures presented in an isolated manner prime visually presented semantically related words. Since gestures and speech are almost always produced together, this study examined whether iconic gestures accompanying speech would prime words and compared the priming effect of iconic gestures with speech to that of iconic gestures presented alone. Adult participants (N = 180) were randomly assigned to one of three conditions in a lexical decision task: Gestures-Only (the primes were iconic gestures presented alone); Speech-Only (the primes were auditory tokens conveying the same meaning as the iconic gestures); Gestures-Accompanying-Speech (the primes were the simultaneous coupling of iconic gestures and their corresponding auditory tokens). Our findings revealed significant priming effects in all three conditions. However, the priming effect in the Gestures-Accompanying-Speech condition was comparable to that in the Speech-Only condition and was significantly weaker than that in the Gestures-Only condition, suggesting that the facilitatory effect of iconic gestures accompanying speech may be constrained by the level of language processing required in the lexical decision task where linguistic processing of words forms is more dominant than semantic processing. Hence, the priming effect afforded by the co-speech iconic gestures was weakened. PMID:24155738
Potential interactions among linguistic, autonomic, and motor factors in speech.
Kleinow, Jennifer; Smith, Anne
2006-05-01
Though anecdotal reports link certain speech disorders to increases in autonomic arousal, few studies have described the relationship between arousal and speech processes. Additionally, it is unclear how increases in arousal may interact with other cognitive-linguistic processes to affect speech motor control. In this experiment we examine potential interactions between autonomic arousal, linguistic processing, and speech motor coordination in adults and children. Autonomic responses (heart rate, finger pulse volume, tonic skin conductance, and phasic skin conductance) were recorded simultaneously with upper and lower lip movements during speech. The lip aperture variability (LA variability index) across multiple repetitions of sentences that varied in length and syntactic complexity was calculated under low- and high-arousal conditions. High arousal conditions were elicited by performance of the Stroop color word task. Children had significantly higher lip aperture variability index values across all speaking tasks, indicating more variable speech motor coordination. Increases in syntactic complexity and utterance length were associated with increases in speech motor coordination variability in both speaker groups. There was a significant effect of Stroop task, which produced increases in autonomic arousal and increased speech motor variability in both adults and children. These results provide novel evidence that high arousal levels can influence speech motor control in both adults and children. (c) 2006 Wiley Periodicals, Inc.
Role of Visual Speech in Phonological Processing by Children With Hearing Loss
Jerger, Susan; Tye-Murray, Nancy; Abdi, Hervé
2011-01-01
Purpose This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in place of articulation, or conflicting in voicing—for example, the picture “pizza” coupled with the distractors “peach,” “teacher,” or “beast,” respectively. Speed of picture naming was measured. Results The conflicting conditions slowed naming, and phonological processing by children with HL displayed the age-related shift in sensitivity to visual speech seen in children with NH, although with developmental delay. Younger children with HL exhibited a disproportionately large influence of visual speech and a negligible influence of auditory speech, whereas older children with HL showed a robust influence of auditory speech with no benefit to performance from adding visual speech. The congruent conditions did not speed naming in children with HL, nor did the addition of visual speech influence performance. Unexpectedly, the /∧/-vowel congruent distractors slowed naming in children with HL and decreased articulatory proficiency. Conclusions Results for the conflicting conditions are consistent with the hypothesis that speech representations in children with HL (a) are initially disproportionally structured in terms of visual speech and (b) become better specified with age in terms of auditorily encoded information. PMID:19339701
Shin, Young Hoon; Seo, Jiwon
2016-01-01
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Shin, Young Hoon; Seo, Jiwon
2016-10-29
People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Research in speech communication.
Flanagan, J
1995-10-24
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker.
Poor Speech Perception Is Not a Core Deficit of Childhood Apraxia of Speech: Preliminary Findings
ERIC Educational Resources Information Center
Zuk, Jennifer; Iuzzini-Seigel, Jenya; Cabbage, Kathryn; Green, Jordan R.; Hogan, Tiffany P.
2018-01-01
Purpose: Childhood apraxia of speech (CAS) is hypothesized to arise from deficits in speech motor planning and programming, but the influence of abnormal speech perception in CAS on these processes is debated. This study examined speech perception abilities among children with CAS with and without language impairment compared to those with…
Brumberg, Jonathan S; Krusienski, Dean J; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain.
Brumberg, Jonathan S.; Krusienski, Dean J.; Chakrabarti, Shreya; Gunduz, Aysegul; Brunner, Peter; Ritaccio, Anthony L.; Schalk, Gerwin
2016-01-01
How the human brain plans, executes, and monitors continuous and fluent speech has remained largely elusive. For example, previous research has defined the cortical locations most important for different aspects of speech function, but has not yet yielded a definition of the temporal progression of involvement of those locations as speech progresses either overtly or covertly. In this paper, we uncovered the spatio-temporal evolution of neuronal population-level activity related to continuous overt speech, and identified those locations that shared activity characteristics across overt and covert speech. Specifically, we asked subjects to repeat continuous sentences aloud or silently while we recorded electrical signals directly from the surface of the brain (electrocorticography (ECoG)). We then determined the relationship between cortical activity and speech output across different areas of cortex and at sub-second timescales. The results highlight a spatio-temporal progression of cortical involvement in the continuous speech process that initiates utterances in frontal-motor areas and ends with the monitoring of auditory feedback in superior temporal gyrus. Direct comparison of cortical activity related to overt versus covert conditions revealed a common network of brain regions involved in speech that may implement orthographic and phonological processing. Our results provide one of the first characterizations of the spatiotemporal electrophysiological representations of the continuous speech process, and also highlight the common neural substrate of overt and covert speech. These results thereby contribute to a refined understanding of speech functions in the human brain. PMID:27875590
Evidence of degraded representation of speech in noise, in the aging midbrain and cortex
Simon, Jonathan Z.; Anderson, Samira
2016-01-01
Humans have a remarkable ability to track and understand speech in unfavorable conditions, such as in background noise, but speech understanding in noise does deteriorate with age. Results from several studies have shown that in younger adults, low-frequency auditory cortical activity reliably synchronizes to the speech envelope, even when the background noise is considerably louder than the speech signal. However, cortical speech processing may be limited by age-related decreases in the precision of neural synchronization in the midbrain. To understand better the neural mechanisms contributing to impaired speech perception in older adults, we investigated how aging affects midbrain and cortical encoding of speech when presented in quiet and in the presence of a single-competing talker. Our results suggest that central auditory temporal processing deficits in older adults manifest in both the midbrain and in the cortex. Specifically, midbrain frequency following responses to a speech syllable are more degraded in noise in older adults than in younger adults. This suggests a failure of the midbrain auditory mechanisms needed to compensate for the presence of a competing talker. Similarly, in cortical responses, older adults show larger reductions than younger adults in their ability to encode the speech envelope when a competing talker is added. Interestingly, older adults showed an exaggerated cortical representation of speech in both quiet and noise conditions, suggesting a possible imbalance between inhibitory and excitatory processes, or diminished network connectivity that may impair their ability to encode speech efficiently. PMID:27535374
Using speech for mode selection in control of multifunctional myoelectric prostheses.
Fang, Peng; Wei, Zheng; Geng, Yanjuan; Yao, Fuan; Li, Guanglin
2013-01-01
Electromyogram (EMG) recorded from residual muscles of limbs is considered as suitable control information for motorized prostheses. However, in case of high-level amputations, the residual muscles are usually limited, which may not provide enough EMG for flexible control of myoelectric prostheses with multiple degrees of freedom of movements. Here, we proposed a control strategy, where the speech signals were used as additional information and combined with the EMG signals to realize more flexible control of multifunctional prostheses. By replacing the traditional "sequential mode-switching (joint-switching)", the speech signals were used to select a mode (joint) of the prosthetic arm, and then the EMG signals were applied to determine a motion class involved in the selected joint and to execute the motion. Preliminary results from three able-bodied subjects and one transhumeral amputee demonstrated the proposed strategy could achieve a high mode-selection rate and enhance the operation efficiency, suggesting the strategy may improve the control performance of commercial myoelectric prostheses.
Role of the Speech-Language Pathologist (SLP) in the Head and Neck Cancer Team.
Hansen, Kelly; Chenoweth, Marybeth; Thompson, Heather; Strouss, Alexandra
2018-01-01
While treatments for head and neck cancer are aimed at curing patients from disease, they can have significant short- and long-term negative impacts on speech and swallowing functions. Research demonstrates that early and frequent involvement of Speech-Language Pathologists (SLPs) is beneficial to these functions and overall quality of life for head and neck cancer patients. Strategies and tools to optimize communication and safe swallowing are presented in this chapter.
High-frame-rate full-vocal-tract 3D dynamic speech imaging.
Fu, Maojing; Barlaz, Marissa S; Holtrop, Joseph L; Perry, Jamie L; Kuehn, David P; Shosted, Ryan K; Liang, Zhi-Pei; Sutton, Bradley P
2017-04-01
To achieve high temporal frame rate, high spatial resolution and full-vocal-tract coverage for three-dimensional dynamic speech MRI by using low-rank modeling and sparse sampling. Three-dimensional dynamic speech MRI is enabled by integrating a novel data acquisition strategy and an image reconstruction method with the partial separability model: (a) a self-navigated sparse sampling strategy that accelerates data acquisition by collecting high-nominal-frame-rate cone navigator sand imaging data within a single repetition time, and (b) are construction method that recovers high-quality speech dynamics from sparse (k,t)-space data by enforcing joint low-rank and spatiotemporal total variation constraints. The proposed method has been evaluated through in vivo experiments. A nominal temporal frame rate of 166 frames per second (defined based on a repetition time of 5.99 ms) was achieved for an imaging volume covering the entire vocal tract with a spatial resolution of 2.2 × 2.2 × 5.0 mm 3 . Practical utility of the proposed method was demonstrated via both validation experiments and a phonetics investigation. Three-dimensional dynamic speech imaging is possible with full-vocal-tract coverage, high spatial resolution and high nominal frame rate to provide dynamic speech data useful for phonetic studies. Magn Reson Med 77:1619-1629, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
The effect of simultaneous text on the recall of noise-degraded speech.
Grossman, Irina; Rajan, Ramesh
2017-05-01
Written and spoken language utilize the same processing system, enabling text to modulate speech processing. We investigated how simultaneously presented text affected speech recall in babble noise using a retrospective recall task. Participants were presented with text-speech sentence pairs in multitalker babble noise and then prompted to recall what they heard or what they read. In Experiment 1, sentence pairs were either congruent or incongruent and they were presented in silence or at 1 of 4 noise levels. Audio and Visual control groups were also tested with sentences presented in only 1 modality. Congruent text facilitated accurate recall of degraded speech; incongruent text had no effect. Text and speech were seldom confused for each other. A consideration of the effects of the language background found that monolingual English speakers outperformed early multilinguals at recalling degraded speech; however the effects of text on speech processing were analogous. Experiment 2 considered if the benefit provided by matching text was maintained when the congruency of the text and speech becomes more ambiguous because of the addition of partially mismatching text-speech sentence pairs that differed only on their final keyword and because of the use of low signal-to-noise ratios. The experiment focused on monolingual English speakers; the results showed that even though participants commonly confused text-for-speech during incongruent text-speech pairings, these confusions could not fully account for the benefit provided by matching text. Thus, we uniquely demonstrate that congruent text benefits the recall of noise-degraded speech. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Talking to children matters: Early language experience strengthens processing and builds vocabulary
Weisleder, Adriana; Fernald, Anne
2016-01-01
Infants differ substantially in their rates of language growth, and slower growth predicts later academic difficulties. This study explored how the amount of speech to infants in Spanish-speaking families low in socioeconomic status (SES) influenced the development of children's skill in real-time language processing and vocabulary learning. All-day recordings of parent-infant interactions at home revealed striking variability among families in how much speech caregivers addressed to their child. Infants who experienced more child-directed speech became more efficient in processing familiar words in real time and had larger expressive vocabularies by 24 months, although speech simply overheard by the child was unrelated to vocabulary outcomes. Mediation analyses showed that the effect of child-directed speech on expressive vocabulary was explained by infants’ language-processing efficiency, suggesting that richer language experience strengthens processing skills that facilitate language growth. PMID:24022649
“Down the Language Rabbit Hole with Alice”: A Case Study of a Deaf Girl with a Cochlear Implant
Andrews, Jean F.; Dionne, Vickie
2011-01-01
Alice, a deaf girl who was implanted after age three years of age was exposed to four weeks of storybook sessions conducted in American Sign Language (ASL) and speech (English). Two research questions were address: (1) how did she use her sign bimodal/bilingualism, codeswitching, and code mixing during reading activities and (2) what sign bilingual code-switching and code-mixing strategies did she use while attending to stories delivered under two treatments: ASL only and speech only. Retelling scores were collected to determine the type and frequency of her codeswitching/codemixing strategies between both languages after Alice was read to a story in ASL and in spoken English. Qualitative descriptive methods were utilized. Teacher, clinician and student transcripts of the reading and retelling sessions were recorded. Results showed Alice frequently used codeswitching and codeswitching strategies while retelling the stories retold under both treatments. Alice increased in her speech production retellings of the stories under both the ASL storyreading and spoken English-only reading of the story. The ASL storyreading did not decrease Alice's retelling scores in spoken English. Professionals are encouraged to consider the benefits of early sign bimodal/bilingualism to enhance the overall speech, language and reading proficiency of deaf children with cochlear implants. PMID:22135677
Current Controversies in Diagnosis and Management of Cleft Palate and Velopharyngeal Insufficiency
Ysunza, Pablo Antonio; Repetto, Gabriela M.; Pamplona, Maria Carmen; Calderon, Juan F.; Shaheen, Kenneth; Chaiyasate, Konkgrit; Rontal, Matthew
2015-01-01
Background. One of the most controversial topics concerning cleft palate is the diagnosis and treatment of velopharyngeal insufficiency (VPI). Objective. This paper reviews current genetic aspects of cleft palate, imaging diagnosis of VPI, the planning of operations for restoring velopharyngeal function during speech, and strategies for speech pathology treatment of articulation disorders in patients with cleft palate. Materials and Methods. An updated review of the scientific literature concerning genetic aspects of cleft palate was carried out. Current strategies for assessing and treating articulation disorders associated with cleft palate were analyzed. Imaging procedures for assessing velopharyngeal closure during speech were reviewed, including a recent method for performing intraoperative videonasopharyngoscopy. Results. Conclusions from the analysis of genetic aspects of syndromic and nonsyndromic cleft palate and their use in its diagnosis and management are presented. Strategies for classifying and treating articulation disorders in patients with cleft palate are presented. Preliminary results of the use of multiplanar videofluoroscopy as an outpatient procedure and intraoperative endoscopy for the planning of operations which aimed to correct VPI are presented. Conclusion. This paper presents current aspects of the diagnosis and management of patients with cleft palate and VPI including 3 main aspects: genetics and genomics, speech pathology and imaging diagnosis, and surgical management. PMID:26273595
Classroom acoustics and intervention strategies to enhance the learning environment
NASA Astrophysics Data System (ADS)
Savage, Christal
The classroom environment can be an acoustically difficult atmosphere for students to learn effectively, sometimes due in part to poor acoustical properties. Noise and reverberation have a substantial influence on room acoustics and subsequently intelligibility of speech. The American Speech-Language-Hearing Association (ASHA, 1995) developed minimal standards for noise and reverberation in a classroom for the purpose of providing an adequate listening environment. A lack of adherence to these standards may have undesirable consequences, which may lead to poor academic performance. The purpose of this capstone project is to develop a protocol to measure the acoustical properties of reverberation time and noise levels in elementary classrooms and present the educators with strategies to improve the learning environment. Noise level and reverberation will be measured and recorded in seven, unoccupied third grade classrooms in Lincoln Parish in North Louisiana. The recordings will occur at six specific distances in the classroom to simulate teacher and student positions. The recordings will be compared to the American Speech-Language-Hearing Association standards for noise and reverberation. If discrepancies are observed, the primary investigator will serve as an auditory consultant for the school and educators to recommend remediation and intervention strategies to improve these acoustical properties. The hypothesis of the study is that the classroom acoustical properties of noise and reverberation will exceed the American Speech-Language-Hearing Association standards; therefore, the auditory consultant will provide strategies to improve those acoustical properties.
Selective spatial attention modulates bottom-up informational masking of speech
Carlile, Simon; Corkhill, Caitlin
2015-01-01
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100
Selective spatial attention modulates bottom-up informational masking of speech.
Carlile, Simon; Corkhill, Caitlin
2015-03-02
To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Noise suppression methods for robust speech processing
NASA Astrophysics Data System (ADS)
Boll, S. F.; Ravindra, H.; Randall, G.; Armantrout, R.; Power, R.
1980-05-01
Robust speech processing in practical operating environments requires effective environmental and processor noise suppression. This report describes the technical findings and accomplishments during this reporting period for the research program funded to develop real time, compressed speech analysis synthesis algorithms whose performance in invariant under signal contamination. Fulfillment of this requirement is necessary to insure reliable secure compressed speech transmission within realistic military command and control environments. Overall contributions resulting from this research program include the understanding of how environmental noise degrades narrow band, coded speech, development of appropriate real time noise suppression algorithms, and development of speech parameter identification methods that consider signal contamination as a fundamental element in the estimation process. This report describes the current research and results in the areas of noise suppression using the dual input adaptive noise cancellation using the short time Fourier transform algorithms, articulation rate change techniques, and a description of an experiment which demonstrated that the spectral subtraction noise suppression algorithm can improve the intelligibility of 2400 bps, LPC 10 coded, helicopter speech by 10.6 point.
Phonemic Characteristics of Apraxia of Speech Resulting from Subcortical Hemorrhage
ERIC Educational Resources Information Center
Peach, Richard K.; Tonkovich, John D.
2004-01-01
Reports describing subcortical apraxia of speech (AOS) have received little consideration in the development of recent speech processing models because the speech characteristics of patients with this diagnosis have not been described precisely. We describe a case of AOS with aphasia secondary to basal ganglia hemorrhage. Speech-language symptoms…
The Effectiveness of Clear Speech as a Masker
ERIC Educational Resources Information Center
Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.
2010-01-01
Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…
ON THE NATURE OF SPEECH SCIENCE.
ERIC Educational Resources Information Center
PETERSON, GORDON E.
IN THIS ARTICLE THE NATURE OF THE DISCIPLINE OF SPEECH SCIENCE IS CONSIDERED AND THE VARIOUS BASIC AND APPLIED AREAS OF THE DISCIPLINE ARE DISCUSSED. THE BASIC AREAS ENCOMPASS THE VARIOUS PROCESSES OF THE PHYSIOLOGY OF SPEECH PRODUCTION, THE ACOUSTICAL CHARACTERISTICS OF SPEECH, INCLUDING THE SPEECH WAVE TYPES AND THE INFORMATION-BEARING ACOUSTIC…
Ding, Nai; Pan, Xunyi; Luo, Cheng; Su, Naifei; Zhang, Wen; Zhang, Jianfeng
2018-01-31
How the brain groups sequential sensory events into chunks is a fundamental question in cognitive neuroscience. This study investigates whether top-down attention or specific tasks are required for the brain to apply lexical knowledge to group syllables into words. Neural responses tracking the syllabic and word rhythms of a rhythmic speech sequence were concurrently monitored using electroencephalography (EEG). The participants performed different tasks, attending to either the rhythmic speech sequence or a distractor, which was another speech stream or a nonlinguistic auditory/visual stimulus. Attention to speech, but not a lexical-meaning-related task, was required for reliable neural tracking of words, even when the distractor was a nonlinguistic stimulus presented cross-modally. Neural tracking of syllables, however, was reliably observed in all tested conditions. These results strongly suggest that neural encoding of individual auditory events (i.e., syllables) is automatic, while knowledge-based construction of temporal chunks (i.e., words) crucially relies on top-down attention. SIGNIFICANCE STATEMENT Why we cannot understand speech when not paying attention is an old question in psychology and cognitive neuroscience. Speech processing is a complex process that involves multiple stages, e.g., hearing and analyzing the speech sound, recognizing words, and combining words into phrases and sentences. The current study investigates which speech-processing stage is blocked when we do not listen carefully. We show that the brain can reliably encode syllables, basic units of speech sounds, even when we do not pay attention. Nevertheless, when distracted, the brain cannot group syllables into multisyllabic words, which are basic units for speech meaning. Therefore, the process of converting speech sound into meaning crucially relies on attention. Copyright © 2018 the authors 0270-6474/18/381178-11$15.00/0.
NASA Technical Reports Server (NTRS)
Wolf, Jared J.
1977-01-01
The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Speech processing: from peripheral to hemispheric asymmetry of the auditory system.
Lazard, Diane S; Collette, Jean-Louis; Perrot, Xavier
2012-01-01
Language processing from the cochlea to auditory association cortices shows side-dependent specificities with an apparent left hemispheric dominance. The aim of this article was to propose to nonspeech specialists a didactic review of two complementary theories about hemispheric asymmetry in speech processing. Starting from anatomico-physiological and clinical observations of auditory asymmetry and interhemispheric connections, this review then exposes behavioral (dichotic listening paradigm) as well as functional (functional magnetic resonance imaging and positron emission tomography) experiments that assessed hemispheric specialization for speech processing. Even though speech at an early phonological level is regarded as being processed bilaterally, a left-hemispheric dominance exists for higher-level processing. This asymmetry may arise from a segregation of the speech signal, broken apart within nonprimary auditory areas in two distinct temporal integration windows--a fast one on the left and a slower one on the right--modeled through the asymmetric sampling in time theory or a spectro-temporal trade-off, with a higher temporal resolution in the left hemisphere and a higher spectral resolution in the right hemisphere, modeled through the spectral/temporal resolution trade-off theory. Both theories deal with the concept that lower-order tuning principles for acoustic signal might drive higher-order organization for speech processing. However, the precise nature, mechanisms, and origin of speech processing asymmetry are still being debated. Finally, an example of hemispheric asymmetry alteration, which has direct clinical implications, is given through the case of auditory aging that mixes peripheral disorder and modifications of central processing. Copyright © 2011 The American Laryngological, Rhinological, and Otological Society, Inc.
Testing and Beyond: Strategies and Tools for Evaluating and Assessing Infants and Toddlers
ERIC Educational Resources Information Center
Crais, Elizabeth R.
2011-01-01
Purpose: This article is a condensation of the recent American Speech-Language-Hearing Association (ASHA) document entitled "Roles and Responsibilities of Speech-Language Pathologists in Early Intervention: Guidelines" (ASHA, 2008). The article presents information on recommended and evidence-based practices related to the screening, evaluation,…
The Pardoning of Richard Nixon: A Failure in Motivational Strategy.
ERIC Educational Resources Information Center
Klumpp, James F.; Lukehart, Jeffrey K.
1978-01-01
Discusses the failure of Gerald Ford's speech pardoning Richard Nixon and the relationship between moral and legal themes. The speech failed because Ford did not effectively combine these themes into a single compatible perspective from which his pardon became the appropriate response to the situation. (JMF)
Business Speaks: A Study of the Themes in Speeches by America's Corporate Leaders.
ERIC Educational Resources Information Center
Myers, Robert J.; Kessler, Martha Stout
1980-01-01
Identifies the issues concerning the American business community as reflected in speeches by corporate leaders on government regulation, energy, capital investment, inflation, the public image of business, and international business. Strategies for dealing with these issues include increased social responsibility, influencing government policy,…
Cerebral Palsy and Communication--What Parents Can Do.
ERIC Educational Resources Information Center
Golbin, Arlene, Ed.
Intended for parents of cerebral palsied children, the manual discusses special communication problems that often accompany the condition, and describes various strategies for helping such children communicate. A chapter on positioning for speech diagrams 14 different positions to help facilitate better functioning in many areas, including speech.…
Vowel Space Characteristics of Speech Directed to Children With and Without Hearing Loss
Wieland, Elizabeth A.; Burnham, Evamarie B.; Kondaurova, Maria; Bergeson, Tonya R.
2015-01-01
Purpose This study examined vowel characteristics in adult-directed (AD) and infant-directed (ID) speech to children with hearing impairment who received cochlear implants or hearing aids compared with speech to children with normal hearing. Method Mothers' AD and ID speech to children with cochlear implants (Study 1, n = 20) or hearing aids (Study 2, n = 11) was compared with mothers' speech to controls matched on age and hearing experience. The first and second formants of vowels /i/, /ɑ/, and /u/ were measured, and vowel space area and dispersion were calculated. Results In both studies, vowel space was modified in ID compared with AD speech to children with and without hearing loss. Study 1 showed larger vowel space area and dispersion in ID compared with AD speech regardless of infant hearing status. The pattern of effects of ID and AD speech on vowel space characteristics in Study 2 was similar to that in Study 1, but depended partly on children's hearing status. Conclusion Given previously demonstrated associations between expanded vowel space in ID compared with AD speech and enhanced speech perception skills, this research supports a focus on vowel pronunciation in developing intervention strategies for improving speech-language skills in children with hearing impairment. PMID:25658071
Cohesive and coherent connected speech deficits in mild stroke.
Barker, Megan S; Young, Breanne; Robinson, Gail A
2017-05-01
Spoken language production theories and lesion studies highlight several important prelinguistic conceptual preparation processes involved in the production of cohesive and coherent connected speech. Cohesion and coherence broadly connect sentences with preceding ideas and the overall topic. Broader cognitive mechanisms may mediate these processes. This study aims to investigate (1) whether stroke patients without aphasia exhibit impairments in cohesion and coherence in connected speech, and (2) the role of attention and executive functions in the production of connected speech. Eighteen stroke patients (8 right hemisphere stroke [RHS]; 6 left [LHS]) and 21 healthy controls completed two self-generated narrative tasks to elicit connected speech. A multi-level analysis of within and between-sentence processing ability was conducted. Cohesion and coherence impairments were found in the stroke group, particularly RHS patients, relative to controls. In the whole stroke group, better performance on the Hayling Test of executive function, which taps verbal initiation/suppression, was related to fewer propositional repetitions and global coherence errors. Better performance on attention tasks was related to fewer propositional repetitions, and decreased global coherence errors. In the RHS group, aspects of cohesive and coherent speech were associated with better performance on attention tasks. Better Hayling Test scores were related to more cohesive and coherent speech in RHS patients, and more coherent speech in LHS patients. Thus, we documented connected speech deficits in a heterogeneous stroke group without prominent aphasia. Our results suggest that broader cognitive processes may play a role in producing connected speech at the early conceptual preparation stage. Copyright © 2017 Elsevier Inc. All rights reserved.
Results using the OPAL strategy in Mandarin speaking cochlear implant recipients.
Vandali, Andrew E; Dawson, Pam W; Arora, Komal
2017-01-01
To evaluate the effectiveness of an experimental pitch-coding strategy for improving recognition of Mandarin lexical tone in cochlear implant (CI) recipients. Adult CI recipients were tested on recognition of Mandarin tones in quiet and speech-shaped noise at a signal-to-noise ratio of +10 dB; Mandarin sentence speech-reception threshold (SRT) in speech-shaped noise; and pitch discrimination of synthetic complex-harmonic tones in quiet. Two versions of the experimental strategy were examined: (OPAL) linear (1:1) mapping of fundamental frequency (F0) to the coded modulation rate; and (OPAL+) transposed mapping of high F0s to a lower coded rate. Outcomes were compared to results using the clinical ACE™ strategy. Five Mandarin speaking users of Nucleus® cochlear implants. A small but significant benefit in recognition of lexical tones was observed using OPAL compared to ACE in noise, but not in quiet, and not for OPAL+ compared to ACE or OPAL in quiet or noise. Sentence SRTs were significantly better using OPAL+ and comparable using OPAL to those using ACE. No differences in pitch discrimination thresholds were observed across strategies. OPAL can provide benefits to Mandarin lexical tone recognition in moderately noisy conditions and preserve perception of Mandarin sentences in challenging noise conditions.
Gifford, René H; Revit, Lawrence J
2010-01-01
Although cochlear implant patients are achieving increasingly higher levels of performance, speech perception in noise continues to be problematic. The newest generations of implant speech processors are equipped with preprocessing and/or external accessories that are purported to improve listening in noise. Most speech perception measures in the clinical setting, however, do not provide a close approximation to real-world listening environments. To assess speech perception for adult cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE) array in order to determine whether commercially available preprocessing strategies and/or external accessories yield improved sentence recognition in noise. Single-subject, repeated-measures design with two groups of participants: Advanced Bionics and Cochlear Corporation recipients. Thirty-four subjects, ranging in age from 18 to 90 yr (mean 54.5 yr), participated in this prospective study. Fourteen subjects were Advanced Bionics recipients, and 20 subjects were Cochlear Corporation recipients. Speech reception thresholds (SRTs) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the subjects' preferred listening programs as well as with the addition of either Beam preprocessing (Cochlear Corporation) or the T-Mic accessory option (Advanced Bionics). In Experiment 1, adaptive SRTs with the Hearing in Noise Test sentences were obtained for all 34 subjects. For Cochlear Corporation recipients, SRTs were obtained with their preferred everyday listening program as well as with the addition of Focus preprocessing. For Advanced Bionics recipients, SRTs were obtained with the integrated behind-the-ear (BTE) mic as well as with the T-Mic. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the preprocessing strategy or external accessory in reducing the SRT in noise. In addition, a standard t-test was run to evaluate effectiveness across manufacturer for improving the SRT in noise. In Experiment 2, 16 of the 20 Cochlear Corporation subjects were reassessed obtaining an SRT in noise using the manufacturer-suggested "Everyday," "Noise," and "Focus" preprocessing strategies. A repeated-measures ANOVA was employed to assess the effects of preprocessing. The primary findings were (i) both Noise and Focus preprocessing strategies (Cochlear Corporation) significantly improved the SRT in noise as compared to Everyday preprocessing, (ii) the T-Mic accessory option (Advanced Bionics) significantly improved the SRT as compared to the BTE mic, and (iii) Focus preprocessing and the T-Mic resulted in similar degrees of improvement that were not found to be significantly different from one another. Options available in current cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise with both Cochlear Corporation and Advanced Bionics systems. For Cochlear Corporation recipients, Focus preprocessing yields the best speech-recognition performance in a complex listening environment; however, it is recommended that Noise preprocessing be used as the new default for everyday listening environments to avoid the need for switching programs throughout the day. For Advanced Bionics recipients, the T-Mic offers significantly improved performance in noise and is recommended for everyday use in all listening environments. American Academy of Audiology.
Cortical Integration of Audio-Visual Information
Vander Wyk, Brent C.; Ramsay, Gordon J.; Hudac, Caitlin M.; Jones, Warren; Lin, David; Klin, Ami; Lee, Su Mei; Pelphrey, Kevin A.
2013-01-01
We investigated the neural basis of audio-visual processing in speech and non-speech stimuli. Physically identical auditory stimuli (speech and sinusoidal tones) and visual stimuli (animated circles and ellipses) were used in this fMRI experiment. Relative to unimodal stimuli, each of the multimodal conjunctions showed increased activation in largely non-overlapping areas. The conjunction of Ellipse and Speech, which most resembles naturalistic audiovisual speech, showed higher activation in the right inferior frontal gyrus, fusiform gyri, left posterior superior temporal sulcus, and lateral occipital cortex. The conjunction of Circle and Tone, an arbitrary audio-visual pairing with no speech association, activated middle temporal gyri and lateral occipital cortex. The conjunction of Circle and Speech showed activation in lateral occipital cortex, and the conjunction of Ellipse and Tone did not show increased activation relative to unimodal stimuli. Further analysis revealed that middle temporal regions, although identified as multimodal only in the Circle-Tone condition, were more strongly active to Ellipse-Speech or Circle-Speech, but regions that were identified as multimodal for Ellipse-Speech were always strongest for Ellipse-Speech. Our results suggest that combinations of auditory and visual stimuli may together be processed by different cortical networks, depending on the extent to which speech or non-speech percepts are evoked. PMID:20709442
Interactive segmentation of tongue contours in ultrasound video sequences using quality maps
NASA Astrophysics Data System (ADS)
Ghrenassia, Sarah; Ménard, Lucie; Laporte, Catherine
2014-03-01
Ultrasound (US) imaging is an effective and non invasive way of studying the tongue motions involved in normal and pathological speech, and the results of US studies are of interest for the development of new strategies in speech therapy. State-of-the-art tongue shape analysis techniques based on US images depend on semi-automated tongue segmentation and tracking techniques. Recent work has mostly focused on improving the accuracy of the tracking techniques themselves. However, occasional errors remain inevitable, regardless of the technique used, and the tongue tracking process must thus be supervised by a speech scientist who will correct these errors manually or semi-automatically. This paper proposes an interactive framework to facilitate this process. In this framework, the user is guided towards potentially problematic portions of the US image sequence by a segmentation quality map that is based on the normalized energy of an active contour model and automatically produced during tracking. When a problematic segmentation is identified, corrections to the segmented contour can be made on one image and propagated both forward and backward in the problematic subsequence, thereby improving the user experience. The interactive tools were tested in combination with two different tracking algorithms. Preliminary results illustrate the potential of the proposed framework, suggesting that the proposed framework generally improves user interaction time, with little change in segmentation repeatability.
How does cognitive load influence speech perception? An encoding hypothesis.
Mitterer, Holger; Mattys, Sven L
2017-01-01
Two experiments investigated the conditions under which cognitive load exerts an effect on the acuity of speech perception. These experiments extend earlier research by using a different speech perception task (four-interval oddity task) and by implementing cognitive load through a task often thought to be modular, namely, face processing. In the cognitive-load conditions, participants were required to remember two faces presented before the speech stimuli. In Experiment 1, performance in the speech-perception task under cognitive load was not impaired in comparison to a no-load baseline condition. In Experiment 2, we modified the load condition minimally such that it required encoding of the two faces simultaneously with the speech stimuli. As a reference condition, we also used a visual search task that in earlier experiments had led to poorer speech perception. Both concurrent tasks led to decrements in the speech task. The results suggest that speech perception is affected even by loads thought to be processed modularly, and that, critically, encoding in working memory might be the locus of interference.
Van Ackeren, Markus Johannes; Barbero, Francesca M; Mattioni, Stefania; Bottini, Roberto
2018-01-01
The occipital cortex of early blind individuals (EB) activates during speech processing, challenging the notion of a hard-wired neurobiology of language. But, at what stage of speech processing do occipital regions participate in EB? Here we demonstrate that parieto-occipital regions in EB enhance their synchronization to acoustic fluctuations in human speech in the theta-range (corresponding to syllabic rate), irrespective of speech intelligibility. Crucially, enhanced synchronization to the intelligibility of speech was selectively observed in primary visual cortex in EB, suggesting that this region is at the interface between speech perception and comprehension. Moreover, EB showed overall enhanced functional connectivity between temporal and occipital cortices that are sensitive to speech intelligibility and altered directionality when compared to the sighted group. These findings suggest that the occipital cortex of the blind adopts an architecture that allows the tracking of speech material, and therefore does not fully abstract from the reorganized sensory inputs it receives. PMID:29338838
Maps and streams in the auditory cortex: nonhuman primates illuminate human speech processing
Rauschecker, Josef P; Scott, Sophie K
2010-01-01
Speech and language are considered uniquely human abilities: animals have communication systems, but they do not match human linguistic skills in terms of recursive structure and combinatorial power. Yet, in evolution, spoken language must have emerged from neural mechanisms at least partially available in animals. In this paper, we will demonstrate how our understanding of speech perception, one important facet of language, has profited from findings and theory in nonhuman primate studies. Chief among these are physiological and anatomical studies showing that primate auditory cortex, across species, shows patterns of hierarchical structure, topographic mapping and streams of functional processing. We will identify roles for different cortical areas in the perceptual processing of speech and review functional imaging work in humans that bears on our understanding of how the brain decodes and monitors speech. A new model connects structures in the temporal, frontal and parietal lobes linking speech perception and production. PMID:19471271
Processing load induced by informational masking is related to linguistic abilities.
Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Rönnberg, Jerker; Kramer, Sophia E
2012-01-01
It is often assumed that the benefit of hearing aids is not primarily reflected in better speech performance, but that it is reflected in less effortful listening in the aided than in the unaided condition. Before being able to assess such a hearing aid benefit the present study examined how processing load while listening to masked speech relates to inter-individual differences in cognitive abilities relevant for language processing. Pupil dilation was measured in thirty-two normal hearing participants while listening to sentences masked by fluctuating noise or interfering speech at either 50% and 84% intelligibility. Additionally, working memory capacity, inhibition of irrelevant information, and written text reception was tested. Pupil responses were larger during interfering speech as compared to fluctuating noise. This effect was independent of intelligibility level. Regression analysis revealed that high working memory capacity, better inhibition, and better text reception were related to better speech reception thresholds. Apart from a positive relation to speech recognition, better inhibition and better text reception are also positively related to larger pupil dilation in the single-talker masker conditions. We conclude that better cognitive abilities not only relate to better speech perception, but also partly explain higher processing load in complex listening conditions.
Emotional Speech Perception Unfolding in Time: The Role of the Basal Ganglia
Paulmann, Silke; Ott, Derek V. M.; Kotz, Sonja A.
2011-01-01
The basal ganglia (BG) have repeatedly been linked to emotional speech processing in studies involving patients with neurodegenerative and structural changes of the BG. However, the majority of previous studies did not consider that (i) emotional speech processing entails multiple processing steps, and the possibility that (ii) the BG may engage in one rather than the other of these processing steps. In the present study we investigate three different stages of emotional speech processing (emotional salience detection, meaning-related processing, and identification) in the same patient group to verify whether lesions to the BG affect these stages in a qualitatively different manner. Specifically, we explore early implicit emotional speech processing (probe verification) in an ERP experiment followed by an explicit behavioral emotional recognition task. In both experiments, participants listened to emotional sentences expressing one of four emotions (anger, fear, disgust, happiness) or neutral sentences. In line with previous evidence patients and healthy controls show differentiation of emotional and neutral sentences in the P200 component (emotional salience detection) and a following negative-going brain wave (meaning-related processing). However, the behavioral recognition (identification stage) of emotional sentences was impaired in BG patients, but not in healthy controls. The current data provide further support that the BG are involved in late, explicit rather than early emotional speech processing stages. PMID:21437277
1990-07-01
changes either in the MFA or in Soviet foreign and defense policy. This situation began to change in May 1986, when Gorbachev gave an unusual speech to the...MFA in which he demanded better performance from Soviet diplomats. Although it was later reported that Gorbachev’s speech contained strong criticism...July 1988 with a sweeping critique of Soviet strategy and ,military policy since World War II. Subsequent speeches and articles in MFA-controlled
Application of advanced speech technology in manned penetration bombers
NASA Astrophysics Data System (ADS)
North, R.; Lea, W.
1982-03-01
This report documents research on the potential use of speech technology in a manned penetration bomber aircraft (B-52/G and H). The objectives of the project were to analyze the pilot/copilot crewstation tasks over a three-hour-and forty-minute mission and determine the tasks that would benefit the most from conversion to speech recognition/generation, determine the technological feasibility of each of the identified tasks, and prioritize these tasks based on these criteria. Secondary objectives of the program were to enunciate research strategies in the application of speech technologies in airborne environments, and develop guidelines for briefing user commands on the potential of using speech technologies in the cockpit. The results of this study indicated that for the B-52 crewmember, speech recognition would be most beneficial for retrieving chart and procedural data that is contained in the flight manuals. Technological feasibility of these tasks indicated that the checklist and procedural retrieval tasks would be highly feasible for a speech recognition system.
Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei
2015-07-01
Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking individuals with high-functioning autism. Individuals with autism showed superior melodic contour identification but comparable contour discrimination relative to controls. In contrast, these individuals performed worse than controls on both discrimination and identification of speech intonation. These findings provide the first evidence for differential pitch processing in music and speech in tone language speakers with autism, suggesting that tone language experience may not compensate for speech intonation perception deficits in individuals with autism.
Brainstem transcription of speech is disrupted in children with autism spectrum disorders
Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina
2009-01-01
Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or about how background noise may impact this transcription process. Unlike cortical encoding of sounds, brainstem representation preserves stimulus features with a degree of fidelity that enables a direct link between acoustic components of the speech syllable (e.g., onsets) to specific aspects of neural encoding (e.g., waves V and A). We measured brainstem responses to the syllable /da/, in quiet and background noise, in children with and without ASD. Children with ASD exhibited deficits in both the neural synchrony (timing) and phase locking (frequency encoding) of speech sounds, despite normal click-evoked brainstem responses. They also exhibited reduced magnitude and fidelity of speech-evoked responses and inordinate degradation of responses by background noise in comparison to typically developing controls. Neural synchrony in noise was significantly related to measures of core and receptive language ability. These data support the idea that abnormalities in the brainstem processing of speech contribute to the language impairment in ASD. Because it is both passively-elicited and malleable, the speech-evoked brainstem response may serve as a clinical tool to assess auditory processing as well as the effects of auditory training in the ASD population. PMID:19635083
Speech perception in autism spectrum disorder: An activation likelihood estimation meta-analysis.
Tryfon, Ana; Foster, Nicholas E V; Sharda, Megha; Hyde, Krista L
2018-02-15
Autism spectrum disorder (ASD) is often characterized by atypical language profiles and auditory and speech processing. These can contribute to aberrant language and social communication skills in ASD. The study of the neural basis of speech perception in ASD can serve as a potential neurobiological marker of ASD early on, but mixed results across studies renders it difficult to find a reliable neural characterization of speech processing in ASD. To this aim, the present study examined the functional neural basis of speech perception in ASD versus typical development (TD) using an activation likelihood estimation (ALE) meta-analysis of 18 qualifying studies. The present study included separate analyses for TD and ASD, which allowed us to examine patterns of within-group brain activation as well as both common and distinct patterns of brain activation across the ASD and TD groups. Overall, ASD and TD showed mostly common brain activation of speech processing in bilateral superior temporal gyrus (STG) and left inferior frontal gyrus (IFG). However, the results revealed trends for some distinct activation in the TD group showing additional activation in higher-order brain areas including left superior frontal gyrus (SFG), left medial frontal gyrus (MFG), and right IFG. These results provide a more reliable neural characterization of speech processing in ASD relative to previous single neuroimaging studies and motivate future work to investigate how these brain signatures relate to behavioral measures of speech processing in ASD. Copyright © 2017 Elsevier B.V. All rights reserved.
Dietz, Mathias; Hohmann, Volker; Jürgens, Tim
2015-01-01
For normal-hearing listeners, speech intelligibility improves if speech and noise are spatially separated. While this spatial release from masking has already been quantified in normal-hearing listeners in many studies, it is less clear how spatial release from masking changes in cochlear implant listeners with and without access to low-frequency acoustic hearing. Spatial release from masking depends on differences in access to speech cues due to hearing status and hearing device. To investigate the influence of these factors on speech intelligibility, the present study measured speech reception thresholds in spatially separated speech and noise for 10 different listener types. A vocoder was used to simulate cochlear implant processing and low-frequency filtering was used to simulate residual low-frequency hearing. These forms of processing were combined to simulate cochlear implant listening, listening based on low-frequency residual hearing, and combinations thereof. Simulated cochlear implant users with additional low-frequency acoustic hearing showed better speech intelligibility in noise than simulated cochlear implant users without acoustic hearing and had access to more spatial speech cues (e.g., higher binaural squelch). Cochlear implant listener types showed higher spatial release from masking with bilateral access to low-frequency acoustic hearing than without. A binaural speech intelligibility model with normal binaural processing showed overall good agreement with measured speech reception thresholds, spatial release from masking, and spatial speech cues. This indicates that differences in speech cues available to listener types are sufficient to explain the changes of spatial release from masking across these simulated listener types. PMID:26721918
Nuttall, Helen E.; Moore, David R.; Barry, Johanna G.; Krumbholz, Katrin
2015-01-01
The speech-evoked auditory brain stem response (speech ABR) is widely considered to provide an index of the quality of neural temporal encoding in the central auditory pathway. The aim of the present study was to evaluate the extent to which the speech ABR is shaped by spectral processing in the cochlea. High-pass noise masking was used to record speech ABRs from delimited octave-wide frequency bands between 0.5 and 8 kHz in normal-hearing young adults. The latency of the frequency-delimited responses decreased from the lowest to the highest frequency band by up to 3.6 ms. The observed frequency-latency function was compatible with model predictions based on wave V of the click ABR. The frequency-delimited speech ABR amplitude was largest in the 2- to 4-kHz frequency band and decreased toward both higher and lower frequency bands despite the predominance of low-frequency energy in the speech stimulus. We argue that the frequency dependence of speech ABR latency and amplitude results from the decrease in cochlear filter width with decreasing frequency. The results suggest that the amplitude and latency of the speech ABR may reflect interindividual differences in cochlear, as well as central, processing. The high-pass noise-masking technique provides a useful tool for differentiating between peripheral and central effects on the speech ABR. It can be used for further elucidating the neural basis of the perceptual speech deficits that have been associated with individual differences in speech ABR characteristics. PMID:25787954
Prediction and constraint in audiovisual speech perception
Peelle, Jonathan E.; Sommers, Mitchell S.
2015-01-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing precision of prediction. Electrophysiological studies demonstrate oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to auditory information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. PMID:25890390
The neural processing of foreign-accented speech and its relationship to listener bias
Yi, Han-Gyol; Smiljanic, Rajka; Chandrasekaran, Bharath
2014-01-01
Foreign-accented speech often presents a challenging listening condition. In addition to deviations from the target speech norms related to the inexperience of the nonnative speaker, listener characteristics may play a role in determining intelligibility levels. We have previously shown that an implicit visual bias for associating East Asian faces and foreignness predicts the listeners' perceptual ability to process Korean-accented English audiovisual speech (Yi et al., 2013). Here, we examine the neural mechanism underlying the influence of listener bias to foreign faces on speech perception. In a functional magnetic resonance imaging (fMRI) study, native English speakers listened to native- and Korean-accented English sentences, with or without faces. The participants' Asian-foreign association was measured using an implicit association test (IAT), conducted outside the scanner. We found that foreign-accented speech evoked greater activity in the bilateral primary auditory cortices and the inferior frontal gyri, potentially reflecting greater computational demand. Higher IAT scores, indicating greater bias, were associated with increased BOLD response to foreign-accented speech with faces in the primary auditory cortex, the early node for spectrotemporal analysis. We conclude the following: (1) foreign-accented speech perception places greater demand on the neural systems underlying speech perception; (2) face of the talker can exaggerate the perceived foreignness of foreign-accented speech; (3) implicit Asian-foreign association is associated with decreased neural efficiency in early spectrotemporal processing. PMID:25339883
Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M
2016-02-01
Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
Getting the cocktail party started: masking effects in speech perception
Evans, S; McGettigan, C; Agnew, ZK; Rosen, S; Scott, SK
2016-01-01
Spoken conversations typically take place in noisy environments and different kinds of masking sounds place differing demands on cognitive resources. Previous studies, examining the modulation of neural activity associated with the properties of competing sounds, have shown that additional speech streams engage the superior temporal gyrus. However, the absence of a condition in which target speech was heard without additional masking made it difficult to identify brain networks specific to masking and to ascertain the extent to which competing speech was processed equivalently to target speech. In this study, we scanned young healthy adults with continuous functional Magnetic Resonance Imaging (fMRI), whilst they listened to stories masked by sounds that differed in their similarity to speech. We show that auditory attention and control networks are activated during attentive listening to masked speech in the absence of an overt behavioural task. We demonstrate that competing speech is processed predominantly in the left hemisphere within the same pathway as target speech but is not treated equivalently within that stream, and that individuals who perform better in speech in noise tasks activate the left mid-posterior superior temporal gyrus more. Finally, we identify neural responses associated with the onset of sounds in the auditory environment, activity was found within right lateralised frontal regions consistent with a phasic alerting response. Taken together, these results provide a comprehensive account of the neural processes involved in listening in noise. PMID:26696297
Zheng, Zane Z; Munhall, Kevin G; Johnsrude, Ingrid S
2010-08-01
The fluency and the reliability of speech production suggest a mechanism that links motor commands and sensory feedback. Here, we examined the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not and by examining the overlap with the network recruited during passive listening to speech sounds. We used real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word ("Ted") and either heard this clearly or heard voice-gated masking noise. We compared this to when they listened to yoked stimuli (identical recordings of "Ted" or noise) without speaking. Activity along the STS and superior temporal gyrus bilaterally was significantly greater if the auditory stimulus was (a) processed as the auditory concomitant of speaking and (b) did not match the predicted outcome (noise). The network exhibiting this Feedback Type x Production/Perception interaction includes a superior temporal gyrus/middle temporal gyrus region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts and that processes an error signal in speech-sensitive regions when this and the sensory data do not match.
Zheng, Zane Z.; Munhall, Kevin G; Johnsrude, Ingrid S
2009-01-01
The fluency and reliability of speech production suggests a mechanism that links motor commands and sensory feedback. Here, we examine the neural organization supporting such links by using fMRI to identify regions in which activity during speech production is modulated according to whether auditory feedback matches the predicted outcome or not, and examining the overlap with the network recruited during passive listening to speech sounds. We use real-time signal processing to compare brain activity when participants whispered a consonant-vowel-consonant word (‘Ted’) and either heard this clearly, or heard voice-gated masking noise. We compare this to when they listened to yoked stimuli (identical recordings of ‘Ted’ or noise) without speaking. Activity along the superior temporal sulcus (STS) and superior temporal gyrus (STG) bilaterally was significantly greater if the auditory stimulus was a) processed as the auditory concomitant of speaking and b) did not match the predicted outcome (noise). The network exhibiting this Feedback type by Production/Perception interaction includes an STG/MTG region that is activated more when listening to speech than to noise. This is consistent with speech production and speech perception being linked in a control system that predicts the sensory outcome of speech acts, and that processes an error signal in speech-sensitive regions when this and the sensory data do not match. PMID:19642886
Characterizing Speech Intelligibility in Noise After Wide Dynamic Range Compression.
Rhebergen, Koenraad S; Maalderink, Thijs H; Dreschler, Wouter A
The effects of nonlinear signal processing on speech intelligibility in noise are difficult to evaluate. Often, the effects are examined by comparing speech intelligibility scores with and without processing measured at fixed signal to noise ratios (SNRs) or by comparing the adaptive measured speech reception thresholds corresponding to 50% intelligibility (SRT50) with and without processing. These outcome measures might not be optimal. Measuring at fixed SNRs can be affected by ceiling or floor effects, because the range of relevant SNRs is not know in advance. The SRT50 is less time consuming, has a fixed performance level (i.e., 50% correct), but the SRT50 could give a limited view, because we hypothesize that the effect of most nonlinear signal processing algorithms at the SRT50 cannot be generalized to other points of the psychometric function. In this article, we tested the value of estimating the entire psychometric function. We studied the effect of wide dynamic range compression (WDRC) on speech intelligibility in stationary, and interrupted speech-shaped noise in normal-hearing subjects, using a fast method-based local linear fitting approach and by two adaptive procedures. The measured performance differences for conditions with and without WDRC for the psychometric functions in stationary noise and interrupted speech-shaped noise show that the effects of WDRC on speech intelligibility are SNR dependent. We conclude that favorable and unfavorable effects of WDRC on speech intelligibility can be missed if the results are presented in terms of SRT50 values only.
Research in speech communication.
Flanagan, J
1995-01-01
Advances in digital speech processing are now supporting application and deployment of a variety of speech technologies for human/machine communication. In fact, new businesses are rapidly forming about these technologies. But these capabilities are of little use unless society can afford them. Happily, explosive advances in microelectronics over the past two decades have assured affordable access to this sophistication as well as to the underlying computing technology. The research challenges in speech processing remain in the traditionally identified areas of recognition, synthesis, and coding. These three areas have typically been addressed individually, often with significant isolation among the efforts. But they are all facets of the same fundamental issue--how to represent and quantify the information in the speech signal. This implies deeper understanding of the physics of speech production, the constraints that the conventions of language impose, and the mechanism for information processing in the auditory system. In ongoing research, therefore, we seek more accurate models of speech generation, better computational formulations of language, and realistic perceptual guides for speech processing--along with ways to coalesce the fundamental issues of recognition, synthesis, and coding. Successful solution will yield the long-sought dictation machine, high-quality synthesis from text, and the ultimate in low bit-rate transmission of speech. It will also open the door to language-translating telephony, where the synthetic foreign translation can be in the voice of the originating talker. Images Fig. 1 Fig. 2 Fig. 5 Fig. 8 Fig. 11 Fig. 12 Fig. 13 PMID:7479806
A Model for Speech Processing in Second Language Listening Activities
ERIC Educational Resources Information Center
Zoghbor, Wafa Shahada
2016-01-01
Teachers' understanding of the process of speech perception could inform practice in listening classrooms. Catford (1950) developed a model for speech perception taking into account the influence of the acoustic features of the linguistic forms used by the speaker, whereby the listener "identifies" and "interprets" these…
Positron Emission Tomography in Cochlear Implant and Auditory Brainstem Implant Recipients.
ERIC Educational Resources Information Center
Miyamoto, Richard T.; Wong, Donald
2001-01-01
Positron emission tomography imaging was used to evaluate the brain's response to auditory stimulation, including speech, in deaf adults (five with cochlear implants and one with an auditory brainstem implant). Functional speech processing was associated with activation in areas classically associated with speech processing. (Contains five…
Differences in Talker Recognition by Preschoolers and Adults
ERIC Educational Resources Information Center
Creel, Sarah C.; Jimenez, Sofia R.
2012-01-01
Talker variability in speech influences language processing from infancy through adulthood and is inextricably embedded in the very cues that identify speech sounds. Yet little is known about developmental changes in the processing of talker information. On one account, children have not yet learned to separate speech sound variability from…
Masked Speech Recognition and Reading Ability in School-Age Children: Is There a Relationship?
ERIC Educational Resources Information Center
Miller, Gabrielle; Lewis, Barbara; Benchek, Penelope; Buss, Emily; Calandruccio, Lauren
2018-01-01
Purpose: The relationship between reading (decoding) skills, phonological processing abilities, and masked speech recognition in typically developing children was explored. This experiment was designed to evaluate the relationship between phonological processing and decoding abilities and 2 aspects of masked speech recognition in typically…
Status Report on Speech Research, No. 27, July-September 1971.
ERIC Educational Resources Information Center
Haskins Labs., New Haven, CT.
This report contains fourteen papers on a wide range of current topics and experiments in speech research, ranging from the relationship between speech and reading to questions of memory and perception of speech sounds. The following papers are included: "How Is Language Conveyed by Speech?;""Reading, the Linguistic Process, and Linguistic…
Speech and Language Skills of Parents of Children with Speech Sound Disorders
ERIC Educational Resources Information Center
Lewis, Barbara A.; Freebairn, Lisa A.; Hansen, Amy J.; Miscimarra, Lara; Iyengar, Sudha K.; Taylor, H. Gerry
2007-01-01
Purpose: This study compared parents with histories of speech sound disorders (SSD) to parents without known histories on measures of speech sound production, phonological processing, language, reading, and spelling. Familial aggregation for speech and language disorders was also examined. Method: The participants were 147 parents of children with…
Musical intervention enhances infants’ neural processing of temporal structure in music and speech
Zhao, T. Christina; Kuhl, Patricia K.
2016-01-01
Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants’ neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants’ neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants’ neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants’ ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing. PMID:27114512
Musical intervention enhances infants' neural processing of temporal structure in music and speech.
Zhao, T Christina; Kuhl, Patricia K
2016-05-10
Individuals with music training in early childhood show enhanced processing of musical sounds, an effect that generalizes to speech processing. However, the conclusions drawn from previous studies are limited due to the possible confounds of predisposition and other factors affecting musicians and nonmusicians. We used a randomized design to test the effects of a laboratory-controlled music intervention on young infants' neural processing of music and speech. Nine-month-old infants were randomly assigned to music (intervention) or play (control) activities for 12 sessions. The intervention targeted temporal structure learning using triple meter in music (e.g., waltz), which is difficult for infants, and it incorporated key characteristics of typical infant music classes to maximize learning (e.g., multimodal, social, and repetitive experiences). Controls had similar multimodal, social, repetitive play, but without music. Upon completion, infants' neural processing of temporal structure was tested in both music (tones in triple meter) and speech (foreign syllable structure). Infants' neural processing was quantified by the mismatch response (MMR) measured with a traditional oddball paradigm using magnetoencephalography (MEG). The intervention group exhibited significantly larger MMRs in response to music temporal structure violations in both auditory and prefrontal cortical regions. Identical results were obtained for temporal structure changes in speech. The intervention thus enhanced temporal structure processing not only in music, but also in speech, at 9 mo of age. We argue that the intervention enhanced infants' ability to extract temporal structure information and to predict future events in time, a skill affecting both music and speech processing.
ERIC Educational Resources Information Center
Pattamadilok, Chotiga; Nelis, Aubéline; Kolinsky, Régine
2014-01-01
Studies on proficient readers showed that speech processing is affected by knowledge of the orthographic code. Yet, the automaticity of the orthographic influence depends on task demand. Here, we addressed this automaticity issue in normal and dyslexic adult readers by comparing the orthographic effects obtained in two speech processing tasks that…
ERIC Educational Resources Information Center
Yau, Shu Hui; Brock, Jon; McArthur, Genevieve
2016-01-01
It has been proposed that language impairments in children with Autism Spectrum Disorders (ASD) stem from atypical neural processing of speech and/or nonspeech sounds. However, the strength of this proposal is compromised by the unreliable outcomes of previous studies of speech and nonspeech processing in ASD. The aim of this study was to…
ERIC Educational Resources Information Center
Basirat, Anahita; Brunellière, Angèle; Hartsuiker, Robert
2018-01-01
Numerous studies suggest that audiovisual speech influences lexical processing. However, it is not clear which stages of lexical processing are modulated by audiovisual speech. In this study, we examined the time course of the access to word representations in long-term memory when they were presented in auditory-only and audiovisual modalities.…
McKinnon, David H; McLeod, Sharynne; Reilly, Sheena
2007-01-01
The aims of this study were threefold: to report teachers' estimates of the prevalence of speech disorders (specifically, stuttering, voice, and speech-sound disorders); to consider correspondence between the prevalence of speech disorders and gender, grade level, and socioeconomic status; and to describe the level of support provided to schoolchildren with speech disorders. Students with speech disorders were identified from 10,425 students in Australia using a 4-stage process: training in the data collection process, teacher identification, confirmation by a speech-language pathologist, and consultation with district special needs advisors. The prevalence of students with speech disorders was estimated; specifically, 0.33% of students were identified as stuttering, 0.12% as having a voice disorder, and 1.06% as having a speech-sound disorder. There was a higher prevalence of speech disorders in males than in females. As grade level increased, the prevalence of speech disorders decreased. There was no significant difference in the pattern of prevalence across the three speech disorders and four socioeconomic groups; however, students who were identified with a speech disorder were more likely to be in the higher socioeconomic groups. Finally, there was a difference between the perceived and actual level of support that was provided to these students. These prevalence figures are lower than those using initial identification by speech-language pathologists and similar to those using parent report.
Role of contextual cues on the perception of spectrally reduced interrupted speech.
Patro, Chhayakanta; Mendel, Lisa Lucks
2016-08-01
Understanding speech within an auditory scene is constantly challenged by interfering noise in suboptimal listening environments when noise hinders the continuity of the speech stream. In such instances, a typical auditory-cognitive system perceptually integrates available speech information and "fills in" missing information in the light of semantic context. However, individuals with cochlear implants (CIs) find it difficult and effortful to understand interrupted speech compared to their normal hearing counterparts. This inefficiency in perceptual integration of speech could be attributed to further degradations in the spectral-temporal domain imposed by CIs making it difficult to utilize the contextual evidence effectively. To address these issues, 20 normal hearing adults listened to speech that was spectrally reduced and spectrally reduced interrupted in a manner similar to CI processing. The Revised Speech Perception in Noise test, which includes contextually rich and contextually poor sentences, was used to evaluate the influence of semantic context on speech perception. Results indicated that listeners benefited more from semantic context when they listened to spectrally reduced speech alone. For the spectrally reduced interrupted speech, contextual information was not as helpful under significant spectral reductions, but became beneficial as the spectral resolution improved. These results suggest top-down processing facilitates speech perception up to a point, and it fails to facilitate speech understanding when the speech signals are significantly degraded.
Equality marker in the language of bali
NASA Astrophysics Data System (ADS)
Wajdi, Majid; Subiyanto, Paulus
2018-01-01
The language of Bali could be grouped into one of the most elaborate languages of the world since the existence of its speech levels, low and high speech levels, as the language of Java has. Low and high speech levels of the language of Bali are language codes that could be used to show and express social relationship between or among its speakers. This paper focuses on describing, analyzing, and interpreting the use of the low code of the language of Bali in daily communication in the speech community of Pegayaman, Bali. Observational and documentation methods were applied to provide the data for the research. Recoding and field note techniques were executed to provide the data. Recorded in spoken language and the study of novel of Balinese were transcribed into written form to ease the process of analysis. Symmetric use of low code expresses social equality between or among the participants involves in the communication. It also implies social intimacy between or among the speakers of the language of Bali. Regular and patterned use of the low code of the language of Bali is not merely communication strategy, but it is a kind of communication agreement or communication contract between the participants. By using low code during their social and communication activities, the participants shared and express their social equality and intimacy between or among the participants involve in social and communication activities.
Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias
2016-01-01
The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
Cognitive Spare Capacity and Speech Communication: A Narrative Overview
2014-01-01
Background noise can make speech communication tiring and cognitively taxing, especially for individuals with hearing impairment. It is now well established that better working memory capacity is associated with better ability to understand speech under adverse conditions as well as better ability to benefit from the advanced signal processing in modern hearing aids. Recent work has shown that although such processing cannot overcome hearing handicap, it can increase cognitive spare capacity, that is, the ability to engage in higher level processing of speech. This paper surveys recent work on cognitive spare capacity and suggests new avenues of investigation. PMID:24971355
Processing of speech and non-speech stimuli in children with specific language impairment
NASA Astrophysics Data System (ADS)
Basu, Madhavi L.; Surprenant, Aimee M.
2003-10-01
Specific Language Impairment (SLI) is a developmental language disorder in which children demonstrate varying degrees of difficulties in acquiring a spoken language. One possible underlying cause is that children with SLI have deficits in processing sounds that are of short duration or when they are presented rapidly. Studies so far have compared their performance on speech and nonspeech sounds of unequal complexity. Hence, it is still unclear whether the deficit is specific to the perception of speech sounds or whether it more generally affects the auditory function. The current study aims to answer this question by comparing the performance of children with SLI on speech and nonspeech sounds synthesized from sine-wave stimuli. The children will be tested using the classic categorical perception paradigm that includes both the identification and discrimination of stimuli along a continuum. If there is a deficit in the performance on both speech and nonspeech tasks, it will show that these children have a deficit in processing complex sounds. Poor performance on only the speech sounds will indicate that the deficit is more related to language. The findings will offer insights into the exact nature of the speech perception deficits in children with SLI. [Work supported by ASHF.
The pupil response is sensitive to divided attention during speech processing.
Koelewijn, Thomas; Shinn-Cunningham, Barbara G; Zekveld, Adriana A; Kramer, Sophia E
2014-06-01
Dividing attention over two streams of speech strongly decreases performance compared to focusing on only one. How divided attention affects cognitive processing load as indexed with pupillometry during speech recognition has so far not been investigated. In 12 young adults the pupil response was recorded while they focused on either one or both of two sentences that were presented dichotically and masked by fluctuating noise across a range of signal-to-noise ratios. In line with previous studies, the performance decreases when processing two target sentences instead of one. Additionally, dividing attention to process two sentences caused larger pupil dilation and later peak pupil latency than processing only one. This suggests an effect of attention on cognitive processing load (pupil dilation) during speech processing in noise. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Angelopoulou, Georgia; Kasselimis, Dimitrios; Makrydakis, George; Varkanitsa, Maria; Roussos, Petros; Goutsos, Dionysis; Evdokimidis, Ioannis; Potagas, Constantin
2018-06-01
Pauses may be studied as an aspect of the temporal organization of speech, as well as an index of internal cognitive processes, such as word access, selection and retrieval, monitoring, articulatory planning, and memory. Several studies have demonstrated specific distributional patterns of pauses in typical speech. However, evidence from patients with language impairment is sparse and restricted to small-scale studies. The aim of the present study is to investigate empty pause distribution and associations between pause variables and linguistic elements in aphasia. Eighteen patients with chronic aphasia following a left hemisphere stroke were recruited. The control group consisted of 19 healthy adults matched for age, gender, and years of formal schooling. Speech samples from both groups were transcribed, and silent pauses were annotated using ELAN. Our results indicate that in both groups, pause duration distribution follows a log-normal bimodal model with significantly different thresholds between the two populations, yet specific enough for each distribution to justify classification into two distinct groups of pauses for each population: short and long. Moreover, we found differences between the patient and control group, prominently with regard to long pause duration and rate. Long pause indices were also associated with fundamental linguistics elements, such as mean length of utterance. Overall, we argue that post-stroke aphasia may induce quantitative but not qualitative alterations of pause patterns during speech, and further suggest that long pauses may serve as an index of internal cognitive processes supporting sentence planning. Our findings are discussed within the context of pause pattern quantification strategies as potential markers of cognitive changes in aphasia, further stressing the importance of such measures as an integral part of language assessment in clinical populations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Philippot, Pierre; Vrielynck, Nathalie; Muller, Valérie
2010-12-01
The present study examined the impact of different modes of processing anxious apprehension on subsequent anxiety and performance in a stressful speech task. Participants were informed that they would have to give a speech on a difficult topic while being videotaped and evaluated on their performance. They were then randomly assigned to one of three conditions. In a specific processing condition, they were encouraged to explore in detail all the specific aspects (thoughts, emotions, sensations) they experienced while anticipating giving the speech; in a general processing condition, they had to focus on the generic aspects that they would typically experience during anxious anticipation; and in a control, no-processing condition, participants were distracted. Results revealed that at the end of the speech, participants in the specific processing condition reported less anxiety than those in the two other conditions. They were also evaluated by judges to have performed better than those in the control condition, who in turn did better than those in the general processing condition. Copyright © 2010. Published by Elsevier Ltd.
Neural integration of iconic and unrelated coverbal gestures: a functional MRI study.
Green, Antonia; Straube, Benjamin; Weis, Susanne; Jansen, Andreas; Willmes, Klaus; Konrad, Kerstin; Kircher, Tilo
2009-10-01
Gestures are an important part of interpersonal communication, for example by illustrating physical properties of speech contents (e.g., "the ball is round"). The meaning of these so-called iconic gestures is strongly intertwined with speech. We investigated the neural correlates of the semantic integration for verbal and gestural information. Participants watched short videos of five speech and gesture conditions performed by an actor, including variation of language (familiar German vs. unfamiliar Russian), variation of gesture (iconic vs. unrelated), as well as isolated familiar language, while brain activation was measured using functional magnetic resonance imaging. For familiar speech with either of both gesture types contrasted to Russian speech-gesture pairs, activation increases were observed at the left temporo-occipital junction. Apart from this shared location, speech with iconic gestures exclusively engaged left occipital areas, whereas speech with unrelated gestures activated bilateral parietal and posterior temporal regions. Our results demonstrate that the processing of speech with speech-related versus speech-unrelated gestures occurs in two distinct but partly overlapping networks. The distinct processing streams (visual versus linguistic/spatial) are interpreted in terms of "auxiliary systems" allowing the integration of speech and gesture in the left temporo-occipital region.
The effect of hearing aid technologies on listening in an automobile.
Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A; Stanziola, Rachel W
2013-06-01
Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile/road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the listener/driver, conventional directional processing that places the directivity beam toward the listener's front may not be helpful and, in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener's speech recognition performance and preference for communication in a traveling automobile. A single-blinded, repeated-measures design was used. Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 yr participated in the study. The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 mph on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound-treated booth to assess speech recognition performance and preference with each programmed condition. Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants' preferences for a given processing scheme were generally consistent with speech recognition results. The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. American Academy of Audiology.
Buchan, Julie N; Paré, Martin; Munhall, Kevin G
2008-11-25
During face-to-face conversation the face provides auditory and visual linguistic information, and also conveys information about the identity of the speaker. This study investigated behavioral strategies involved in gathering visual information while watching talking faces. The effects of varying talker identity and varying the intelligibility of speech (by adding acoustic noise) on gaze behavior were measured with an eyetracker. Varying the intelligibility of the speech by adding noise had a noticeable effect on the location and duration of fixations. When noise was present subjects adopted a vantage point that was more centralized on the face by reducing the frequency of the fixations on the eyes and mouth and lengthening the duration of their gaze fixations on the nose and mouth. Varying talker identity resulted in a more modest change in gaze behavior that was modulated by the intelligibility of the speech. Although subjects generally used similar strategies to extract visual information in both talker variability conditions, when noise was absent there were more fixations on the mouth when viewing a different talker every trial as opposed to the same talker every trial. These findings provide a useful baseline for studies examining gaze behavior during audiovisual speech perception and perception of dynamic faces.
V2S: Voice to Sign Language Translation System for Malaysian Deaf People
NASA Astrophysics Data System (ADS)
Mean Foong, Oi; Low, Tang Jung; La, Wai Wan
The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception
ERIC Educational Resources Information Center
Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.
2016-01-01
Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
ERIC Educational Resources Information Center
Pillai, Patrick
2000-01-01
Children with chronic ear infections experience a lag time in understanding speech, which inhibits classroom participation and the ability to make friends, and ultimately reduces self-esteem. Difficulty in hearing affects speech and vocabulary development, reading and writing proficiency, and academic performance, and could lead to placement in…
Supporting Children with Speech and Language Difficulties. Supporting Children Series
ERIC Educational Resources Information Center
David Fulton Publishers, 2004
2004-01-01
Off-the-shelf support containing all the vital information practitioners need to know about Speech and Language Difficulties, this book includes: (1) Strategies for developing attention control; (2) Guidance on how to improve language and listening skills; and (3) Ideas for teaching phonological awareness. Following a foreword and an introduction,…
Once More, With Feeling: Reagan and "The Speech" in 1980.
ERIC Educational Resources Information Center
Henry, David
Ronald Reagan's rise from political neophyte to Republican candidate for governor of California in 1966 was characterized by a public relations strategy, which was bolstered by "The Speech," a 30-minute anti-big government, defense-of-freedom message. He presented this message appropriately to each audience to identify himself with…
ERIC Educational Resources Information Center
Perfitt, Ruth
2013-01-01
This article investigates the impact of transitions upon pupils aged 11-14 with speech, language and communication needs, including specific language impairment and autism. The aim is to identify stress factors, examine whether these affect any subgroups in particular and suggest practical strategies to support pupils through transitions. Stress…
The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers
Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren
2012-01-01
Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results. PMID:22347374
Relationship between individual differences in speech processing and cognitive functions.
Ou, Jinghua; Law, Sam-Po; Fung, Roxana
2015-12-01
A growing body of research has suggested that cognitive abilities may play a role in individual differences in speech processing. The present study took advantage of a widespread linguistic phenomenon of sound change to systematically assess the relationships between speech processing and various components of attention and working memory in the auditory and visual modalities among typically developed Cantonese-speaking individuals. The individual variations in speech processing are captured in an ongoing sound change-tone merging in Hong Kong Cantonese, in which typically developed native speakers are reported to lose the distinctions between some tonal contrasts in perception and/or production. Three groups of participants were recruited, with a first group of good perception and production, a second group of good perception but poor production, and a third group of good production but poor perception. Our findings revealed that modality-independent abilities of attentional switching/control and working memory might contribute to individual differences in patterns of speech perception and production as well as discrimination latencies among typically developed speakers. The findings not only have the potential to generalize to speech processing in other languages, but also broaden our understanding of the omnipresent phenomenon of language change in all languages.
The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.
Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren
2012-01-01
Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.
The Effect of Lexical Content on Dichotic Speech Recognition in Older Adults.
Findlen, Ursula M; Roup, Christina M
2016-01-01
Age-related auditory processing deficits have been shown to negatively affect speech recognition for older adult listeners. In contrast, older adults gain benefit from their ability to make use of semantic and lexical content of the speech signal (i.e., top-down processing), particularly in complex listening situations. Assessment of auditory processing abilities among aging adults should take into consideration semantic and lexical content of the speech signal. The purpose of this study was to examine the effects of lexical and attentional factors on dichotic speech recognition performance characteristics for older adult listeners. A repeated measures design was used to examine differences in dichotic word recognition as a function of lexical and attentional factors. Thirty-five older adults (61-85 yr) with sensorineural hearing loss participated in this study. Dichotic speech recognition was evaluated using consonant-vowel-consonant (CVC) word and nonsense CVC syllable stimuli administered in the free recall, directed recall right, and directed recall left response conditions. Dichotic speech recognition performance for nonsense CVC syllables was significantly poorer than performance for CVC words. Dichotic recognition performance varied across response condition for both stimulus types, which is consistent with previous studies on dichotic speech recognition. Inspection of individual results revealed that five listeners demonstrated an auditory-based left ear deficit for one or both stimulus types. Lexical content of stimulus materials affects performance characteristics for dichotic speech recognition tasks in the older adult population. The use of nonsense CVC syllable material may provide a way to assess dichotic speech recognition performance while potentially lessening the effects of lexical content on performance (i.e., measuring bottom-up auditory function both with and without top-down processing). American Academy of Audiology.
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis
ERIC Educational Resources Information Center
Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.
2009-01-01
Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Blind estimation of reverberation time
NASA Astrophysics Data System (ADS)
Ratnam, Rama; Jones, Douglas L.; Wheeler, Bruce C.; O'Brien, William D.; Lansing, Charissa R.; Feng, Albert S.
2003-11-01
The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values.
Online estimation of room reverberation time
NASA Astrophysics Data System (ADS)
Ratnam, Rama; Jones, Douglas L.; Wheeler, Bruce C.; Feng, Albert S.
2003-04-01
The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. State-of-the-art signal processing algorithms for hearing aids are expected to have the ability to evaluate the characteristics of the listening environment and turn on an appropriate processing strategy accordingly. Thus, a method for the characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method or regression, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, we describe a method for estimating RT without prior knowledge of sound sources or room geometry. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values.
Doctors' voices in patients' narratives: coping with emotions in storytelling.
Lucius-Hoene, Gabriele; Thiele, Ulrike; Breuning, Martina; Haug, Stephanie
2012-09-01
To understand doctors' impacts on the emotional coping of patients, their stories about encounters with doctors are used. These accounts reflect meaning-making processes and biographically contextualized experiences. We investigate how patients characterize their doctors by voicing them in their stories, thus assigning them functions in their coping process. 394 narrated scenes with reported speech of doctors were extracted from interviews with 26 patients with type 2 diabetes and 30 with chronic pain. Constructed speech acts were investigated by means of positioning and narrative analysis, and assigned into thematic categories by a bottom-up coding procedure. Patients use narratives as coping strategies when confronted with illness and their encounters with doctors by constructing them in a supportive and face-saving way. In correspondence with the variance of illness conditions, differing moral problems in dealing with doctors arise. Different evaluative stances towards the same events within interviews show that positionings are not fixed, but vary according to contexts and purposes. Our narrative approach deepens the standardized and predominantly cognitive statements of questionnaires in research on doctor-patient relations by individualized emotional and biographical aspects of patients' perspective. Doctors should be trained to become aware of their impact in patients' coping processes.
Speech processing and production in two-year-old children acquiring isiXhosa: A tale of two children
Rossouw, Kate; Fish, Laura; Jansen, Charne; Manley, Natalie; Powell, Michelle; Rosen, Loren
2016-01-01
We investigated the speech processing and production of 2-year-old children acquiring isiXhosa in South Africa. Two children (2 years, 5 months; 2 years, 8 months) are presented as single cases. Speech input processing, stored phonological knowledge and speech output are described, based on data from auditory discrimination, naming, and repetition tasks. Both children were approximating adult levels of accuracy in their speech output, although naming was constrained by vocabulary. Performance across tasks was variable: One child showed a relative strength with repetition, and experienced most difficulties with auditory discrimination. The other performed equally well in naming and repetition, and obtained 100% for her auditory task. There is limited data regarding typical development of isiXhosa, and the focus has mainly been on speech production. This exploratory study describes typical development of isiXhosa using a variety of tasks understood within a psycholinguistic framework. We describe some ways in which speech and language therapists can devise and carry out assessment with children in situations where few formal assessments exist, and also detail the challenges of such work. PMID:27245131
Perceptual learning for speech in noise after application of binary time-frequency masks
Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.
2013-01-01
Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Early infant diet and ERP Correlates of Speech Stimuli Discrimination in 9 month old infants
USDA-ARS?s Scientific Manuscript database
Processing and discrimination of speech stimuli were examined during the initial period of weaning in infants enrolled in a longitudinal study of infant diet and development (the Beginnings Study). Event-related potential measures (ERP; 128 sites) were used to compare the processing of speech stimul...
Brainstem Transcription of Speech Is Disrupted in Children with Autism Spectrum Disorders
ERIC Educational Resources Information Center
Russo, Nicole; Nicol, Trent; Trommer, Barbara; Zecker, Steve; Kraus, Nina
2009-01-01
Language impairment is a hallmark of autism spectrum disorders (ASD). The origin of the deficit is poorly understood although deficiencies in auditory processing have been detected in both perception and cortical encoding of speech sounds. Little is known about the processing and transcription of speech sounds at earlier (brainstem) levels or…
Role of Visual Speech in Phonological Processing by Children with Hearing Loss
ERIC Educational Resources Information Center
Jerger, Susan; Tye-Murray, Nancy; Abdi, Herve
2009-01-01
Purpose: This research assessed the influence of visual speech on phonological processing by children with hearing loss (HL). Method: Children with HL and children with normal hearing (NH) named pictures while attempting to ignore auditory or audiovisual speech distractors whose onsets relative to the pictures were either congruent, conflicting in…
Hemispheric Differences in Processing Dichotic Meaningful and Non-Meaningful Words
ERIC Educational Resources Information Center
Yasin, Ifat
2007-01-01
Classic dichotic-listening paradigms reveal a right-ear advantage (REA) for speech sounds as compared to non-speech sounds. This REA is assumed to be associated with a left-hemisphere dominance for meaningful speech processing. This study objectively probed the relationship between ear advantage and hemispheric dominance in a dichotic-listening…
Application of an auditory model to speech recognition.
Cohen, J R
1989-06-01
Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.
The Functional Neuroanatomy of Prelexical Processing in Speech Perception
ERIC Educational Resources Information Center
Scott, Sophie K.; Wise, Richard J. S.
2004-01-01
In this paper we attempt to relate the prelexical processing of speech, with particular emphasis on functional neuroimaging studies, to the study of auditory perceptual systems by disciplines in the speech and hearing sciences. The elaboration of the sound-to-meaning pathways in the human brain enables their integration into models of the human…
ERIC Educational Resources Information Center
Gauthier, Gilles
2001-01-01
Focuses on the indirection process presented in Searle's and Vanderveken's theory of speech acts: the performance of a primary speech act by means of the accomplishment of a secondary speech act. Discusses indirection mechanisms used in advertising and in political communication. (Author/VWL)
Audiovisual Matching in Speech and Nonspeech Sounds: A Neurodynamical Model
ERIC Educational Resources Information Center
Loh, Marco; Schmid, Gabriele; Deco, Gustavo; Ziegler, Wolfram
2010-01-01
Audiovisual speech perception provides an opportunity to investigate the mechanisms underlying multimodal processing. By using nonspeech stimuli, it is possible to investigate the degree to which audiovisual processing is specific to the speech domain. It has been shown in a match-to-sample design that matching across modalities is more difficult…
Cason, Nia; Astésano, Corine; Schön, Daniele
2015-02-01
Following findings that musical rhythmic priming enhances subsequent speech perception, we investigated whether rhythmic priming for spoken sentences can enhance phonological processing - the building blocks of speech - and whether audio-motor training enhances this effect. Participants heard a metrical prime followed by a sentence (with a matching/mismatching prosodic structure), for which they performed a phoneme detection task. Behavioural (RT) data was collected from two groups: one who received audio-motor training, and one who did not. We hypothesised that 1) phonological processing would be enhanced in matching conditions, and 2) audio-motor training with the musical rhythms would enhance this effect. Indeed, providing a matching rhythmic prime context resulted in faster phoneme detection, thus revealing a cross-domain effect of musical rhythm on phonological processing. In addition, our results indicate that rhythmic audio-motor training enhances this priming effect. These results have important implications for rhythm-based speech therapies, and suggest that metrical rhythm in music and speech may rely on shared temporal processing brain resources. Copyright © 2015 Elsevier B.V. All rights reserved.
Liu, David; Zucherman, Mark; Tulloss, William B
2006-03-01
The reporting of radiological images is undergoing dramatic changes due to the introduction of two new technologies: structured reporting and speech recognition. Each technology has its own unique advantages. The highly organized content of structured reporting facilitates data mining and billing, whereas speech recognition offers a natural succession from the traditional dictation-transcription process. This article clarifies the distinction between the process and outcome of structured reporting, describes fundamental requirements for any effective structured reporting system, and describes the potential development of a novel, easy-to-use, customizable structured reporting system that incorporates speech recognition. This system should have all the advantages derived from structured reporting, accommodate a wide variety of user needs, and incorporate speech recognition as a natural component and extension of the overall reporting process.
Elevated depressive symptoms enhance reflexive but not reflective auditory category learning.
Maddox, W Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G
2014-09-01
In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Elevated Depressive Symptoms Enhance Reflexive but not Reflective Auditory Category Learning
Maddox, W. Todd; Chandrasekaran, Bharath; Smayda, Kirsten; Yi, Han-Gyol; Koslov, Seth; Beevers, Christopher G.
2014-01-01
In vision an extensive literature supports the existence of competitive dual-processing systems of category learning that are grounded in neuroscience and are partially-dissociable. The reflective system is prefrontally-mediated and uses working memory and executive attention to develop and test rules for classifying in an explicit fashion. The reflexive system is striatally-mediated and operates by implicitly associating perception with actions that lead to reinforcement. Although categorization is fundamental to auditory processing, little is known about the learning systems that mediate auditory categorization and even less is known about the effects of individual difference in the relative efficiency of the two learning systems. Previous studies have shown that individuals with elevated depressive symptoms show deficits in reflective processing. We exploit this finding to test critical predictions of the dual-learning systems model in audition. Specifically, we examine the extent to which the two systems are dissociable and competitive. We predicted that elevated depressive symptoms would lead to reflective-optimal learning deficits but reflexive-optimal learning advantages. Because natural speech category learning is reflexive in nature, we made the prediction that elevated depressive symptoms would lead to superior speech learning. In support of our predictions, individuals with elevated depressive symptoms showed a deficit in reflective-optimal auditory category learning, but an advantage in reflexive-optimal auditory category learning. In addition, individuals with elevated depressive symptoms showed an advantage in learning a non-native speech category structure. Computational modeling suggested that the elevated depressive symptom advantage was due to faster, more accurate, and more frequent use of reflexive category learning strategies in individuals with elevated depressive symptoms. The implications of this work for dual-process approach to auditory learning and depression are discussed. PMID:25041936
Zekveld, Adriana A; Rudner, Mary; Kramer, Sophia E; Lyzenga, Johannes; Rönnberg, Jerker
2014-01-01
We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data.
Zekveld, Adriana A.; Rudner, Mary; Kramer, Sophia E.; Lyzenga, Johannes; Rönnberg, Jerker
2014-01-01
We investigated changes in speech recognition and cognitive processing load due to the masking release attributable to decreasing similarity between target and masker speech. This was achieved by using masker voices with either the same (female) gender as the target speech or different gender (male) and/or by spatially separating the target and masker speech using HRTFs. We assessed the relation between the signal-to-noise ratio required for 50% sentence intelligibility, the pupil response and cognitive abilities. We hypothesized that the pupil response, a measure of cognitive processing load, would be larger for co-located maskers and for same-gender compared to different-gender maskers. We further expected that better cognitive abilities would be associated with better speech perception and larger pupil responses as the allocation of larger capacity may result in more intense mental processing. In line with previous studies, the performance benefit from different-gender compared to same-gender maskers was larger for co-located masker signals. The performance benefit of spatially-separated maskers was larger for same-gender maskers. The pupil response was larger for same-gender than for different-gender maskers, but was not reduced by spatial separation. We observed associations between better perception performance and better working memory, better information updating, and better executive abilities when applying no corrections for multiple comparisons. The pupil response was not associated with cognitive abilities. Thus, although both gender and location differences between target and masker facilitate speech perception, only gender differences lower cognitive processing load. Presenting a more dissimilar masker may facilitate target-masker separation at a later (cognitive) processing stage than increasing the spatial separation between the target and masker. The pupil response provides information about speech perception that complements intelligibility data. PMID:24808818
Perceived gender in clear and conversational speech
NASA Astrophysics Data System (ADS)
Booz, Jaime A.
Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated with making adjustments to improve speech clarity have a larger impact on perceived femininity than others. Using a clear speech strategy alone may not be sufficient for a male speaker to be perceived as female, but could be used as one of many tools to help speakers achieve more "feminine" speech, in conjunction with more specific strategies targeting the acoustic parameters outlined in this study.
Speech training alters consonant and vowel responses in multiple auditory cortex fields
Engineer, Crystal T.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Fink, Melyssa K.; Kilgard, Michael P.
2015-01-01
Speech sounds evoke unique neural activity patterns in primary auditory cortex (A1). Extensive speech sound discrimination training alters A1 responses. While the neighboring auditory cortical fields each contain information about speech sound identity, each field processes speech sounds differently. We hypothesized that while all fields would exhibit training-induced plasticity following speech training, there would be unique differences in how each field changes. In this study, rats were trained to discriminate speech sounds by consonant or vowel in quiet and in varying levels of background speech-shaped noise. Local field potential and multiunit responses were recorded from four auditory cortex fields in rats that had received 10 weeks of speech discrimination training. Our results reveal that training alters speech evoked responses in each of the auditory fields tested. The neural response to consonants was significantly stronger in anterior auditory field (AAF) and A1 following speech training. The neural response to vowels following speech training was significantly weaker in ventral auditory field (VAF) and posterior auditory field (PAF). This differential plasticity of consonant and vowel sound responses may result from the greater paired pulse depression, expanded low frequency tuning, reduced frequency selectivity, and lower tone thresholds, which occurred across the four auditory fields. These findings suggest that alterations in the distributed processing of behaviorally relevant sounds may contribute to robust speech discrimination. PMID:25827927
Prediction and constraint in audiovisual speech perception.
Peelle, Jonathan E; Sommers, Mitchell S
2015-07-01
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration, supported by distinct neuroanatomical mechanisms. Copyright © 2015 Elsevier Ltd. All rights reserved.
Nittrouer, Susan; Lowenstein, Joanna H
2007-02-01
It has been reported that children and adults weight differently the various acoustic properties of the speech signal that support phonetic decisions. This finding is generally attributed to the fact that the amount of weight assigned to various acoustic properties by adults varies across languages, and that children have not yet discovered the mature weighting strategies of their own native languages. But an alternative explanation exists: Perhaps children's auditory sensitivities for some acoustic properties of speech are poorer than those of adults, and children cannot categorize stimuli based on properties to which they are not keenly sensitive. The purpose of the current study was to test that hypothesis. Edited-natural, synthetic-formant, and sine wave stimuli were all used, and all were modeled after words with voiced and voiceless final stops. Adults and children (5 and 7 years of age) listened to pairs of stimuli in 5 conditions: 2 involving a temporal property (1 with speech and 1 with nonspeech stimuli) and 3 involving a spectral property (1 with speech and 2 with nonspeech stimuli). An AX discrimination task was used in which a standard stimulus (A) was compared with all other stimuli (X) equal numbers of times (method of constant stimuli). Adults and children had similar difference thresholds (i.e., 50% point on the discrimination function) for 2 of the 3 sets of nonspeech stimuli (1 temporal and 1 spectral), but children's thresholds were greater for both sets of speech stimuli. Results are interpreted as evidence that children's auditory sensitivities are adequate to support weighting strategies similar to those of adults, and so observed differences between children and adults in speech perception cannot be explained by differences in auditory perception. Furthermore, it is concluded that listeners bring expectations to the listening task about the nature of the signals they are hearing based on their experiences with those signals.
Lohvansuu, Kaisa; Hämäläinen, Jarmo A; Tanskanen, Annika; Ervast, Leena; Heikkinen, Elisa; Lyytinen, Heikki; Leppänen, Paavo H T
2014-12-01
Specific reading disability, dyslexia, is a prevalent and heritable disorder impairing reading acquisition characterized by a phonological deficit. However, the underlying mechanism of how the impaired phonological processing mediates resulting dyslexia or reading disabilities remains still unclear. Using ERPs we studied speech sound processing of 30 dyslexic children with familial risk for dyslexia, 51 typically reading children with familial risk for dyslexia, and 58 typically reading control children. We found enhanced brain responses to shortening of a phonemic length in pseudo-words (/at:a/ vs. /ata/) in dyslexic children with familial risk as compared to other groups. The enhanced brain responses were associated with better performance in behavioral phonemic length discrimination task, as well as with better reading and writing accuracy. Source analyses revealed that the brain responses of sub-group of dyslexic children with largest responses originated from a more posterior area of the right temporal cortex as compared to the responses of the other participants. This is the first electrophysiological evidence for a possible compensatory speech perception mechanism in dyslexia. The best readers within the dyslexic group have probably developed alternative strategies which employ compensatory mechanisms substituting their possible earlier deficit in phonological processing and might therefore be able to perform better in phonemic length discrimination and reading and writing accuracy tasks. However, we speculate that for reading fluency compensatory mechanisms are not that easily built and dyslexic children remain slow readers during their adult life. Copyright © 2014 Elsevier B.V. All rights reserved.
Speech processing in children with functional articulation disorders.
Gósy, Mária; Horváth, Viktória
2015-03-01
This study explored auditory speech processing and comprehension abilities in 5-8-year-old monolingual Hungarian children with functional articulation disorders (FADs) and their typically developing peers. Our main hypothesis was that children with FAD would show co-existing auditory speech processing disorders, with different levels of these skills depending on the nature of the receptive processes. The tasks included (i) sentence and non-word repetitions, (ii) non-word discrimination and (iii) sentence and story comprehension. Results suggest that the auditory speech processing of children with FAD is underdeveloped compared with that of typically developing children, and largely varies across task types. In addition, there are differences between children with FAD and controls in all age groups from 5 to 8 years. Our results have several clinical implications.
Kempe, Vera; Bublitz, Dennis; Brooks, Patricia J
2015-05-01
Is the observed link between musical ability and non-native speech-sound processing due to enhanced sensitivity to acoustic features underlying both musical and linguistic processing? To address this question, native English speakers (N = 118) discriminated Norwegian tonal contrasts and Norwegian vowels. Short tones differing in temporal, pitch, and spectral characteristics were used to measure sensitivity to the various acoustic features implicated in musical and speech processing. Musical ability was measured using Gordon's Advanced Measures of Musical Audiation. Results showed that sensitivity to specific acoustic features played a role in non-native speech-sound processing: Controlling for non-verbal intelligence, prior foreign language-learning experience, and sex, sensitivity to pitch and spectral information partially mediated the link between musical ability and discrimination of non-native vowels and lexical tones. The findings suggest that while sensitivity to certain acoustic features partially mediates the relationship between musical ability and non-native speech-sound processing, complex tests of musical ability also tap into other shared mechanisms. © 2014 The British Psychological Society.
NASA Astrophysics Data System (ADS)
Lynch, John T.
1987-02-01
The present technique for coping with fading and burst noise on HF channels used in digital voice communications transmits digital voice only during high S/N time intervals, and speeds up the speech when necessary to avoid conversation-hindering delays. On the basis of informal listening tests, four test conditions were selected in order to characterize those conditions of speech interruption which would render it comprehensible or incomprehensible. One of the test conditions, 2 secs on and 1/2-sec off, yielded test scores comparable to the reference continuous speech case and is a reasonable match to the temporal variations of a disturbed ionosphere.
Simulation of talking faces in the human brain improves auditory speech recognition
von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.
2008-01-01
Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Communicating by Language: The Speech Process.
ERIC Educational Resources Information Center
House, Arthur S., Ed.
This document reports on a conference focused on speech problems. The main objective of these discussions was to facilitate a deeper understanding of human communication through interaction of conference participants with colleagues in other disciplines. Topics discussed included speech production, feedback, speech perception, and development of…
Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.
Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe
2015-01-01
The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior to NH subjects despite a positive signal-to-noise ratio shift for both noise conditions, while demonstrating the synergistic effect for cutoff frequencies ≥300 Hz. One SDT model largely predicted the combined-stimulation results in continuous noise, while falling short of predicting performance observed in modulated noise. The presented simulation was able to demonstrate the combined-stimulation advantage for NH subjects as observed in EAS users. Only NH subjects tested with EAS simulations were able to take advantage of the gap listening effect, while CI and EAS user performance was consistently degraded in modulated noise compared with performance in continuous noise. The application of ASR systems seems feasible to assess the impact of different signal processing strategies on speech perception with CI and EAS simulations. In continuous noise, SDT models were largely able to predict the performance gain without assuming any synergistic effects, but model amendments are required to explain the gap listening effect in modulated noise.
Speech Databases of Typical Children and Children with SLI
Grill, Pavel; Tučková, Jana
2016-01-01
The extent of research on children’s speech in general and on disordered speech specifically is very limited. In this article, we describe the process of creating databases of children’s speech and the possibilities for using such databases, which have been created by the LANNA research group in the Faculty of Electrical Engineering at Czech Technical University in Prague. These databases have been principally compiled for medical research but also for use in other areas, such as linguistics. Two databases were recorded: one for healthy children’s speech (recorded in kindergarten and in the first level of elementary school) and the other for pathological speech of children with a Specific Language Impairment (recorded at a surgery of speech and language therapists and at the hospital). Both databases were sub-divided according to specific demands of medical research. Their utilization can be exoteric, specifically for linguistic research and pedagogical use as well as for studies of speech-signal processing. PMID:26963508
Chen, Yung-Yue
2018-05-08
Mobile devices are often used in our daily lives for the purposes of speech and communication. The speech quality of mobile devices is always degraded due to the environmental noises surrounding mobile device users. Regretfully, an effective background noise reduction solution cannot easily be developed for this speech enhancement problem. Due to these depicted reasons, a methodology is systematically proposed to eliminate the effects of background noises for the speech communication of mobile devices. This methodology integrates a dual microphone array with a background noise elimination algorithm. The proposed background noise elimination algorithm includes a whitening process, a speech modelling method and an H ₂ estimator. Due to the adoption of the dual microphone array, a low-cost design can be obtained for the speech enhancement of mobile devices. Practical tests have proven that this proposed method is immune to random background noises, and noiseless speech can be obtained after executing this denoise process.
Finke, Mareike; Sandmann, Pascale; Bönitz, Hanna; Kral, Andrej; Büchner, Andreas
2016-01-01
Single-sided deaf subjects with a cochlear implant (CI) provide the unique opportunity to compare central auditory processing of the electrical input (CI ear) and the acoustic input (normal-hearing, NH, ear) within the same individual. In these individuals, sensory processing differs between their two ears, while cognitive abilities are the same irrespectively of the sensory input. To better understand perceptual-cognitive factors modulating speech intelligibility with a CI, this electroencephalography study examined the central-auditory processing of words, the cognitive abilities, and the speech intelligibility in 10 postlingually single-sided deaf CI users. We found lower hit rates and prolonged response times for word classification during an oddball task for the CI ear when compared with the NH ear. Also, event-related potentials reflecting sensory (N1) and higher-order processing (N2/N4) were prolonged for word classification (targets versus nontargets) with the CI ear compared with the NH ear. Our results suggest that speech processing via the CI ear and the NH ear differs both at sensory (N1) and cognitive (N2/N4) processing stages, thereby affecting the behavioral performance for speech discrimination. These results provide objective evidence for cognition to be a key factor for speech perception under adverse listening conditions, such as the degraded speech signal provided from the CI. © 2016 S. Karger AG, Basel.
Is there a hearing aid for the thinking person?
Hafter, Ervin R
2010-10-01
The history of auditory prosthesis has generally concentrated on bottom-up processing, that is, on audibility. However, a growing interest in top-down processing has focused on correlations between success with a hearing aid and such higher order processing as the patient's intelligence, problem solving and language skills, and the perceived effort of day-to-day listening. Examination of two cases of cognitive effects in hearing that illustrate less-often-studied issues: (1) Individual subjects in a study use different listening strategies, a fact that, if not known to the experimenter, can lead to errors in interpretation; (2) A measure of shared attention can point to otherwise unknown functional effects of an algorithm used in hearing aids. In the two examples described above: (1) Patients with cochlear implants served in a study of the binaural precedence effect, that is, echo suppression. (2) Individuals identifying speech-in-noise benefit from noise reduction (NR) when the criterion was improved performance in simultaneous tests of verbal memory or visual reaction times. Studies of hearing impairment, either in the laboratory or in a fitting session, should include study of the complex stimuli that make up the natural environment, conditions where the thinking auditory brain adopts strategies for dealing with large amounts of input data. In addition to well-known factors that must be included in communication, such things as familiarity, syntax, and semantics, the work here shows that strategic listening can affect even how we deal with seemingly simpler requirements, localizing sounds in a reverberant auditory scene and listening for speech in noise when busy with other cognitive tasks. American Academy of Audiology.
DETECTION AND IDENTIFICATION OF SPEECH SOUNDS USING CORTICAL ACTIVITY PATTERNS
Centanni, T.M.; Sloan, A.M.; Reed, A.C.; Engineer, C.T.; Rennaker, R.; Kilgard, M.P.
2014-01-01
We have developed a classifier capable of locating and identifying speech sounds using activity from rat auditory cortex with an accuracy equivalent to behavioral performance without the need to specify the onset time of the speech sounds. This classifier can identify speech sounds from a large speech set within 40 ms of stimulus presentation. To compare the temporal limits of the classifier to behavior, we developed a novel task that requires rats to identify individual consonant sounds from a stream of distracter consonants. The classifier successfully predicted the ability of rats to accurately identify speech sounds for syllable presentation rates up to 10 syllables per second (up to 17.9 ± 1.5 bits/sec), which is comparable to human performance. Our results demonstrate that the spatiotemporal patterns generated in primary auditory cortex can be used to quickly and accurately identify consonant sounds from a continuous speech stream without prior knowledge of the stimulus onset times. Improved understanding of the neural mechanisms that support robust speech processing in difficult listening conditions could improve the identification and treatment of a variety of speech processing disorders. PMID:24286757
Miao, Melissa; Power, Emma; O'Halloran, Robyn
2015-01-01
Although clinical practice guidelines can facilitate evidence-based practice and improve the health outcomes of stroke patients, they continue to be underutilised. There is limited research into the reasons for this, especially in speech pathology. This study provides the first in-depth, qualitative examination of the barriers and facilitators that speech pathologists perceive and experience when implementing guidelines. A maximum variation sample of eight speech pathologists participated in a semi-structured interview concerning the implementation of the National Stroke Foundation's Clinical Guidelines for Stroke Management 2010. Interviews were transcribed, thematically analysed and member checked before overall themes were identified. Three main themes and ten subthemes were identified. The first main theme, making implementation explicit, reflected the necessity of accessing and understanding guideline recommendations, and focussing specifically on implementation in context. In the second theme, demand versus ability to change, the size of changes required was compared with available resources and collaboration. The final theme, Speech pathologist motivation to implement guidelines, demonstrated the influence of individual perception of the guidelines and personal commitment to improved practice. Factors affecting implementation are complex, and are not exclusively barriers or facilitators. Some potential implementation strategies are suggested. Further research is recommended. In most Western nations, stroke remains the single greatest cause of disability, including communication and swallowing disabilities. Although adherence to stroke clinical practice guidelines improves stroke patient outcomes, guidelines continue to be underutilised, and the reasons for this are not well understood. This is the first in-depth qualitative study identifying the complex barriers and facilitators to guideline implementation as experienced by speech pathologists in stroke care. Suggested implementation strategies include local monitoring of guideline implementation (e.g. team meetings, audits), increasing collaboration on implementation projects (e.g. managerial involvement, networking), and seeking speech pathologist input into guideline development.
Stekelenburg, Jeroen J; Keetels, Mirjam; Vroomen, Jean
2018-05-01
Numerous studies have demonstrated that the vision of lip movements can alter the perception of auditory speech syllables (McGurk effect). While there is ample evidence for integration of text and auditory speech, there are only a few studies on the orthographic equivalent of the McGurk effect. Here, we examined whether written text, like visual speech, can induce an illusory change in the perception of speech sounds on both the behavioural and neural levels. In a sound categorization task, we found that both text and visual speech changed the identity of speech sounds from an /aba/-/ada/ continuum, but the size of this audiovisual effect was considerably smaller for text than visual speech. To examine at which level in the information processing hierarchy these multisensory interactions occur, we recorded electroencephalography in an audiovisual mismatch negativity (MMN, a component of the event-related potential reflecting preattentive auditory change detection) paradigm in which deviant text or visual speech was used to induce an illusory change in a sequence of ambiguous sounds halfway between /aba/ and /ada/. We found that only deviant visual speech induced an MMN, but not deviant text, which induced a late P3-like positive potential. These results demonstrate that text has much weaker effects on sound processing than visual speech does, possibly because text has different biological roots than visual speech. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Centanni, Tracy M.; Chen, Fuyi; Booker, Anne M.; Engineer, Crystal T.; Sloan, Andrew M.; Rennaker, Robert L.; LoTurco, Joseph J.; Kilgard, Michael P.
2014-01-01
In utero RNAi of the dyslexia-associated gene Kiaa0319 in rats (KIA-) degrades cortical responses to speech sounds and increases trial-by-trial variability in onset latency. We tested the hypothesis that KIA- rats would be impaired at speech sound discrimination. KIA- rats needed twice as much training in quiet conditions to perform at control levels and remained impaired at several speech tasks. Focused training using truncated speech sounds was able to normalize speech discrimination in quiet and background noise conditions. Training also normalized trial-by-trial neural variability and temporal phase locking. Cortical activity from speech trained KIA- rats was sufficient to accurately discriminate between similar consonant sounds. These results provide the first direct evidence that assumed reduced expression of the dyslexia-associated gene KIAA0319 can cause phoneme processing impairments similar to those seen in dyslexia and that intensive behavioral therapy can eliminate these impairments. PMID:24871331
Discharge experiences of speech-language pathologists working in Cyprus and Greece.
Kambanaros, Maria
2010-08-01
Post-termination relationships are complex because the client may need additional services and it may be difficult to determine when the speech-language pathologist-client relationship is truly terminated. In my contribution to this scientific forum, discharge experiences from speech-language pathologists working in Cyprus and Greece will be explored in search of commonalities and differences in the way in which pathologists end therapy from different cultural perspectives. Within this context the personal impact on speech-language pathologists of the discharge process will be highlighted. Inherent in this process is how speech-language pathologists learn to hold their feelings, anxieties and reactions when communicating discharge to clients. Overall speech-language pathologists working in Cyprus and Greece experience similar emotional responses to positive and negative therapy endings as speech-language pathologists working in Australia. The major difference is that Cypriot and Greek therapists face serious limitations in moving their clients on after therapy has ended.
Rate and onset cues can improve cochlear implant synthetic vowel recognition in noise
Mc Laughlin, Myles; Reilly, Richard B.; Zeng, Fan-Gang
2013-01-01
Understanding speech-in-noise is difficult for most cochlear implant (CI) users. Speech-in-noise segregation cues are well understood for acoustic hearing but not for electric hearing. This study investigated the effects of stimulation rate and onset delay on synthetic vowel-in-noise recognition in CI subjects. In experiment I, synthetic vowels were presented at 50, 145, or 795 pulse/s and noise at the same three rates, yielding nine combinations. Recognition improved significantly if the noise had a lower rate than the vowel, suggesting that listeners can use temporal gaps in the noise to detect a synthetic vowel. This hypothesis is supported by accurate prediction of synthetic vowel recognition using a temporal integration window model. Using lower rates a similar trend was observed in normal hearing subjects. Experiment II found that for CI subjects, a vowel onset delay improved performance if the noise had a lower or higher rate than the synthetic vowel. These results show that differing rates or onset times can improve synthetic vowel-in-noise recognition, indicating a need to develop speech processing strategies that encode or emphasize these cues. PMID:23464025
Multivariate predictors of music perception and appraisal by adult cochlear implant users.
Gfeller, Kate; Oleson, Jacob; Knutson, John F; Breheny, Patrick; Driscoll, Virginia; Olszewski, Carol
2008-02-01
The research examined whether performance by adult cochlear implant recipients on a variety of recognition and appraisal tests derived from real-world music could be predicted from technological, demographic, and life experience variables, as well as speech recognition scores. A representative sample of 209 adults implanted between 1985 and 2006 participated. Using multiple linear regression models and generalized linear mixed models, sets of optimal predictor variables were selected that effectively predicted performance on a test battery that assessed different aspects of music listening. These analyses established the importance of distinguishing between the accuracy of music perception and the appraisal of musical stimuli when using music listening as an index of implant success. Importantly, neither device type nor processing strategy predicted music perception or music appraisal. Speech recognition performance was not a strong predictor of music perception, and primarily predicted music perception when the test stimuli included lyrics. Additionally, limitations in the utility of speech perception in predicting musical perception and appraisal underscore the utility of music perception as an alternative outcome measure for evaluating implant outcomes. Music listening background, residual hearing (i.e., hearing aid use), cognitive factors, and some demographic factors predicted several indices of perceptual accuracy or appraisal of music.
The development of the Nucleus Freedom Cochlear implant system.
Patrick, James F; Busby, Peter A; Gibson, Peter J
2006-12-01
Cochlear Limited (Cochlear) released the fourth-generation cochlear implant system, Nucleus Freedom, in 2005. Freedom is based on 25 years of experience in cochlear implant research and development and incorporates advances in medicine, implantable materials, electronic technology, and sound coding. This article presents the development of Cochlear's implant systems, with an overview of the first 3 generations, and details of the Freedom system: the CI24RE receiver-stimulator, the Contour Advance electrode, the modular Freedom processor, the available speech coding strategies, the input processing options of Smart Sound to improve the signal before coding as electrical signals, and the programming software. Preliminary results from multicenter studies with the Freedom system are reported, demonstrating better levels of performance compared with the previous systems. The final section presents the most recent implant reliability data, with the early findings at 18 months showing improved reliability of the Freedom implant compared with the earlier Nucleus 3 System. Also reported are some of the findings of Cochlear's collaborative research programs to improve recipient outcomes. Included are studies showing the benefits from bilateral implants, electroacoustic stimulation using an ipsilateral and/or contralateral hearing aid, advanced speech coding, and streamlined speech processor programming.
ERIC Educational Resources Information Center
Marks, William J.; Jones, W. Paul; Loe, Scott A.
2013-01-01
This study investigated the use of compressed speech as a modality for assessment of the simultaneous processing function for participants with visual impairment. A 24-item compressed speech test was created using a sound editing program to randomly remove sound elements from aural stimuli, holding pitch constant, with the objective to emulate the…
ERIC Educational Resources Information Center
Golumbic, Elana M. Zion; Poeppel, David; Schroeder, Charles E.
2012-01-01
The human capacity for processing speech is remarkable, especially given that information in speech unfolds over multiple time scales concurrently. Similarly notable is our ability to filter out of extraneous sounds and focus our attention on one conversation, epitomized by the "Cocktail Party" effect. Yet, the neural mechanisms underlying on-line…
ERIC Educational Resources Information Center
Shenker, Rosalee C.
2006-01-01
Background: There will always be a place for stuttering treatments designed to eliminate or reduce stuttered speech. When those treatments are required, direct speech measures of treatment process and outcome are needed in clinical practice. Aims: Based on the contents of published clinical trials of such treatments, three "core" measures of…
ERIC Educational Resources Information Center
Brown-Schmidt, Sarah; Konopka, Agnieszka E.
2008-01-01
During unscripted speech, speakers coordinate the formulation of pre-linguistic messages with the linguistic processes that implement those messages into speech. We examine the process of constructing a contextually appropriate message and interfacing that message with utterance planning in English ("the small butterfly") and Spanish ("la mariposa…
Teaching Turkish as a Foreign Language: Extrapolating from Experimental Psychology
ERIC Educational Resources Information Center
Erdener, Dogu
2017-01-01
Speech perception is beyond the auditory domain and a multimodal process, specifically, an auditory-visual one--we process lip and face movements during speech. In this paper, the findings in cross-language studies of auditory-visual speech perception in the past two decades are interpreted to the applied domain of second language (L2)…
ERIC Educational Resources Information Center
Fukuda, Makiko
2014-01-01
The present study revealed the dynamic process of speech development in a domestic immersion program by seven adult beginning learners of Japanese. The speech data were analyzed with fluency, accuracy, and complexity measurements at group, interindividual, and intraindividual levels. The results revealed the complex nature of language development…
Lessons Learned in Part-of-Speech Tagging of Conversational Speech
2010-10-01
for conversational speech recognition. In Plenary Meeting and Symposium on Prosody and Speech Processing. Slav Petrov and Dan Klein. 2007. Improved...inference for unlexicalized parsing. In HLT-NAACL. Slav Petrov. 2010. Products of random latent variable grammars. In HLT-NAACL. Brian Roark, Yang Liu
A Dynamically Focusing Cochlear Implant Strategy Can Improve Vowel Identification in Noise.
Arenberg, Julie G; Parkinson, Wendy S; Litvak, Leonid; Chen, Chen; Kreft, Heather A; Oxenham, Andrew J
2018-03-09
The standard, monopolar (MP) electrode configuration used in commercially available cochlear implants (CI) creates a broad electrical field, which can lead to unwanted channel interactions. Use of more focused configurations, such as tripolar and phased array, has led to mixed results for improving speech understanding. The purpose of the present study was to assess the efficacy of a physiologically inspired configuration called dynamic focusing, using focused tripolar stimulation at low levels and less focused stimulation at high levels. Dynamic focusing may better mimic cochlear excitation patterns in normal acoustic hearing, while reducing the current levels necessary to achieve sufficient loudness at high levels. Twenty postlingually deafened adult CI users participated in the study. Speech perception was assessed in quiet and in a four-talker babble background noise. Speech stimuli were closed-set spondees in noise, and medial vowels at 50 and 60 dB SPL in quiet and in noise. The signal to noise ratio was adjusted individually such that performance was between 40 and 60% correct with the MP strategy. Subjects were fitted with three experimental strategies matched for pulse duration, pulse rate, filter settings, and loudness on a channel-by-channel basis. The strategies included 14 channels programmed in MP, fixed partial tripolar (σ = 0.8), and dynamic partial tripolar (σ at 0.8 at threshold and 0.5 at the most comfortable level). Fifteen minutes of listening experience was provided with each strategy before testing. Sound quality ratings were also obtained. Speech perception performance for vowel identification in quiet at 50 and 60 dB SPL and for spondees in noise was similar for the three tested strategies. However, performance on vowel identification in noise was significantly better for listeners using the dynamic focusing strategy. Sound quality ratings were similar for the three strategies. Some subjects obtained more benefit than others, with some individual differences explained by the relation between loudness growth and the rate of change from focused to broader stimulation. These initial results suggest that further exploration of dynamic focusing is warranted. Specifically, optimizing such strategies on an individual basis may lead to improvements in speech perception for more adult listeners and improve how CIs are tailored. Some listeners may also need a longer period of time to acclimate to a new program.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
New developments in the management of speech and language disorders.
Harding, Celia; Gourlay, Sara
2008-05-01
Speech and language disorders, which include swallowing difficulties, are usually managed by speech and language therapists. Such a diverse, complex and challenging clinical group of symptoms requires practitioners with detailed knowledge and understanding of research within those areas, as well as the ability to implement appropriate therapy strategies within many environments. These environments range from neonatal units, acute paediatric wards and health centres through to nurseries, schools and children's homes. This paper summarises the key issues that are fundamental to our understanding of this client group.
Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina
2014-11-15
Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.
Gifford, René H.; Revit, Lawrence J.
2014-01-01
Background Although cochlear implant patients are achieving increasingly higher levels of performance, speech perception in noise continues to be problematic. The newest generations of implant speech processors are equipped with preprocessing and/or external accessories that are purported to improve listening in noise. Most speech perception measures in the clinical setting, however, do not provide a close approximation to real-world listening environments. Purpose To assess speech perception for adult cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE™) array in order to determine whether commercially available preprocessing strategies and/or external accessories yield improved sentence recognition in noise. Research Design Single-subject, repeated-measures design with two groups of participants: Advanced Bionics and Cochlear Corporation recipients. Study Sample Thirty-four subjects, ranging in age from 18 to 90 yr (mean 54.5 yr), participated in this prospective study. Fourteen subjects were Advanced Bionics recipients, and 20 subjects were Cochlear Corporation recipients. Intervention Speech reception thresholds (SRTs) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the subjects’ preferred listening programs as well as with the addition of either Beam™ preprocessing (Cochlear Corporation) or the T-Mic® accessory option (Advanced Bionics). Data Collection and Analysis In Experiment 1, adaptive SRTs with the Hearing in Noise Test sentences were obtained for all 34 subjects. For Cochlear Corporation recipients, SRTs were obtained with their preferred everyday listening program as well as with the addition of Focus preprocessing. For Advanced Bionics recipients, SRTs were obtained with the integrated behind-the-ear (BTE) mic as well as with the T-Mic. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the preprocessing strategy or external accessory in reducing the SRT in noise. In addition, a standard t-test was run to evaluate effectiveness across manufacturer for improving the SRT in noise. In Experiment 2, 16 of the 20 Cochlear Corporation subjects were reassessed obtaining an SRT in noise using the manufacturer-suggested “Everyday,” “Noise,” and “Focus” preprocessing strategies. A repeated-measures ANOVA was employed to assess the effects of preprocessing. Results The primary findings were (i) both Noise and Focus preprocessing strategies (Cochlear Corporation) significantly improved the SRT in noise as compared to Everyday preprocessing, (ii) the T-Mic accessory option (Advanced Bionics) significantly improved the SRT as compared to the BTE mic, and (iii) Focus preprocessing and the T-Mic resulted in similar degrees of improvement that were not found to be significantly different from one another. Conclusion Options available in current cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise with both Cochlear Corporation and Advanced Bionics systems. For Cochlear Corporation recipients, Focus preprocessing yields the best speech-recognition performance in a complex listening environment; however, it is recommended that Noise preprocessing be used as the new default for everyday listening environments to avoid the need for switching programs throughout the day. For Advanced Bionics recipients, the T-Mic offers significantly improved performance in noise and is recommended for everyday use in all listening environments. PMID:20807480
Finch, Emma; Cameron, Ashley; Fleming, Jennifer; Lethlean, Jennifer; Hudson, Kyla; McPhail, Steven
2017-07-01
Aphasia is a common consequence of stroke. Despite receiving specialised training in communication, speech-language pathology students may lack confidence when communicating with People with Aphasia (PWA). This paper reports data from secondary outcome measures from a randomised controlled trial. The aim of the current study was to examine the effects of communication partner training on the communication skills of speech-language pathology students during conversations with PWA. Thirty-eight speech-language pathology students were randomly allocated to trained and untrained groups. The first group received a lecture about communication strategies for communicating with PWA then participated in a conversation with PWA (Trained group), while the second group of students participated in a conversation with the PWA without receiving the lecture (Untrained group). The conversations between the groups were analysed according to the Measure of skill in Supported Conversation (MSC) scales, Measure of Participation in Conversation (MPC) scales, types of strategies used in conversation, and the occurrence and repair of conversation breakdowns. The trained group received significantly higher MSC Revealing Competence scores, used significantly more props, and introduced significantly more new ideas into the conversation than the untrained group. The trained group also used more gesture and writing to facilitate the conversation, however, the difference was not significant. There was no significant difference between the groups according to MSC Acknowledging Competence scores, MPC Interaction or Transaction scores, or in the number of interruptions, minor or major conversation breakdowns, or in the success of strategies initiated to repair the conversation breakdowns. Speech-language pathology students may benefit from participation in communication partner training programs. Copyright © 2017 Elsevier Inc. All rights reserved.
Fava, Eswen; Hull, Rachel; Bortfeld, Heather
2014-01-01
Initially, infants are capable of discriminating phonetic contrasts across the world’s languages. Starting between seven and ten months of age, they gradually lose this ability through a process of perceptual narrowing. Although traditionally investigated with isolated speech sounds, such narrowing occurs in a variety of perceptual domains (e.g., faces, visual speech). Thus far, tracking the developmental trajectory of this tuning process has been focused primarily on auditory speech alone, and generally using isolated sounds. But infants learn from speech produced by people talking to them, meaning they learn from a complex audiovisual signal. Here, we use near-infrared spectroscopy to measure blood concentration changes in the bilateral temporal cortices of infants in three different age groups: 3-to-6 months, 7-to-10 months, and 11-to-14-months. Critically, all three groups of infants were tested with continuous audiovisual speech in both their native and another, unfamiliar language. We found that at each age range, infants showed different patterns of cortical activity in response to the native and non-native stimuli. Infants in the youngest group showed bilateral cortical activity that was greater overall in response to non-native relative to native speech; the oldest group showed left lateralized activity in response to native relative to non-native speech. These results highlight perceptual tuning as a dynamic process that happens across modalities and at different levels of stimulus complexity. PMID:25116572
Chen, Xizhuo; Zhao, Yanxin; Zhong, Suyu; Cui, Zaixu; Li, Jiaqi; Gong, Gaolang; Dong, Qi; Nan, Yun
2018-05-01
The arcuate fasciculus (AF) is a neural fiber tract that is critical to speech and music development. Although the predominant role of the left AF in speech development is relatively clear, how the AF engages in music development is not understood. Congenital amusia is a special neurodevelopmental condition, which not only affects musical pitch but also speech tone processing. Using diffusion tensor tractography, we aimed at understanding the role of AF in music and speech processing by examining the neural connectivity characteristics of the bilateral AF among thirty Mandarin amusics. Compared to age- and intelligence quotient (IQ)-matched controls, amusics demonstrated increased connectivity as reflected by the increased fractional anisotropy in the right posterior AF but decreased connectivity as reflected by the decreased volume in the right anterior AF. Moreover, greater fractional anisotropy in the left direct AF was correlated with worse performance in speech tone perception among amusics. This study is the first to examine the neural connectivity of AF in the neurodevelopmental condition of amusia as a result of disrupted music pitch and speech tone processing. We found abnormal white matter structural connectivity in the right AF for the amusic individuals. Moreover, we demonstrated that the white matter microstructural properties of the left direct AF is modulated by lexical tone deficits among the amusic individuals. These data support the notion of distinctive pitch processing systems between music and speech.
The functional neuroanatomy of language
NASA Astrophysics Data System (ADS)
Hickok, Gregory
2009-09-01
There has been substantial progress over the last several years in understanding aspects of the functional neuroanatomy of language. Some of these advances are summarized in this review. It will be argued that recognizing speech sounds is carried out in the superior temporal lobe bilaterally, that the superior temporal sulcus bilaterally is involved in phonological-level aspects of this process, that the frontal/motor system is not central to speech recognition although it may modulate auditory perception of speech, that conceptual access mechanisms are likely located in the lateral posterior temporal lobe (middle and inferior temporal gyri), that speech production involves sensory-related systems in the posterior superior temporal lobe in the left hemisphere, that the interface between perceptual and motor systems is supported by a sensory-motor circuit for vocal tract actions (not dedicated to speech) that is very similar to sensory-motor circuits found in primate parietal lobe, and that verbal short-term memory can be understood as an emergent property of this sensory-motor circuit. These observations are considered within the context of a dual stream model of speech processing in which one pathway supports speech comprehension and the other supports sensory-motor integration. Additional topics of discussion include the functional organization of the planum temporale for spatial hearing and speech-related sensory-motor processes, the anatomical and functional basis of a form of acquired language disorder, conduction aphasia, the neural basis of vocabulary development, and sentence-level/grammatical processing.
Children's Acoustic and Linguistic Adaptations to Peers with Hearing Impairment
ERIC Educational Resources Information Center
Granlund, Sonia; Hazan, Valerie; Mahon, Merle
2018-01-01
Purpose: This study aims to examine the clear speaking strategies used by older children when interacting with a peer with hearing loss, focusing on both acoustic and linguistic adaptations in speech. Method: The Grid task, a problem-solving task developed to elicit spontaneous interactive speech, was used to obtain a range of global acoustic and…
Enhancing Listener Strategies Using a Payoff Matrix in Speech-on-speech Masking Experiments
2015-09-03
five of the eighteen listeners would have achieved the 1200 point goal, had it been in place for experiment 1. Seven out of the ten listeners in...listeners would revert back to reporting more masker words at negative TMRs in the ab- sence of specific instructions and incentives. The listeners could have
A Controlled Clinical Trial for Stuttering in Persons Aged 9 to 14 Years.
ERIC Educational Resources Information Center
Craig, Ashley; And Others
1996-01-01
This paper presents results of a controlled trial of 3 child stuttering treatment strategies in 97 subjects. All 3 treatments (electromyography feedback, intensive smooth speech, and home-based smooth speech) were very successful in the long term for 70% of the group, with electromyography and home-based treatment appearing to be especially…
ERIC Educational Resources Information Center
Cox, Amy Swartz; Clark, Denise M.; Skoning, Stacey N.; Wegner, Theresa M.; Muwana, Florence C.
2015-01-01
This study examined the effects of using sensory, augmentative, and alternative communication (AAC), and supportive communication strategies on the rate and type of communication used by three students with severe speech and motor impairments (SSMI). Using a multiple baseline across behaviour design with sensory and AAC intervention phases,…
ERIC Educational Resources Information Center
Higgins, Maureen B.; And Others
1996-01-01
A study of four children with deafness who had cochlear implants investigated the use of negative intraoral air pressure in articulation, from both the physiological and phonological perspectives. The study showed that the children used speech-production strategies that were different from hearing children and that deviant speech behaviors should…
Degraded neural and behavioral processing of speech sounds in a rat model of Rett syndrome
Engineer, Crystal T.; Rahebi, Kimiya C.; Borland, Michael S.; Buell, Elizabeth P.; Centanni, Tracy M.; Fink, Melyssa K.; Im, Kwok W.; Wilson, Linda G.; Kilgard, Michael P.
2015-01-01
Individuals with Rett syndrome have greatly impaired speech and language abilities. Auditory brainstem responses to sounds are normal, but cortical responses are highly abnormal. In this study, we used the novel rat Mecp2 knockout model of Rett syndrome to document the neural and behavioral processing of speech sounds. We hypothesized that both speech discrimination ability and the neural response to speech sounds would be impaired in Mecp2 rats. We expected that extensive speech training would improve speech discrimination ability and the cortical response to speech sounds. Our results reveal that speech responses across all four auditory cortex fields of Mecp2 rats were hyperexcitable, responded slower, and were less able to follow rapidly presented sounds. While Mecp2 rats could accurately perform consonant and vowel discrimination tasks in quiet, they were significantly impaired at speech sound discrimination in background noise. Extensive speech training improved discrimination ability. Training shifted cortical responses in both Mecp2 and control rats to favor the onset of speech sounds. While training increased the response to low frequency sounds in control rats, the opposite occurred in Mecp2 rats. Although neural coding and plasticity are abnormal in the rat model of Rett syndrome, extensive therapy appears to be effective. These findings may help to explain some aspects of communication deficits in Rett syndrome and suggest that extensive rehabilitation therapy might prove beneficial. PMID:26321676
The effect of hearing aid technologies on listening in an automobile
Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A.; Stanziola, Rachel W.
2014-01-01
Background Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile /road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the driver/listener, conventional directional processing that places the directivity beam toward the listener’s front may not be helpful, and in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). Purpose The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener’s speech recognition performance and preference for communication in a traveling automobile. Research design A single-blinded, repeated-measures design was used. Study Sample Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 years participated in the study. Data Collection and Analysis The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 miles/hour on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound treated booth to assess speech recognition performance and preference with each programmed condition. Results Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants’ preferences for a given processing scheme were generally consistent with speech recognition results. Conclusions The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. PMID:23886425
Civier, Oren; Tasko, Stephen M.; Guenther, Frank H.
2010-01-01
This paper investigates the hypothesis that stuttering may result in part from impaired readout of feedforward control of speech, which forces persons who stutter (PWS) to produce speech with a motor strategy that is weighted too much toward auditory feedback control. Over-reliance on feedback control leads to production errors which, if they grow large enough, can cause the motor system to “reset” and repeat the current syllable. This hypothesis is investigated using computer simulations of a “neurally impaired” version of the DIVA model, a neural network model of speech acquisition and production. The model’s outputs are compared to published acoustic data from PWS’ fluent speech, and to combined acoustic and articulatory movement data collected from the dysfluent speech of one PWS. The simulations mimic the errors observed in the PWS subject’s speech, as well as the repairs of these errors. Additional simulations were able to account for enhancements of fluency gained by slowed/prolonged speech and masking noise. Together these results support the hypothesis that many dysfluencies in stuttering are due to a bias away from feedforward control and toward feedback control. PMID:20831971
NASA Astrophysics Data System (ADS)
Hargus Ferguson, Sarah; Kewley-Port, Diane
2002-05-01
Several studies have shown that when a talker is instructed to speak as though talking to a hearing-impaired person, the resulting ``clear'' speech is significantly more intelligible than typical conversational speech. Recent work in this lab suggests that talkers vary in how much their intelligibility improves when they are instructed to speak clearly. The few studies examining acoustic characteristics of clear and conversational speech suggest that these differing clear speech effects result from different acoustic strategies on the part of individual talkers. However, only two studies to date have directly examined differences among talkers producing clear versus conversational speech, and neither included acoustic analysis. In this project, clear and conversational speech was recorded from 41 male and female talkers aged 18-45 years. A listening experiment demonstrated that for normal-hearing listeners in noise, vowel intelligibility varied widely among the 41 talkers for both speaking styles, as did the magnitude of the speaking style effect. Acoustic analyses using stimuli from a subgroup of talkers shown to have a range of speaking style effects will be used to assess specific acoustic correlates of vowel intelligibility in clear and conversational speech. [Work supported by NIHDCD-02229.
Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J
2018-01-01
Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.
Normal Aspects of Speech, Hearing, and Language.
ERIC Educational Resources Information Center
Minifie, Fred. D., Ed.; And Others
This book is written as a guide to the understanding of the processes involved in human speech communication. Ten authorities contributed material to provide an introduction to the physiological aspects of speech production and reception, the acoustical aspects of speech production and transmission, the psychophysics of sound reception, the nature…
Hemispheric Differences in the Effects of Context on Vowel Perception
ERIC Educational Resources Information Center
Sjerps, Matthias J.; Mitterer, Holger; McQueen, James M.
2012-01-01
Listeners perceive speech sounds relative to context. Contextual influences might differ over hemispheres if different types of auditory processing are lateralized. Hemispheric differences in contextual influences on vowel perception were investigated by presenting speech targets and both speech and non-speech contexts to listeners' right or left…
Inferring Speaker Affect in Spoken Natural Language Communication
ERIC Educational Resources Information Center
Pon-Barry, Heather Roberta
2013-01-01
The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…
Perceptual Learning of Noise Vocoded Words: Effects of Feedback and Lexicality
ERIC Educational Resources Information Center
Hervais-Adelman, Alexis; Davis, Matthew H.; Johnsrude, Ingrid S.; Carlyon, Robert P.
2008-01-01
Speech comprehension is resistant to acoustic distortion in the input, reflecting listeners' ability to adjust perceptual processes to match the speech input. This adjustment is reflected in improved comprehension of distorted speech with experience. For noise vocoding, a manipulation that removes spectral detail from speech, listeners' word…
PETER, BEATE; BUTTON, LE; STOEL-GAMMON, CAROL; CHAPMAN, KATHY; RASKIND, WENDY H.
2013-01-01
The purpose of this study was to evaluate a global deficit in sequential processing as candidate endophenotypein a family with familial childhood apraxia of speech (CAS). Of 10 adults and 13 children in a three-generational family with speech sound disorder (SSD) consistent with CAS, 3 adults and 6 children had past or present SSD diagnoses. Two preschoolers with unremediated CAS showed a high number of sequencing errors during single-word production. Performance on tasks with high sequential processing loads differentiated between the affected and unaffected family members, whereas there were no group differences in tasks with low processing loads. Adults with a history of SSD produced more sequencing errors during nonword and multisyllabic real word imitation, compared to those without such a history. Results are consistent with a global deficit in sequential processing that influences speech development as well as cognitive and linguistic processing. PMID:23339324
The Bilingual Language Interaction Network for Comprehension of Speech*
Marian, Viorica
2013-01-01
During speech comprehension, bilinguals co-activate both of their languages, resulting in cross-linguistic interaction at various levels of processing. This interaction has important consequences for both the structure of the language system and the mechanisms by which the system processes spoken language. Using computational modeling, we can examine how cross-linguistic interaction affects language processing in a controlled, simulated environment. Here we present a connectionist model of bilingual language processing, the Bilingual Language Interaction Network for Comprehension of Speech (BLINCS), wherein interconnected levels of processing are created using dynamic, self-organizing maps. BLINCS can account for a variety of psycholinguistic phenomena, including cross-linguistic interaction at and across multiple levels of processing, cognate facilitation effects, and audio-visual integration during speech comprehension. The model also provides a way to separate two languages without requiring a global language-identification system. We conclude that BLINCS serves as a promising new model of bilingual spoken language comprehension. PMID:24363602
Kjellberg, A
2004-01-01
The paper presents a theoretical analysis of possible effects of reverberation time on the cognitive load in speech communication. Speech comprehension requires not only phonological processing of the spoken words. Simultaneously, this information must be further processed and stored. All this processing takes place in the working memory, which has a limited processing capacity. The more resources that are allocated to word identification, the fewer resources are therefore left for the further processing and storing of the information. Reverberation conditions that allow the identification of almost all words may therefore still interfere with speech comprehension and memory storing. These problems are likely to be especially serious in situations where speech has to be followed continuously for a long time. An unfavourable reverberation time (RT) then could contribute to the development of cognitive fatigue, which means that working memory resources are gradually reduced. RT may also affect the cognitive load in two other ways: RT may change the distracting effects of a sound and a person's mood. Both effects could influence the cognitive load of a listener. It is argued that we need studies of RT effects in realistic long-lasting listening situations to better understand the effect of RT on speech communication. Furthermore, the effect of RT on distraction and mood need to be better understood.
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.
Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V
2012-01-01
Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
Current Policies and New Directions for Speech-Language Pathology Assistants.
Paul-Brown, Diane; Goldberg, Lynette R
2001-01-01
This article provides an overview of current American Speech-Language-Hearing Association (ASHA) policies for the appropriate use and supervision of speech-language pathology assistants with an emphasis on the need to preserve the role of fully qualified speech-language pathologists in the service delivery system. Seven challenging issues surrounding the appropriate use of speech-language pathology assistants are considered. These include registering assistants and approving training programs; membership in ASHA; discrepancies between state requirements and ASHA policies; preparation for serving diverse multicultural, bilingual, and international populations; supervision considerations; funding and reimbursement for assistants; and perspectives on career-ladder/bachelor-level personnel. The formation of a National Leadership Council is proposed to develop a coordinated strategic plan for addressing these controversial and potentially divisive issues related to speech-language pathology assistants. This council would implement strategies for future development in the areas of professional education pertaining to assistant-level supervision, instruction of assistants, communication networks, policy development, research, and the dissemination/promotion of information regarding assistants.
A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder
NASA Astrophysics Data System (ADS)
Wilson, J. B.; Mosko, J. D.
1985-12-01
The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.
The role of the supplementary motor area for speech and language processing.
Hertrich, Ingo; Dietrich, Susanne; Ackermann, Hermann
2016-09-01
Apart from its function in speech motor control, the supplementary motor area (SMA) has largely been neglected in models of speech and language processing in the brain. The aim of this review paper is to summarize more recent work, suggesting that the SMA has various superordinate control functions during speech communication and language reception, which is particularly relevant in case of increased task demands. The SMA is subdivided into a posterior region serving predominantly motor-related functions (SMA proper) whereas the anterior part (pre-SMA) is involved in higher-order cognitive control mechanisms. In analogy to motor triggering functions of the SMA proper, the pre-SMA seems to manage procedural aspects of cognitive processing. These latter functions, among others, comprise attentional switching, ambiguity resolution, context integration, and coordination between procedural and declarative memory structures. Regarding language processing, this refers, for example, to the use of inner speech mechanisms during language encoding, but also to lexical disambiguation, syntax and prosody integration, and context-tracking. Copyright © 2016 Elsevier Ltd. All rights reserved.
Subcortical processing of speech regularities underlies reading and music aptitude in children.
Strait, Dana L; Hornickel, Jane; Kraus, Nina
2011-10-17
Neural sensitivity to acoustic regularities supports fundamental human behaviors such as hearing in noise and reading. Although the failure to encode acoustic regularities in ongoing speech has been associated with language and literacy deficits, how auditory expertise, such as the expertise that is associated with musical skill, relates to the brainstem processing of speech regularities is unknown. An association between musical skill and neural sensitivity to acoustic regularities would not be surprising given the importance of repetition and regularity in music. Here, we aimed to define relationships between the subcortical processing of speech regularities, music aptitude, and reading abilities in children with and without reading impairment. We hypothesized that, in combination with auditory cognitive abilities, neural sensitivity to regularities in ongoing speech provides a common biological mechanism underlying the development of music and reading abilities. We assessed auditory working memory and attention, music aptitude, reading ability, and neural sensitivity to acoustic regularities in 42 school-aged children with a wide range of reading ability. Neural sensitivity to acoustic regularities was assessed by recording brainstem responses to the same speech sound presented in predictable and variable speech streams. Through correlation analyses and structural equation modeling, we reveal that music aptitude and literacy both relate to the extent of subcortical adaptation to regularities in ongoing speech as well as with auditory working memory and attention. Relationships between music and speech processing are specifically driven by performance on a musical rhythm task, underscoring the importance of rhythmic regularity for both language and music. These data indicate common brain mechanisms underlying reading and music abilities that relate to how the nervous system responds to regularities in auditory input. Definition of common biological underpinnings for music and reading supports the usefulness of music for promoting child literacy, with the potential to improve reading remediation.
Visual activity predicts auditory recovery from deafness after adult cochlear implantation.
Strelnikov, Kuzma; Rouger, Julien; Demonet, Jean-François; Lagleyre, Sebastien; Fraysse, Bernard; Deguine, Olivier; Barone, Pascal
2013-12-01
Modern cochlear implantation technologies allow deaf patients to understand auditory speech; however, the implants deliver only a coarse auditory input and patients must use long-term adaptive processes to achieve coherent percepts. In adults with post-lingual deafness, the high progress of speech recovery is observed during the first year after cochlear implantation, but there is a large range of variability in the level of cochlear implant outcomes and the temporal evolution of recovery. It has been proposed that when profoundly deaf subjects receive a cochlear implant, the visual cross-modal reorganization of the brain is deleterious for auditory speech recovery. We tested this hypothesis in post-lingually deaf adults by analysing whether brain activity shortly after implantation correlated with the level of auditory recovery 6 months later. Based on brain activity induced by a speech-processing task, we found strong positive correlations in areas outside the auditory cortex. The highest positive correlations were found in the occipital cortex involved in visual processing, as well as in the posterior-temporal cortex known for audio-visual integration. The other area, which positively correlated with auditory speech recovery, was localized in the left inferior frontal area known for speech processing. Our results demonstrate that the visual modality's functional level is related to the proficiency level of auditory recovery. Based on the positive correlation of visual activity with auditory speech recovery, we suggest that visual modality may facilitate the perception of the word's auditory counterpart in communicative situations. The link demonstrated between visual activity and auditory speech perception indicates that visuoauditory synergy is crucial for cross-modal plasticity and fostering speech-comprehension recovery in adult cochlear-implanted deaf patients.
ERIC Educational Resources Information Center
Preston, Jonathan L.; Felsenfeld, Susan; Frost, Stephen J.; Mencl, W. Einar; Fulbright, Robert K.; Grigorenko, Elena L.; Landi, Nicole; Seki, Ayumi; Pugh, Kenneth R.
2012-01-01
Purpose: To examine neural response to spoken and printed language in children with speech sound errors (SSE). Method: Functional magnetic resonance imaging was used to compare processing of auditorily and visually presented words and pseudowords in 17 children with SSE, ages 8;6[years;months] through 10;10, with 17 matched controls. Results: When…
Assessing Children's Home Language Environments Using Automatic Speech Recognition Technology
ERIC Educational Resources Information Center
Greenwood, Charles R.; Thiemann-Bourque, Kathy; Walker, Dale; Buzhardt, Jay; Gilkerson, Jill
2011-01-01
The purpose of this research was to replicate and extend some of the findings of Hart and Risley using automatic speech processing instead of human transcription of language samples. The long-term goal of this work is to make the current approach to speech processing possible by researchers and clinicians working on a daily basis with families and…
ERIC Educational Resources Information Center
Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge
2012-01-01
This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…
Multilingual Phoneme Models for Rapid Speech Processing System Development
2006-09-01
processes are used to develop an Arabic speech recognition system starting from monolingual English models, In- ternational Phonetic Association (IPA...clusters. It was found that multilingual bootstrapping methods out- perform monolingual English bootstrapping methods on the Arabic evaluation data initially...International Phonetic Alphabet . . . . . . . . . 7 2.3.2 Multilingual vs. Monolingual Speech Recognition 7 2.3.3 Data-Driven Approaches
Arndt, Susan; Aschendorff, Antje; Laszig, Roland; Wesarg, Thomas
2016-01-01
The ability to detect a target signal masked by noise is improved in normal-hearing listeners when interaural phase differences (IPDs) between the ear signals exist either in the masker or in the signal. To improve binaural hearing in bilaterally implanted cochlear implant (BiCI) users, a coding strategy providing the best possible access to IPD is highly desirable. In this study, we compared two coding strategies in BiCI users provided with CI systems from MED-EL (Innsbruck, Austria). The CI systems were bilaterally programmed either with the fine structure processing strategy FS4 or with the constant rate strategy high definition continuous interleaved sampling (HDCIS). Familiarization periods between 6 and 12 weeks were considered. The effect of IPD was measured in two types of experiments: (a) IPD detection thresholds with tonal signals addressing mainly one apical interaural electrode pair and (b) with speech in noise in terms of binaural speech intelligibility level differences (BILD) addressing multiple electrodes bilaterally. The results in (a) showed improved IPD detection thresholds with FS4 compared with HDCIS in four out of the seven BiCI users. In contrast, 12 BiCI users in (b) showed similar BILD with FS4 (0.6 ± 1.9 dB) and HDCIS (0.5 ± 2.0 dB). However, no correlation between results in (a) and (b) both obtained with FS4 was found. In conclusion, the degree of IPD sensitivity determined on an apical interaural electrode pair was not an indicator for BILD based on bilateral multielectrode stimulation. PMID:27659487
Zirn, Stefan; Arndt, Susan; Aschendorff, Antje; Laszig, Roland; Wesarg, Thomas
2016-09-22
The ability to detect a target signal masked by noise is improved in normal-hearing listeners when interaural phase differences (IPDs) between the ear signals exist either in the masker or in the signal. To improve binaural hearing in bilaterally implanted cochlear implant (BiCI) users, a coding strategy providing the best possible access to IPD is highly desirable. In this study, we compared two coding strategies in BiCI users provided with CI systems from MED-EL (Innsbruck, Austria). The CI systems were bilaterally programmed either with the fine structure processing strategy FS4 or with the constant rate strategy high definition continuous interleaved sampling (HDCIS). Familiarization periods between 6 and 12 weeks were considered. The effect of IPD was measured in two types of experiments: (a) IPD detection thresholds with tonal signals addressing mainly one apical interaural electrode pair and (b) with speech in noise in terms of binaural speech intelligibility level differences (BILD) addressing multiple electrodes bilaterally. The results in (a) showed improved IPD detection thresholds with FS4 compared with HDCIS in four out of the seven BiCI users. In contrast, 12 BiCI users in (b) showed similar BILD with FS4 (0.6 ± 1.9 dB) and HDCIS (0.5 ± 2.0 dB). However, no correlation between results in (a) and (b) both obtained with FS4 was found. In conclusion, the degree of IPD sensitivity determined on an apical interaural electrode pair was not an indicator for BILD based on bilateral multielectrode stimulation. © The Author(s) 2016.
When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.
Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola
2017-11-01
Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory.
Besson, Mireille; Chobert, Julie; Marie, Céline
2011-01-01
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing.
Transfer of Training between Music and Speech: Common Processing, Attention, and Memory
Besson, Mireille; Chobert, Julie; Marie, Céline
2011-01-01
After a brief historical perspective of the relationship between language and music, we review our work on transfer of training from music to speech that aimed at testing the general hypothesis that musicians should be more sensitive than non-musicians to speech sounds. In light of recent results in the literature, we argue that when long-term experience in one domain influences acoustic processing in the other domain, results can be interpreted as common acoustic processing. But when long-term experience in one domain influences the building-up of abstract and specific percepts in another domain, results are taken as evidence for transfer of training effects. Moreover, we also discuss the influence of attention and working memory on transfer effects and we highlight the usefulness of the event-related potentials method to disentangle the different processes that unfold in the course of music and speech perception. Finally, we give an overview of an on-going longitudinal project with children aimed at testing transfer effects from music to different levels and aspects of speech processing. PMID:21738519