speech recognition threshold: Topics by Science.gov

Sample records for speech recognition threshold

Temporal Sensitivity Measured Shortly After Cochlear Implantation Predicts 6-Month Speech Recognition Outcome.

PubMed

Erb, Julia; Ludwig, Alexandra Annemarie; Kunke, Dunja; Fuchs, Michael; Obleser, Jonas

2018-04-24

Psychoacoustic tests assessed shortly after cochlear implantation are useful predictors of the rehabilitative speech outcome. While largely independent, both spectral and temporal resolution tests are important to provide an accurate prediction of speech recognition. However, rapid tests of temporal sensitivity are currently lacking. Here, we propose a simple amplitude modulation rate discrimination (AMRD) paradigm that is validated by predicting future speech recognition in adult cochlear implant (CI) patients. In 34 newly implanted patients, we used an adaptive AMRD paradigm, where broadband noise was modulated at the speech-relevant rate of ~4 Hz. In a longitudinal study, speech recognition in quiet was assessed using the closed-set Freiburger number test shortly after cochlear implantation (t0) as well as the open-set Freiburger monosyllabic word test 6 months later (t6). Both AMRD thresholds at t0 (r = -0.51) and speech recognition scores at t0 (r = 0.56) predicted speech recognition scores at t6. However, AMRD and speech recognition at t0 were uncorrelated, suggesting that those measures capture partially distinct perceptual abilities. A multiple regression model predicting 6-month speech recognition outcome with deafness duration and speech recognition at t0 improved from adjusted R = 0.30 to adjusted R = 0.44 when AMRD threshold was added as a predictor. These findings identify AMRD thresholds as a reliable, nonredundant predictor above and beyond established speech tests for CI outcome. This AMRD test could potentially be developed into a rapid clinical temporal-resolution test to be integrated into the postoperative test battery to improve the reliability of speech outcome prognosis.
Perceptual learning for speech in noise after application of binary time-frequency masks

PubMed Central

Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.

2013-01-01

Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Speech Recognition Thresholds for Multilingual Populations.

ERIC Educational Resources Information Center

Ramkissoon, Ishara

2001-01-01

This article traces the development of speech audiometry in the United States and reports on the current status, focusing on the needs of a multilingual population in terms of measuring speech recognition threshold (SRT). It also discusses sociolinguistic considerations, alternative SRT stimuli for second language learners, and research on using…
Differences in Speech Recognition Between Children with Attention Deficits and Typically Developed Children Disappear When Exposed to 65 dB of Auditory Noise

PubMed Central

Söderlund, Göran B. W.; Jobs, Elisabeth Nilsson

2016-01-01

The most common neuropsychiatric condition in the in children is attention deficit hyperactivity disorder (ADHD), affecting ∼6–9% of the population. ADHD is distinguished by inattention and hyperactive, impulsive behaviors as well as poor performance in various cognitive tasks often leading to failures at school. Sensory and perceptual dysfunctions have also been noticed. Prior research has mainly focused on limitations in executive functioning where differences are often explained by deficits in pre-frontal cortex activation. Less notice has been given to sensory perception and subcortical functioning in ADHD. Recent research has shown that children with ADHD diagnosis have a deviant auditory brain stem response compared to healthy controls. The aim of the present study was to investigate if the speech recognition threshold differs between attentive and children with ADHD symptoms in two environmental sound conditions, with and without external noise. Previous research has namely shown that children with attention deficits can benefit from white noise exposure during cognitive tasks and here we investigate if noise benefit is present during an auditory perceptual task. For this purpose we used a modified Hagerman’s speech recognition test where children with and without attention deficits performed a binaural speech recognition task to assess the speech recognition threshold in no noise and noise conditions (65 dB). Results showed that the inattentive group displayed a higher speech recognition threshold than typically developed children and that the difference in speech recognition threshold disappeared when exposed to noise at supra threshold level. From this we conclude that inattention can partly be explained by sensory perceptual limitations that can possibly be ameliorated through noise exposure. PMID:26858679
Evaluation of Mandarin Chinese Speech Recognition in Adults with Cochlear Implants Using the Spectral Ripple Discrimination Test

PubMed Central

Dai, Chuanfu; Zhao, Zeqi; Zhang, Duo; Lei, Guanxiong

2018-01-01

Background The aim of this study was to explore the value of the spectral ripple discrimination test in speech recognition evaluation among a deaf (post-lingual) Mandarin-speaking population in China following cochlear implantation. Material/Methods The study included 23 Mandarin-speaking adult subjects with normal hearing (normal-hearing group) and 17 deaf adults who were former Mandarin-speakers, with cochlear implants (cochlear implantation group). The normal-hearing subjects were divided into men (n=10) and women (n=13). The spectral ripple discrimination thresholds between the groups were compared. The correlation between spectral ripple discrimination thresholds and Mandarin speech recognition rates in the cochlear implantation group were studied. Results Spectral ripple discrimination thresholds did not correlate with age (r=−0.19; p=0.22), and there was no significant difference in spectral ripple discrimination thresholds between the male and female groups (p=0.654). Spectral ripple discrimination thresholds of deaf adults with cochlear implants were significantly correlated with monosyllabic recognition rates (r=0.84; p=0.000). Conclusions In a Mandarin Chinese speaking population, spectral ripple discrimination thresholds of normal-hearing individuals were unaffected by both gender and age. Spectral ripple discrimination thresholds were correlated with Mandarin monosyllabic recognition rates of Mandarin-speaking in post-lingual deaf adults with cochlear implants. The spectral ripple discrimination test is a promising method for speech recognition evaluation in adults following cochlear implantation in China. PMID:29806954
Evaluation of Mandarin Chinese Speech Recognition in Adults with Cochlear Implants Using the Spectral Ripple Discrimination Test.

PubMed

Dai, Chuanfu; Zhao, Zeqi; Shen, Weidong; Zhang, Duo; Lei, Guanxiong; Qiao, Yuehua; Yang, Shiming

2018-05-28

BACKGROUND The aim of this study was to explore the value of the spectral ripple discrimination test in speech recognition evaluation among a deaf (post-lingual) Mandarin-speaking population in China following cochlear implantation. MATERIAL AND METHODS The study included 23 Mandarin-speaking adult subjects with normal hearing (normal-hearing group) and 17 deaf adults who were former Mandarin-speakers, with cochlear implants (cochlear implantation group). The normal-hearing subjects were divided into men (n=10) and women (n=13). The spectral ripple discrimination thresholds between the groups were compared. The correlation between spectral ripple discrimination thresholds and Mandarin speech recognition rates in the cochlear implantation group were studied. RESULTS Spectral ripple discrimination thresholds did not correlate with age (r=-0.19; p=0.22), and there was no significant difference in spectral ripple discrimination thresholds between the male and female groups (p=0.654). Spectral ripple discrimination thresholds of deaf adults with cochlear implants were significantly correlated with monosyllabic recognition rates (r=0.84; p=0.000). CONCLUSIONS In a Mandarin Chinese speaking population, spectral ripple discrimination thresholds of normal-hearing individuals were unaffected by both gender and age. Spectral ripple discrimination thresholds were correlated with Mandarin monosyllabic recognition rates of Mandarin-speaking in post-lingual deaf adults with cochlear implants. The spectral ripple discrimination test is a promising method for speech recognition evaluation in adults following cochlear implantation in China.
Dynamic relation between working memory capacity and speech recognition in noise during the first 6 months of hearing aid use.

PubMed

Ng, Elaine H N; Classon, Elisabet; Larsby, Birgitta; Arlinger, Stig; Lunner, Thomas; Rudner, Mary; Rönnberg, Jerker

2014-11-23

The present study aimed to investigate the changing relationship between aided speech recognition and cognitive function during the first 6 months of hearing aid use. Twenty-seven first-time hearing aid users with symmetrical mild to moderate sensorineural hearing loss were recruited. Aided speech recognition thresholds in noise were obtained in the hearing aid fitting session as well as at 3 and 6 months postfitting. Cognitive abilities were assessed using a reading span test, which is a measure of working memory capacity, and a cognitive test battery. Results showed a significant correlation between reading span and speech reception threshold during the hearing aid fitting session. This relation was significantly weakened over the first 6 months of hearing aid use. Multiple regression analysis showed that reading span was the main predictor of speech recognition thresholds in noise when hearing aids were first fitted, but that the pure-tone average hearing threshold was the main predictor 6 months later. One way of explaining the results is that working memory capacity plays a more important role in speech recognition in noise initially rather than after 6 months of use. We propose that new hearing aid users engage working memory capacity to recognize unfamiliar processed speech signals because the phonological form of these signals cannot be automatically matched to phonological representations in long-term memory. As familiarization proceeds, the mismatch effect is alleviated, and the engagement of working memory capacity is reduced. © The Author(s) 2014.
Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise

PubMed Central

Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

2016-01-01

Vocabulary size has been suggested as a useful measure of “verbal abilities” that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18–35 years) and 22 older (60–78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults’ poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access. PMID:27458400
Age-Related Differences in Lexical Access Relate to Speech Recognition in Noise.

PubMed

Carroll, Rebecca; Warzybok, Anna; Kollmeier, Birger; Ruigendijk, Esther

2016-01-01

Vocabulary size has been suggested as a useful measure of "verbal abilities" that correlates with speech recognition scores. Knowing more words is linked to better speech recognition. How vocabulary knowledge translates to general speech recognition mechanisms, how these mechanisms relate to offline speech recognition scores, and how they may be modulated by acoustical distortion or age, is less clear. Age-related differences in linguistic measures may predict age-related differences in speech recognition in noise performance. We hypothesized that speech recognition performance can be predicted by the efficiency of lexical access, which refers to the speed with which a given word can be searched and accessed relative to the size of the mental lexicon. We tested speech recognition in a clinical German sentence-in-noise test at two signal-to-noise ratios (SNRs), in 22 younger (18-35 years) and 22 older (60-78 years) listeners with normal hearing. We also assessed receptive vocabulary, lexical access time, verbal working memory, and hearing thresholds as measures of individual differences. Age group, SNR level, vocabulary size, and lexical access time were significant predictors of individual speech recognition scores, but working memory and hearing threshold were not. Interestingly, longer accessing times were correlated with better speech recognition scores. Hierarchical regression models for each subset of age group and SNR showed very similar patterns: the combination of vocabulary size and lexical access time contributed most to speech recognition performance; only for the younger group at the better SNR (yielding about 85% correct speech recognition) did vocabulary size alone predict performance. Our data suggest that successful speech recognition in noise is mainly modulated by the efficiency of lexical access. This suggests that older adults' poorer performance in the speech recognition task may have arisen from reduced efficiency in lexical access; with an average vocabulary size similar to that of younger adults, they were still slower in lexical access.
Accuracy of cochlear implant recipients in speech reception in the presence of background music.

PubMed

Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

2012-12-01

This study examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of 3 contrasting types of background music, and compared performance based upon listener groups: CI recipients using conventional long-electrode devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing adults. We tested 154 long-electrode CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 normal-hearing adults on closed-set recognition of spondees presented in 3 contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Signal-to-noise ratio thresholds for speech in music were examined in relation to measures of speech recognition in background noise and multitalker babble, pitch perception, and music experience. The signal-to-noise ratio thresholds for speech in music varied as a function of category of background music, group membership (long-electrode, Hybrid, normal-hearing), and age. The thresholds for speech in background music were significantly correlated with measures of pitch perception and thresholds for speech in background noise; auditory status was an important predictor. Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music.
Recognition of speech in noise after application of time-frequency masks: Dependence on frequency and threshold parameters

PubMed Central

Sinex, Donal G.

2013-01-01

Binary time-frequency (TF) masks can be applied to separate speech from noise. Previous studies have shown that with appropriate parameters, ideal TF masks can extract highly intelligible speech even at very low speech-to-noise ratios (SNRs). Two psychophysical experiments provided additional information about the dependence of intelligibility on the frequency resolution and threshold criteria that define the ideal TF mask. Listeners identified AzBio Sentences in noise, before and after application of TF masks. Masks generated with 8 or 16 frequency bands per octave supported nearly-perfect identification. Word recognition accuracy was slightly lower and more variable with 4 bands per octave. When TF masks were generated with a local threshold criterion of 0 dB SNR, the mean speech reception threshold was −9.5 dB SNR, compared to −5.7 dB for unprocessed sentences in noise. Speech reception thresholds decreased by about 1 dB per dB of additional decrease in the local threshold criterion. Information reported here about the dependence of speech intelligibility on frequency and level parameters has relevance for the development of non-ideal TF masks for clinical applications such as speech processing for hearing aids. PMID:23556604
Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex; Trawick, David

1991-01-01

The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words.
The NTID speech recognition test: NSRT(®).

PubMed

Bochner, Joseph H; Garrison, Wayne M; Doherty, Karen A

2015-07-01

The purpose of this study was to collect and analyse data necessary for expansion of the NSRT item pool and to evaluate the NSRT adaptive testing software. Participants were administered pure-tone and speech recognition tests including W-22 and QuickSIN, as well as a set of 323 new NSRT items and NSRT adaptive tests in quiet and background noise. Performance on the adaptive tests was compared to pure-tone thresholds and performance on other speech recognition measures. The 323 new items were subjected to Rasch scaling analysis. Seventy adults with mild to moderately severe hearing loss participated in this study. Their mean age was 62.4 years (sd = 20.8). The 323 new NSRT items fit very well with the original item bank, enabling the item pool to be more than doubled in size. Data indicate high reliability coefficients for the NSRT and moderate correlations with pure-tone thresholds (PTA and HFPTA) and other speech recognition measures (W-22, QuickSIN, and SRT). The adaptive NSRT is an efficient and effective measure of speech recognition, providing valid and reliable information concerning respondents' speech perception abilities.
Accuracy of Cochlear Implant Recipients on Speech Reception in Background Music

PubMed Central

Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Kliethermes, Stephanie; Driscoll, Virginia

2012-01-01

Objectives This study (a) examined speech recognition abilities of cochlear implant (CI) recipients in the spectrally complex listening condition of three contrasting types of background music, and (b) compared performance based upon listener groups: CI recipients using conventional long-electrode (LE) devices, Hybrid CI recipients (acoustic plus electric stimulation), and normal-hearing (NH) adults. Methods We tested 154 LE CI recipients using varied devices and strategies, 21 Hybrid CI recipients, and 49 NH adults on closed-set recognition of spondees presented in three contrasting forms of background music (piano solo, large symphony orchestra, vocal solo with small combo accompaniment) in an adaptive test. Outcomes Signal-to-noise thresholds for speech in music (SRTM) were examined in relation to measures of speech recognition in background noise and multi-talker babble, pitch perception, and music experience. Results SRTM thresholds varied as a function of category of background music, group membership (LE, Hybrid, NH), and age. Thresholds for speech in background music were significantly correlated with measures of pitch perception and speech in background noise thresholds; auditory status was an important predictor. Conclusions Evidence suggests that speech reception thresholds in background music change as a function of listener age (with more advanced age being detrimental), structural characteristics of different types of music, and hearing status (residual hearing). These findings have implications for everyday listening conditions such as communicating in social or commercial situations in which there is background music. PMID:23342550
Lexical-Access Ability and Cognitive Predictors of Speech Recognition in Noise in Adult Cochlear Implant Users

PubMed Central

Smits, Cas; Merkus, Paul; Festen, Joost M.; Goverts, S. Theo

2017-01-01

Not all of the variance in speech-recognition performance of cochlear implant (CI) users can be explained by biographic and auditory factors. In normal-hearing listeners, linguistic and cognitive factors determine most of speech-in-noise performance. The current study explored specifically the influence of visually measured lexical-access ability compared with other cognitive factors on speech recognition of 24 postlingually deafened CI users. Speech-recognition performance was measured with monosyllables in quiet (consonant-vowel-consonant [CVC]), sentences-in-noise (SIN), and digit-triplets in noise (DIN). In addition to a composite variable of lexical-access ability (LA), measured with a lexical-decision test (LDT) and word-naming task, vocabulary size, working-memory capacity (Reading Span test [RSpan]), and a visual analogue of the SIN test (text reception threshold test) were measured. The DIN test was used to correct for auditory factors in SIN thresholds by taking the difference between SIN and DIN: SRTdiff. Correlation analyses revealed that duration of hearing loss (dHL) was related to SIN thresholds. Better working-memory capacity was related to SIN and SRTdiff scores. LDT reaction time was positively correlated with SRTdiff scores. No significant relationships were found for CVC or DIN scores with the predictor variables. Regression analyses showed that together with dHL, RSpan explained 55% of the variance in SIN thresholds. When controlling for auditory performance, LA, LDT, and RSpan separately explained, together with dHL, respectively 37%, 36%, and 46% of the variance in SRTdiff outcome. The results suggest that poor verbal working-memory capacity and to a lesser extent poor lexical-access ability limit speech-recognition ability in listeners with a CI. PMID:29205095
Cochlear implant characteristics and speech perception skills of adolescents with long-term device use.

PubMed

Davidson, Lisa S; Geers, Ann E; Brenner, Christine

2010-10-01

Updated cochlear implant technology and optimized fitting can have a substantial impact on speech perception. The effects of upgrades in processor technology and aided thresholds on word recognition at soft input levels and sentence recognition in noise were examined. We hypothesized that updated speech processors and lower aided thresholds would allow improved recognition of soft speech without compromising performance in noise. 109 teenagers who had used a Nucleus 22-cochlear implant since preschool were tested with their current speech processor(s) (101 unilateral and 8 bilateral): 13 used the Spectra, 22 the ESPrit 22, 61 the ESPrit 3G, and 13 the Freedom. The Lexical Neighborhood Test (LNT) was administered at 70 and 50 dB SPL and the Bamford Kowal Bench sentences were administered in quiet and in noise. Aided thresholds were obtained for frequency-modulated tones from 250 to 4,000 Hz. Results were analyzed using repeated measures analysis of variance. Aided thresholds for the Freedom/3G group were significantly lower (better) than the Spectra/Sprint group. LNT scores at 50 dB were significantly higher for the Freedom/3G group. No significant differences between the 2 groups were found for the LNT at 70 or sentences in quiet or noise. Adolescents using updated processors that allowed for aided detection thresholds of 30 dB HL or better performed the best at soft levels. The BKB in noise results suggest that greater access to soft speech does not compromise listening in noise.
Self-Assessed Hearing Handicap in Older Adults With Poorer-Than-Predicted Speech Recognition in Noise.

PubMed

Eckert, Mark A; Matthews, Lois J; Dubno, Judy R

2017-01-01

Even older adults with relatively mild hearing loss report hearing handicap, suggesting that hearing handicap is not completely explained by reduced speech audibility. We examined the extent to which self-assessed ratings of hearing handicap using the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982) were significantly associated with measures of speech recognition in noise that controlled for differences in speech audibility. One hundred sixty-two middle-aged and older adults had HHIE total scores that were significantly associated with audibility-adjusted measures of speech recognition for low-context but not high-context sentences. These findings were driven by HHIE items involving negative feelings related to communication difficulties that also captured variance in subjective ratings of effort and frustration that predicted speech recognition. The average pure-tone threshold accounted for some of the variance in the association between the HHIE and audibility-adjusted speech recognition, suggesting an effect of central and peripheral auditory system decline related to elevated thresholds. The accumulation of difficult listening experiences appears to produce a self-assessment of hearing handicap resulting from (a) reduced audibility of stimuli, (b) declines in the central and peripheral auditory system function, and (c) additional individual variation in central nervous system function.
Self-Assessed Hearing Handicap in Older Adults With Poorer-Than-Predicted Speech Recognition in Noise

PubMed Central

Matthews, Lois J.; Dubno, Judy R.

2017-01-01

Purpose Even older adults with relatively mild hearing loss report hearing handicap, suggesting that hearing handicap is not completely explained by reduced speech audibility. Method We examined the extent to which self-assessed ratings of hearing handicap using the Hearing Handicap Inventory for the Elderly (HHIE; Ventry & Weinstein, 1982) were significantly associated with measures of speech recognition in noise that controlled for differences in speech audibility. Results One hundred sixty-two middle-aged and older adults had HHIE total scores that were significantly associated with audibility-adjusted measures of speech recognition for low-context but not high-context sentences. These findings were driven by HHIE items involving negative feelings related to communication difficulties that also captured variance in subjective ratings of effort and frustration that predicted speech recognition. The average pure-tone threshold accounted for some of the variance in the association between the HHIE and audibility-adjusted speech recognition, suggesting an effect of central and peripheral auditory system decline related to elevated thresholds. Conclusion The accumulation of difficult listening experiences appears to produce a self-assessment of hearing handicap resulting from (a) reduced audibility of stimuli, (b) declines in the central and peripheral auditory system function, and (c) additional individual variation in central nervous system function. PMID:28060993
Monopolar Detection Thresholds Predict Spatial Selectivity of Neural Excitation in Cochlear Implants: Implications for Speech Recognition

PubMed Central

2016-01-01

The objectives of the study were to (1) investigate the potential of using monopolar psychophysical detection thresholds for estimating spatial selectivity of neural excitation with cochlear implants and to (2) examine the effect of site removal on speech recognition based on the threshold measure. Detection thresholds were measured in Cochlear Nucleus® device users using monopolar stimulation for pulse trains that were of (a) low rate and long duration, (b) high rate and short duration, and (c) high rate and long duration. Spatial selectivity of neural excitation was estimated by a forward-masking paradigm, where the probe threshold elevation in the presence of a forward masker was measured as a function of masker-probe separation. The strength of the correlation between the monopolar thresholds and the slopes of the masking patterns systematically reduced as neural response of the threshold stimulus involved interpulse interactions (refractoriness and sub-threshold adaptation), and spike-rate adaptation. Detection threshold for the low-rate stimulus most strongly correlated with the spread of forward masking patterns and the correlation reduced for long and high rate pulse trains. The low-rate thresholds were then measured for all electrodes across the array for each subject. Subsequently, speech recognition was tested with experimental maps that deactivated five stimulation sites with the highest thresholds and five randomly chosen ones. Performance with deactivating the high-threshold sites was better than performance with the subjects’ clinical map used every day with all electrodes active, in both quiet and background noise. Performance with random deactivation was on average poorer than that with the clinical map but the difference was not significant. These results suggested that the monopolar low-rate thresholds are related to the spatial neural excitation patterns in cochlear implant users and can be used to select sites for more optimal speech recognition performance. PMID:27798658
Speech intelligibility index predictions for young and old listeners in automobile noise: Can the index be improved by incorporating factors other than absolute threshold?

NASA Astrophysics Data System (ADS)

Saweikis, Meghan; Surprenant, Aimée M.; Davies, Patricia; Gallant, Don

2003-10-01

While young and old subjects with comparable audiograms tend to perform comparably on speech recognition tasks in quiet environments, the older subjects have more difficulty than the younger subjects with recognition tasks in degraded listening conditions. This suggests that factors other than an absolute threshold may account for some of the difficulty older listeners have on recognition tasks in noisy environments. Many metrics, including the Speech Intelligibility Index (SII), used to measure speech intelligibility, only consider an absolute threshold when accounting for age related hearing loss. Therefore these metrics tend to overestimate the performance for elderly listeners in noisy environments [Tobias et al., J. Acoust. Soc. Am. 83, 859-895 (1988)]. The present studies examine the predictive capabilities of the SII in an environment with automobile noise present. This is of interest because people's evaluation of the automobile interior sound is closely linked to their ability to carry on conversations with their fellow passengers. The four studies examine whether, for subjects with age related hearing loss, the accuracy of the SII can be improved by incorporating factors other than an absolute threshold into the model. [Work supported by Ford Motor Company.

Recognition and localization of speech by adult cochlear implant recipients wearing a digital hearing aid in the nonimplanted ear (bimodal hearing).

PubMed

Potts, Lisa G; Skinner, Margaret W; Litovsky, Ruth A; Strube, Michael J; Kuk, Francis

2009-06-01

The use of bilateral amplification is now common clinical practice for hearing aid users but not for cochlear implant recipients. In the past, most cochlear implant recipients were implanted in one ear and wore only a monaural cochlear implant processor. There has been recent interest in benefits arising from bilateral stimulation that may be present for cochlear implant recipients. One option for bilateral stimulation is the use of a cochlear implant in one ear and a hearing aid in the opposite nonimplanted ear (bimodal hearing). This study evaluated the effect of wearing a cochlear implant in one ear and a digital hearing aid in the opposite ear on speech recognition and localization. A repeated-measures correlational study was completed. Nineteen adult Cochlear Nucleus 24 implant recipients participated in the study. The participants were fit with a Widex Senso Vita 38 hearing aid to achieve maximum audibility and comfort within their dynamic range. Soundfield thresholds, loudness growth, speech recognition, localization, and subjective questionnaires were obtained six-eight weeks after the hearing aid fitting. Testing was completed in three conditions: hearing aid only, cochlear implant only, and cochlear implant and hearing aid (bimodal). All tests were repeated four weeks after the first test session. Repeated-measures analysis of variance was used to analyze the data. Significant effects were further examined using pairwise comparison of means or in the case of continuous moderators, regression analyses. The speech-recognition and localization tasks were unique, in that a speech stimulus presented from a variety of roaming azimuths (140 degree loudspeaker array) was used. Performance in the bimodal condition was significantly better for speech recognition and localization compared to the cochlear implant-only and hearing aid-only conditions. Performance was also different between these conditions when the location (i.e., side of the loudspeaker array that presented the word) was analyzed. In the bimodal condition, the speech-recognition and localization tasks were equal regardless of which side of the loudspeaker array presented the word, while performance was significantly poorer for the monaural conditions (hearing aid only and cochlear implant only) when the words were presented on the side with no stimulation. Binaural loudness summation of 1-3 dB was seen in soundfield thresholds and loudness growth in the bimodal condition. Measures of the audibility of sound with the hearing aid, including unaided thresholds, soundfield thresholds, and the Speech Intelligibility Index, were significant moderators of speech recognition and localization. Based on the questionnaire responses, participants showed a strong preference for bimodal stimulation. These findings suggest that a well-fit digital hearing aid worn in conjunction with a cochlear implant is beneficial to speech recognition and localization. The dynamic test procedures used in this study illustrate the importance of bilateral hearing for locating, identifying, and switching attention between multiple speakers. It is recommended that unilateral cochlear implant recipients, with measurable unaided hearing thresholds, be fit with a hearing aid.
New Measures of Masked Text Recognition in Relation to Speech-in-Noise Perception and Their Associations with Age and Cognitive Abilities

ERIC Educational Resources Information Center

Besser, Jana; Zekveld, Adriana A.; Kramer, Sophia E.; Ronnberg, Jerker; Festen, Joost M.

2012-01-01

Purpose: In this research, the authors aimed to increase the analogy between Text Reception Threshold (TRT; Zekveld, George, Kramer, Goverts, & Houtgast, 2007) and Speech Reception Threshold (SRT; Plomp & Mimpen, 1979) and to examine the TRT's value in estimating cognitive abilities that are important for speech comprehension in noise. Method: The…
Exploring the Link Between Cognitive Abilities and Speech Recognition in the Elderly Under Different Listening Conditions

PubMed Central

Nuesse, Theresa; Steenken, Rike; Neher, Tobias; Holube, Inga

2018-01-01

Elderly listeners are known to differ considerably in their ability to understand speech in noise. Several studies have addressed the underlying factors that contribute to these differences. These factors include audibility, and age-related changes in supra-threshold auditory processing abilities, and it has been suggested that differences in cognitive abilities may also be important. The objective of this study was to investigate associations between performance in cognitive tasks and speech recognition under different listening conditions in older adults with either age appropriate hearing or hearing-impairment. To that end, speech recognition threshold (SRT) measurements were performed under several masking conditions that varied along the perceptual dimensions of dip listening, spatial separation, and informational masking. In addition, a neuropsychological test battery was administered, which included measures of verbal working and short-term memory, executive functioning, selective and divided attention, and lexical and semantic abilities. Age-matched groups of older adults with either age-appropriate hearing (ENH, n = 20) or aided hearing impairment (EHI, n = 21) participated. In repeated linear regression analyses, composite scores of cognitive test outcomes (evaluated using PCA) were included to predict SRTs. These associations were different for the two groups. When hearing thresholds were controlled for, composed cognitive factors were significantly associated with the SRTs for the ENH listeners. Whereas better lexical and semantic abilities were associated with lower (better) SRTs in this group, there was a negative association between attentional abilities and speech recognition in the presence of spatially separated speech-like maskers. For the EHI group, the pure-tone thresholds (averaged across 0.5, 1, 2, and 4 kHz) were significantly associated with the SRTs, despite the fact that all signals were amplified and therefore in principle audible. PMID:29867654
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise.

PubMed

Shen, Jing; Souza, Pamela E

2017-09-18

This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

PubMed Central

Souza, Pamela E.

2017-01-01

Purpose This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for speech recognition in noise. Last, we explored the individual factors that predict the amount of dynamic-pitch benefit for speech recognition in noise. Method Younger listeners with normal hearing and older listeners with varying levels of hearing sensitivity participated in the study, in which speech reception thresholds were measured with sentences in nonspeech noise. Results The younger listeners benefited more from dynamic pitch for speech recognition in temporally modulated noise than unmodulated noise. Older listeners were able to benefit from the dynamic-pitch cues but received less benefit from noise modulation than the younger listeners. For those older listeners with hearing loss, the amount of hearing loss strongly predicted the dynamic-pitch benefit for speech recognition in noise. Conclusions Dynamic-pitch cues aid speech recognition in noise, particularly when noise has temporal modulation. Hearing loss negatively affects the dynamic-pitch benefit to older listeners with significant hearing loss. PMID:28800370
Recognition and Localization of Speech by Adult Cochlear Implant Recipients Wearing a Digital Hearing Aid in the Nonimplanted Ear (Bimodal Hearing)

PubMed Central

Potts, Lisa G.; Skinner, Margaret W.; Litovsky, Ruth A.; Strube, Michael J; Kuk, Francis

2010-01-01

Background The use of bilateral amplification is now common clinical practice for hearing aid users but not for cochlear implant recipients. In the past, most cochlear implant recipients were implanted in one ear and wore only a monaural cochlear implant processor. There has been recent interest in benefits arising from bilateral stimulation that may be present for cochlear implant recipients. One option for bilateral stimulation is the use of a cochlear implant in one ear and a hearing aid in the opposite nonimplanted ear (bimodal hearing). Purpose This study evaluated the effect of wearing a cochlear implant in one ear and a digital hearing aid in the opposite ear on speech recognition and localization. Research Design A repeated-measures correlational study was completed. Study Sample Nineteen adult Cochlear Nucleus 24 implant recipients participated in the study. Intervention The participants were fit with a Widex Senso Vita 38 hearing aid to achieve maximum audibility and comfort within their dynamic range. Data Collection and Analysis Soundfield thresholds, loudness growth, speech recognition, localization, and subjective questionnaires were obtained six–eight weeks after the hearing aid fitting. Testing was completed in three conditions: hearing aid only, cochlear implant only, and cochlear implant and hearing aid (bimodal). All tests were repeated four weeks after the first test session. Repeated-measures analysis of variance was used to analyze the data. Significant effects were further examined using pairwise comparison of means or in the case of continuous moderators, regression analyses. The speech-recognition and localization tasks were unique, in that a speech stimulus presented from a variety of roaming azimuths (140 degree loudspeaker array) was used. Results Performance in the bimodal condition was significantly better for speech recognition and localization compared to the cochlear implant–only and hearing aid–only conditions. Performance was also different between these conditions when the location (i.e., side of the loudspeaker array that presented the word) was analyzed. In the bimodal condition, the speech-recognition and localization tasks were equal regardless of which side of the loudspeaker array presented the word, while performance was significantly poorer for the monaural conditions (hearing aid only and cochlear implant only) when the words were presented on the side with no stimulation. Binaural loudness summation of 1–3 dB was seen in soundfield thresholds and loudness growth in the bimodal condition. Measures of the audibility of sound with the hearing aid, including unaided thresholds, soundfield thresholds, and the Speech Intelligibility Index, were significant moderators of speech recognition and localization. Based on the questionnaire responses, participants showed a strong preference for bimodal stimulation. Conclusions These findings suggest that a well-fit digital hearing aid worn in conjunction with a cochlear implant is beneficial to speech recognition and localization. The dynamic test procedures used in this study illustrate the importance of bilateral hearing for locating, identifying, and switching attention between multiple speakers. It is recommended that unilateral cochlear implant recipients, with measurable unaided hearing thresholds, be fit with a hearing aid. PMID:19594084
How linguistic closure and verbal working memory relate to speech recognition in noise--a review.

PubMed

Besser, Jana; Koelewijn, Thomas; Zekveld, Adriana A; Kramer, Sophia E; Festen, Joost M

2013-06-01

The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations.
How Linguistic Closure and Verbal Working Memory Relate to Speech Recognition in Noise—A Review

PubMed Central

Koelewijn, Thomas; Zekveld, Adriana A.; Kramer, Sophia E.; Festen, Joost M.

2013-01-01

The ability to recognize masked speech, commonly measured with a speech reception threshold (SRT) test, is associated with cognitive processing abilities. Two cognitive factors frequently assessed in speech recognition research are the capacity of working memory (WM), measured by means of a reading span (Rspan) or listening span (Lspan) test, and the ability to read masked text (linguistic closure), measured by the text reception threshold (TRT). The current article provides a review of recent hearing research that examined the relationship of TRT and WM span to SRTs in various maskers. Furthermore, modality differences in WM capacity assessed with the Rspan compared to the Lspan test were examined and related to speech recognition abilities in an experimental study with young adults with normal hearing (NH). Span scores were strongly associated with each other, but were higher in the auditory modality. The results of the reviewed studies suggest that TRT and WM span are related to each other, but differ in their relationships with SRT performance. In NH adults of middle age or older, both TRT and Rspan were associated with SRTs in speech maskers, whereas TRT better predicted speech recognition in fluctuating nonspeech maskers. The associations with SRTs in steady-state noise were inconclusive for both measures. WM span was positively related to benefit from contextual information in speech recognition, but better TRTs related to less interference from unrelated cues. Data for individuals with impaired hearing are limited, but larger WM span seems to give a general advantage in various listening situations. PMID:23945955
Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE

PubMed Central

Schädler, Marc René; Warzybok, Anna; Meyer, Bernd T.; Brand, Thomas

2016-01-01

To characterize the individual patient’s hearing impairment as obtained with the matrix sentence recognition test, a simulation Framework for Auditory Discrimination Experiments (FADE) is extended here using the Attenuation and Distortion (A+D) approach by Plomp as a blueprint for setting the individual processing parameters. FADE has been shown to predict the outcome of both speech recognition tests and psychoacoustic experiments based on simulations using an automatic speech recognition system requiring only few assumptions. It builds on the closed-set matrix sentence recognition test which is advantageous for testing individual speech recognition in a way comparable across languages. Individual predictions of speech recognition thresholds in stationary and in fluctuating noise were derived using the audiogram and an estimate of the internal level uncertainty for modeling the individual Plomp curves fitted to the data with the Attenuation (A-) and Distortion (D-) parameters of the Plomp approach. The “typical” audiogram shapes from Bisgaard et al with or without a “typical” level uncertainty and the individual data were used for individual predictions. As a result, the individualization of the level uncertainty was found to be more important than the exact shape of the individual audiogram to accurately model the outcome of the German Matrix test in stationary or fluctuating noise for listeners with hearing impairment. The prediction accuracy of the individualized approach also outperforms the (modified) Speech Intelligibility Index approach which is based on the individual threshold data only. PMID:27604782
Relationship between Consonant Recognition in Noise and Hearing Threshold

ERIC Educational Resources Information Center

Yoon, Yang-soo; Allen, Jont B.; Gooler, David M.

2012-01-01

Purpose: Although poorer understanding of speech in noise by listeners who are hearing-impaired (HI) is known not to be directly related to audiometric hearing threshold, "HT" (f), grouping HI listeners with "HT" (f) is widely practiced. In this article, the relationship between consonant recognition and "HT" (f) is…
Prediction of consonant recognition in quiet for listeners with normal and impaired hearing using an auditory model.

PubMed

Jürgens, Tim; Ewert, Stephan D; Kollmeier, Birger; Brand, Thomas

2014-03-01

Consonant recognition was assessed in normal-hearing (NH) and hearing-impaired (HI) listeners in quiet as a function of speech level using a nonsense logatome test. Average recognition scores were analyzed and compared to recognition scores of a speech recognition model. In contrast to commonly used spectral speech recognition models operating on long-term spectra, a "microscopic" model operating in the time domain was used. Variations of the model (accounting for hearing impairment) and different model parameters (reflecting cochlear compression) were tested. Using these model variations this study examined whether speech recognition performance in quiet is affected by changes in cochlear compression, namely, a linearization, which is often observed in HI listeners. Consonant recognition scores for HI listeners were poorer than for NH listeners. The model accurately predicted the speech reception thresholds of the NH and most HI listeners. A partial linearization of the cochlear compression in the auditory model, while keeping audibility constant, produced higher recognition scores and improved the prediction accuracy. However, including listener-specific information about the exact form of the cochlear compression did not improve the prediction further.
Clinical implications of word recognition differences in earphone and aided conditions

PubMed Central

McRackan, Theodore R.; Ahlstrom, Jayne B.; Clinkscales, William B.; Meyer, Ted A.; Dubno, Judy R

2017-01-01

Objective To compare word recognition scores for adults with hearing loss measured using earphones and in the sound field without and with hearing aids (HA) Study design Independent review of pre-surgical audiological data from an active middle ear implant (MEI) FDA clinical trial Setting Multicenter prospective FDA clinical trial Patients Ninety-four adult HA users Interventions/Main outcomes measured Pre-operative earphone, unaided and aided pure tone thresholds, word recognition scores, and speech intelligibility index. Results We performed an independent review of pre-surgical audiological data from a MEI FDA trial and compared unaided and aided word recognition scores with participants’ HAs fit according to the NAL-R algorithm. For 52 participants (55.3%), differences in scores between earphone and aided conditions were >10%; for 33 participants (35.1%), earphone scores were higher by 10% or more than aided scores. These participants had significantly higher pure tone thresholds at 250 Hz, 500 Hz, and 1000 Hz), higher pure tone averages, higher speech recognition thresholds, (and higher earphone speech levels (p=0.002). No significant correlation was observed between word recognition scores measured with earphones and with hearing aids (r=.14; p=0.16), whereas a moderately high positive correlation was observed between unaided and aided word recognition (r=0.68; p<0.001). Conclusion Results of the these analyses do not support the common clinical practice of using word recognition scores measured with earphones to predict aided word recognition or hearing aid benefit. Rather, these results provide evidence supporting the measurement of aided word recognition in patients who are considering hearing aids. PMID:27631832
Sentence Recognition Prediction for Hearing-impaired Listeners in Stationary and Fluctuation Noise With FADE: Empowering the Attenuation and Distortion Concept by Plomp With a Quantitative Processing Model.

PubMed

Kollmeier, Birger; Schädler, Marc René; Warzybok, Anna; Meyer, Bernd T; Brand, Thomas

2016-09-07

To characterize the individual patient's hearing impairment as obtained with the matrix sentence recognition test, a simulation Framework for Auditory Discrimination Experiments (FADE) is extended here using the Attenuation and Distortion (A+D) approach by Plomp as a blueprint for setting the individual processing parameters. FADE has been shown to predict the outcome of both speech recognition tests and psychoacoustic experiments based on simulations using an automatic speech recognition system requiring only few assumptions. It builds on the closed-set matrix sentence recognition test which is advantageous for testing individual speech recognition in a way comparable across languages. Individual predictions of speech recognition thresholds in stationary and in fluctuating noise were derived using the audiogram and an estimate of the internal level uncertainty for modeling the individual Plomp curves fitted to the data with the Attenuation (A-) and Distortion (D-) parameters of the Plomp approach. The "typical" audiogram shapes from Bisgaard et al with or without a "typical" level uncertainty and the individual data were used for individual predictions. As a result, the individualization of the level uncertainty was found to be more important than the exact shape of the individual audiogram to accurately model the outcome of the German Matrix test in stationary or fluctuating noise for listeners with hearing impairment. The prediction accuracy of the individualized approach also outperforms the (modified) Speech Intelligibility Index approach which is based on the individual threshold data only. © The Author(s) 2016.
Cochlear implantation with hearing preservation yields significant benefit for speech recognition in complex listening environments.

PubMed

Gifford, René H; Dorman, Michael F; Skarzynski, Henryk; Lorens, Artur; Polak, Marek; Driscoll, Colin L W; Roland, Peter; Buchman, Craig A

2013-01-01

The aim of this study was to assess the benefit of having preserved acoustic hearing in the implanted ear for speech recognition in complex listening environments. The present study included a within-subjects, repeated-measures design including 21 English-speaking and 17 Polish-speaking cochlear implant (CI) recipients with preserved acoustic hearing in the implanted ear. The patients were implanted with electrodes that varied in insertion depth from 10 to 31 mm. Mean preoperative low-frequency thresholds (average of 125, 250, and 500 Hz) in the implanted ear were 39.3 and 23.4 dB HL for the English- and Polish-speaking participants, respectively. In one condition, speech perception was assessed in an eight-loudspeaker environment in which the speech signals were presented from one loudspeaker and restaurant noise was presented from all loudspeakers. In another condition, the signals were presented in a simulation of a reverberant environment with a reverberation time of 0.6 sec. The response measures included speech reception thresholds (SRTs) and percent correct sentence understanding for two test conditions: CI plus low-frequency hearing in the contralateral ear (bimodal condition) and CI plus low-frequency hearing in both ears (best-aided condition). A subset of six English-speaking listeners were also assessed on measures of interaural time difference thresholds for a 250-Hz signal. Small, but significant, improvements in performance (1.7-2.1 dB and 6-10 percentage points) were found for the best-aided condition versus the bimodal condition. Postoperative thresholds in the implanted ear were correlated with the degree of electric and acoustic stimulation (EAS) benefit for speech recognition in diffuse noise. There was no reliable relationship among measures of audiometric threshold in the implanted ear nor elevation in threshold after surgery and improvement in speech understanding in reverberation. There was a significant correlation between interaural time difference threshold at 250 Hz and EAS-related benefit for the adaptive speech reception threshold. The findings of this study suggest that (1) preserved low-frequency hearing improves speech understanding for CI recipients, (2) testing in complex listening environments, in which binaural timing cues differ for signal and noise, may best demonstrate the value of having two ears with low-frequency acoustic hearing, and (3) preservation of binaural timing cues, although poorer than observed for individuals with normal hearing, is possible after unilateral cochlear implantation with hearing preservation and is associated with EAS benefit. The results of this study demonstrate significant communicative benefit for hearing preservation in the implanted ear and provide support for the expansion of CI criteria to include individuals with low-frequency thresholds in even the normal to near-normal range.
Integrating the acoustics of running speech into the pure tone audiogram: a step from audibility to intelligibility and disability.

PubMed

Corthals, Paul

2008-01-01

The aim of the present study is to construct a simple method for visualizing and quantifying the audibility of speech on the audiogram and to predict speech intelligibility. The proposed method involves a series of indices on the audiogram form reflecting the sound pressure level distribution of running speech. The indices that coincide with a patient's pure tone thresholds reflect speech audibility and give evidence of residual functional hearing capacity. Two validation studies were conducted among sensorineurally hearing-impaired participants (n = 56 and n = 37, respectively) to investigate the relation with speech recognition ability and hearing disability. The potential of the new audibility indices as predictors for speech reception thresholds is comparable to the predictive potential of the ANSI 1968 articulation index and the ANSI 1997 speech intelligibility index. The sum of indices or a weighted combination can explain considerable proportions of variance in speech reception results for sentences in quiet free field conditions. The proportions of variance that can be explained in questionnaire results on hearing disability are less, presumably because the threshold indices almost exclusively reflect message audibility and much less the psychosocial consequences of hearing deficits. The outcomes underpin the validity of the new audibility indexing system, even though the proposed method may be better suited for predicting relative performance across a set of conditions than for predicting absolute speech recognition performance. (c) 2007 S. Karger AG, Basel
Longitudinal changes in speech recognition in older persons.

PubMed

Dubno, Judy R; Lee, Fu-Shing; Matthews, Lois J; Ahlstrom, Jayne B; Horwitz, Amy R; Mills, John H

2008-01-01

Recognition of isolated monosyllabic words in quiet and recognition of key words in low- and high-context sentences in babble were measured in a large sample of older persons enrolled in a longitudinal study of age-related hearing loss. Repeated measures were obtained yearly or every 2 to 3 years. To control for concurrent changes in pure-tone thresholds and speech levels, speech-recognition scores were adjusted using an importance-weighted speech-audibility metric (AI). Linear-regression slope estimated the rate of change in adjusted speech-recognition scores. Recognition of words in quiet declined significantly faster with age than predicted by declines in speech audibility. As subjects aged, observed scores deviated increasingly from AI-predicted scores, but this effect did not accelerate with age. Rate of decline in word recognition was significantly faster for females than males and for females with high serum progesterone levels, whereas noise history had no effect. Rate of decline did not accelerate with age but increased with degree of hearing loss, suggesting that with more severe injury to the auditory system, impairments to auditory function other than reduced audibility resulted in faster declines in word recognition as subjects aged. Recognition of key words in low- and high-context sentences in babble did not decline significantly with age.
The effect of instantaneous input dynamic range setting on the speech perception of children with the nucleus 24 implant.

PubMed

Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan

2009-06-01

The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p < 0.001) than with the 30 IIDR. Group mean CNC scores at 60 dB SPL, loudness ratings, and the signal to noise ratios-50 for Bamford-Kowal-Bench Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.
Do Older Listeners With Hearing Loss Benefit From Dynamic Pitch for Speech Recognition in Noise?

PubMed

Shen, Jing; Souza, Pamela E

2017-10-12

Dynamic pitch, the variation in the fundamental frequency of speech, aids older listeners' speech perception in noise. It is unclear, however, whether some older listeners with hearing loss benefit from strengthened dynamic pitch cues for recognizing speech in certain noise scenarios and how this relative benefit may be associated with individual factors. We first examined older individuals' relative benefit between natural and strong dynamic pitches for better speech recognition in noise. Further, we reported the individual factors of the 2 groups of listeners who benefit differently from natural and strong dynamic pitches. Speech reception thresholds of 13 older listeners with mild-moderate hearing loss were measured using target speech with 3 levels of dynamic pitch strength. Individuals' ability to benefit from dynamic pitch was defined as the speech reception threshold difference between speeches with and without dynamic pitch cues. The relative benefit of natural versus strong dynamic pitch varied across individuals. However, this relative benefit remained consistent for the same individuals across those background noises with temporal modulation. Those listeners who benefited more from strong dynamic pitch reported better subjective speech perception abilities. Strong dynamic pitch may be more beneficial than natural dynamic pitch for some older listeners to recognize speech better in noise, particularly when the noise has temporal modulation.
[Characteristics, advantages, and limits of matrix tests].

PubMed

Brand, T; Wagener, K C

2017-03-01

Deterioration of communication abilities due to hearing problems is particularly relevant in listening situations with noise. Therefore, speech intelligibility tests in noise are required for audiological diagnostics and evaluation of hearing rehabilitation. This study analyzed the characteristics of matrix tests assessing the 50 % speech recognition threshold in noise. What are their advantages and limitations? Matrix tests are based on a matrix of 50 words (10 five-word sentences with same grammatical structure). In the standard setting, 20 sentences are presented using an adaptive procedure estimating the individual 50 % speech recognition threshold in noise. At present, matrix tests in 17 different languages are available. A high international comparability of matrix tests exists. The German language matrix test (OLSA, male speaker) has a reference 50 % speech recognition threshold of -7.1 (± 1.1) dB SNR. Before using a matrix test for the first time, the test person has to become familiar with the basic speech material using two training lists. Hereafter, matrix tests produce constant results even if repeated many times. Matrix tests are suitable for users of hearing aids and cochlear implants, particularly for assessment of benefit during the fitting process. Matrix tests can be performed in closed form and consequently with non-native listeners, even if the experimenter does not speak the test person's native language. Short versions of matrix tests are available for listeners with a shorter memory span, e.g., children.
A new time-adaptive discrete bionic wavelet transform for enhancing speech from adverse noise environment

NASA Astrophysics Data System (ADS)

Palaniswamy, Sumithra; Duraisamy, Prakash; Alam, Mohammad Showkat; Yuan, Xiaohui

2012-04-01

Automatic speech processing systems are widely used in everyday life such as mobile communication, speech and speaker recognition, and for assisting the hearing impaired. In speech communication systems, the quality and intelligibility of speech is of utmost importance for ease and accuracy of information exchange. To obtain an intelligible speech signal and one that is more pleasant to listen, noise reduction is essential. In this paper a new Time Adaptive Discrete Bionic Wavelet Thresholding (TADBWT) scheme is proposed. The proposed technique uses Daubechies mother wavelet to achieve better enhancement of speech from additive non- stationary noises which occur in real life such as street noise and factory noise. Due to the integration of human auditory system model into the wavelet transform, bionic wavelet transform (BWT) has great potential for speech enhancement which may lead to a new path in speech processing. In the proposed technique, at first, discrete BWT is applied to noisy speech to derive TADBWT coefficients. Then the adaptive nature of the BWT is captured by introducing a time varying linear factor which updates the coefficients at each scale over time. This approach has shown better performance than the existing algorithms at lower input SNR due to modified soft level dependent thresholding on time adaptive coefficients. The objective and subjective test results confirmed the competency of the TADBWT technique. The effectiveness of the proposed technique is also evaluated for speaker recognition task under noisy environment. The recognition results show that the TADWT technique yields better performance when compared to alternate methods specifically at lower input SNR.

The Binaural Masking-Level Difference of Mandarin Tone Detection and the Binaural Intelligibility-Level Difference of Mandarin Tone Recognition in the Presence of Speech-Spectrum Noise

PubMed Central

Ho, Cheng-Yu; Li, Pei-Chun; Chiang, Yuan-Chuan; Young, Shuenn-Tsong; Chu, Woei-Chyn

2015-01-01

Binaural hearing involves using information relating to the differences between the signals that arrive at the two ears, and it can make it easier to detect and recognize signals in a noisy environment. This phenomenon of binaural hearing is quantified in laboratory studies as the binaural masking-level difference (BMLD). Mandarin is one of the most commonly used languages, but there are no publication values of BMLD or BILD based on Mandarin tones. Therefore, this study investigated the BMLD and BILD of Mandarin tones. The BMLDs of Mandarin tone detection were measured based on the detection threshold differences for the four tones of the voiced vowels /i/ (i.e., /i1/, /i2/, /i3/, and /i4/) and /u/ (i.e., /u1/, /u2/, /u3/, and /u4/) in the presence of speech-spectrum noise when presented interaurally in phase (S0N0) and interaurally in antiphase (SπN0). The BILDs of Mandarin tone recognition in speech-spectrum noise were determined as the differences in the target-to-masker ratio (TMR) required for 50% correct tone recognitions between the S0N0 and SπN0 conditions. The detection thresholds for the four tones of /i/ and /u/ differed significantly (p<0.001) between the S0N0 and SπN0 conditions. The average detection thresholds of Mandarin tones were all lower in the SπN0 condition than in the S0N0 condition, and the BMLDs ranged from 7.3 to 11.5 dB. The TMR for 50% correct Mandarin tone recognitions differed significantly (p<0.001) between the S0N0 and SπN0 conditions, at –13.4 and –18.0 dB, respectively, with a mean BILD of 4.6 dB. The study showed that the thresholds of Mandarin tone detection and recognition in the presence of speech-spectrum noise are improved when phase inversion is applied to the target speech. The average BILDs of Mandarin tones are smaller than the average BMLDs of Mandarin tones. PMID:25835987
Effects of programming threshold and maplaw settings on acoustic thresholds and speech discrimination with the MED-EL COMBI 40+ cochlear implant.

PubMed

Boyd, Paul J

2006-12-01

The principal task in the programming of a cochlear implant (CI) speech processor is the setting of the electrical dynamic range (output) for each electrode, to ensure that a comfortable loudness percept is obtained for a range of input levels. This typically involves separate psychophysical measurement of electrical threshold ([theta] e) and upper tolerance levels using short current bursts generated by the fitting software. Anecdotal clinical experience and some experimental studies suggest that the measurement of [theta]e is relatively unimportant and that the setting of upper tolerance limits is more critical for processor programming. The present study aims to test this hypothesis and examines in detail how acoustic thresholds and speech recognition are affected by setting of the lower limit of the output ("Programming threshold" or "PT") to understand better the influence of this parameter and how it interacts with certain other programming parameters. Test programs (maps) were generated with PT set to artificially high and low values and tested on users of the MED-EL COMBI 40+ CI system. Acoustic thresholds and speech recognition scores (sentence tests) were measured for each of the test maps. Acoustic thresholds were also measured using maps with a range of output compression functions ("maplaws"). In addition, subjective reports were recorded regarding the presence of "background threshold stimulation" which is occasionally reported by CI users if PT is set to relatively high values when using the CIS strategy. Manipulation of PT was found to have very little effect. Setting PT to minimum produced a mean 5 dB (S.D. = 6.25) increase in acoustic thresholds, relative to thresholds with PT set normally, and had no statistically significant effect on speech recognition scores on a sentence test. On the other hand, maplaw setting was found to have a significant effect on acoustic thresholds (raised as maplaw is made more linear), which provides some theoretical explanation as to why PT has little effect when using the default maplaw of c = 500. Subjective reports of background threshold stimulation showed that most users could perceive a relatively loud auditory percept, in the absence of microphone input, when PT was set to double the behaviorally measured electrical thresholds ([theta]e), but that this produced little intrusion when microphone input was present. The results of these investigations have direct clinical relevance, showing that setting of PT is indeed relatively unimportant in terms of speech discrimination, but that it is worth ensuring that PT is not set excessively high, as this can produce distracting background stimulation. Indeed, it may even be set to minimum values without deleterious effect.
How much does language proficiency by non-native listeners influence speech audiometric tests in noise?

PubMed

Warzybok, Anna; Brand, Thomas; Wagener, Kirsten C; Kollmeier, Birger

2015-01-01

The current study investigates the extent to which the linguistic complexity of three commonly employed speech recognition tests and second language proficiency influence speech recognition thresholds (SRTs) in noise in non-native listeners. SRTs were measured for non-natives and natives using three German speech recognition tests: the digit triplet test (DTT), the Oldenburg sentence test (OLSA), and the Göttingen sentence test (GÖSA). Sixty-four non-native and eight native listeners participated. Non-natives can show native-like SRTs in noise only for the linguistically easy speech material (DTT). Furthermore, the limitation of phonemic-acoustical cues in digit triplets affects speech recognition to the same extent in non-natives and natives. For more complex and less familiar speech materials, non-natives, ranging from basic to advanced proficiency in German, require on average 3-dB better signal-to-noise ratio for the OLSA and 6-dB for the GÖSA to obtain 50% speech recognition compared to native listeners. In clinical audiology, SRT measurements with a closed-set speech test (i.e. DTT for screening or OLSA test for clinical purposes) should be used with non-native listeners rather than open-set speech tests (such as the GÖSA or HINT), especially if a closed-set version in the patient's own native language is available.
A simulation framework for auditory discrimination experiments: Revealing the importance of across-frequency processing in speech perception.

PubMed

Schädler, Marc René; Warzybok, Anna; Ewert, Stephan D; Kollmeier, Birger

2016-05-01

A framework for simulating auditory discrimination experiments, based on an approach from Schädler, Warzybok, Hochmuth, and Kollmeier [(2015). Int. J. Audiol. 54, 100-107] which was originally designed to predict speech recognition thresholds, is extended to also predict psychoacoustic thresholds. The proposed framework is used to assess the suitability of different auditory-inspired feature sets for a range of auditory discrimination experiments that included psychoacoustic as well as speech recognition experiments in noise. The considered experiments were 2 kHz tone-in-broadband-noise simultaneous masking depending on the tone length, spectral masking with simultaneously presented tone signals and narrow-band noise maskers, and German Matrix sentence test reception threshold in stationary and modulated noise. The employed feature sets included spectro-temporal Gabor filter bank features, Mel-frequency cepstral coefficients, logarithmically scaled Mel-spectrograms, and the internal representation of the Perception Model from Dau, Kollmeier, and Kohlrausch [(1997). J. Acoust. Soc. Am. 102(5), 2892-2905]. The proposed framework was successfully employed to simulate all experiments with a common parameter set and obtain objective thresholds with less assumptions compared to traditional modeling approaches. Depending on the feature set, the simulated reference-free thresholds were found to agree with-and hence to predict-empirical data from the literature. Across-frequency processing was found to be crucial to accurately model the lower speech reception threshold in modulated noise conditions than in stationary noise conditions.
Cochlear implantation with hearing preservation yields significant benefit for speech recognition in complex listening environments

PubMed Central

Gifford, René H.; Dorman, Michael F.; Skarzynski, Henryk; Lorens, Artur; Polak, Marek; Driscoll, Colin L. W.; Roland, Peter; Buchman, Craig A.

2012-01-01

Objective The aim of this study was to assess the benefit of having preserved acoustic hearing in the implanted ear for speech recognition in complex listening environments. Design The current study included a within subjects, repeated-measures design including 21 English speaking and 17 Polish speaking cochlear implant recipients with preserved acoustic hearing in the implanted ear. The patients were implanted with electrodes that varied in insertion depth from 10 to 31 mm. Mean preoperative low-frequency thresholds (average of 125, 250 and 500 Hz) in the implanted ear were 39.3 and 23.4 dB HL for the English- and Polish-speaking participants, respectively. In one condition, speech perception was assessed in an 8-loudspeaker environment in which the speech signals were presented from one loudspeaker and restaurant noise was presented from all loudspeakers. In another condition, the signals were presented in a simulation of a reverberant environment with a reverberation time of 0.6 sec. The response measures included speech reception thresholds (SRTs) and percent correct sentence understanding for two test conditions: cochlear implant (CI) plus low-frequency hearing in the contralateral ear (bimodal condition) and CI plus low-frequency hearing in both ears (best aided condition). A subset of 6 English-speaking listeners were also assessed on measures of interaural time difference (ITD) thresholds for a 250-Hz signal. Results Small, but significant, improvements in performance (1.7 – 2.1 dB and 6 – 10 percentage points) were found for the best-aided condition vs. the bimodal condition. Postoperative thresholds in the implanted ear were correlated with the degree of EAS benefit for speech recognition in diffuse noise. There was no reliable relationship among measures of audiometric threshold in the implanted ear nor elevation in threshold following surgery and improvement in speech understanding in reverberation. There was a significant correlation between ITD threshold at 250 Hz and EAS-related benefit for the adaptive SRT. Conclusions Our results suggest that (i) preserved low-frequency hearing improves speech understanding for CI recipients (ii) testing in complex listening environments, in which binaural timing cues differ for signal and noise, may best demonstrate the value of having two ears with low-frequency acoustic hearing and (iii) preservation of binaural timing cues, albeit poorer than observed for individuals with normal hearing, is possible following unilateral cochlear implantation with hearing preservation and is associated with EAS benefit. Our results demonstrate significant communicative benefit for hearing preservation in the implanted ear and provide support for the expansion of cochlear implant criteria to include individuals with low-frequency thresholds in even the normal to near-normal range. PMID:23446225
Speech Perception with Music Maskers by Cochlear Implant Users and Normal-Hearing Listeners

ERIC Educational Resources Information Center

Eskridge, Elizabeth N.; Galvin, John J., III; Aronoff, Justin M.; Li, Tianhao; Fu, Qian-Jie

2012-01-01

Purpose: The goal of this study was to investigate how the spectral and temporal properties in background music may interfere with cochlear implant (CI) and normal-hearing listeners' (NH) speech understanding. Method: Speech-recognition thresholds (SRTs) were adaptively measured in 11 CI and 9 NH subjects. CI subjects were tested while using their…
Working memory capacity may influence perceived effort during aided speech recognition in noise.

PubMed

Rudner, Mary; Lunner, Thomas; Behrens, Thomas; Thorén, Elisabet Sundewall; Rönnberg, Jerker

2012-09-01

Recently there has been interest in using subjective ratings as a measure of perceived effort during speech recognition in noise. Perceived effort may be an indicator of cognitive load. Thus, subjective effort ratings during speech recognition in noise may covary both with signal-to-noise ratio (SNR) and individual cognitive capacity. The present study investigated the relation between subjective ratings of the effort involved in listening to speech in noise, speech recognition performance, and individual working memory (WM) capacity in hearing impaired hearing aid users. In two experiments, participants with hearing loss rated perceived effort during aided speech perception in noise. Noise type and SNR were manipulated in both experiments, and in the second experiment hearing aid compression release settings were also manipulated. Speech recognition performance was measured along with WM capacity. There were 46 participants in all with bilateral mild to moderate sloping hearing loss. In Experiment 1 there were 16 native Danish speakers (eight women and eight men) with a mean age of 63.5 yr (SD = 12.1) and average pure tone (PT) threshold of 47. 6 dB (SD = 9.8). In Experiment 2 there were 30 native Swedish speakers (19 women and 11 men) with a mean age of 70 yr (SD = 7.8) and average PT threshold of 45.8 dB (SD = 6.6). A visual analog scale (VAS) was used for effort rating in both experiments. In Experiment 1, effort was rated at individually adapted SNRs while in Experiment 2 it was rated at fixed SNRs. Speech recognition in noise performance was measured using adaptive procedures in both experiments with Dantale II sentences in Experiment 1 and Hagerman sentences in Experiment 2. WM capacity was measured using a letter-monitoring task in Experiment 1 and the reading span task in Experiment 2. In both experiments, there was a strong and significant relation between rated effort and SNR that was independent of individual WM capacity, whereas the relation between rated effort and noise type seemed to be influenced by individual WM capacity. Experiment 2 showed that hearing aid compression setting influenced rated effort. Subjective ratings of the effort involved in speech recognition in noise reflect SNRs, and individual cognitive capacity seems to influence relative rating of noise type. American Academy of Audiology.
[The endpoint detection of cough signal in continuous speech].

PubMed

Yang, Guoqing; Mo, Hongqiang; Li, Wen; Lian, Lianfang; Zheng, Zeguang

2010-06-01

The endpoint detection of cough signal in continuous speech has been researched in order to improve the efficiency and veracity of manual recognition or computer-based automatic recognition. First, using the short time zero crossing ratio(ZCR) for identifying the suspicious coughs and getting the threshold of short time energy based on acoustic characteristics of cough. Then, the short time energy is combined with short time ZCR in order to implement the endpoint detection of cough in continuous speech. To evaluate the effect of the method, first, the virtual number of coughs in each recording was identified by two experienced doctors using the graphical user interface (GUI). Second, the recordings were analyzed by automatic endpoint detection program under Matlab7.0. Finally, the comparison between these two results showed: The error rate of undetected cough is 2.18%, and 98.13% of noise, silence and speech were removed. The way of setting short time energy threshold is robust. The endpoint detection program can remove most speech and noise, thus maintaining a lower rate of error.
Continuous multiword recognition performance of young and elderly listeners in ambient noise

NASA Astrophysics Data System (ADS)

Sato, Hiroshi

2005-09-01

Hearing threshold shift due to aging is known as a dominant factor to degrade speech recognition performance in noisy conditions. On the other hand, cognitive factors of aging-relating speech recognition performance in various speech-to-noise conditions are not well established. In this study, two kinds of speech test were performed to examine how working memory load relates to speech recognition performance. One is word recognition test with high-familiarity, four-syllable Japanese words (single-word test). In this test, each word was presented to listeners; the listeners were asked to write the word down on paper with enough time to answer. In the other test, five continuous word were presented to listeners and listeners were asked to write the word down after just five words were presented (multiword test). Both tests were done in various speech-to-noise ratios under 50-dBA Hoth spectrum noise with more than 50 young and elderly subjects. The results of two experiments suggest that (1) Hearing level is related to scores of both tests. (2) Scores of single-word test are well correlated with those of multiword test. (3) Scores of multiword test are not improved as speech-to-noise ratio improves in the condition where scores of single-word test reach their ceiling.
Speech as a pilot input medium

NASA Technical Reports Server (NTRS)

Plummer, R. P.; Coler, C. R.

1977-01-01

The speech recognition system under development is a trainable pattern classifier based on a maximum-likelihood technique. An adjustable uncertainty threshold allows the rejection of borderline cases for which the probability of misclassification is high. The syntax of the command language spoken may be used as an aid to recognition, and the system adapts to changes in pronunciation if feedback from the user is available. Words must be separated by .25 second gaps. The system runs in real time on a mini-computer (PDP 11/10) and was tested on 120,000 speech samples from 10- and 100-word vocabularies. The results of these tests were 99.9% correct recognition for a vocabulary consisting of the ten digits, and 99.6% recognition for a 100-word vocabulary of flight commands, with a 5% rejection rate in each case. With no rejection, the recognition accuracies for the same vocabularies were 99.5% and 98.6% respectively.
Clinical Validation of a Sound Processor Upgrade in Direct Acoustic Cochlear Implant Subjects

PubMed Central

Kludt, Eugen; D’hondt, Christiane; Lenarz, Thomas; Maier, Hannes

2017-01-01

Objective: The objectives of the investigation were to evaluate the effect of a sound processor upgrade on the speech reception threshold in noise and to collect long-term safety and efficacy data after 2½ to 5 years of device use of direct acoustic cochlear implant (DACI) recipients. Study Design: The study was designed as a mono-centric, prospective clinical trial. Setting: Tertiary referral center. Patients: Fifteen patients implanted with a direct acoustic cochlear implant. Intervention: Upgrade with a newer generation of sound processor. Main Outcome Measures: Speech recognition test in quiet and in noise, pure tone thresholds, subject-reported outcome measures. Results: The speech recognition in quiet and in noise is superior after the sound processor upgrade and stable after long-term use of the direct acoustic cochlear implant. The bone conduction thresholds did not decrease significantly after long-term high level stimulation. Conclusions: The new sound processor for the DACI system provides significant benefits for DACI users for speech recognition in both quiet and noise. Especially the noise program with the use of directional microphones (Zoom) allows DACI patients to have much less difficulty when having conversations in noisy environments. Furthermore, the study confirms that the benefits of the sound processor upgrade are available to the DACI recipients even after several years of experience with a legacy sound processor. Finally, our study demonstrates that the DACI system is a safe and effective long-term therapy. PMID:28406848
Round Window Application of an Active Middle Ear Implant: A Comparison With Hearing Aid Usage in Japan.

PubMed

Iwasaki, Satoshi; Usami, Shin-Ichi; Takahashi, Haruo; Kanda, Yukihiko; Tono, Tetsuya; Doi, Katsumi; Kumakawa, Kozo; Gyo, Kiyofumi; Naito, Yasushi; Kanzaki, Sho; Yamanaka, Noboru; Kaga, Kimitaka

2017-07-01

To report on the safety and efficacy of an investigational active middle ear implant (AMEI) in Japan, and to compare results to preoperative results with a hearing aid. Prospective study conducted in Japan in which 23 Japanese-speaking adults suffering from conductive or mixed hearing loss received a VIBRANT SOUNDBRIDGE with implantation at the round window. Postoperative thresholds, speech perception results (word recognition scores, speech reception thresholds, signal-to-noise ratio [SNR]), and quality of life questionnaires at 20 weeks were compared with preoperative results with all patients receiving the same, best available hearing aid (HA). Statistically significant improvements in postoperative AMEI-aided thresholds (1, 2, 4, and 8 kHz) and on the speech reception thresholds and word recognition scores tests, compared with preoperative HA-aided results, were observed. On the SNR, the subjects' mean values showed statistically significant improvement, with -5.7 dB SNR for the AMEI-aided mean and -2.1 dB SNR for the preoperative HA-assisted mean. The APHAB quality of life questionnaire also showed statistically significant improvement with the AMEI. Results with the AMEI applied to the round window exceeded those of the best available hearing aid in speech perception as well as quality of life questionnaires. There were minimal adverse events or changes to patients' residual hearing.
Evaluation of auditory functions for Royal Canadian Mounted Police officers.

PubMed

Vaillancourt, Véronique; Laroche, Chantal; Giguère, Christian; Beaulieu, Marc-André; Legault, Jean-Pierre

2011-06-01

Auditory fitness for duty (AFFD) testing is an important element in an assessment of workers' ability to perform job tasks safely and effectively. Functional hearing is particularly critical to job performance in law enforcement. Most often, assessment is based on pure-tone detection thresholds; however, its validity can be questioned and challenged in court. In an attempt to move beyond the pure-tone audiogram, some organizations like the Royal Canadian Mounted Police (RCMP) are incorporating additional testing to supplement audiometric data in their AFFD protocols, such as measurements of speech recognition in quiet and/or in noise, and sound localization. This article reports on the assessment of RCMP officers wearing hearing aids in speech recognition and sound localization tasks. The purpose was to quantify individual performance in different domains of hearing identified as necessary components of fitness for duty, and to document the type of hearing aids prescribed in the field and their benefit for functional hearing. The data are to help RCMP in making more informed decisions regarding AFFD in officers wearing hearing aids. The proposed new AFFD protocol included unaided and aided measures of speech recognition in quiet and in noise using the Hearing in Noise Test (HINT) and sound localization in the left/right (L/R) and front/back (F/B) horizontal planes. Sixty-four officers were identified and selected by the RCMP to take part in this study on the basis of hearing thresholds exceeding current audiometrically based criteria. This article reports the results of 57 officers wearing hearing aids. Based on individual results, 49% of officers were reclassified from nonoperational status to operational with limitations on fine hearing duties, given their unaided and/or aided performance. Group data revealed that hearing aids (1) improved speech recognition thresholds on the HINT, the effects being most prominent in Quiet and in conditions of spatial separation between target and noise (Noise Right and Noise Left) and least considerable in Noise Front; (2) neither significantly improved nor impeded L/R localization; and (3) substantially increased F/B errors in localization in a number of cases. Additional analyses also pointed to the poor ability of threshold data to predict functional abilities for speech in noise (r² = 0.26 to 0.33) and sound localization (r² = 0.03 to 0.28). Only speech in quiet (r² = 0.68 to 0.85) is predicted adequately from threshold data. Combined with previous findings, results indicate that the use of hearing aids can considerably affect F/B localization abilities in a number of individuals. Moreover, speech understanding in noise and sound localization abilities were poorly predicted from pure-tone thresholds, demonstrating the need to specifically test these abilities, both unaided and aided, when assessing AFFD. Finally, further work is needed to develop empirically based hearing criteria for the RCMP and identify best practices in hearing aid fittings for optimal functional hearing abilities. American Academy of Audiology.
Results using the OPAL strategy in Mandarin speaking cochlear implant recipients.

PubMed

Vandali, Andrew E; Dawson, Pam W; Arora, Komal

2017-01-01

To evaluate the effectiveness of an experimental pitch-coding strategy for improving recognition of Mandarin lexical tone in cochlear implant (CI) recipients. Adult CI recipients were tested on recognition of Mandarin tones in quiet and speech-shaped noise at a signal-to-noise ratio of +10 dB; Mandarin sentence speech-reception threshold (SRT) in speech-shaped noise; and pitch discrimination of synthetic complex-harmonic tones in quiet. Two versions of the experimental strategy were examined: (OPAL) linear (1:1) mapping of fundamental frequency (F0) to the coded modulation rate; and (OPAL+) transposed mapping of high F0s to a lower coded rate. Outcomes were compared to results using the clinical ACE™ strategy. Five Mandarin speaking users of Nucleus® cochlear implants. A small but significant benefit in recognition of lexical tones was observed using OPAL compared to ACE in noise, but not in quiet, and not for OPAL+ compared to ACE or OPAL in quiet or noise. Sentence SRTs were significantly better using OPAL+ and comparable using OPAL to those using ACE. No differences in pitch discrimination thresholds were observed across strategies. OPAL can provide benefits to Mandarin lexical tone recognition in moderately noisy conditions and preserve perception of Mandarin sentences in challenging noise conditions.
Across-site patterns of modulation detection: Relation to speech recognitiona)

PubMed Central

Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.

2012-01-01

The aim of this study was to identify across-site patterns of modulation detection thresholds (MDTs) in subjects with cochlear implants and to determine if removal of sites with the poorest MDTs from speech processor programs would result in improved speech recognition. Five hundred millisecond trains of symmetric-biphasic pulses were modulated sinusoidally at 10 Hz and presented at a rate of 900 pps using monopolar stimulation. Subjects were asked to discriminate a modulated pulse train from an unmodulated pulse train for all electrodes in quiet and in the presence of an interleaved unmodulated masker presented on the adjacent site. Across-site patterns of masked MDTs were then used to construct two 10-channel MAPs such that one MAP consisted of sites with the best masked MDTs and the other MAP consisted of sites with the worst masked MDTs. Subjects’ speech recognition skills were compared when they used these two different MAPs. Results showed that MDTs were variable across sites and were elevated in the presence of a masker by various amounts across sites. Better speech recognition was observed when the processor MAP consisted of sites with best masked MDTs, suggesting that temporal modulation sensitivity has important contributions to speech recognition with a cochlear implant. PMID:22559376
Speech recognition in one- and two-talker maskers in school-age children and adults: Development of perceptual masking and glimpsing

PubMed Central

Buss, Emily; Leibold, Lori J.; Porter, Heather L.; Grose, John H.

2017-01-01

Children perform more poorly than adults on a wide range of masked speech perception paradigms, but this effect is particularly pronounced when the masker itself is also composed of speech. The present study evaluated two factors that might contribute to this effect: the ability to perceptually isolate the target from masker speech, and the ability to recognize target speech based on sparse cues (glimpsing). Speech reception thresholds (SRTs) were estimated for closed-set, disyllabic word recognition in children (5–16 years) and adults in a one- or two-talker masker. Speech maskers were 60 dB sound pressure level (SPL), and they were either presented alone or in combination with a 50-dB-SPL speech-shaped noise masker. There was an age effect overall, but performance was adult-like at a younger age for the one-talker than the two-talker masker. Noise tended to elevate SRTs, particularly for older children and adults, and when summed with the one-talker masker. Removing time-frequency epochs associated with a poor target-to-masker ratio markedly improved SRTs, with larger effects for younger listeners; the age effect was not eliminated, however. Results were interpreted as indicating that development of speech-in-speech recognition is likely impacted by development of both perceptual masking and the ability recognize speech based on sparse cues. PMID:28464682
Real-time interactive speech technology at Threshold Technology, Incorporated

NASA Technical Reports Server (NTRS)

Herscher, Marvin B.

1977-01-01

Basic real-time isolated-word recognition techniques are reviewed. Industrial applications of voice technology are described in chronological order of their development. Future research efforts are also discussed.
Unilateral Hearing Loss: Understanding Speech Recognition and Localization Variability-Implications for Cochlear Implant Candidacy.

PubMed

Firszt, Jill B; Reeder, Ruth M; Holden, Laura K

At a minimum, unilateral hearing loss (UHL) impairs sound localization ability and understanding speech in noisy environments, particularly if the loss is severe to profound. Accompanying the numerous negative consequences of UHL is considerable unexplained individual variability in the magnitude of its effects. Identification of covariables that affect outcome and contribute to variability in UHLs could augment counseling, treatment options, and rehabilitation. Cochlear implantation as a treatment for UHL is on the rise yet little is known about factors that could impact performance or whether there is a group at risk for poor cochlear implant outcomes when hearing is near-normal in one ear. The overall goal of our research is to investigate the range and source of variability in speech recognition in noise and localization among individuals with severe to profound UHL and thereby help determine factors relevant to decisions regarding cochlear implantation in this population. The present study evaluated adults with severe to profound UHL and adults with bilateral normal hearing. Measures included adaptive sentence understanding in diffuse restaurant noise, localization, roving-source speech recognition (words from 1 of 15 speakers in a 140° arc), and an adaptive speech-reception threshold psychoacoustic task with varied noise types and noise-source locations. There were three age-sex-matched groups: UHL (severe to profound hearing loss in one ear and normal hearing in the contralateral ear), normal hearing listening bilaterally, and normal hearing listening unilaterally. Although the normal-hearing-bilateral group scored significantly better and had less performance variability than UHLs on all measures, some UHL participants scored within the range of the normal-hearing-bilateral group on all measures. The normal-hearing participants listening unilaterally had better monosyllabic word understanding than UHLs for words presented on the blocked/deaf side but not the open/hearing side. In contrast, UHLs localized better than the normal-hearing unilateral listeners for stimuli on the open/hearing side but not the blocked/deaf side. This suggests that UHLs had learned strategies for improved localization on the side of the intact ear. The UHL and unilateral normal-hearing participant groups were not significantly different for speech in noise measures. UHL participants with childhood rather than recent hearing loss onset localized significantly better; however, these two groups did not differ for speech recognition in noise. Age at onset in UHL adults appears to affect localization ability differently than understanding speech in noise. Hearing thresholds were significantly correlated with speech recognition for UHL participants but not the other two groups. Auditory abilities of UHLs varied widely and could be explained only in part by hearing threshold levels. Age at onset and length of hearing loss influenced performance on some, but not all measures. Results support the need for a revised and diverse set of clinical measures, including sound localization, understanding speech in varied environments, and careful consideration of functional abilities as individuals with severe to profound UHL are being considered potential cochlear implant candidates.
Optimization of programming parameters in children with the advanced bionics cochlear implant.

PubMed

Baudhuin, Jacquelyn; Cadieux, Jamie; Firszt, Jill B; Reeder, Ruth M; Maxson, Jerrica L

2012-05-01

Cochlear implants provide access to soft intensity sounds and therefore improved audibility for children with severe-to-profound hearing loss. Speech processor programming parameters, such as threshold (or T-level), input dynamic range (IDR), and microphone sensitivity, contribute to the recipient's program and influence audibility. When soundfield thresholds obtained through the speech processor are elevated, programming parameters can be modified to improve soft sound detection. Adult recipients show improved detection for low-level sounds when T-levels are set at raised levels and show better speech understanding in quiet when wider IDRs are used. Little is known about the effects of parameter settings on detection and speech recognition in children using today's cochlear implant technology. The overall study aim was to assess optimal T-level, IDR, and sensitivity settings in pediatric recipients of the Advanced Bionics cochlear implant. Two experiments were conducted. Experiment 1 examined the effects of two T-level settings on soundfield thresholds and detection of the Ling 6 sounds. One program set T-levels at 10% of most comfortable levels (M-levels) and another at 10 current units (CUs) below the level judged as "soft." Experiment 2 examined the effects of IDR and sensitivity settings on speech recognition in quiet and noise. Participants were 11 children 7-17 yr of age (mean 11.3) implanted with the Advanced Bionics High Resolution 90K or CII cochlear implant system who had speech recognition scores of 20% or greater on a monosyllabic word test. Two T-level programs were compared for detection of the Ling sounds and frequency modulated (FM) tones. Differing IDR/sensitivity programs (50/0, 50/10, 70/0, 70/10) were compared using Ling and FM tone detection thresholds, CNC (consonant-vowel nucleus-consonant) words at 50 dB SPL, and Hearing in Noise Test for Children (HINT-C) sentences at 65 dB SPL in the presence of four-talker babble (+8 signal-to-noise ratio). Outcomes were analyzed using a paired t-test and a mixed-model repeated measures analysis of variance (ANOVA). T-levels set 10 CUs below "soft" resulted in significantly lower detection thresholds for all six Ling sounds and FM tones at 250, 1000, 3000, 4000, and 6000 Hz. When comparing programs differing by IDR and sensitivity, a 50 dB IDR with a 0 sensitivity setting showed significantly poorer thresholds for low frequency FM tones and voiced Ling sounds. Analysis of group mean scores for CNC words in quiet or HINT-C sentences in noise indicated no significant differences across IDR/sensitivity settings. Individual data, however, showed significant differences between IDR/sensitivity programs in noise; the optimal program differed across participants. In pediatric recipients of the Advanced Bionics cochlear implant device, manually setting T-levels with ascending loudness judgments should be considered when possible or when low-level sounds are inaudible. Study findings confirm the need to determine program settings on an individual basis as well as the importance of speech recognition verification measures in both quiet and noise. Clinical guidelines are suggested for selection of programming parameters in both young and older children. American Academy of Audiology.
Effects of Noise Level and Cognitive Function on Speech Perception in Normal Elderly and Elderly with Amnestic Mild Cognitive Impairment.

PubMed

Lee, Soo Jung; Park, Kyung Won; Kim, Lee-Suk; Kim, HyangHee

2016-06-01

Along with auditory function, cognitive function contributes to speech perception in the presence of background noise. Older adults with cognitive impairment might, therefore, have more difficulty perceiving speech-in-noise than their peers who have normal cognitive function. We compared the effects of noise level and cognitive function on speech perception in patients with amnestic mild cognitive impairment (aMCI), cognitively normal older adults, and cognitively normal younger adults. We studied 14 patients with aMCI and 14 age-, education-, and hearing threshold-matched cognitively intact older adults as experimental groups, and 14 younger adults as a control group. We assessed speech perception with monosyllabic word and sentence recognition tests at four noise levels: quiet condition and signal-to-noise ratio +5 dB, 0 dB, and -5 dB. We also evaluated the aMCI group with a neuropsychological assessment. Controlling for hearing thresholds, we found that the aMCI group scored significantly lower than both the older adults and the younger adults only when the noise level was high (signal-to-noise ratio -5 dB). At signal-to-noise ratio -5 dB, both older groups had significantly lower scores than the younger adults on the sentence recognition test. The aMCI group's sentence recognition performance was related to their executive function scores. Our findings suggest that patients with aMCI have more problems communicating in noisy situations in daily life than do their cognitively healthy peers and that older listeners with more difficulties understanding speech in noise should be considered for testing of neuropsychological function as well as hearing.

A preliminary analysis of human factors affecting the recognition accuracy of a discrete word recognizer for C3 systems

NASA Astrophysics Data System (ADS)

Yellen, H. W.

1983-03-01

Literature pertaining to Voice Recognition abounds with information relevant to the assessment of transitory speech recognition devices. In the past, engineering requirements have dictated the path this technology followed. But, other factors do exist that influence recognition accuracy. This thesis explores the impact of Human Factors on the successful recognition of speech, principally addressing the differences or variability among users. A Threshold Technology T-600 was used for a 100 utterance vocubalary to test 44 subjects. A statistical analysis was conducted on 5 generic categories of Human Factors: Occupational, Operational, Psychological, Physiological and Personal. How the equipment is trained and the experience level of the speaker were found to be key characteristics influencing recognition accuracy. To a lesser extent computer experience, time or week, accent, vital capacity and rate of air flow, speaker cooperativeness and anxiety were found to affect overall error rates.
Deterioration of Speech Recognition Ability Over a Period of 5 Years in Adults Ages 18 to 70 Years: Results of the Dutch Online Speech-in-Noise Test.

PubMed

Stam, Mariska; Smits, Cas; Twisk, Jos W R; Lemke, Ulrike; Festen, Joost M; Kramer, Sophia E

2015-01-01

The first aim of the present study was to determine the change in speech recognition in noise over a period of 5 years in participants ages 18 to 70 years at baseline. The second aim was to investigate whether age, gender, educational level, the level of initial speech recognition in noise, and reported chronic conditions were associated with a change in speech recognition in noise. The baseline and 5-year follow-up data of 427 participants with and without hearing impairment participating in the National Longitudinal Study on Hearing (NL-SH) were analyzed. The ability to recognize speech in noise was measured twice with the online National Hearing Test, a digit-triplet speech-in-noise test. Speech-reception-threshold in noise (SRTn) scores were calculated, corresponding to 50% speech intelligibility. Unaided SRTn scores obtained with the same transducer (headphones or loudspeakers) at both test moments were included. Changes in SRTn were calculated as a raw shift (T1 - T0) and an adjusted shift for regression towards the mean. Paired t tests and multivariable linear regression analyses were applied. The mean increase (i.e., deterioration) in SRTn was 0.38-dB signal-to-noise ratio (SNR) over 5 years (p < 0.001). Results of the multivariable regression analyses showed that the age group of 50 to 59 years had a significantly larger deterioration in SRTn compared with the age group of 18 to 39 years (raw shift: beta: 0.64-dB SNR; 95% confidence interval: 0.07-1.22; p = 0.028, adjusted for initial speech recognition level - adjusted shift: beta: 0.82-dB SNR; 95% confidence interval: 0.27-1.34; p = 0.004). Gender, educational level, and the number of chronic conditions were not associated with a change in SRTn over time. No significant differences in increase of SRTn were found between the initial levels of speech recognition (i.e., good, insufficient, or poor) when taking into account the phenomenon regression towards the mean. The study results indicate that hearing deterioration of speech recognition in noise over 5 years can also be detected in adults ages 18 to 70 years. This rather small numeric change might represent a relevant impact on an individual's ability to understand speech in everyday life.
Influence of compact disk recording protocols on reliability and comparability of speech audiometry outcomes: acoustic analysis.

PubMed

Di Berardino, F; Tognola, G; Paglialonga, A; Alpini, D; Grandori, F; Cesarani, A

2010-08-01

To assess whether different compact disk recording protocols, used to prepare speech test material, affect the reliability and comparability of speech audiometry testing. We conducted acoustic analysis of compact disks used in clinical practice, to determine whether speech material had been recorded using similar procedures. To assess the impact of different recording procedures on speech test outcomes, normal hearing subjects were tested using differently prepared compact disks, and their psychometric curves compared. Acoustic analysis revealed that speech material had been recorded using different protocols. The major difference was the gain between the levels at which the speech material and the calibration signal had been recorded. Although correct calibration of the audiometer was performed for each compact disk before testing, speech recognition thresholds and maximum intelligibility thresholds differed significantly between compact disks (p < 0.05), and were influenced by the gain between the recording level of the speech material and the calibration signal. To ensure the reliability and comparability of speech test outcomes obtained using different compact disks, it is recommended to check for possible differences in the recording gains used to prepare the compact disks, and then to compensate for any differences before testing.
Unilateral Hearing Loss: Understanding Speech Recognition and Localization Variability - Implications for Cochlear Implant Candidacy

PubMed Central

Firszt, Jill B.; Reeder, Ruth M.; Holden, Laura K.

2016-01-01

Objectives At a minimum, unilateral hearing loss (UHL) impairs sound localization ability and understanding speech in noisy environments, particularly if the loss is severe to profound. Accompanying the numerous negative consequences of UHL is considerable unexplained individual variability in the magnitude of its effects. Identification of co-variables that affect outcome and contribute to variability in UHLs could augment counseling, treatment options, and rehabilitation. Cochlear implantation as a treatment for UHL is on the rise yet little is known about factors that could impact performance or whether there is a group at risk for poor cochlear implant outcomes when hearing is near-normal in one ear. The overall goal of our research is to investigate the range and source of variability in speech recognition in noise and localization among individuals with severe to profound UHL and thereby help determine factors relevant to decisions regarding cochlear implantation in this population. Design The present study evaluated adults with severe to profound UHL and adults with bilateral normal hearing. Measures included adaptive sentence understanding in diffuse restaurant noise, localization, roving-source speech recognition (words from 1 of 15 speakers in a 140° arc) and an adaptive speech-reception threshold psychoacoustic task with varied noise types and noise-source locations. There were three age-gender-matched groups: UHL (severe to profound hearing loss in one ear and normal hearing in the contralateral ear), normal hearing listening bilaterally, and normal hearing listening unilaterally. Results Although the normal-hearing-bilateral group scored significantly better and had less performance variability than UHLs on all measures, some UHL participants scored within the range of the normal-hearing-bilateral group on all measures. The normal-hearing participants listening unilaterally had better monosyllabic word understanding than UHLs for words presented on the blocked/deaf side but not the open/hearing side. In contrast, UHLs localized better than the normal hearing unilateral listeners for stimuli on the open/hearing side but not the blocked/deaf side. This suggests that UHLs had learned strategies for improved localization on the side of the intact ear. The UHL and unilateral normal hearing participant groups were not significantly different for speech-in-noise measures. UHL participants with childhood rather than recent hearing loss onset localized significantly better; however, these two groups did not differ for speech recognition in noise. Age at onset in UHL adults appears to affect localization ability differently than understanding speech in noise. Hearing thresholds were significantly correlated with speech recognition for UHL participants but not the other two groups. Conclusions Auditory abilities of UHLs varied widely and could be explained only in part by hearing threshold levels. Age at onset and length of hearing loss influenced performance on some, but not all measures. Results support the need for a revised and diverse set of clinical measures, including sound localization, understanding speech in varied environments and careful consideration of functional abilities as individuals with severe to profound UHL are being considered potential cochlear implant candidates. PMID:28067750
Clinical experience with the words-in-noise test on 3430 veterans: comparisons with pure-tone thresholds and word recognition in quiet.

PubMed

Wilson, Richard H

2011-01-01

Since the 1940s, measures of pure-tone sensitivity and speech recognition in quiet have been vital components of the audiologic evaluation. Although early investigators urged that speech recognition in noise also should be a component of the audiologic evaluation, only recently has this suggestion started to become a reality. This report focuses on the Words-in-Noise (WIN) Test, which evaluates word recognition in multitalker babble at seven signal-to-noise ratios and uses the 50% correct point (in dB SNR) calculated with the Spearman-Kärber equation as the primary metric. The WIN was developed and validated in a series of 12 laboratory studies. The current study examined the effectiveness of the WIN materials for measuring the word-recognition performance of patients in a typical clinical setting. To examine the relations among three audiometric measures including pure-tone thresholds, word-recognition performances in quiet, and word-recognition performances in multitalker babble for veterans seeking remediation for their hearing loss. Retrospective, descriptive. The participants were 3430 veterans who for the most part were evaluated consecutively in the Audiology Clinic at the VA Medical Center, Mountain Home, Tennessee. The mean age was 62.3 yr (SD = 12.8 yr). The data were collected in the course of a 60 min routine audiologic evaluation. A history, otoscopy, and aural-acoustic immittance measures also were included in the clinic protocol but were not evaluated in this report. Overall, the 1000-8000 Hz thresholds were significantly lower (better) in the right ear (RE) than in the left ear (LE). There was a direct relation between age and the pure-tone thresholds, with greater change across age in the high frequencies than in the low frequencies. Notched audiograms at 4000 Hz were observed in at least one ear in 41% of the participants with more unilateral than bilateral notches. Normal pure-tone thresholds (≤20 dB HL) were obtained from 6% of the participants. Maximum performance on the Northwestern University Auditory Test No. 6 (NU-6) in quiet was ≥90% correct by 50% of the participants, with an additional 20% performing at ≥80% correct; the RE performed 1-3% better than the LE. Of the 3291 who completed the WIN on both ears, only 7% exhibited normal performance (50% correct point of ≤6 dB SNR). Overall, WIN performance was significantly better in the RE (mean = 13.3 dB SNR) than in the LE (mean = 13.8 dB SNR). Recognition performance on both the NU-6 and the WIN decreased as a function of both pure-tone hearing loss and age. There was a stronger relation between the high-frequency pure-tone average (1000, 2000, and 4000 Hz) and the WIN than between the pure-tone average (500, 1000, and 2000 Hz) and the WIN. The results on the WIN from both the previous laboratory studies and the current clinical study indicate that the WIN is an appropriate clinic instrument to assess word-recognition performance in background noise. Recognition performance on a speech-in-quiet task does not predict performance on a speech-in-noise task, as the two tasks reflect different domains of auditory function. Experience with the WIN indicates that word-in-noise tasks should be considered the "stress test" for auditory function. American Academy of Audiology.
Auditory Speech Perception Tests in Relation to the Coding Strategy in Cochlear Implant.

PubMed

Bazon, Aline Cristine; Mantello, Erika Barioni; Gonçales, Alina Sanches; Isaac, Myriam de Lima; Hyppolito, Miguel Angelo; Reis, Ana Cláudia Mirândola Barbosa

2016-07-01

The objective of the evaluation of auditory perception of cochlear implant users is to determine how the acoustic signal is processed, leading to the recognition and understanding of sound. To investigate the differences in the process of auditory speech perception in individuals with postlingual hearing loss wearing a cochlear implant, using two different speech coding strategies, and to analyze speech perception and handicap perception in relation to the strategy used. This study is prospective cross-sectional cohort study of a descriptive character. We selected ten cochlear implant users that were characterized by hearing threshold by the application of speech perception tests and of the Hearing Handicap Inventory for Adults. There was no significant difference when comparing the variables subject age, age at acquisition of hearing loss, etiology, time of hearing deprivation, time of cochlear implant use and mean hearing threshold with the cochlear implant with the shift in speech coding strategy. There was no relationship between lack of handicap perception and improvement in speech perception in both speech coding strategies used. There was no significant difference between the strategies evaluated and no relation was observed between them and the variables studied.
Cochlear Dead Regions in Typical Hearing Aid Candidates: Prevalence and Implications for Use of High-Frequency Speech Cues

PubMed Central

Cox, Robyn M; Alexander, Genevieve C; Johnson, Jani; Rivera, Izel

2011-01-01

We investigated the prevalence of cochlear dead regions in listeners with hearing losses similar to those of many hearing aid wearers, and explored the impact of these dead regions on speech perception. Prevalence of dead regions was assessed using the Threshold Equalizing Noise test (TEN(HL)). Speech recognition was measured using high-frequency emphasis (HFE) Quick Speech In Noise (QSIN) test stimuli and low-pass filtered HFE QSIN stimuli. About one third of subjects tested positive for a dead region at one or more frequencies. Also, groups without and with dead regions both benefited from additional high-frequency speech cues. PMID:21522068
Multisensory speech perception in autism spectrum disorder: From phoneme to whole-word perception.

PubMed

Stevenson, Ryan A; Baum, Sarah H; Segers, Magali; Ferber, Susanne; Barense, Morgan D; Wallace, Mark T

2017-07-01

Speech perception in noisy environments is boosted when a listener can see the speaker's mouth and integrate the auditory and visual speech information. Autistic children have a diminished capacity to integrate sensory information across modalities, which contributes to core symptoms of autism, such as impairments in social communication. We investigated the abilities of autistic and typically-developing (TD) children to integrate auditory and visual speech stimuli in various signal-to-noise ratios (SNR). Measurements of both whole-word and phoneme recognition were recorded. At the level of whole-word recognition, autistic children exhibited reduced performance in both the auditory and audiovisual modalities. Importantly, autistic children showed reduced behavioral benefit from multisensory integration with whole-word recognition, specifically at low SNRs. At the level of phoneme recognition, autistic children exhibited reduced performance relative to their TD peers in auditory, visual, and audiovisual modalities. However, and in contrast to their performance at the level of whole-word recognition, both autistic and TD children showed benefits from multisensory integration for phoneme recognition. In accordance with the principle of inverse effectiveness, both groups exhibited greater benefit at low SNRs relative to high SNRs. Thus, while autistic children showed typical multisensory benefits during phoneme recognition, these benefits did not translate to typical multisensory benefit of whole-word recognition in noisy environments. We hypothesize that sensory impairments in autistic children raise the SNR threshold needed to extract meaningful information from a given sensory input, resulting in subsequent failure to exhibit behavioral benefits from additional sensory information at the level of whole-word recognition. Autism Res 2017. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. Autism Res 2017, 10: 1280-1290. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Phase effects in masking by harmonic complexes: speech recognition.

PubMed

Deroche, Mickael L D; Culling, John F; Chatterjee, Monita

2013-12-01

Harmonic complexes that generate highly modulated temporal envelopes on the basilar membrane (BM) mask a tone less effectively than complexes that generate relatively flat temporal envelopes, because the non-linear active gain of the BM selectively amplifies a low-level tone in the dips of a modulated masker envelope. The present study examines a similar effect in speech recognition. Speech reception thresholds (SRTs) were measured for a voice masked by harmonic complexes with partials in sine phase (SP) or in random phase (RP). The masker's fundamental frequency (F0) was 50, 100 or 200 Hz. SRTs were considerably lower for SP than for RP maskers at 50-Hz F0, but the two converged at 100-Hz F0, while at 200-Hz F0, SRTs were a little higher for SP than RP maskers. The results were similar whether the target voice was male or female and whether the masker's spectral profile was flat or speech-shaped. Although listening in the masker dips has been shown to play a large role for artificial stimuli such as Schroeder-phase complexes at high levels, it contributes weakly to speech recognition in the presence of harmonic maskers with different crest factors at more moderate sound levels (65 dB SPL). Copyright © 2013 Elsevier B.V. All rights reserved.
The effect of presentation level and stimulation rate on speech perception and modulation detection for cochlear implant users.

PubMed

Brochier, Tim; McDermott, Hugh J; McKay, Colette M

2017-06-01

In order to improve speech understanding for cochlear implant users, it is important to maximize the transmission of temporal information. The combined effects of stimulation rate and presentation level on temporal information transfer and speech understanding remain unclear. The present study systematically varied presentation level (60, 50, and 40 dBA) and stimulation rate [500 and 2400 pulses per second per electrode (pps)] in order to observe how the effect of rate on speech understanding changes for different presentation levels. Speech recognition in quiet and noise, and acoustic amplitude modulation detection thresholds (AMDTs) were measured with acoustic stimuli presented to speech processors via direct audio input (DAI). With the 500 pps processor, results showed significantly better performance for consonant-vowel nucleus-consonant words in quiet, and a reduced effect of noise on sentence recognition. However, no rate or level effect was found for AMDTs, perhaps partly because of amplitude compression in the sound processor. AMDTs were found to be strongly correlated with the effect of noise on sentence perception at low levels. These results indicate that AMDTs, at least when measured with the CP910 Freedom speech processor via DAI, explain between-subject variance of speech understanding, but do not explain within-subject variance for different rates and levels.
Evaluation of the effects of nonlinear frequency compression on speech recognition and sound quality for adults with mild to moderate hearing loss.

PubMed

Picou, Erin M; Marcrum, Steven C; Ricketts, Todd A

2015-03-01

While potentially improving audibility for listeners with considerable high frequency hearing loss, the effects of implementing nonlinear frequency compression (NFC) for listeners with moderate high frequency hearing loss are unclear. The purpose of this study was to investigate the effects of activating NFC for listeners who are not traditionally considered candidates for this technology. Participants wore study hearing aids with NFC activated for a 3-4 week trial period. After the trial period, they were tested with NFC and with conventional processing on measures of consonant discrimination threshold in quiet, consonant recognition in quiet, sentence recognition in noise, and acceptableness of sound quality of speech and music. Seventeen adult listeners with symmetrical, mild to moderate sensorineural hearing loss participated. Better ear, high frequency pure-tone averages (4, 6, and 8 kHz) were 60 dB HL or better. Activating NFC resulted in lower (better) thresholds for discrimination of /s/, whose spectral center was 9 kHz. There were no other significant effects of NFC compared to conventional processing. These data suggest that the benefits, and detriments, of activating NFC may be limited for this population.
Psychometrically equivalent bisyllabic words for speech recognition threshold testing in Vietnamese.

PubMed

Harris, Richard W; McPherson, David L; Hanson, Claire M; Eggett, Dennis L

2017-08-01

This study identified, digitally recorded, edited and evaluated 89 bisyllabic Vietnamese words with the goal of identifying homogeneous words that could be used to measure the speech recognition threshold (SRT) in native talkers of Vietnamese. Native male and female talker productions of 89 Vietnamese bisyllabic words were recorded, edited and then presented at intensities ranging from -10 to 20 dBHL. Logistic regression was used to identify the best words for measuring the SRT. Forty-eight words were selected and digitally edited to have 50% intelligibility at a level equal to the mean pure-tone average (PTA) for normally hearing participants (5.2 dBHL). Twenty normally hearing native Vietnamese participants listened to and repeated bisyllabic Vietnamese words at intensities ranging from -10 to 20 dBHL. A total of 48 male and female talker recordings of bisyllabic words with steep psychometric functions (>9.0%/dB) were chosen for the final bisyllabic SRT list. Only words homogeneous with respect to threshold audibility with steep psychometric function slopes were chosen for the final list. Digital recordings of bisyllabic Vietnamese words are now available for use in measuring the SRT for patients whose native language is Vietnamese.
Round window vibroplasty: long-term results.

PubMed

Böheim, Klaus; Mlynski, Robert; Lenarz, Thomas; Schlögel, Max; Hagen, Rudolf

2012-10-01

The round window (RW) approach in the use of the Vibrant Soundbridge(®) (VSB) is a safe and effective treatment of conductive and mixed hearing losses for a period of more than 3 years of device use. To investigate the long-term safety and efficacy as well as user satisfaction of patients with conductive and mixed hearing losses implanted with the VSB using RW vibroplasty. Twelve patients with conductive and mixed hearing losses were evaluated after 40 months of daily VSB use. Safety was assessed by evaluating reports of postoperative medical and surgical complications as well as by changes in bone conduction hearing thresholds. Efficacy outcome measures included aided and unaided hearing thresholds, speech recognition in quiet and in noise and subjective benefit questionnaires. The safety results revealed no significant medical complications. One subject experienced sudden hearing loss after 18-24 months of device use, but still continues to wear the device to her satisfaction. With regard to efficacy, there were no significant changes from short- to long-term results in aided word understanding, functional gain or speech recognition threshold, suggesting that the outcomes are stable over time. Subjective questionnaires revealed either the same or better results compared with the short-term data.
Spoken Language Processing in the Clarissa Procedure Browser

NASA Technical Reports Server (NTRS)

Rayner, M.; Hockey, B. A.; Renders, J.-M.; Chatzichrisafis, N.; Farrell, K.

2005-01-01

Clarissa, an experimental voice enabled procedure browser that has recently been deployed on the International Space Station, is as far as we know the first spoken dialog system in space. We describe the objectives of the Clarissa project and the system's architecture. In particular, we focus on three key problems: grammar-based speech recognition using the Regulus toolkit; methods for open mic speech recognition; and robust side-effect free dialogue management for handling undos, corrections and confirmations. We first describe the grammar-based recogniser we have build using Regulus, and report experiments where we compare it against a class N-gram recogniser trained off the same 3297 utterance dataset. We obtained a 15% relative improvement in WER and a 37% improvement in semantic error rate. The grammar-based recogniser moreover outperforms the class N-gram version for utterances of all lengths from 1 to 9 words inclusive. The central problem in building an open-mic speech recognition system is being able to distinguish between commands directed at the system, and other material (cross-talk), which should be rejected. Most spoken dialogue systems make the accept/reject decision by applying a threshold to the recognition confidence score. NASA shows how a simple and general method, based on standard approaches to document classification using Support Vector Machines, can give substantially better performance, and report experiments showing a relative reduction in the task-level error rate by about 25% compared to the baseline confidence threshold method. Finally, we describe a general side-effect free dialogue management architecture that we have implemented in Clarissa, which extends the "update semantics'' framework by including task as well as dialogue information in the information state. We show that this enables elegant treatments of several dialogue management problems, including corrections, confirmations, querying of the environment, and regression testing.
Auditory Outcomes with Hearing Rehabilitation in Children with Unilateral Hearing Loss: A Systematic Review.

PubMed

Appachi, Swathi; Specht, Jessica L; Raol, Nikhila; Lieu, Judith E C; Cohen, Michael S; Dedhia, Kavita; Anne, Samantha

2017-10-01

Objective Options for management of unilateral hearing loss (UHL) in children include conventional hearing aids, bone-conduction hearing devices, contralateral routing of signal (CROS) aids, and frequency-modulating (FM) systems. The objective of this study was to systematically review the current literature to characterize auditory outcomes of hearing rehabilitation options in UHL. Data Sources PubMed, EMBASE, Medline, CINAHL, and Cochrane Library were searched from inception to January 2016. Manual searches of bibliographies were also performed. Review Methods Studies analyzing auditory outcomes of hearing amplification in children with UHL were included. Outcome measures included functional and objective auditory results. Two independent reviewers evaluated each abstract and article. Results Of the 249 articles identified, 12 met inclusion criteria. Seven articles solely focused on outcomes with bone-conduction hearing devices. Outcomes favored improved pure-tone averages, speech recognition thresholds, and sound localization in implanted patients. Five studies focused on FM systems, conventional hearing aids, or CROS hearing aids. Limited data are available but suggest a trend toward improvement in speech perception with hearing aids. FM systems were shown to have the most benefit for speech recognition in noise. Studies evaluating CROS hearing aids demonstrated variable outcomes. Conclusions Data evaluating functional and objective auditory measures following hearing amplification in children with UHL are limited. Most studies do suggest improvement in speech perception, speech recognition in noise, and sound localization with a hearing rehabilitation device.
Assessment of central auditory processing in a group of workers exposed to solvents.

PubMed

Fuente, Adrian; McPherson, Bradley; Muñoz, Verónica; Pablo Espina, Juan

2006-12-01

Despite having normal hearing thresholds and speech recognition thresholds, results for central auditory tests were abnormal in a group of workers exposed to solvents. Workers exposed to solvents may have difficulties in everyday listening situations that are not related to a decrement in hearing thresholds. A central auditory processing disorder may underlie these difficulties. To study central auditory processing abilities in a group of workers occupationally exposed to a mix of organic solvents. Ten workers exposed to a mix of organic solvents and 10 matched non-exposed workers were studied. The test battery comprised pure-tone audiometry, tympanometry, acoustic reflex measurement, acoustic reflex decay, dichotic digit, pitch pattern sequence, masking level difference, filtered speech, random gap detection and hearing-in-noise tests. All the workers presented normal hearing thresholds and no signs of middle ear abnormalities. Workers exposed to solvents had lower results in comparison with the control group and previously reported normative data, in the majority of the tests.
Development and validation of a smartphone-based digits-in-noise hearing test in South African English.

PubMed

Potgieter, Jenni-Marí; Swanepoel, De Wet; Myburgh, Hermanus Carel; Hopper, Thomas Christopher; Smits, Cas

2015-07-01

The objective of this study was to develop and validate a smartphone-based digits-in-noise hearing test for South African English. Single digits (0-9) were recorded and spoken by a first language English female speaker. Level corrections were applied to create a set of homogeneous digits with steep speech recognition functions. A smartphone application was created to utilize 120 digit-triplets in noise as test material. An adaptive test procedure determined the speech reception threshold (SRT). Experiments were performed to determine headphones effects on the SRT and to establish normative data. Participants consisted of 40 normal-hearing subjects with thresholds ≤15 dB across the frequency spectrum (250-8000 Hz) and 186 subjects with normal-hearing in both ears, or normal-hearing in the better ear. The results show steep speech recognition functions with a slope of 20%/dB for digit-triplets presented in noise using the smartphone application. The results of five headphone types indicate that the smartphone-based hearing test is reliable and can be conducted using standard Android smartphone headphones or clinical headphones. A digits-in-noise hearing test was developed and validated for South Africa. The mean SRT and speech recognition functions correspond to previous developed telephone-based digits-in-noise tests.
Monitoring the Hearing Handicap and the Recognition Threshold of Sentences of a Patient with Unilateral Auditory Neuropathy Spectrum Disorder with Use of a Hearing Aid.

PubMed

Lima, Aline Patrícia; Mantello, Erika Barioni; Anastasio, Adriana Ribeiro Tavares

2016-04-01

Introduction Treatment for auditory neuropathy spectrum disorder (ANSD) is not yet well established, including the use of hearing aids (HAs). Not all patients diagnosed with ASND have access to HAs, and in some cases HAs are even contraindicated. Objective To monitor the hearing handicap and the recognition threshold of sentences in silence and in noise in a patient with ASND using an HA. Resumed Report A 47-year-old woman reported moderate sensorineural hearing loss in the right ear and high-frequency loss of 4 kHz in the left ear, with bilateral otoacoustic emissions. Auditory brainstem response suggested changes in the functioning of the auditory pathway (up to the inferior colliculus) on the right. An HA was indicated on the right. The patient was tested within a 3-month period before the HA fitting with respect to recognition threshold of sentences in quiet and in noise and for handicap determination. After HA use, she showed a 2.1-dB improvement in the recognition threshold of sentences in silence, a 6.0-dB improvement for recognition threshold of sentences in noise, and a rapid improvement of the signal-to-noise ratio from +3.66 to -2.4 dB when compared with the same tests before the fitting of the HA. Conclusion There was a reduction of the auditory handicap, although speech perception continued to be severely limited. There was a significant improvement of the recognition threshold of sentences in silence and in noise and of the signal-to-noise ratio after 3 months of HA use.
Monitoring the Hearing Handicap and the Recognition Threshold of Sentences of a Patient with Unilateral Auditory Neuropathy Spectrum Disorder with Use of a Hearing Aid

PubMed Central

Lima, Aline Patrícia; Mantello, Erika Barioni; Anastasio, Adriana Ribeiro Tavares

2015-01-01

Introduction Treatment for auditory neuropathy spectrum disorder (ANSD) is not yet well established, including the use of hearing aids (HAs). Not all patients diagnosed with ASND have access to HAs, and in some cases HAs are even contraindicated. Objective To monitor the hearing handicap and the recognition threshold of sentences in silence and in noise in a patient with ASND using an HA. Resumed Report A 47-year-old woman reported moderate sensorineural hearing loss in the right ear and high-frequency loss of 4 kHz in the left ear, with bilateral otoacoustic emissions. Auditory brainstem response suggested changes in the functioning of the auditory pathway (up to the inferior colliculus) on the right. An HA was indicated on the right. The patient was tested within a 3-month period before the HA fitting with respect to recognition threshold of sentences in quiet and in noise and for handicap determination. After HA use, she showed a 2.1-dB improvement in the recognition threshold of sentences in silence, a 6.0-dB improvement for recognition threshold of sentences in noise, and a rapid improvement of the signal-to-noise ratio from +3.66 to −2.4 dB when compared with the same tests before the fitting of the HA. Conclusion There was a reduction of the auditory handicap, although speech perception continued to be severely limited. There was a significant improvement of the recognition threshold of sentences in silence and in noise and of the signal-to-noise ratio after 3 months of HA use. PMID:27096026
Converted and upgraded maps programmed in the newer speech processor for the first generation of multichannel cochlear implant.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, M Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2013-09-01

To identify the technological contributions of the newer version of speech processor to the first generation of multichannel cochlear implant and the satisfaction of users of the new technology. Among the new features available, we focused on the effect of the frequency allocation table, the T-SPL and C-SPL, and the preprocessing gain adjustments (adaptive dynamic range optimization). Prospective exploratory study. Cochlear implant center at hospital. Cochlear implant users of the Spectra processor with speech recognition in closed set. Seventeen patients were selected between the ages of 15 and 82 and deployed for more than 8 years. The technology update of the speech processor for the Nucleus 22. To determine Freedom's contribution, thresholds and speech perception tests were performed with the last map used with the Spectra and the maps created for Freedom. To identify the effect of the frequency allocation table, both upgraded and converted maps were programmed. One map was programmed with 25 dB T-SPL and 65 dB C-SPL and the other map with adaptive dynamic range optimization. To assess satisfaction, SADL and APHAB were used. All speech perception tests and all sound field thresholds were statistically better with the new speech processor; 64.7% of patients preferred maintaining the same frequency table that was suggested for the older processor. The sound field threshold was statistically significant at 500, 1,000, 1,500, and 2,000 Hz with 25 dB T-SPL/65 dB C-SPL. Regarding patient's satisfaction, there was a statistically significant improvement, only in the subscale of speech in noise abilities and phone use. The new technology improved the performance of patients with the first generation of multichannel cochlear implant.

Using Temporal Modulation Sensitivity to Select Stimulation Sites for Processor MAPs in Cochlear Implant Listeners

PubMed Central

Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.

2013-01-01

Previous studies in our laboratory showed that temporal acuity as assessed by modulation detection thresholds (MDTs) varied across activation sites and that this site-to-site variability was subject specific. Using two 10-channel MAPs, the previous experiments showed that processor MAPs that had better across-site mean (ASM) MDTs yielded better speech recognition than MAPs with poorer ASM MDTs tested in the same subject. The current study extends our earlier work on developing more optimal fitting strategies to test the feasibility of using a site-selection approach in the clinical domain. This study examined the hypothesis that revising the clinical speech processor MAP for cochlear implant (CI) recipients by turning off selected sites that have poorer temporal acuity and reallocating frequencies to the remaining electrodes would lead to improved speech recognition. Twelve CI recipients participated in the experiments. We found that site selection procedure based on MDTs in the presence of a masker resulted in improved performance on consonant recognition and recognition of sentences in noise. In contrast, vowel recognition was poorer with the experimental MAP than with the clinical MAP, possibly due to reduced spectral resolution when sites were removed from the experimental MAP. Overall, these results suggest a promising path for improving recipient outcomes using personalized processor-fitting strategies based on a psychophysical measure of temporal acuity. PMID:23881208
Improving Speech Perception in Noise with Current Focusing in Cochlear Implant Users

PubMed Central

Srinivasan, Arthi G.; Padilla, Monica; Shannon, Robert V.; Landsberger, David M.

2013-01-01

Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. PMID:23467170
Masking release due to linguistic and phonetic dissimilarity between the target and masker speech

PubMed Central

Calandruccio, Lauren; Brouwer, Susanne; Van Engen, Kristin J.; Dhar, Sumitrajit; Bradlow, Ann R.

2013-01-01

Purpose To investigate masking release for speech maskers for linguistically and phonetically close (English and Dutch) and distant (English and Mandarin) language pairs. Method Twenty monolingual speakers of English with normal-audiometric thresholds participated. Data are reported for an English sentence recognition task in English, Dutch and Mandarin competing speech maskers (Experiment I) and noise maskers (Experiment II) that were matched either to the long-term-average-speech spectra or to the temporal modulations of the speech maskers from Experiment I. Results Results indicated that listener performance increased as the target-to-masker linguistic distance increased (English-in-English < English-in-Dutch < English-in-Mandarin). Conclusions Spectral differences between maskers can account for some, but not all, of the variation in performance between maskers; however, temporal differences did not seem to play a significant role. PMID:23800811
Tinnitus and other auditory problems - occupational noise exposure below risk limits may cause inner ear dysfunction.

PubMed

Lindblad, Ann-Cathrine; Rosenhall, Ulf; Olofsson, Åke; Hagerman, Björn

2014-01-01

The aim of the investigation was to study if dysfunctions associated to the cochlea or its regulatory system can be found, and possibly explain hearing problems in subjects with normal or near-normal audiograms. The design was a prospective study of subjects recruited from the general population. The included subjects were persons with auditory problems who had normal, or near-normal, pure tone hearing thresholds, who could be included in one of three subgroups: teachers, Education; people working with music, Music; and people with moderate or negligible noise exposure, Other. A fourth group included people with poorer pure tone hearing thresholds and a history of severe occupational noise, Industry. Ntotal = 193. The following hearing tests were used: - pure tone audiometry with Békésy technique, - transient evoked otoacoustic emissions and distortion product otoacoustic emissions, without and with contralateral noise; - psychoacoustical modulation transfer function, - forward masking, - speech recognition in noise, - tinnitus matching. A questionnaire about occupations, noise exposure, stress/anxiety, muscular problems, medication, and heredity, was addressed to the participants. Forward masking results were significantly worse for Education and Industry than for the other groups, possibly associated to the inner hair cell area. Forward masking results were significantly correlated to louder matched tinnitus. For many subjects speech recognition in noise, left ear, did not increase in a normal way when the listening level was increased. Subjects hypersensitive to loud sound had significantly better speech recognition in noise at the lower test level than subjects not hypersensitive. Self-reported stress/anxiety was similar for all groups. In conclusion, hearing dysfunctions were found in subjects with tinnitus and other auditory problems, combined with normal or near-normal pure tone thresholds. The teachers, mostly regarded as a group exposed to noise below risk levels, had dysfunctions almost identical to those of the more exposed Industry group.
Tinnitus and Other Auditory Problems – Occupational Noise Exposure below Risk Limits May Cause Inner Ear Dysfunction

PubMed Central

Lindblad, Ann-Cathrine; Rosenhall, Ulf; Olofsson, Åke; Hagerman, Björn

2014-01-01

The aim of the investigation was to study if dysfunctions associated to the cochlea or its regulatory system can be found, and possibly explain hearing problems in subjects with normal or near-normal audiograms. The design was a prospective study of subjects recruited from the general population. The included subjects were persons with auditory problems who had normal, or near-normal, pure tone hearing thresholds, who could be included in one of three subgroups: teachers, Education; people working with music, Music; and people with moderate or negligible noise exposure, Other. A fourth group included people with poorer pure tone hearing thresholds and a history of severe occupational noise, Industry. Ntotal = 193. The following hearing tests were used: − pure tone audiometry with Békésy technique, − transient evoked otoacoustic emissions and distortion product otoacoustic emissions, without and with contralateral noise; − psychoacoustical modulation transfer function, − forward masking, − speech recognition in noise, − tinnitus matching. A questionnaire about occupations, noise exposure, stress/anxiety, muscular problems, medication, and heredity, was addressed to the participants. Forward masking results were significantly worse for Education and Industry than for the other groups, possibly associated to the inner hair cell area. Forward masking results were significantly correlated to louder matched tinnitus. For many subjects speech recognition in noise, left ear, did not increase in a normal way when the listening level was increased. Subjects hypersensitive to loud sound had significantly better speech recognition in noise at the lower test level than subjects not hypersensitive. Self-reported stress/anxiety was similar for all groups. In conclusion, hearing dysfunctions were found in subjects with tinnitus and other auditory problems, combined with normal or near-normal pure tone thresholds. The teachers, mostly regarded as a group exposed to noise below risk levels, had dysfunctions almost identical to those of the more exposed Industry group. PMID:24827149
Temporal and speech processing skills in normal hearing individuals exposed to occupational noise.

PubMed

Kumar, U Ajith; Ameenudin, Syed; Sangamanatha, A V

2012-01-01

Prolonged exposure to high levels of occupational noise can cause damage to hair cells in the cochlea and result in permanent noise-induced cochlear hearing loss. Consequences of cochlear hearing loss on speech perception and psychophysical abilities have been well documented. Primary goal of this research was to explore temporal processing and speech perception Skills in individuals who are exposed to occupational noise of more than 80 dBA and not yet incurred clinically significant threshold shifts. Contribution of temporal processing skills to speech perception in adverse listening situation was also evaluated. A total of 118 participants took part in this research. Participants comprised three groups of train drivers in the age range of 30-40 (n= 13), 41 50 ( = 13), 41-50 (n = 9), and 51-60 (n = 6) years and their non-noise-exposed counterparts (n = 30 in each age group). Participants of all the groups including the train drivers had hearing sensitivity within 25 dB HL in the octave frequencies between 250 and 8 kHz. Temporal processing was evaluated using gap detection, modulation detection, and duration pattern tests. Speech recognition was tested in presence multi-talker babble at -5dB SNR. Differences between experimental and control groups were analyzed using ANOVA and independent sample t-tests. Results showed a trend of reduced temporal processing skills in individuals with noise exposure. These deficits were observed despite normal peripheral hearing sensitivity. Speech recognition scores in the presence of noise were also significantly poor in noise-exposed group. Furthermore, poor temporal processing skills partially accounted for the speech recognition difficulties exhibited by the noise-exposed individuals. These results suggest that noise can cause significant distortions in the processing of suprathreshold temporal cues which may add to difficulties in hearing in adverse listening conditions.
Word recognition materials for native speakers of Taiwan Mandarin.

PubMed

Nissen, Shawn L; Harris, Richard W; Dukes, Alycia

2008-06-01

To select, digitally record, evaluate, and psychometrically equate word recognition materials that can be used to measure the speech perception abilities of native speakers of Taiwan Mandarin in quiet. Frequently used bisyllabic words produced by male and female talkers of Taiwan Mandarin were digitally recorded and subsequently evaluated using 20 native listeners with normal hearing at 10 intensity levels (-5 to 40 dB HL) in increments of 5 dB. Using logistic regression, 200 words with the steepest psychometric slopes were divided into 4 lists and 8 half-lists that were relatively equivalent in psychometric function slope. To increase auditory homogeneity of the lists, the intensity of words in each list was digitally adjusted so that the threshold of each list was equal to the midpoint between the mean thresholds of the male and female half-lists. Digital recordings of the word recognition lists and the associated clinical instructions are available on CD upon request.
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms.

PubMed

Schädler, Marc R; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than -20 dB could not be predicted.
Objective Prediction of Hearing Aid Benefit Across Listener Groups Using Machine Learning: Speech Recognition Performance With Binaural Noise-Reduction Algorithms

PubMed Central

Schädler, Marc R.; Warzybok, Anna; Kollmeier, Birger

2018-01-01

The simulation framework for auditory discrimination experiments (FADE) was adopted and validated to predict the individual speech-in-noise recognition performance of listeners with normal and impaired hearing with and without a given hearing-aid algorithm. FADE uses a simple automatic speech recognizer (ASR) to estimate the lowest achievable speech reception thresholds (SRTs) from simulated speech recognition experiments in an objective way, independent from any empirical reference data. Empirical data from the literature were used to evaluate the model in terms of predicted SRTs and benefits in SRT with the German matrix sentence recognition test when using eight single- and multichannel binaural noise-reduction algorithms. To allow individual predictions of SRTs in binaural conditions, the model was extended with a simple better ear approach and individualized by taking audiograms into account. In a realistic binaural cafeteria condition, FADE explained about 90% of the variance of the empirical SRTs for a group of normal-hearing listeners and predicted the corresponding benefits with a root-mean-square prediction error of 0.6 dB. This highlights the potential of the approach for the objective assessment of benefits in SRT without prior knowledge about the empirical data. The predictions for the group of listeners with impaired hearing explained 75% of the empirical variance, while the individual predictions explained less than 25%. Possibly, additional individual factors should be considered for more accurate predictions with impaired hearing. A competing talker condition clearly showed one limitation of current ASR technology, as the empirical performance with SRTs lower than −20 dB could not be predicted. PMID:29692200
Differential Recognition of Pitch Patterns in Discrete and Gliding Stimuli in Congenital Amusia: Evidence from Mandarin Speakers

ERIC Educational Resources Information Center

Liu, Fang; Xu, Yi; Patel, Aniruddh D.; Francart, Tom; Jiang, Cunmei

2012-01-01

This study examined whether "melodic contour deafness" (insensitivity to the direction of pitch movement) in congenital amusia is associated with specific types of pitch patterns (discrete versus gliding pitches) or stimulus types (speech syllables versus complex tones). Thresholds for identification of pitch direction were obtained using discrete…
Improving speech perception in noise with current focusing in cochlear implant users.

PubMed

Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M

2013-05-01

Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. Copyright © 2013 Elsevier B.V. All rights reserved.
Within-subjects comparison of the HiRes and Fidelity120 speech processing strategies: speech perception and its relation to place-pitch sensitivity.

PubMed

Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z

2011-01-01

Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer experience with Fidelity120. Another three subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8-wk time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in vowel F2 frequency perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (vowel F1 frequency or consonant place of articulation). Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise.
Comparing models of the combined-stimulation advantage for speech recognition.

PubMed

Micheyl, Christophe; Oxenham, Andrew J

2012-05-01

The "combined-stimulation advantage" refers to an improvement in speech recognition when cochlear-implant or vocoded stimulation is supplemented by low-frequency acoustic information. Previous studies have been interpreted as evidence for "super-additive" or "synergistic" effects in the combination of low-frequency and electric or vocoded speech information by human listeners. However, this conclusion was based on predictions of performance obtained using a suboptimal high-threshold model of information combination. The present study shows that a different model, based on Gaussian signal detection theory, can predict surprisingly large combined-stimulation advantages, even when performance with either information source alone is close to chance, without involving any synergistic interaction. A reanalysis of published data using this model reveals that previous results, which have been interpreted as evidence for super-additive effects in perception of combined speech stimuli, are actually consistent with a more parsimonious explanation, according to which the combined-stimulation advantage reflects an optimal combination of two independent sources of information. The present results do not rule out the possible existence of synergistic effects in combined stimulation; however, they emphasize the possibility that the combined-stimulation advantages observed in some studies can be explained simply by non-interactive combination of two information sources.
High levels of sound pressure: acoustic reflex thresholds and auditory complaints of workers with noise exposure.

PubMed

Duarte, Alexandre Scalli Mathias; Ng, Ronny Tah Yen; de Carvalho, Guilherme Machado; Guimarães, Alexandre Caixeta; Pinheiro, Laiza Araujo Mohana; Costa, Everardo Andrade da; Gusmão, Reinaldo Jordão

2015-01-01

The clinical evaluation of subjects with occupational noise exposure has been difficult due to the discrepancy between auditory complaints and auditory test results. This study aimed to evaluate the contralateral acoustic reflex thresholds of workers exposed to high levels of noise, and to compare these results to the subjects' auditory complaints. This clinical retrospective study evaluated 364 workers between 1998 and 2005; their contralateral acoustic reflexes were compared to auditory complaints, age, and noise exposure time by chi-squared, Fisher's, and Spearman's tests. The workers' age ranged from 18 to 50 years (mean=39.6), and noise exposure time from one to 38 years (mean=17.3). We found that 15.1% (55) of the workers had bilateral hearing loss, 38.5% (140) had bilateral tinnitus, 52.8% (192) had abnormal sensitivity to loud sounds, and 47.2% (172) had speech recognition impairment. The variables hearing loss, speech recognition impairment, tinnitus, age group, and noise exposure time did not show relationship with acoustic reflex thresholds; however, all complaints demonstrated a statistically significant relationship with Metz recruitment at 3000 and 4000Hz bilaterally. There was no significance relationship between auditory complaints and acoustic reflexes. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Classification Influence of Features on Given Emotions and Its Application in Feature Selection

NASA Astrophysics Data System (ADS)

Xing, Yin; Chen, Chuang; Liu, Li-Long

2018-04-01

In order to solve the problem that there is a large amount of redundant data in high-dimensional speech emotion features, we analyze deeply the extracted speech emotion features and select better features. Firstly, a given emotion is classified by each feature. Secondly, the recognition rate is ranked in descending order. Then, the optimal threshold of features is determined by rate criterion. Finally, the better features are obtained. When applied in Berlin and Chinese emotional data set, the experimental results show that the feature selection method outperforms the other traditional methods.
[The Freiburg speech intelligibility test : A pillar of speech audiometry in German-speaking countries].

PubMed

Hoth, S

2016-08-01

The Freiburg speech intelligibility test according to DIN 45621 was introduced around 60 years ago. For decades, and still today, the Freiburg test has been a standard whose relevance extends far beyond pure audiometry. It is used primarily to determine the speech perception threshold (based on two-digit numbers) and the ability to discriminate speech at suprathreshold presentation levels (based on monosyllabic nouns). Moreover, it is a measure of the degree of disability, the requirement for and success of technical hearing aids (auxiliaries directives), and the compensation for disability and handicap (Königstein recommendation). In differential audiological diagnostics, the Freiburg test contributes to the distinction between low- and high-frequency hearing loss, as well as to identification of conductive, sensory, neural, and central disorders. Currently, the phonemic and perceptual balance of the monosyllabic test lists is subject to critical discussions. Obvious deficiencies exist for testing speech recognition in noise. In this respect, alternatives such as sentence or rhyme tests in closed-answer inventories are discussed.
Automatic speech recognition technology development at ITT Defense Communications Division

NASA Technical Reports Server (NTRS)

White, George M.

1977-01-01

An assessment of the applications of automatic speech recognition to defense communication systems is presented. Future research efforts include investigations into the following areas: (1) dynamic programming; (2) recognition of speech degraded by noise; (3) speaker independent recognition; (4) large vocabulary recognition; (5) word spotting and continuous speech recognition; and (6) isolated word recognition.
[Clinical diagnosis of Treacher Collins syndrome and the efficacy of using BAHA].

PubMed

Wang, Y B; Chen, X W; Wang, P; Fan, X M; Fan, Y; Liu, Q; Gao, Z Q

2017-04-20

Objective: To evaluate the efficacy of soft or implanted BAHA in the patients of Treacher Collins syndrome(TCS). Method: Six patients of TCS were studied. The Teber scoring system was used to evaluate the deformity degree. The air and bone auditory thresholds were assessed by auditory brain stem response(ABR). The infant-toddler meaningful auditory integration scale(IT-MAIS) was used to assess the auditory development at three time levels: baseline,3 months and 6 months. The hearing threshold and speech recognition score were measured under unaided and aided conditions. Result: The average score of deformity degree was 14.0±0.6. The TCOF1 gene was tested in two patients. The bone conduction hearing thresholds of patients was（18.0±4.5）dBnHL and the air conduction hearing thresholds was （70.5±7.0）dBnHL. The IT-MAIS total, detection and perception scores were improved significantly after wearing softband BAHA and approached the normal level in the 2 patients under 2 years old. The hearing thresholds of 6 patients in unaided and softband BAHA conditions were（65.8±3.8）dBHL and （30.0±3.2）dBHL ( P <0.01) respectively, and 1 implanted BAHA was 15 dBHL. The speech recognition scores of 3 patients in unaided and softband BAHA conditions were（31.7±3.5）% and（86.0±1.7）%( P <0.05) respectively, and 1 implanted BAHA was 96%. Conclusion: Whenever the patient was diagnosed as TCS by the clinical manifestations and genetic testing, BAHA system could help to rehabilitate the hearing to a normal condition. Copyright© by the Editorial Department of Journal of Clinical Otorhinolaryngology Head and Neck Surgery.
Auditory and cognitive factors underlying individual differences in aided speech-understanding among older adults

PubMed Central

Humes, Larry E.; Kidd, Gary R.; Lentz, Jennifer J.

2013-01-01

This study was designed to address individual differences in aided speech understanding among a relatively large group of older adults. The group of older adults consisted of 98 adults (50 female and 48 male) ranging in age from 60 to 86 (mean = 69.2). Hearing loss was typical for this age group and about 90% had not worn hearing aids. All subjects completed a battery of tests, including cognitive (6 measures), psychophysical (17 measures), and speech-understanding (9 measures), as well as the Speech, Spatial, and Qualities of Hearing (SSQ) self-report scale. Most of the speech-understanding measures made use of competing speech and the non-speech psychophysical measures were designed to tap phenomena thought to be relevant for the perception of speech in competing speech (e.g., stream segregation, modulation-detection interference). All measures of speech understanding were administered with spectral shaping applied to the speech stimuli to fully restore audibility through at least 4000 Hz. The measures used were demonstrated to be reliable in older adults and, when compared to a reference group of 28 young normal-hearing adults, age-group differences were observed on many of the measures. Principal-components factor analysis was applied successfully to reduce the number of independent and dependent (speech understanding) measures for a multiple-regression analysis. Doing so yielded one global cognitive-processing factor and five non-speech psychoacoustic factors (hearing loss, dichotic signal detection, multi-burst masking, stream segregation, and modulation detection) as potential predictors. To this set of six potential predictor variables were added subject age, Environmental Sound Identification (ESI), and performance on the text-recognition-threshold (TRT) task (a visual analog of interrupted speech recognition). These variables were used to successfully predict one global aided speech-understanding factor, accounting for about 60% of the variance. PMID:24098273
Fifty years of progress in speech and speaker recognition

NASA Astrophysics Data System (ADS)

Furui, Sadaoki

2004-10-01

Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.

Evaluation of speech reception threshold in noise in young Cochlear™ Nucleus® system 6 implant recipients using two different digital remote microphone technologies and a speech enhancement sound processing algorithm.

PubMed

Razza, Sergio; Zaccone, Monica; Meli, Aannalisa; Cristofari, Eliana

2017-12-01

Children affected by hearing loss can experience difficulties in challenging and noisy environments even when deafness is corrected by Cochlear implant (CI) devices. These patients have a selective attention deficit in multiple listening conditions. At present, the most effective ways to improve the performance of speech recognition in noise consists of providing CI processors with noise reduction algorithms and of providing patients with bilateral CIs. The aim of this study was to compare speech performances in noise, across increasing noise levels, in CI recipients using two kinds of wireless remote-microphone radio systems that use digital radio frequency transmission: the Roger Inspiro accessory and the Cochlear Wireless Mini Microphone accessory. Eleven Nucleus Cochlear CP910 CI young user subjects were studied. The signal/noise ratio, at a speech reception threshold (SRT) value of 50%, was measured in different conditions for each patient: with CI only, with the Roger or with the MiniMic accessory. The effect of the application of the SNR-noise reduction algorithm in each of these conditions was also assessed. The tests were performed with the subject positioned in front of the main speaker, at a distance of 2.5 m. Another two speakers were positioned at 3.50 m. The main speaker at 65 dB issued disyllabic words. Babble noise signal was delivered through the other speakers, with variable intensity. The use of both wireless remote microphones improved the SRT results. Both systems improved gain of speech performances. The gain was higher with the Mini Mic system (SRT = -4.76) than the Roger system (SRT = -3.01). The addition of the NR algorithm did not statistically further improve the results. There is significant improvement in speech recognition results with both wireless digital remote microphone accessories, in particular with the Mini Mic system when used with the CP910 processor. The use of a remote microphone accessory surpasses the benefit of application of NR algorithm. Copyright © 2017. Published by Elsevier B.V.
Asynchronous glimpsing of speech: Spread of masking and task set-size

PubMed Central

Ozmeral, Erol J.; Buss, Emily; Hall, Joseph W.

2012-01-01

Howard-Jones and Rosen [(1993). J. Acoust. Soc. Am. 93, 2915–2922] investigated the ability to integrate glimpses of speech that are separated in time and frequency using a “checkerboard” masker, with asynchronous amplitude modulation (AM) across frequency. Asynchronous glimpsing was demonstrated only for spectrally wide frequency bands. It is possible that the reduced evidence of spectro-temporal integration with narrower bands was due to spread of masking at the periphery. The present study tested this hypothesis with a dichotic condition, in which the even- and odd-numbered bands of the target speech and asynchronous AM masker were presented to opposite ears, minimizing the deleterious effects of masking spread. For closed-set consonant recognition, thresholds were 5.1–8.5 dB better for dichotic than for monotic asynchronous AM conditions. Results were similar for closed-set word recognition, but for open-set word recognition the benefit of dichotic presentation was more modest and level dependent, consistent with the effects of spread of masking being level dependent. There was greater evidence of asynchronous glimpsing in the open-set than closed-set tasks. Presenting stimuli dichotically supported asynchronous glimpsing with narrower frequency bands than previously shown, though the magnitude of glimpsing was reduced for narrower bandwidths even in some dichotic conditions. PMID:22894234
Evaluating a smartphone digits-in-noise test as part of the audiometric test battery.

PubMed

Potgieter, Jenni-Mari; Swanepoel, De Wet; Smits, Cas

2018-05-21

Speech-in-noise tests have become a valuable part of the audiometric test battery providing an indication of a listener's ability to function in background noise. A simple digits-in-noise (DIN) test could be valuable to support diagnostic hearing assessments, hearing aid fittings and counselling for both paediatric and adult populations. Objective: The objective of this study was to evaluate the South African English smartphone DIN test's performance as part of the audiometric test battery. Design: This descriptive study evaluated 109 adult subjects (43 male and 66 female subjects) with and without sensorineural hearing loss by comparing pure-tone air conduction thresholds, speech recognition monaural performance scores (SRS dB) and the DIN speech reception threshold (SRT). An additional nine adult hearing aid users (four male and five female subjects) were included in a subset to determine aided and unaided DIN SRTs. Results: The DIN SRT is strongly associated with the best ear 4 frequency pure-tone average (4FPTA) (rs = 0.81) and maximum SRS dB (r = 0.72). The DIN test had high sensitivity and specificity to identify abnormal pure-tone (0.88 and 0.88, respectively) and SRS dB (0.76 and 0.88, respectively) results. There was a mean signal-to-noise ratio (SNR) improvement in the aided condition that demonstrated an overall benefit of 0.84 SNR dB. Conclusion: The DIN SRT was significantly correlated with the best ear 4FPTA and maximum SRS dB. The DIN SRT provides a useful measure of speech recognition in noise that can evaluate hearing aid fittings, manage counselling and hearing expectations.
Loud Music Exposure and Cochlear Synaptopathy in Young Adults: Isolated Auditory Brainstem Response Effects but No Perceptual Consequences.

PubMed

Grose, John H; Buss, Emily; Hall, Joseph W

2017-01-01

The purpose of this study was to test the hypothesis that listeners with frequent exposure to loud music exhibit deficits in suprathreshold auditory performance consistent with cochlear synaptopathy. Young adults with normal audiograms were recruited who either did ( n = 31) or did not ( n = 30) have a history of frequent attendance at loud music venues where the typical sound levels could be expected to result in temporary threshold shifts. A test battery was administered that comprised three sets of procedures: (a) electrophysiological tests including distortion product otoacoustic emissions, auditory brainstem responses, envelope following responses, and the acoustic change complex evoked by an interaural phase inversion; (b) psychoacoustic tests including temporal modulation detection, spectral modulation detection, and sensitivity to interaural phase; and (c) speech tests including filtered phoneme recognition and speech-in-noise recognition. The results demonstrated that a history of loud music exposure can lead to a profile of peripheral auditory function that is consistent with an interpretation of cochlear synaptopathy in humans, namely, modestly abnormal auditory brainstem response Wave I/Wave V ratios in the presence of normal distortion product otoacoustic emissions and normal audiometric thresholds. However, there were no other electrophysiological, psychophysical, or speech perception effects. The absence of any behavioral effects in suprathreshold sound processing indicated that, even if cochlear synaptopathy is a valid pathophysiological condition in humans, its perceptual sequelae are either too diffuse or too inconsequential to permit a simple differential diagnosis of hidden hearing loss.
Should visual speech cues (speechreading) be considered when fitting hearing aids?

NASA Astrophysics Data System (ADS)

Grant, Ken

2002-05-01

When talker and listener are face-to-face, visual speech cues become an important part of the communication environment, and yet, these cues are seldom considered when designing hearing aids. Models of auditory-visual speech recognition highlight the importance of complementary versus redundant speech information for predicting auditory-visual recognition performance. Thus, for hearing aids to work optimally when visual speech cues are present, it is important to know whether the cues provided by amplification and the cues provided by speechreading complement each other. In this talk, data will be reviewed that show nonmonotonicity between auditory-alone speech recognition and auditory-visual speech recognition, suggesting that efforts designed solely to improve auditory-alone recognition may not always result in improved auditory-visual recognition. Data will also be presented showing that one of the most important speech cues for enhancing auditory-visual speech recognition performance, voicing, is often the cue that benefits least from amplification.
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

NASA Astrophysics Data System (ADS)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Effect of signal to noise ratio on the speech perception ability of older adults

PubMed Central

Shojaei, Elahe; Ashayeri, Hassan; Jafari, Zahra; Zarrin Dast, Mohammad Reza; Kamali, Koorosh

2016-01-01

Background: Speech perception ability depends on auditory and extra-auditory elements. The signal- to-noise ratio (SNR) is an extra-auditory element that has an effect on the ability to normally follow speech and maintain a conversation. Speech in noise perception difficulty is a common complaint of the elderly. In this study, the importance of SNR magnitude as an extra-auditory effect on speech perception in noise was examined in the elderly. Methods: The speech perception in noise test (SPIN) was conducted on 25 elderly participants who had bilateral low–mid frequency normal hearing thresholds at three SNRs in the presence of ipsilateral white noise. These participants were selected by available sampling method. Cognitive screening was done using the Persian Mini Mental State Examination (MMSE) test. Results: Independent T- test, ANNOVA and Pearson Correlation Index were used for statistical analysis. There was a significant difference in word discrimination scores at silence and at three SNRs in both ears (p≤0.047). Moreover, there was a significant difference in word discrimination scores for paired SNRs (0 and +5, 0 and +10, and +5 and +10 (p≤0.04)). No significant correlation was found between age and word recognition scores at silence and at three SNRs in both ears (p≥0.386). Conclusion: Our results revealed that decreasing the signal level and increasing the competing noise considerably reduced the speech perception ability in normal hearing at low–mid thresholds in the elderly. These results support the critical role of SNRs for speech perception ability in the elderly. Furthermore, our results revealed that normal hearing elderly participants required compensatory strategies to maintain normal speech perception in challenging acoustic situations. PMID:27390712
Development of equally intelligible Telugu sentence-lists to test speech recognition in noise.

PubMed

Tanniru, Kishore; Narne, Vijaya Kumar; Jain, Chandni; Konadath, Sreeraj; Singh, Niraj Kumar; Sreenivas, K J Ramadevi; K, Anusha

2017-09-01

To develop sentence lists in the Telugu language for the assessment of speech recognition threshold (SRT) in the presence of background noise through identification of the mean signal-to-noise ratio required to attain a 50% sentence recognition score (SRTn). This study was conducted in three phases. The first phase involved the selection and recording of Telugu sentences. In the second phase, 20 lists, each consisting of 10 sentences with equal intelligibility, were formulated using a numerical optimisation procedure. In the third phase, the SRTn of the developed lists was estimated using adaptive procedures on individuals with normal hearing. A total of 68 native Telugu speakers with normal hearing participated in the study. Of these, 18 (including the speakers) performed on various subjective measures in first phase, 20 performed on sentence/word recognition in noise for second phase and 30 participated in the list equivalency procedures in third phase. In all, 15 lists of comparable difficulty were formulated as test material. The mean SRTn across these lists corresponded to -2.74 (SD = 0.21). The developed sentence lists provided a valid and reliable tool to measure SRTn in Telugu native speakers.
SAM: speech-aware applications in medicine to support structured data entry.

PubMed Central

Wormek, A. K.; Ingenerf, J.; Orthner, H. F.

1997-01-01

In the last two years, improvement in speech recognition technology has directed the medical community's interest to porting and using such innovations in clinical systems. The acceptance of speech recognition systems in clinical domains increases with recognition speed, large medical vocabulary, high accuracy, continuous speech recognition, and speaker independence. Although some commercial speech engines approach these requirements, the greatest benefit can be achieved in adapting a speech recognizer to a specific medical application. The goals of our work are first, to develop a speech-aware core component which is able to establish connections to speech recognition engines of different vendors. This is realized in SAM. Second, with applications based on SAM we want to support the physician in his/her routine clinical care activities. Within the STAMP project (STAndardized Multimedia report generator in Pathology), we extend SAM by combining a structured data entry approach with speech recognition technology. Another speech-aware application in the field of Diabetes care is connected to a terminology server. The server delivers a controlled vocabulary which can be used for speech recognition. PMID:9357730
Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users.

PubMed

Anderson, Elizabeth S; Oxenham, Andrew J; Nelson, Peggy B; Nelson, David A

2012-12-01

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects' thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time.
Spectrotemporal modulation sensitivity for hearing-impaired listeners: dependence on carrier center frequency and the relationship to speech intelligibility.

PubMed

Mehraei, Golbarg; Gallun, Frederick J; Leek, Marjorie R; Bernstein, Joshua G W

2014-07-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4-32 Hz), spectral ripple density [0.5-4 cycles/octave (c/o)] and carrier center frequency (500-4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4-12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements.
Within-subjects comparison of the HiRes and Fidelity120 speech processing strategies: Speech perception and its relation to place-pitch sensitivity

PubMed Central

Donaldson, Gail S.; Dawson, Patricia K.; Borden, Lamar Z.

2010-01-01

Objectives Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many CI users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral-cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. Design A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 weeks during the main study; a subset of five subjects used Fidelity120 for 3 additional months following the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency (Vow F1), vowel F2 frequency (Vow F2) and consonant place of articulation (Con PLC); overall transmitted information for vowels (Vow STIM) and consonants (Con STIM); and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle and basal regions of the implanted array using a psychophysical pitch-ranking task. Results With one exception, there was no effect of strategy (HiRes vs. Fidelity120) on the speech measures tested, either during the main study (n=10) or after extended use of Fidelity120 (n=5). The exception was a small but significant advantage for HiRes over Fidelity120 for the Con STIM measure during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 weeks or longer experience with Fidelity120. Another 3 subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8 week time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in Vow F2 perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (Vow F1 or Con PLC). Conclusions Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise. PMID:21084987
Development and validation of the University of Washington Clinical Assessment of Music Perception test.

PubMed

Kang, Robert; Nimmons, Grace Liu; Drennan, Ward; Longnion, Jeff; Ruffin, Chad; Nie, Kaibao; Won, Jong Ho; Worman, Tina; Yueh, Bevan; Rubinstein, Jay

2009-08-01

Assessment of cochlear implant outcomes centers around speech discrimination. Despite dramatic improvements in speech perception, music perception remains a challenge for most cochlear implant users. No standardized test exists to quantify music perception in a clinically practical manner. This study presents the University of Washington Clinical Assessment of Music Perception (CAMP) test as a reliable and valid music perception test for English-speaking, adult cochlear implant users. Forty-two cochlear implant subjects were recruited from the University of Washington Medical Center cochlear implant program and referred by two implant manufacturers. Ten normal-hearing volunteers were drawn from the University of Washington Medical Center and associated campuses. A computer-driven, self-administered test was developed to examine three specific aspects of music perception: pitch direction discrimination, melody recognition, and timbre recognition. The pitch subtest used an adaptive procedure to determine just-noticeable differences for complex tone pitch direction discrimination within the range of 1 to 12 semitones. The melody and timbre subtests assessed recognition of 12 commonly known melodies played with complex tones in an isochronous manner and eight musical instruments playing an identical five-note sequence, respectively. Testing was repeated for cochlear implant subjects to evaluate test-retest reliability. Normal-hearing volunteers were also tested to demonstrate differences in performance in the two populations. For cochlear implant subjects, pitch direction discrimination just-noticeable differences ranged from 1 to 8.0 semitones (Mean = 3.0, SD = 2.3). Melody and timbre recognition ranged from 0 to 94.4% correct (mean = 25.1, SD = 22.2) and 20.8 to 87.5% (mean = 45.3, SD = 16.2), respectively. Each subtest significantly correlated at least moderately with both Consonant-Nucleus-Consonant (CNC) word recognition scores and spondee recognition thresholds in steady state noise and two-talker babble. Intraclass coefficients demonstrating test-retest correlations for pitch, melody, and timbre were 0.85, 0.92, and 0.69, respectively. Normal-hearing volunteers had a mean pitch direction discrimination threshold of 1.0 semitone, the smallest interval tested, and mean melody and timbre recognition scores of 87.5 and 94.2%, respectively. The CAMP test discriminates a wide range of music perceptual ability in cochlear implant users. Moderate correlations were seen between music test results and both Consonant-Nucleus-Consonant word recognition scores and spondee recognition thresholds in background noise. Test-retest reliability was moderate to strong. The CAMP test provides a reliable and valid metric for a clinically practical, standardized evaluation of music perception in adult cochlear implant users.
Perception and performance in flight simulators: The contribution of vestibular, visual, and auditory information

NASA Technical Reports Server (NTRS)

1979-01-01

The pilot's perception and performance in flight simulators is examined. The areas investigated include: vestibular stimulation, flight management and man cockpit information interfacing, and visual perception in flight simulation. The effects of higher levels of rotary acceleration on response time to constant acceleration, tracking performance, and thresholds for angular acceleration are examined. Areas of flight management examined are cockpit display of traffic information, work load, synthetic speech call outs during the landing phase of flight, perceptual factors in the use of a microwave landing system, automatic speech recognition, automation of aircraft operation, and total simulation of flight training.
Unvoiced Speech Recognition Using Tissue-Conductive Acoustic Sensor

NASA Astrophysics Data System (ADS)

Heracleous, Panikos; Kaino, Tomomi; Saruwatari, Hiroshi; Shikano, Kiyohiro

2006-12-01

We present the use of stethoscope and silicon NAM (nonaudible murmur) microphones in automatic speech recognition. NAM microphones are special acoustic sensors, which are attached behind the talker's ear and can capture not only normal (audible) speech, but also very quietly uttered speech (nonaudible murmur). As a result, NAM microphones can be applied in automatic speech recognition systems when privacy is desired in human-machine communication. Moreover, NAM microphones show robustness against noise and they might be used in special systems (speech recognition, speech transform, etc.) for sound-impaired people. Using adaptation techniques and a small amount of training data, we achieved for a 20 k dictation task a[InlineEquation not available: see fulltext.] word accuracy for nonaudible murmur recognition in a clean environment. In this paper, we also investigate nonaudible murmur recognition in noisy environments and the effect of the Lombard reflex on nonaudible murmur recognition. We also propose three methods to integrate audible speech and nonaudible murmur recognition using a stethoscope NAM microphone with very promising results.
Speech recognition technology: an outlook for human-to-machine interaction.

PubMed

Erdel, T; Crooks, S

2000-01-01

Speech recognition, as an enabling technology in healthcare-systems computing, is a topic that has been discussed for quite some time, but is just now coming to fruition. Traditionally, speech-recognition software has been constrained by hardware, but improved processors and increased memory capacities are starting to remove some of these limitations. With these barriers removed, companies that create software for the healthcare setting have the opportunity to write more successful applications. Among the criticisms of speech-recognition applications are the high rates of error and steep training curves. However, even in the face of such negative perceptions, there remains significant opportunities for speech recognition to allow healthcare providers and, more specifically, physicians, to work more efficiently and ultimately spend more time with their patients and less time completing necessary documentation. This article will identify opportunities for inclusion of speech-recognition technology in the healthcare setting and examine major categories of speech-recognition software--continuous speech recognition, command and control, and text-to-speech. We will discuss the advantages and disadvantages of each area, the limitations of the software today, and how future trends might affect them.
Relationship between listeners' nonnative speech recognition and categorization abilities

PubMed Central

Atagi, Eriko; Bent, Tessa

2015-01-01

Enhancement of the perceptual encoding of talker characteristics (indexical information) in speech can facilitate listeners' recognition of linguistic content. The present study explored this indexical-linguistic relationship in nonnative speech processing by examining listeners' performance on two tasks: nonnative accent categorization and nonnative speech-in-noise recognition. Results indicated substantial variability across listeners in their performance on both the accent categorization and nonnative speech recognition tasks. Moreover, listeners' accent categorization performance correlated with their nonnative speech-in-noise recognition performance. These results suggest that having more robust indexical representations for nonnative accents may allow listeners to more accurately recognize the linguistic content of nonnative speech. PMID:25618098
Speech perception at positive signal-to-noise ratios using adaptive adjustment of time compression.

PubMed

Schlueter, Anne; Brand, Thomas; Lemke, Ulrike; Nitzschner, Stefan; Kollmeier, Birger; Holube, Inga

2015-11-01

Positive signal-to-noise ratios (SNRs) characterize listening situations most relevant for hearing-impaired listeners in daily life and should therefore be considered when evaluating hearing aid algorithms. For this, a speech-in-noise test was developed and evaluated, in which the background noise is presented at fixed positive SNRs and the speech rate (i.e., the time compression of the speech material) is adaptively adjusted. In total, 29 younger and 12 older normal-hearing, as well as 24 older hearing-impaired listeners took part in repeated measurements. Younger normal-hearing and older hearing-impaired listeners conducted one of two adaptive methods which differed in adaptive procedure and step size. Analysis of the measurements with regard to list length and estimation strategy for thresholds resulted in a practical method measuring the time compression for 50% recognition. This method uses time-compression adjustment and step sizes according to Versfeld and Dreschler [(2002). J. Acoust. Soc. Am. 111, 401-408], with sentence scoring, lists of 30 sentences, and a maximum likelihood method for threshold estimation. Evaluation of the procedure showed that older participants obtained higher test-retest reliability compared to younger participants. Depending on the group of listeners, one or two lists are required for training prior to data collection.
Right-Ear Advantage for Speech-in-Noise Recognition in Patients with Nonlateralized Tinnitus and Normal Hearing Sensitivity.

PubMed

Tai, Yihsin; Husain, Fatima T

2018-04-01

Despite having normal hearing sensitivity, patients with chronic tinnitus may experience more difficulty recognizing speech in adverse listening conditions as compared to controls. However, the association between the characteristics of tinnitus (severity and loudness) and speech recognition remains unclear. In this study, the Quick Speech-in-Noise test (QuickSIN) was conducted monaurally on 14 patients with bilateral tinnitus and 14 age- and hearing-matched adults to determine the relation between tinnitus characteristics and speech understanding. Further, Tinnitus Handicap Inventory (THI), tinnitus loudness magnitude estimation, and loudness matching were obtained to better characterize the perceptual and psychological aspects of tinnitus. The patients reported low THI scores, with most participants in the slight handicap category. Significant between-group differences in speech-in-noise performance were only found at the 5-dB signal-to-noise ratio (SNR) condition. The tinnitus group performed significantly worse in the left ear than in the right ear, even though bilateral tinnitus percept and symmetrical thresholds were reported in all patients. This between-ear difference is likely influenced by a right-ear advantage for speech sounds, as factors related to testing order and fatigue were ruled out. Additionally, significant correlations found between SNR loss in the left ear and tinnitus loudness matching suggest that perceptual factors related to tinnitus had an effect on speech-in-noise performance, pointing to a possible interaction between peripheral and cognitive factors in chronic tinnitus. Further studies, that take into account both hearing and cognitive abilities of patients, are needed to better parse out the effect of tinnitus in the absence of hearing impairment.
Method and apparatus for obtaining complete speech signals for speech recognition applications

NASA Technical Reports Server (NTRS)

Abrash, Victor (Inventor); Cesari, Federico (Inventor); Franco, Horacio (Inventor); George, Christopher (Inventor); Zheng, Jing (Inventor)

2009-01-01

The present invention relates to a method and apparatus for obtaining complete speech signals for speech recognition applications. In one embodiment, the method continuously records an audio stream comprising a sequence of frames to a circular buffer. When a user command to commence or terminate speech recognition is received, the method obtains a number of frames of the audio stream occurring before or after the user command in order to identify an augmented audio signal for speech recognition processing. In further embodiments, the method analyzes the augmented audio signal in order to locate starting and ending speech endpoints that bound at least a portion of speech to be processed for recognition. At least one of the speech endpoints is located using a Hidden Markov Model.

The efficacy of unilateral bone-anchored hearing devices in Chinese Mandarin-speaking patients with bilateral aural atresia.

PubMed

Fan, Yue; Zhang, Ying; Wang, Pu; Wang, Zhen; Zhu, Xiaoli; Yang, Hua; Chen, Xiaowei

2014-04-01

The bone-anchored hearing device (BAHD) was not introduced in China until 2010. To our knowledge, this is the first study to assess the efficacy of Chinese Mandarin-speaking patients with bilateral aural atresia. To evaluate the speech recognition of Chinese Mandarin-speaking patients with BAHDs as well as patients' satisfaction using 2 questionnaires. A retrospective case review of 16 patients with bilateral aural atresia conducted at a tertiary referral center. A BAHD was implanted during auricle reconstruction surgery or after the auricle was rebuilt. A surgical method to combine the BAHD implantation with the second stage of ear reconstruction was introduced. Speech audiometry test and mean pure-tone threshold results were compared among patients with unaided hearing and those with BAHDs. Scores from the BAHD user questionnaire and Glasgow Children's Benefit Inventory (GCBI) were used to measure patients' satisfaction and subjective health benefit. The mean (SD) speech discrimination scores measured in a sound field with a presentation level of 45 dB HL (hearing level) were 6.7% (7.4%) unaided and 86.5% (4.4%) with a BAHD. Scores with a presentation level of 65 dB HL were 56.5% (7.4%) unaided and 90.1% (3.4%) with a BAHD. The speech reception threshold was 60.6 (7.5) dB HL unaided and 24.7 (5.0) dB HL with a BAHD. The mean (SD) pure-tone threshold of the patients was 61.6 (7.8) dB HL unaided and 23.8 (5.9) dB HL with a BAHD. The BAHD application questionnaire demonstrated excellent patient satisfaction. The mean (SD) benefit score of GCBI was 45.6 (14.4). For aural atresia, the BAHD has been one of the most reliable methods of auditory rehabilitation. It can improve the patient's word recognition performance and quality of life. The technique of BAHD implantation combined with auricular reconstruction in a 2-stages-in-1 surgery and the modified incision of patients with reconstructed auricle proved to be safe and effective.
Audibility-based predictions of speech recognition for children and adults with normal hearing.

PubMed

McCreery, Ryan W; Stelmachowicz, Patricia G

2011-12-01

This study investigated the relationship between audibility and predictions of speech recognition for children and adults with normal hearing. The Speech Intelligibility Index (SII) is used to quantify the audibility of speech signals and can be applied to transfer functions to predict speech recognition scores. Although the SII is used clinically with children, relatively few studies have evaluated SII predictions of children's speech recognition directly. Children have required more audibility than adults to reach maximum levels of speech understanding in previous studies. Furthermore, children may require greater bandwidth than adults for optimal speech understanding, which could influence frequency-importance functions used to calculate the SII. Speech recognition was measured for 116 children and 19 adults with normal hearing. Stimulus bandwidth and background noise level were varied systematically in order to evaluate speech recognition as predicted by the SII and derive frequency-importance functions for children and adults. Results suggested that children required greater audibility to reach the same level of speech understanding as adults. However, differences in performance between adults and children did not vary across frequency bands. © 2011 Acoustical Society of America
The Suitability of Cloud-Based Speech Recognition Engines for Language Learning

ERIC Educational Resources Information Center

Daniels, Paul; Iwago, Koji

2017-01-01

As online automatic speech recognition (ASR) engines become more accurate and more widely implemented with call software, it becomes important to evaluate the effectiveness and the accuracy of these recognition engines using authentic speech samples. This study investigates two of the most prominent cloud-based speech recognition engines--Apple's…
Individual differences in language and working memory affect children's speech recognition in noise.

PubMed

McCreery, Ryan W; Spratford, Meredith; Kirby, Benjamin; Brennan, Marc

2017-05-01

We examined how cognitive and linguistic skills affect speech recognition in noise for children with normal hearing. Children with better working memory and language abilities were expected to have better speech recognition in noise than peers with poorer skills in these domains. As part of a prospective, cross-sectional study, children with normal hearing completed speech recognition in noise for three types of stimuli: (1) monosyllabic words, (2) syntactically correct but semantically anomalous sentences and (3) semantically and syntactically anomalous word sequences. Measures of vocabulary, syntax and working memory were used to predict individual differences in speech recognition in noise. Ninety-six children with normal hearing, who were between 5 and 12 years of age. Higher working memory was associated with better speech recognition in noise for all three stimulus types. Higher vocabulary abilities were associated with better recognition in noise for sentences and word sequences, but not for words. Working memory and language both influence children's speech recognition in noise, but the relationships vary across types of stimuli. These findings suggest that clinical assessment of speech recognition is likely to reflect underlying cognitive and linguistic abilities, in addition to a child's auditory skills, consistent with the Ease of Language Understanding model.
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar

PubMed Central

Shin, Young Hoon; Seo, Jiwon

2016-01-01

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker’s vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing. PMID:27801867
Towards Contactless Silent Speech Recognition Based on Detection of Active and Visible Articulators Using IR-UWB Radar.

PubMed

Shin, Young Hoon; Seo, Jiwon

2016-10-29

People with hearing or speaking disabilities are deprived of the benefits of conventional speech recognition technology because it is based on acoustic signals. Recent research has focused on silent speech recognition systems that are based on the motions of a speaker's vocal tract and articulators. Because most silent speech recognition systems use contact sensors that are very inconvenient to users or optical systems that are susceptible to environmental interference, a contactless and robust solution is hence required. Toward this objective, this paper presents a series of signal processing algorithms for a contactless silent speech recognition system using an impulse radio ultra-wide band (IR-UWB) radar. The IR-UWB radar is used to remotely and wirelessly detect motions of the lips and jaw. In order to extract the necessary features of lip and jaw motions from the received radar signals, we propose a feature extraction algorithm. The proposed algorithm noticeably improved speech recognition performance compared to the existing algorithm during our word recognition test with five speakers. We also propose a speech activity detection algorithm to automatically select speech segments from continuous input signals. Thus, speech recognition processing is performed only when speech segments are detected. Our testbed consists of commercial off-the-shelf radar products, and the proposed algorithms are readily applicable without designing specialized radar hardware for silent speech processing.
Effects and modeling of phonetic and acoustic confusions in accented speech.

PubMed

Fung, Pascale; Liu, Yi

2005-11-01

Accented speech recognition is more challenging than standard speech recognition due to the effects of phonetic and acoustic confusions. Phonetic confusion in accented speech occurs when an expected phone is pronounced as a different one, which leads to erroneous recognition. Acoustic confusion occurs when the pronounced phone is found to lie acoustically between two baseform models and can be equally recognized as either one. We propose that it is necessary to analyze and model these confusions separately in order to improve accented speech recognition without degrading standard speech recognition. Since low phonetic confusion units in accented speech do not give rise to automatic speech recognition errors, we focus on analyzing and reducing phonetic and acoustic confusability under high phonetic confusion conditions. We propose using likelihood ratio test to measure phonetic confusion, and asymmetric acoustic distance to measure acoustic confusion. Only accent-specific phonetic units with low acoustic confusion are used in an augmented pronunciation dictionary, while phonetic units with high acoustic confusion are reconstructed using decision tree merging. Experimental results show that our approach is effective and superior to methods modeling phonetic confusion or acoustic confusion alone in accented speech, with a significant 5.7% absolute WER reduction, without degrading standard speech recognition.
Results with cochlear implantation in adults with speech recognition scores exceeding current criteria.

PubMed

Amoodi, Hosam A; Mick, Paul T; Shipp, David B; Friesen, Lendra M; Nedzelski, Julian M; Chen, Joseph M; Lin, Vincent Y W

2012-01-01

The primary purpose of this study was to evaluate a group of postlingually deafened adults, whose aided speech recognition exceeded commonly accepted candidacy criteria for implantation. The study aimed to define performance and qualitative outcomes of cochlear implants in these individuals compared with their optimally fitted hearing aid(s). Retrospective case series. Tertiary referral center. All postlingually deafened subjects (N = 27), who were unsuccessful hearing aid users implanted between 2000 and 2010 with a preimplantation Hearing in Noise Test (HINT) score of 60% or more were included. We compared patients' preoperative performance (HINT score) with hearing aids to postoperative performance with the cochlear implant after 12 months of device use. In addition, the Hearing Handicap Inventory questionnaire was used to quantify the hearing-related handicap change perceived after the implantation. The study group demonstrated significant postoperative improvement on all outcome measures; most notably, the mean HINT score improved from 68.4% (standard deviation, 8.3) to 91.9% (standard deviation, 9.7). Additionally, there was a significant improvement in hearing-related handicap perceived by all patients. The envelope of implantation candidacy criteria continues to expand as shown by this study's cohort. Patient satisfaction and speech recognition results are very encouraging in support of treating those who currently perform at a level above the conventional candidacy threshold but struggle with optimally fitted hearing aids.
Auditory and Non-Auditory Contributions for Unaided Speech Recognition in Noise as a Function of Hearing Aid Use

PubMed Central

Gieseler, Anja; Tahden, Maike A. S.; Thiel, Christiane M.; Wagener, Kirsten C.; Meis, Markus; Colonius, Hans

2017-01-01

Differences in understanding speech in noise among hearing-impaired individuals cannot be explained entirely by hearing thresholds alone, suggesting the contribution of other factors beyond standard auditory ones as derived from the audiogram. This paper reports two analyses addressing individual differences in the explanation of unaided speech-in-noise performance among n = 438 elderly hearing-impaired listeners (mean = 71.1 ± 5.8 years). The main analysis was designed to identify clinically relevant auditory and non-auditory measures for speech-in-noise prediction using auditory (audiogram, categorical loudness scaling) and cognitive tests (verbal-intelligence test, screening test of dementia), as well as questionnaires assessing various self-reported measures (health status, socio-economic status, and subjective hearing problems). Using stepwise linear regression analysis, 62% of the variance in unaided speech-in-noise performance was explained, with measures Pure-tone average (PTA), Age, and Verbal intelligence emerging as the three most important predictors. In the complementary analysis, those individuals with the same hearing loss profile were separated into hearing aid users (HAU) and non-users (NU), and were then compared regarding potential differences in the test measures and in explaining unaided speech-in-noise recognition. The groupwise comparisons revealed significant differences in auditory measures and self-reported subjective hearing problems, while no differences in the cognitive domain were found. Furthermore, groupwise regression analyses revealed that Verbal intelligence had a predictive value in both groups, whereas Age and PTA only emerged significant in the group of hearing aid NU. PMID:28270784
Auditory and Non-Auditory Contributions for Unaided Speech Recognition in Noise as a Function of Hearing Aid Use.

PubMed

Gieseler, Anja; Tahden, Maike A S; Thiel, Christiane M; Wagener, Kirsten C; Meis, Markus; Colonius, Hans

2017-01-01

Differences in understanding speech in noise among hearing-impaired individuals cannot be explained entirely by hearing thresholds alone, suggesting the contribution of other factors beyond standard auditory ones as derived from the audiogram. This paper reports two analyses addressing individual differences in the explanation of unaided speech-in-noise performance among n = 438 elderly hearing-impaired listeners ( mean = 71.1 ± 5.8 years). The main analysis was designed to identify clinically relevant auditory and non-auditory measures for speech-in-noise prediction using auditory (audiogram, categorical loudness scaling) and cognitive tests (verbal-intelligence test, screening test of dementia), as well as questionnaires assessing various self-reported measures (health status, socio-economic status, and subjective hearing problems). Using stepwise linear regression analysis, 62% of the variance in unaided speech-in-noise performance was explained, with measures Pure-tone average (PTA), Age , and Verbal intelligence emerging as the three most important predictors. In the complementary analysis, those individuals with the same hearing loss profile were separated into hearing aid users (HAU) and non-users (NU), and were then compared regarding potential differences in the test measures and in explaining unaided speech-in-noise recognition. The groupwise comparisons revealed significant differences in auditory measures and self-reported subjective hearing problems, while no differences in the cognitive domain were found. Furthermore, groupwise regression analyses revealed that Verbal intelligence had a predictive value in both groups, whereas Age and PTA only emerged significant in the group of hearing aid NU.
Relationship between consonant recognition in noise and hearing threshold.

PubMed

Yoon, Yang-soo; Allen, Jont B; Gooler, David M

2012-04-01

Although poorer understanding of speech in noise by listeners who are hearing-impaired (HI) is known not to be directly related to audiometric hearing threshold, HT (f), grouping HI listeners with HT (f) is widely practiced. In this article, the relationship between consonant recognition and HT (f) is considered over a range of signal-to-noise ratios (SNRs). Confusion matrices (CMs) from 25 HI ears were generated in response to 16 consonant-vowel syllables presented at 6 different SNRs. Individual differences scaling (INDSCAL) was applied to both feature-based matrices and CMs in order to evaluate the relationship between HT (f) and consonant recognition among HI listeners. The results showed no predictive relationship between the percent error scores (Pe) and HT (f) across SNRs. The multiple regression models showed that the HT (f) accounted for 39% of the total variance of the slopes of the Pe. Feature-based INDSCAL analysis showed consistent grouping of listeners across SNRs, but not in terms of HT (f). Systematic relationship between measures was also not defined by CM-based INDSCAL analysis across SNRs. HT (f) did not account for the majority of the variance (39%) in consonant recognition in noise when the complete body of the CM was considered.
Processing load induced by informational masking is related to linguistic abilities.

PubMed

Koelewijn, Thomas; Zekveld, Adriana A; Festen, Joost M; Rönnberg, Jerker; Kramer, Sophia E

2012-01-01

It is often assumed that the benefit of hearing aids is not primarily reflected in better speech performance, but that it is reflected in less effortful listening in the aided than in the unaided condition. Before being able to assess such a hearing aid benefit the present study examined how processing load while listening to masked speech relates to inter-individual differences in cognitive abilities relevant for language processing. Pupil dilation was measured in thirty-two normal hearing participants while listening to sentences masked by fluctuating noise or interfering speech at either 50% and 84% intelligibility. Additionally, working memory capacity, inhibition of irrelevant information, and written text reception was tested. Pupil responses were larger during interfering speech as compared to fluctuating noise. This effect was independent of intelligibility level. Regression analysis revealed that high working memory capacity, better inhibition, and better text reception were related to better speech reception thresholds. Apart from a positive relation to speech recognition, better inhibition and better text reception are also positively related to larger pupil dilation in the single-talker masker conditions. We conclude that better cognitive abilities not only relate to better speech perception, but also partly explain higher processing load in complex listening conditions.
Spectrotemporal modulation sensitivity for hearing-impaired listeners: Dependence on carrier center frequency and the relationship to speech intelligibility

PubMed Central

Mehraei, Golbarg; Gallun, Frederick J.; Leek, Marjorie R.; Bernstein, Joshua G. W.

2014-01-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4–32 Hz), spectral ripple density [0.5–4 cycles/octave (c/o)] and carrier center frequency (500–4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4–12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements. PMID:24993215
Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review

ERIC Educational Resources Information Center

Young, Victoria; Mihailidis, Alex

2010-01-01

Despite their growing presence in home computer applications and various telephony services, commercial automatic speech recognition technologies are still not easily employed by everyone; especially individuals with speech disorders. In addition, relatively little research has been conducted on automatic speech recognition performance with older…
Noise-induced hearing loss in young adults: the role of personal listening devices and other sources of leisure noise.

PubMed

Mostafapour, S P; Lahargoue, K; Gates, G A

1998-12-01

No consensus exists regarding the magnitude of the risk of noise-induced hearing loss (NIHL) associated with leisure noise, in particular, personal listening devices in young adults. Examine the magnitude of hearing loss associated with personal listening devices and other sources of leisure noise in causing NIHL in young adults. Prospective auditory testing of college student volunteers with retrospective history exposure to home stereos, personal listening devices, firearms, and other sources of recreational noise. Subjects underwent audiologic examination consisting of estimation of pure-tone thresholds, speech reception thresholds, and word recognition at 45 dB HL. Fifty subjects aged 18 to 30 years were tested. All hearing thresholds of all subjects (save one-a unilateral 30 dB HL threshold at 6 kHz) were normal, (i.e., 25 dB HL or better). A 10 dB threshold elevation (notch) in either ear at 3 to 6 kHz as compared with neighboring frequencies was noted in 11 (22%) subjects and an unequivocal notch (15 dB or greater) in either ear was noted in 14 (28%) of subjects. The presence or absence of any notch (small or large) did not correlate with any single or cumulative source of noise exposure. No difference in pure-tone threshold, speech reception threshold, or speech discrimination was found among subjects when segregated by noise exposure level. The majority of young users of personal listening devices are at low risk for substantive NIHL. Interpretation of the significance of these findings in relation to noise exposure must be made with caution. NIHL is an additive process and even subtle deficits may contribute to unequivocal hearing loss with continued exposure. The low prevalence of measurable deficits in this study group may not exclude more substantive deficits in other populations with greater exposures. Continued education of young people about the risk to hearing from recreational noise exposure is warranted.
[Creating language model of the forensic medicine domain for developing a autopsy recording system by automatic speech recognition].

PubMed

Niijima, H; Ito, N; Ogino, S; Takatori, T; Iwase, H; Kobayashi, M

2000-11-01

For the purpose of practical use of speech recognition technology for recording of forensic autopsy, a language model of the speech recording system, specialized for the forensic autopsy, was developed. The language model for the forensic autopsy by applying 3-gram model was created, and an acoustic model for Japanese speech recognition by Hidden Markov Model in addition to the above were utilized to customize the speech recognition engine for forensic autopsy. A forensic vocabulary set of over 10,000 words was compiled and some 300,000 sentence patterns were made to create the forensic language model, then properly mixing with a general language model to attain high exactitude. When tried by dictating autopsy findings, this speech recognition system was proved to be about 95% of recognition rate that seems to have reached to the practical usability in view of speech recognition software, though there remains rooms for improving its hardware and application-layer software.
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition.

PubMed

Díaz, Begoña; Blank, Helen; von Kriegstein, Katharina

2018-05-14

The cerebral cortex modulates early sensory processing via feed-back connections to sensory pathway nuclei. The functions of this top-down modulation for human behavior are poorly understood. Here, we show that top-down modulation of the visual sensory thalamus (the lateral geniculate body, LGN) is involved in visual-speech recognition. In two independent functional magnetic resonance imaging (fMRI) studies, LGN response increased when participants processed fast-varying features of articulatory movements required for visual-speech recognition, as compared to temporally more stable features required for face identification with the same stimulus material. The LGN response during the visual-speech task correlated positively with the visual-speech recognition scores across participants. In addition, the task-dependent modulation was present for speech movements and did not occur for control conditions involving non-speech biological movements. In face-to-face communication, visual speech recognition is used to enhance or even enable understanding what is said. Speech recognition is commonly explained in frameworks focusing on cerebral cortex areas. Our findings suggest that task-dependent modulation at subcortical sensory stages has an important role for communication: Together with similar findings in the auditory modality the findings imply that task-dependent modulation of the sensory thalami is a general mechanism to optimize speech recognition. Copyright © 2018. Published by Elsevier Inc.
Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L)

NASA Astrophysics Data System (ADS)

Scharenborg, Odette; ten Bosch, Louis; Boves, Lou; Norris, Dennis

2003-12-01

This letter evaluates potential benefits of combining human speech recognition (HSR) and automatic speech recognition by building a joint model of an automatic phone recognizer (APR) and a computational model of HSR, viz., Shortlist [Norris, Cognition 52, 189-234 (1994)]. Experiments based on ``real-life'' speech highlight critical limitations posed by some of the simplifying assumptions made in models of human speech recognition. These limitations could be overcome by avoiding hard phone decisions at the output side of the APR, and by using a match between the input and the internal lexicon that flexibly copes with deviations from canonical phonemic representations.
Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users

PubMed Central

Anderson, Elizabeth S.; Oxenham, Andrew J.; Nelson, Peggy B.; Nelson, David A.

2012-01-01

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects’ thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time. PMID:23231122
Increase in Speech Recognition due to Linguistic Mismatch Between Target and Masker Speech: Monolingual and Simultaneous Bilingual Performance

PubMed Central

Calandruccio, Lauren; Zhou, Haibo

2014-01-01

Purpose To examine whether improved speech recognition during linguistically mismatched target–masker experiments is due to linguistic unfamiliarity of the masker speech or linguistic dissimilarity between the target and masker speech. Method Monolingual English speakers (n = 20) and English–Greek simultaneous bilinguals (n = 20) listened to English sentences in the presence of competing English and Greek speech. Data were analyzed using mixed-effects regression models to determine differences in English recogition performance between the 2 groups and 2 masker conditions. Results Results indicated that English sentence recognition for monolinguals and simultaneous English–Greek bilinguals improved when the masker speech changed from competing English to competing Greek speech. Conclusion The improvement in speech recognition that has been observed for linguistically mismatched target–masker experiments cannot be simply explained by the masker language being linguistically unknown or unfamiliar to the listeners. Listeners can improve their speech recognition in linguistically mismatched target–masker experiments even when the listener is able to obtain meaningful linguistic information from the masker speech. PMID:24167230

The Effect of Lexical Content on Dichotic Speech Recognition in Older Adults.

PubMed

Findlen, Ursula M; Roup, Christina M

2016-01-01

Age-related auditory processing deficits have been shown to negatively affect speech recognition for older adult listeners. In contrast, older adults gain benefit from their ability to make use of semantic and lexical content of the speech signal (i.e., top-down processing), particularly in complex listening situations. Assessment of auditory processing abilities among aging adults should take into consideration semantic and lexical content of the speech signal. The purpose of this study was to examine the effects of lexical and attentional factors on dichotic speech recognition performance characteristics for older adult listeners. A repeated measures design was used to examine differences in dichotic word recognition as a function of lexical and attentional factors. Thirty-five older adults (61-85 yr) with sensorineural hearing loss participated in this study. Dichotic speech recognition was evaluated using consonant-vowel-consonant (CVC) word and nonsense CVC syllable stimuli administered in the free recall, directed recall right, and directed recall left response conditions. Dichotic speech recognition performance for nonsense CVC syllables was significantly poorer than performance for CVC words. Dichotic recognition performance varied across response condition for both stimulus types, which is consistent with previous studies on dichotic speech recognition. Inspection of individual results revealed that five listeners demonstrated an auditory-based left ear deficit for one or both stimulus types. Lexical content of stimulus materials affects performance characteristics for dichotic speech recognition tasks in the older adult population. The use of nonsense CVC syllable material may provide a way to assess dichotic speech recognition performance while potentially lessening the effects of lexical content on performance (i.e., measuring bottom-up auditory function both with and without top-down processing). American Academy of Audiology.
Microscopic prediction of speech intelligibility in spatially distributed speech-shaped noise for normal-hearing listeners.

PubMed

Geravanchizadeh, Masoud; Fallah, Ali

2015-12-01

A binaural and psychoacoustically motivated intelligibility model, based on a well-known monaural microscopic model is proposed. This model simulates a phoneme recognition task in the presence of spatially distributed speech-shaped noise in anechoic scenarios. In the proposed model, binaural advantage effects are considered by generating a feature vector for a dynamic-time-warping speech recognizer. This vector consists of three subvectors incorporating two monaural subvectors to model the better-ear hearing, and a binaural subvector to simulate the binaural unmasking effect. The binaural unit of the model is based on equalization-cancellation theory. This model operates blindly, which means separate recordings of speech and noise are not required for the predictions. Speech intelligibility tests were conducted with 12 normal hearing listeners by collecting speech reception thresholds (SRTs) in the presence of single and multiple sources of speech-shaped noise. The comparison of the model predictions with the measured binaural SRTs, and with the predictions of a macroscopic binaural model called extended equalization-cancellation, shows that this approach predicts the intelligibility in anechoic scenarios with good precision. The square of the correlation coefficient (r(2)) and the mean-absolute error between the model predictions and the measurements are 0.98 and 0.62 dB, respectively.
[Test set for the evaluation of hearing and speech development after cochlear implantation in children].

PubMed

Lamprecht-Dinnesen, A; Sick, U; Sandrieser, P; Illg, A; Lesinski-Schiedat, A; Döring, W H; Müller-Deile, J; Kiefer, J; Matthias, K; Wüst, A; Konradi, E; Riebandt, M; Matulat, P; Von Der Haar-Heise, S; Swart, J; Elixmann, K; Neumann, K; Hildmann, A; Coninx, F; Meyer, V; Gross, M; Kruse, E; Lenarz, T

2002-10-01

Since autumn 1998 the multicenter interdisciplinary study group "Test Materials for CI Children" has been compiling a uniform examination tool for evaluation of speech and hearing development after cochlear implantation in childhood. After studying the relevant literature, suitable materials were checked for practical applicability, modified and provided with criteria for execution and break-off. For data acquisition, observation forms for preparation of a PC-version were developed. The evaluation set contains forms for master data with supplements relating to postoperative processes. The hearing tests check supra-threshold hearing with loudness scaling for children, speech comprehension in silence (Mainz and Göttingen Test for Speech Comprehension in Childhood) and phonemic differentiation (Oldenburg Rhyme Test for Children), the central auditory processes of detection, discrimination, identification and recognition (modification of the "Frankfurt Functional Hearing Test for Children") and audiovisual speech perception (Open Paragraph Tracking, Kiel Speech Track Program). The materials for speech and language development comprise phonetics-phonology, lexicon and semantics (LOGO Pronunciation Test), syntax and morphology (analysis of spontaneous speech), language comprehension (Reynell Scales), communication and pragmatics (observation forms). The MAIS and MUSS modified questionnaires are integrated. The evaluation set serves quality assurance and permits factor analysis as well as controls for regularity through the multicenter comparison of long-term developmental trends after cochlear implantation.
Improving on hidden Markov models: An articulatorily constrained, maximum likelihood approach to speech recognition and speech coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.

The goal of the proposed research is to test a statistical model of speech recognition that incorporates the knowledge that speech is produced by relatively slow motions of the tongue, lips, and other speech articulators. This model is called Maximum Likelihood Continuity Mapping (Malcom). Many speech researchers believe that by using constraints imposed by articulator motions, we can improve or replace the current hidden Markov model based speech recognition algorithms. Unfortunately, previous efforts to incorporate information about articulation into speech recognition algorithms have suffered because (1) slight inaccuracies in our knowledge or the formulation of our knowledge about articulation maymore » decrease recognition performance, (2) small changes in the assumptions underlying models of speech production can lead to large changes in the speech derived from the models, and (3) collecting measurements of human articulator positions in sufficient quantity for training a speech recognition algorithm is still impractical. The most interesting (and in fact, unique) quality of Malcom is that, even though Malcom makes use of a mapping between acoustics and articulation, Malcom can be trained to recognize speech using only acoustic data. By learning the mapping between acoustics and articulation using only acoustic data, Malcom avoids the difficulties involved in collecting articulator position measurements and does not require an articulatory synthesizer model to estimate the mapping between vocal tract shapes and speech acoustics. Preliminary experiments that demonstrate that Malcom can learn the mapping between acoustics and articulation are discussed. Potential applications of Malcom aside from speech recognition are also discussed. Finally, specific deliverables resulting from the proposed research are described.« less
An articulatorily constrained, maximum entropy approach to speech recognition and speech coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.

Hidden Markov models (HMM`s) are among the most popular tools for performing computer speech recognition. One of the primary reasons that HMM`s typically outperform other speech recognition techniques is that the parameters used for recognition are determined by the data, not by preconceived notions of what the parameters should be. This makes HMM`s better able to deal with intra- and inter-speaker variability despite the limited knowledge of how speech signals vary and despite the often limited ability to correctly formulate rules describing variability and invariance in speech. In fact, it is often the case that when HMM parameter values aremore » constrained using the limited knowledge of speech, recognition performance decreases. However, the structure of an HMM has little in common with the mechanisms underlying speech production. Here, the author argues that by using probabilistic models that more accurately embody the process of speech production, he can create models that have all the advantages of HMM`s, but that should more accurately capture the statistical properties of real speech samples--presumably leading to more accurate speech recognition. The model he will discuss uses the fact that speech articulators move smoothly and continuously. Before discussing how to use articulatory constraints, he will give a brief description of HMM`s. This will allow him to highlight the similarities and differences between HMM`s and the proposed technique.« less
Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions.

PubMed

Eckert, Mark A; Teubner-Rhodes, Susan; Vaden, Kenneth I

2016-01-01

This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. The authors propose that the behavioral economics or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance.
Is Listening in Noise Worth It? The Neurobiology of Speech Recognition in Challenging Listening Conditions

PubMed Central

Eckert, Mark A.; Teubner-Rhodes, Susan; Vaden, Kenneth I.

2016-01-01

This review examines findings from functional neuroimaging studies of speech recognition in noise to provide a neural systems level explanation for the effort and fatigue that can be experienced during speech recognition in challenging listening conditions. Neuroimaging studies of speech recognition consistently demonstrate that challenging listening conditions engage neural systems that are used to monitor and optimize performance across a wide range of tasks. These systems appear to improve speech recognition in younger and older adults, but sustained engagement of these systems also appears to produce an experience of effort and fatigue that may affect the value of communication. When considered in the broader context of the neuroimaging and decision making literature, the speech recognition findings from functional imaging studies indicate that the expected value, or expected level of speech recognition given the difficulty of listening conditions, should be considered when measuring effort and fatigue. We propose that the behavioral economics and/or neuroeconomics of listening can provide a conceptual and experimental framework for understanding effort and fatigue that may have clinical significance. PMID:27355759
Speech Recognition and Parent Ratings From Auditory Development Questionnaires in Children Who Are Hard of Hearing.

PubMed

McCreery, Ryan W; Walker, Elizabeth A; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

2015-01-01

Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HAs) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children's auditory experience on parent-reported auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Parent ratings on auditory development questionnaires and children's speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children rating scale, and an adaptation of the Speech, Spatial, and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open- and Closed-Set Test, Early Speech Perception test, Lexical Neighborhood Test, and Phonetically Balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared with peers with normal hearing matched for age, maternal educational level, and nonverbal intelligence. The effects of aided audibility, HA use, and language ability on parent responses to auditory development questionnaires and on children's speech recognition were also examined. Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use, and better language abilities generally had higher parent ratings of auditory skills and better speech-recognition abilities in quiet and in noise than peers with less audibility, more limited HA use, or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Children who are hard of hearing continue to experience delays in auditory skill development and speech-recognition abilities compared with peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported before the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech-recognition abilities and also may enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children's speech recognition.
Using Voice Recognition Equipment to Run the Warfare Environmental Simulator (WES),

DTIC Science & Technology

1981-03-01

simulations and models are often used. War games are a type of simulation frequently used by the military to evaluate C3 effectiveness. Through the use of a...to 162 words or short phrases (Appendix B). B. EQUIPMENT USED 1. Hardware Description [13] For the experiment a Threshold Model T600 discrete... Model T600 terminal used in this experiment con- sists of an analog speech preprocessor, microcomputer, CRT/keyboard unit, magnetic tape cartridge unit
Automatic concept extraction from spoken medical reports.

PubMed

Happe, André; Pouliquen, Bruno; Burgun, Anita; Cuggia, Marc; Le Beux, Pierre

2003-07-01

The objective of this project is to investigate methods whereby a combination of speech recognition and automated indexing methods substitute for current transcription and indexing practices. We based our study on existing speech recognition software programs and on NOMINDEX, a tool that extracts MeSH concepts from medical text in natural language and that is mainly based on a French medical lexicon and on the UMLS. For each document, the process consists of three steps: (1) dictation and digital audio recording, (2) speech recognition, (3) automatic indexing. The evaluation consisted of a comparison between the set of concepts extracted by NOMINDEX after the speech recognition phase and the set of keywords manually extracted from the initial document. The method was evaluated on a set of 28 patient discharge summaries extracted from the MENELAS corpus in French, corresponding to in-patients admitted for coronarography. The overall precision was 73% and the overall recall was 90%. Indexing errors were mainly due to word sense ambiguity and abbreviations. A specific issue was the fact that the standard French translation of MeSH terms lacks diacritics. A preliminary evaluation of speech recognition tools showed that the rate of accurate recognition was higher than 98%. Only 3% of the indexing errors were generated by inadequate speech recognition. We discuss several areas to focus on to improve this prototype. However, the very low rate of indexing errors due to speech recognition errors highlights the potential benefits of combining speech recognition techniques and automatic indexing.
Speech Recognition as a Transcription Aid: A Randomized Comparison With Standard Transcription

PubMed Central

Mohr, David N.; Turner, David W.; Pond, Gregory R.; Kamath, Joseph S.; De Vos, Cathy B.; Carpenter, Paul C.

2003-01-01

Objective. Speech recognition promises to reduce information entry costs for clinical information systems. It is most likely to be accepted across an organization if physicians can dictate without concerning themselves with real-time recognition and editing; assistants can then edit and process the computer-generated document. Our objective was to evaluate the use of speech-recognition technology in a randomized controlled trial using our institutional infrastructure. Design. Clinical note dictations from physicians in two specialty divisions were randomized to either a standard transcription process or a speech-recognition process. Secretaries and transcriptionists also were assigned randomly to each of these processes. Measurements. The duration of each dictation was measured. The amount of time spent processing a dictation to yield a finished document also was measured. Secretarial and transcriptionist productivity, defined as hours of secretary work per minute of dictation processed, was determined for speech recognition and standard transcription. Results. Secretaries in the endocrinology division were 87.3% (confidence interval, 83.3%, 92.3%) as productive with the speech-recognition technology as implemented in this study as they were using standard transcription. Psychiatry transcriptionists and secretaries were similarly less productive. Author, secretary, and type of clinical note were significant (p < 0.05) predictors of productivity. Conclusion. When implemented in an organization with an existing document-processing infrastructure (which included training and interfaces of the speech-recognition editor with the existing document entry application), speech recognition did not improve the productivity of secretaries or transcriptionists. PMID:12509359
Speech emotion recognition methods: A literature review

NASA Astrophysics Data System (ADS)

Basharirad, Babak; Moradhaseli, Mohammadreza

2017-10-01

Recently, attention of the emotional speech signals research has been boosted in human machine interfaces due to availability of high computation capability. There are many systems proposed in the literature to identify the emotional state through speech. Selection of suitable feature sets, design of a proper classifications methods and prepare an appropriate dataset are the main key issues of speech emotion recognition systems. This paper critically analyzed the current available approaches of speech emotion recognition methods based on the three evaluating parameters (feature set, classification of features, accurately usage). In addition, this paper also evaluates the performance and limitations of available methods. Furthermore, it highlights the current promising direction for improvement of speech emotion recognition systems.
Development and preliminary evaluation of a pediatric Spanish-English speech perception task.

PubMed

Calandruccio, Lauren; Gomez, Bianca; Buss, Emily; Leibold, Lori J

2014-06-01

The purpose of this study was to develop a task to evaluate children's English and Spanish speech perception abilities in either noise or competing speech maskers. Eight bilingual Spanish-English and 8 age-matched monolingual English children (ages 4.9-16.4 years) were tested. A forced-choice, picture-pointing paradigm was selected for adaptively estimating masked speech reception thresholds. Speech stimuli were spoken by simultaneous bilingual Spanish-English talkers. The target stimuli were 30 disyllabic English and Spanish words, familiar to 5-year-olds and easily illustrated. Competing stimuli included either 2-talker English or 2-talker Spanish speech (corresponding to target language) and spectrally matched noise. For both groups of children, regardless of test language, performance was significantly worse for the 2-talker than for the noise masker condition. No difference in performance was found between bilingual and monolingual children. Bilingual children performed significantly better in English than in Spanish in competing speech. For all listening conditions, performance improved with increasing age. Results indicated that the stimuli and task were appropriate for speech recognition testing in both languages, providing a more conventional measure of speech-in-noise perception as well as a measure of complex listening. Further research is needed to determine performance for Spanish-dominant listeners and to evaluate the feasibility of implementation into routine clinical use.
Development and preliminary evaluation of a pediatric Spanish/English speech perception task

PubMed Central

Calandruccio, Lauren; Gomez, Bianca; Buss, Emily; Leibold, Lori J.

2014-01-01

Purpose To develop a task to evaluate children’s English and Spanish speech perception abilities in either noise or competing speech maskers. Methods Eight bilingual Spanish/English and eight age matched monolingual English children (ages 4.9 –16.4 years) were tested. A forced-choice, picture-pointing paradigm was selected for adaptively estimating masked speech reception thresholds. Speech stimuli were spoken by simultaneous bilingual Spanish/English talkers. The target stimuli were thirty disyllabic English and Spanish words, familiar to five-year-olds, and easily illustrated. Competing stimuli included either two-talker English or two-talker Spanish speech (corresponding to target language) and spectrally matched noise. Results For both groups of children, regardless of test language, performance was significantly worse for the two-talker than the noise masker. No difference in performance was found between bilingual and monolingual children. Bilingual children performed significantly better in English than in Spanish in competing speech. For all listening conditions, performance improved with increasing age. Conclusions Results indicate that the stimuli and task are appropriate for speech recognition testing in both languages, providing a more conventional measure of speech-in-noise perception as well as a measure of complex listening. Further research is needed to determine performance for Spanish-dominant listeners and to evaluate the feasibility of implementation into routine clinical use. PMID:24686915
Relation between speech-in-noise threshold, hearing loss and cognition from 40-69 years of age.

PubMed

Moore, David R; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H; Munro, Kevin J

2014-01-01

Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40-69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. About half a million volunteers were recruited through NHS registers. Respondents completed 'whole-body' testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from <40 years for men, and consistently less hearing loss for women. The decline in speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could be extended further.
Relation between Speech-in-Noise Threshold, Hearing Loss and Cognition from 40–69 Years of Age

PubMed Central

Moore, David R.; Edmondson-Jones, Mark; Dawes, Piers; Fortnum, Heather; McCormack, Abby; Pierzycki, Robert H.; Munro, Kevin J.

2014-01-01

Background Healthy hearing depends on sensitive ears and adequate brain processing. Essential aspects of both hearing and cognition decline with advancing age, but it is largely unknown how one influences the other. The current standard measure of hearing, the pure-tone audiogram is not very cognitively demanding and does not predict well the most important yet challenging use of hearing, listening to speech in noisy environments. We analysed data from UK Biobank that asked 40–69 year olds about their hearing, and assessed their ability on tests of speech-in-noise hearing and cognition. Methods and Findings About half a million volunteers were recruited through NHS registers. Respondents completed ‘whole-body’ testing in purpose-designed, community-based test centres across the UK. Objective hearing (spoken digit recognition in noise) and cognitive (reasoning, memory, processing speed) data were analysed using logistic and multiple regression methods. Speech hearing in noise declined exponentially with age for both sexes from about 50 years, differing from previous audiogram data that showed a more linear decline from <40 years for men, and consistently less hearing loss for women. The decline in speech-in-noise hearing was especially dramatic among those with lower cognitive scores. Decreasing cognitive ability and increasing age were both independently associated with decreasing ability to hear speech-in-noise (0.70 and 0.89 dB, respectively) among the population studied. Men subjectively reported up to 60% higher rates of difficulty hearing than women. Workplace noise history associated with difficulty in both subjective hearing and objective speech hearing in noise. Leisure noise history was associated with subjective, but not with objective difficulty hearing. Conclusions Older people have declining cognitive processing ability associated with reduced ability to hear speech in noise, measured by recognition of recorded spoken digits. Subjective reports of hearing difficulty generally show a higher prevalence than objective measures, suggesting that current objective methods could be extended further. PMID:25229622
Emotionally conditioning the target-speech voice enhances recognition of the target speech under "cocktail-party" listening conditions.

PubMed

Lu, Lingxi; Bao, Xiaohan; Chen, Jing; Qu, Tianshu; Wu, Xihong; Li, Liang

2018-05-01

Under a noisy "cocktail-party" listening condition with multiple people talking, listeners can use various perceptual/cognitive unmasking cues to improve recognition of the target speech against informational speech-on-speech masking. One potential unmasking cue is the emotion expressed in a speech voice, by means of certain acoustical features. However, it was unclear whether emotionally conditioning a target-speech voice that has none of the typical acoustical features of emotions (i.e., an emotionally neutral voice) can be used by listeners for enhancing target-speech recognition under speech-on-speech masking conditions. In this study we examined the recognition of target speech against a two-talker speech masker both before and after the emotionally neutral target voice was paired with a loud female screaming sound that has a marked negative emotional valence. The results showed that recognition of the target speech (especially the first keyword in a target sentence) was significantly improved by emotionally conditioning the target speaker's voice. Moreover, the emotional unmasking effect was independent of the unmasking effect of the perceived spatial separation between the target speech and the masker. Also, (skin conductance) electrodermal responses became stronger after emotional learning when the target speech and masker were perceptually co-located, suggesting an increase of listening efforts when the target speech was informationally masked. These results indicate that emotionally conditioning the target speaker's voice does not change the acoustical parameters of the target-speech stimuli, but the emotionally conditioned vocal features can be used as cues for unmasking target speech.
Effect of gap detection threshold on consistency of speech in children with speech sound disorder.

PubMed

Sayyahi, Fateme; Soleymani, Zahra; Akbari, Mohammad; Bijankhan, Mahmood; Dolatshahi, Behrooz

2017-02-01

The present study examined the relationship between gap detection threshold and speech error consistency in children with speech sound disorder. The participants were children five to six years of age who were categorized into three groups of typical speech, consistent speech disorder (CSD) and inconsistent speech disorder (ISD).The phonetic gap detection threshold test was used for this study, which is a valid test comprised six syllables with inter-stimulus intervals between 20-300ms. The participants were asked to listen to the recorded stimuli three times and indicate whether they heard one or two sounds. There was no significant difference between the typical and CSD groups (p=0.55), but there were significant differences in performance between the ISD and CSD groups and the ISD and typical groups (p=0.00). The ISD group discriminated between speech sounds at a higher threshold. Children with inconsistent speech errors could not distinguish speech sounds during time-limited phonetic discrimination. It is suggested that inconsistency in speech is a representation of inconsistency in auditory perception, which causes by high gap detection threshold. Copyright © 2016 Elsevier Ltd. All rights reserved.
Ongoing slow oscillatory phase modulates speech intelligibility in cooperation with motor cortical activity.

PubMed

Onojima, Takayuki; Kitajo, Keiichi; Mizuhara, Hiroaki

2017-01-01

Neural oscillation is attracting attention as an underlying mechanism for speech recognition. Speech intelligibility is enhanced by the synchronization of speech rhythms and slow neural oscillation, which is typically observed as human scalp electroencephalography (EEG). In addition to the effect of neural oscillation, it has been proposed that speech recognition is enhanced by the identification of a speaker's motor signals, which are used for speech production. To verify the relationship between the effect of neural oscillation and motor cortical activity, we measured scalp EEG, and simultaneous EEG and functional magnetic resonance imaging (fMRI) during a speech recognition task in which participants were required to recognize spoken words embedded in noise sound. We proposed an index to quantitatively evaluate the EEG phase effect on behavioral performance. The results showed that the delta and theta EEG phase before speech inputs modulated the participant's response time when conducting speech recognition tasks. The simultaneous EEG-fMRI experiment showed that slow EEG activity was correlated with motor cortical activity. These results suggested that the effect of the slow oscillatory phase was associated with the activity of the motor cortex during speech recognition.
Address entry while driving: speech recognition versus a touch-screen keyboard.

PubMed

Tsimhoni, Omer; Smith, Daniel; Green, Paul

2004-01-01

A driving simulator experiment was conducted to determine the effects of entering addresses into a navigation system during driving. Participants drove on roads of varying visual demand while entering addresses. Three address entry methods were explored: word-based speech recognition, character-based speech recognition, and typing on a touch-screen keyboard. For each method, vehicle control and task measures, glance timing, and subjective ratings were examined. During driving, word-based speech recognition yielded the shortest total task time (15.3 s), followed by character-based speech recognition (41.0 s) and touch-screen keyboard (86.0 s). The standard deviation of lateral position when performing keyboard entry (0.21 m) was 60% higher than that for all other address entry methods (0.13 m). Degradation of vehicle control associated with address entry using a touch screen suggests that the use of speech recognition is favorable. Speech recognition systems with visual feedback, however, even with excellent accuracy, are not without performance consequences. Applications of this research include the design of in-vehicle navigation systems as well as other systems requiring significant driver input, such as E-mail, the Internet, and text messaging.

Speech Processing and Recognition (SPaRe)

DTIC Science & Technology

2011-01-01

results in the areas of automatic speech recognition (ASR), speech processing, machine translation (MT), natural language processing ( NLP ), and...Processing ( NLP ), Information Retrieval (IR) 16. SECURITY CLASSIFICATION OF: UNCLASSIFED 17. LIMITATION OF ABSTRACT 18. NUMBER OF PAGES 19a. NAME...Figure 9, the IOC was only expected to provide document submission and search; automatic speech recognition (ASR) for English, Spanish, Arabic , and
Using Automatic Speech Recognition to Dictate Mathematical Expressions: The Development of the "TalkMaths" Application at Kingston University

ERIC Educational Resources Information Center

Wigmore, Angela; Hunter, Gordon; Pflugel, Eckhard; Denholm-Price, James; Binelli, Vincent

2009-01-01

Speech technology--especially automatic speech recognition--has now advanced to a level where it can be of great benefit both to able-bodied people and those with various disabilities. In this paper we describe an application "TalkMaths" which, using the output from a commonly-used conventional automatic speech recognition system,…
Performing speech recognition research with hypercard

NASA Technical Reports Server (NTRS)

Shepherd, Chip

1993-01-01

The purpose of this paper is to describe a HyperCard-based system for performing speech recognition research and to instruct Human Factors professionals on how to use the system to obtain detailed data about the user interface of a prototype speech recognition application.
Speech recognition and parent-ratings from auditory development questionnaires in children who are hard of hearing

PubMed Central

McCreery, Ryan W.; Walker, Elizabeth A.; Spratford, Meredith; Oleson, Jacob; Bentler, Ruth; Holte, Lenore; Roush, Patricia

2015-01-01

Objectives Progress has been made in recent years in the provision of amplification and early intervention for children who are hard of hearing. However, children who use hearing aids (HA) may have inconsistent access to their auditory environment due to limitations in speech audibility through their HAs or limited HA use. The effects of variability in children’s auditory experience on parent-report auditory skills questionnaires and on speech recognition in quiet and in noise were examined for a large group of children who were followed as part of the Outcomes of Children with Hearing Loss study. Design Parent ratings on auditory development questionnaires and children’s speech recognition were assessed for 306 children who are hard of hearing. Children ranged in age from 12 months to 9 years of age. Three questionnaires involving parent ratings of auditory skill development and behavior were used, including the LittlEARS Auditory Questionnaire, Parents Evaluation of Oral/Aural Performance in Children Rating Scale, and an adaptation of the Speech, Spatial and Qualities of Hearing scale. Speech recognition in quiet was assessed using the Open and Closed set task, Early Speech Perception Test, Lexical Neighborhood Test, and Phonetically-balanced Kindergarten word lists. Speech recognition in noise was assessed using the Computer-Assisted Speech Perception Assessment. Children who are hard of hearing were compared to peers with normal hearing matched for age, maternal educational level and nonverbal intelligence. The effects of aided audibility, HA use and language ability on parent responses to auditory development questionnaires and on children’s speech recognition were also examined. Results Children who are hard of hearing had poorer performance than peers with normal hearing on parent ratings of auditory skills and had poorer speech recognition. Significant individual variability among children who are hard of hearing was observed. Children with greater aided audibility through their HAs, more hours of HA use and better language abilities generally had higher parent ratings of auditory skills and better speech recognition abilities in quiet and in noise than peers with less audibility, more limited HA use or poorer language abilities. In addition to the auditory and language factors that were predictive for speech recognition in quiet, phonological working memory was also a positive predictor for word recognition abilities in noise. Conclusions Children who are hard of hearing continue to experience delays in auditory skill development and speech recognition abilities compared to peers with normal hearing. However, significant improvements in these domains have occurred in comparison to similar data reported prior to the adoption of universal newborn hearing screening and early intervention programs for children who are hard of hearing. Increasing the audibility of speech has a direct positive effect on auditory skill development and speech recognition abilities, and may also enhance these skills by improving language abilities in children who are hard of hearing. Greater number of hours of HA use also had a significant positive impact on parent ratings of auditory skills and children’s speech recognition. PMID:26731160
The effects of reverberant self- and overlap-masking on speech recognition in cochlear implant listeners.

PubMed

Desmond, Jill M; Collins, Leslie M; Throckmorton, Chandra S

2014-06-01

Many cochlear implant (CI) listeners experience decreased speech recognition in reverberant environments [Kokkinakis et al., J. Acoust. Soc. Am. 129(5), 3221-3232 (2011)], which may be caused by a combination of self- and overlap-masking [Bolt and MacDonald, J. Acoust. Soc. Am. 21(6), 577-580 (1949)]. Determining the extent to which these effects decrease speech recognition for CI listeners may influence reverberation mitigation algorithms. This study compared speech recognition with ideal self-masking mitigation, with ideal overlap-masking mitigation, and with no mitigation. Under these conditions, mitigating either self- or overlap-masking resulted in significant improvements in speech recognition for both normal hearing subjects utilizing an acoustic model and for CI listeners using their own devices.
Inferior frontal sensitivity to common speech sounds is amplified by increasing word intelligibility.

PubMed

Vaden, Kenneth I; Kuchinsky, Stefanie E; Keren, Noam I; Harris, Kelly C; Ahlstrom, Jayne B; Dubno, Judy R; Eckert, Mark A

2011-11-01

The left inferior frontal gyrus (LIFG) exhibits increased responsiveness when people listen to words composed of speech sounds that frequently co-occur in the English language (Vaden, Piquado, & Hickok, 2011), termed high phonotactic frequency (Vitevitch & Luce, 1998). The current experiment aimed to further characterize the relation of phonotactic frequency to LIFG activity by manipulating word intelligibility in participants of varying age. Thirty six native English speakers, 19-79 years old (mean=50.5, sd=21.0) indicated with a button press whether they recognized 120 binaurally presented consonant-vowel-consonant words during a sparse sampling fMRI experiment (TR=8 s). Word intelligibility was manipulated by low-pass filtering (cutoff frequencies of 400 Hz, 1000 Hz, 1600 Hz, and 3150 Hz). Group analyses revealed a significant positive correlation between phonotactic frequency and LIFG activity, which was unaffected by age and hearing thresholds. A region of interest analysis revealed that the relation between phonotactic frequency and LIFG activity was significantly strengthened for the most intelligible words (low-pass cutoff at 3150 Hz). These results suggest that the responsiveness of the left inferior frontal cortex to phonotactic frequency reflects the downstream impact of word recognition rather than support of word recognition, at least when there are no speech production demands. Published by Elsevier Ltd.
Children with a cochlear implant: characteristics and determinants of speech recognition, speech-recognition growth rate, and speech production.

PubMed

Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce

2007-05-01

The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language.
Auditory Performance and Electrical Stimulation Measures in Cochlear Implant Recipients With Auditory Neuropathy Compared With Severe to Profound Sensorineural Hearing Loss.

PubMed

Attias, Joseph; Greenstein, Tally; Peled, Miriam; Ulanovski, David; Wohlgelernter, Jay; Raveh, Eyal

The aim of the study was to compare auditory and speech outcomes and electrical parameters on average 8 years after cochlear implantation between children with isolated auditory neuropathy (AN) and children with sensorineural hearing loss (SNHL). The study was conducted at a tertiary, university-affiliated pediatric medical center. The cohort included 16 patients with isolated AN with current age of 5 to 12.2 years who had been using a cochlear implant for at least 3.4 years and 16 control patients with SNHL matched for duration of deafness, age at implantation, type of implant, and unilateral/bilateral implant placement. All participants had had extensive auditory rehabilitation before and after implantation, including the use of conventional hearing aids. Most patients received Cochlear Nucleus devices, and the remainder either Med-El or Advanced Bionics devices. Unaided pure-tone audiograms were evaluated before and after implantation. Implantation outcomes were assessed by auditory and speech recognition tests in quiet and in noise. Data were also collected on the educational setting at 1 year after implantation and at school age. The electrical stimulation measures were evaluated only in the Cochlear Nucleus implant recipients in the two groups. Similar mapping and electrical measurement techniques were used in the two groups. Electrical thresholds, comfortable level, dynamic range, and objective neural response telemetry threshold were measured across the 22-electrode array in each patient. Main outcome measures were between-group differences in the following parameters: (1) Auditory and speech tests. (2) Residual hearing. (3) Electrical stimulation parameters. (4) Correlations of residual hearing at low frequencies with electrical thresholds at the basal, middle, and apical electrodes. The children with isolated AN performed equally well to the children with SNHL on auditory and speech recognition tests in both quiet and noise. More children in the AN group than the SNHL group were attending mainstream educational settings at school age, but the difference was not statistically significant. Significant between-group differences were noted in electrical measurements: the AN group was characterized by a lower current charge to reach subjective electrical thresholds, lower comfortable level and dynamic range, and lower telemetric neural response threshold. Based on pure-tone audiograms, the children with AN also had more residual hearing before and after implantation. Highly positive coefficients were found on correlation analysis between T levels across the basal and midcochlear electrodes and low-frequency acoustic thresholds. Prelingual children with isolated AN who fail to show expected oral and auditory progress after extensive rehabilitation with conventional hearing aids should be considered for cochlear implantation. Children with isolated AN had similar pattern as children with SNHL on auditory performance tests after cochlear implantation. The lower current charge required to evoke subjective and objective electrical thresholds in children with AN compared with children with SNHL may be attributed to the contribution to electrophonic hearing from the remaining neurons and hair cells. In addition, it is also possible that mechanical stimulation of the basilar membrane, as in acoustic stimulation, is added to the electrical stimulation of the cochlear implant.
How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

PubMed

Ellis, Rachel J; Rönnberg, Jerker

2015-01-01

Proactive interference (PI) is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility.
How does susceptibility to proactive interference relate to speech recognition in aided and unaided conditions?

PubMed Central

Ellis, Rachel J.; Rönnberg, Jerker

2015-01-01

Proactive interference (PI) is the capacity to resist interference to the acquisition of new memories from information stored in the long-term memory. Previous research has shown that PI correlates significantly with the speech-in-noise recognition scores of younger adults with normal hearing. In this study, we report the results of an experiment designed to investigate the extent to which tests of visual PI relate to the speech-in-noise recognition scores of older adults with hearing loss, in aided and unaided conditions. The results suggest that measures of PI correlate significantly with speech-in-noise recognition only in the unaided condition. Furthermore the relation between PI and speech-in-noise recognition differs to that observed in younger listeners without hearing loss. The findings suggest that the relation between PI tests and the speech-in-noise recognition scores of older adults with hearing loss relates to capability of the test to index cognitive flexibility. PMID:26283981
Comparison of bimodal and bilateral cochlear implant users on speech recognition with competing talker, music perception, affective prosody discrimination, and talker identification.

PubMed

Cullington, Helen E; Zeng, Fan-Gang

2011-02-01

Despite excellent performance in speech recognition in quiet, most cochlear implant users have great difficulty with speech recognition in noise, music perception, identifying tone of voice, and discriminating different talkers. This may be partly due to the pitch coding in cochlear implant speech processing. Most current speech processing strategies use only the envelope information; the temporal fine structure is discarded. One way to improve electric pitch perception is to use residual acoustic hearing via a hearing aid on the nonimplanted ear (bimodal hearing). This study aimed to test the hypothesis that bimodal users would perform better than bilateral cochlear implant users on tasks requiring good pitch perception. Four pitch-related tasks were used. 1. Hearing in Noise Test (HINT) sentences spoken by a male talker with a competing female, male, or child talker. 2. Montreal Battery of Evaluation of Amusia. This is a music test with six subtests examining pitch, rhythm and timing perception, and musical memory. 3. Aprosodia Battery. This has five subtests evaluating aspects of affective prosody and recognition of sarcasm. 4. Talker identification using vowels spoken by 10 different talkers (three men, three women, two boys, and two girls). Bilateral cochlear implant users were chosen as the comparison group. Thirteen bimodal and 13 bilateral adult cochlear implant users were recruited; all had good speech perception in quiet. There were no significant differences between the mean scores of the bimodal and bilateral groups on any of the tests, although the bimodal group did perform better than the bilateral group on almost all tests. Performance on the different pitch-related tasks was not correlated, meaning that if a subject performed one task well they would not necessarily perform well on another. The correlation between the bimodal users' hearing threshold levels in the aided ear and their performance on these tasks was weak. Although the bimodal cochlear implant group performed better than the bilateral group on most parts of the four pitch-related tests, the differences were not statistically significant. The lack of correlation between test results shows that the tasks used are not simply providing a measure of pitch ability. Even if the bimodal users have better pitch perception, the real-world tasks used are reflecting more diverse skills than pitch. This research adds to the existing speech perception, language, and localization studies that show no significant difference between bimodal and bilateral cochlear implant users.
Visual face-movement sensitive cortex is relevant for auditory-only speech recognition.

PubMed

Riedel, Philipp; Ragert, Patrick; Schelinski, Stefanie; Kiebel, Stefan J; von Kriegstein, Katharina

2015-07-01

It is commonly assumed that the recruitment of visual areas during audition is not relevant for performing auditory tasks ('auditory-only view'). According to an alternative view, however, the recruitment of visual cortices is thought to optimize auditory-only task performance ('auditory-visual view'). This alternative view is based on functional magnetic resonance imaging (fMRI) studies. These studies have shown, for example, that even if there is only auditory input available, face-movement sensitive areas within the posterior superior temporal sulcus (pSTS) are involved in understanding what is said (auditory-only speech recognition). This is particularly the case when speakers are known audio-visually, that is, after brief voice-face learning. Here we tested whether the left pSTS involvement is causally related to performance in auditory-only speech recognition when speakers are known by face. To test this hypothesis, we applied cathodal transcranial direct current stimulation (tDCS) to the pSTS during (i) visual-only speech recognition of a speaker known only visually to participants and (ii) auditory-only speech recognition of speakers they learned by voice and face. We defined the cathode as active electrode to down-regulate cortical excitability by hyperpolarization of neurons. tDCS to the pSTS interfered with visual-only speech recognition performance compared to a control group without pSTS stimulation (tDCS to BA6/44 or sham). Critically, compared to controls, pSTS stimulation additionally decreased auditory-only speech recognition performance selectively for voice-face learned speakers. These results are important in two ways. First, they provide direct evidence that the pSTS is causally involved in visual-only speech recognition; this confirms a long-standing prediction of current face-processing models. Secondly, they show that visual face-sensitive pSTS is causally involved in optimizing auditory-only speech recognition. These results are in line with the 'auditory-visual view' of auditory speech perception, which assumes that auditory speech recognition is optimized by using predictions from previously encoded speaker-specific audio-visual internal models. Copyright © 2015 Elsevier Ltd. All rights reserved.
Automatic Speech Recognition from Neural Signals: A Focused Review.

PubMed

Herff, Christian; Schultz, Tanja

2016-01-01

Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Distributed Fusion in Sensor Networks with Information Genealogy

DTIC Science & Technology

2011-06-28

image processing [2], acoustic and speech recognition [3], multitarget tracking [4], distributed fusion [5], and Bayesian inference [6-7]. For...Adaptation for Distant-Talking Speech Recognition." in Proc Acoustics. Speech , and Signal Processing, 2004 |4| Y Bar-Shalom and T 1-. Fortmann...used in speech recognition and other classification applications [8]. But their use in underwater mine classification is limited. In this paper, we
Mispronunciation Detection for Language Learning and Speech Recognition Adaptation

ERIC Educational Resources Information Center

Ge, Zhenhao

2013-01-01

The areas of "mispronunciation detection" (or "accent detection" more specifically) within the speech recognition community are receiving increased attention now. Two application areas, namely language learning and speech recognition adaptation, are largely driving this research interest and are the focal points of this work.…
From Birdsong to Human Speech Recognition: Bayesian Inference on a Hierarchy of Nonlinear Dynamical Systems

PubMed Central

Yildiz, Izzet B.; von Kriegstein, Katharina; Kiebel, Stefan J.

2013-01-01

Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents—an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments. PMID:24068902
From birdsong to human speech recognition: bayesian inference on a hierarchy of nonlinear dynamical systems.

PubMed

Yildiz, Izzet B; von Kriegstein, Katharina; Kiebel, Stefan J

2013-01-01

Our knowledge about the computational mechanisms underlying human learning and recognition of sound sequences, especially speech, is still very limited. One difficulty in deciphering the exact means by which humans recognize speech is that there are scarce experimental findings at a neuronal, microscopic level. Here, we show that our neuronal-computational understanding of speech learning and recognition may be vastly improved by looking at an animal model, i.e., the songbird, which faces the same challenge as humans: to learn and decode complex auditory input, in an online fashion. Motivated by striking similarities between the human and songbird neural recognition systems at the macroscopic level, we assumed that the human brain uses the same computational principles at a microscopic level and translated a birdsong model into a novel human sound learning and recognition model with an emphasis on speech. We show that the resulting Bayesian model with a hierarchy of nonlinear dynamical systems can learn speech samples such as words rapidly and recognize them robustly, even in adverse conditions. In addition, we show that recognition can be performed even when words are spoken by different speakers and with different accents-an everyday situation in which current state-of-the-art speech recognition models often fail. The model can also be used to qualitatively explain behavioral data on human speech learning and derive predictions for future experiments.
Objective Assessment of Listening Effort: Coregistration of Pupillometry and EEG.

PubMed

Miles, Kelly; McMahon, Catherine; Boisvert, Isabelle; Ibrahim, Ronny; de Lissa, Peter; Graham, Petra; Lyxell, Björn

2017-01-01

Listening to speech in noise is effortful, particularly for people with hearing impairment. While it is known that effort is related to a complex interplay between bottom-up and top-down processes, the cognitive and neurophysiological mechanisms contributing to effortful listening remain unknown. Therefore, a reliable physiological measure to assess effort remains elusive. This study aimed to determine whether pupil dilation and alpha power change, two physiological measures suggested to index listening effort, assess similar processes. Listening effort was manipulated by parametrically varying spectral resolution (16- and 6-channel noise vocoding) and speech reception thresholds (SRT; 50% and 80%) while 19 young, normal-hearing adults performed a speech recognition task in noise. Results of off-line sentence scoring showed discrepancies between the target SRTs and the true performance obtained during the speech recognition task. For example, in the SRT80% condition, participants scored an average of 64.7%. Participants' true performance levels were therefore used for subsequent statistical modelling. Results showed that both measures appeared to be sensitive to changes in spectral resolution (channel vocoding), while pupil dilation only was also significantly related to their true performance levels (%) and task accuracy (i.e., whether the response was correctly or partially recalled). The two measures were not correlated, suggesting they each may reflect different cognitive processes involved in listening effort. This combination of findings contributes to a growing body of research aiming to develop an objective measure of listening effort.
Auditory rehabilitation after stroke: treatment of auditory processing disorders in stroke patients with personal frequency-modulated (FM) systems.

PubMed

Koohi, Nehzat; Vickers, Deborah; Chandrashekar, Hoskote; Tsang, Benjamin; Werring, David; Bamiou, Doris-Eva

2017-03-01

Auditory disability due to impaired auditory processing (AP) despite normal pure-tone thresholds is common after stroke, and it leads to isolation, reduced quality of life and physical decline. There are currently no proven remedial interventions for AP deficits in stroke patients. This is the first study to investigate the benefits of personal frequency-modulated (FM) systems in stroke patients with disordered AP. Fifty stroke patients had baseline audiological assessments, AP tests and completed the (modified) Amsterdam Inventory for Auditory Disability and Hearing Handicap Inventory for Elderly questionnaires. Nine out of these 50 patients were diagnosed with disordered AP based on severe deficits in understanding speech in background noise but with normal pure-tone thresholds. These nine patients underwent spatial speech-in-noise testing in a sound-attenuating chamber (the "crescent of sound") with and without FM systems. The signal-to-noise ratio (SNR) for 50% correct speech recognition performance was measured with speech presented from 0° azimuth and competing babble from ±90° azimuth. Spatial release from masking (SRM) was defined as the difference between SNRs measured with co-located speech and babble and SNRs measured with spatially separated speech and babble. The SRM significantly improved when babble was spatially separated from target speech, while the patients had the FM systems in their ears compared to without the FM systems. Personal FM systems may substantially improve speech-in-noise deficits in stroke patients who are not eligible for conventional hearing aids. FMs are feasible in stroke patients and show promise to address impaired AP after stroke. Implications for Rehabilitation This is the first study to investigate the benefits of personal frequency-modulated (FM) systems in stroke patients with disordered AP. All cases significantly improved speech perception in noise with the FM systems, when noise was spatially separated from the speech signal by 90° compared with unaided listening. Personal FM systems are feasible in stroke patients, and may be of benefit in just under 20% of this population, who are not eligible for conventional hearing aids.
Statistical assessment of speech system performance

NASA Technical Reports Server (NTRS)

Moshier, Stephen L.

1977-01-01

Methods for the normalization of performance tests results of speech recognition systems are presented. Technological accomplishments in speech recognition systems, as well as planned research activities are described.

Outcomes of Late Implantation in Usher Syndrome Patients.

PubMed

Hoshino, Ana Cristina H; Echegoyen, Agustina; Goffi-Gomez, Maria Valéria Schmidt; Tsuji, Robinson Koji; Bento, Ricardo Ferreira

2017-04-01

Introduction Usher syndrome (US) is an autosomal recessive disorder characterized by hearing loss and progressive visual impairment. Some deaf Usher syndrome patients learn to communicate using sign language. During adolescence, as they start losing vision, they are usually referred to cochlear implantation as a salvage for their new condition. Is a late implantation beneficial to these children? Objective The objective of this study is to describe the outcomes of US patients who received cochlear implants at a later age. Methods This is a retrospective study of ten patients diagnosed with US1. We collected pure-tone thresholds and speech perception tests from pre and one-year post implant. Results Average age at implantation was 18.9 years (5-49). Aided average thresholds were 103 dB HL and 35 dB HL pre and one-year post implant, respectively. Speech perception was only possible to be measured in four patients preoperatively, who scored 13.3; 26.67; 46% vowels and 56% 4-choice. All patients except one had some kind of communication. Two were bilingual. After one year of using the device, seven patients were able to perform the speech tests (from four-choice to close set sentences) and three patients abandoned the use of the implant. Conclusion We observed that detection of sounds can be achieved with late implantation, but speech recognition is only possible in patients with previous hearing stimulation, since it depends on the development of hearing skills and the maturation of the auditory pathways.
Building Searchable Collections of Enterprise Speech Data.

ERIC Educational Resources Information Center

Cooper, James W.; Viswanathan, Mahesh; Byron, Donna; Chan, Margaret

The study has applied speech recognition and text-mining technologies to a set of recorded outbound marketing calls and analyzed the results. Since speaker-independent speech recognition technology results in a significantly lower recognition rate than that found when the recognizer is trained for a particular speaker, a number of post-processing…
Masked Speech Recognition and Reading Ability in School-Age Children: Is There a Relationship?

ERIC Educational Resources Information Center

Miller, Gabrielle; Lewis, Barbara; Benchek, Penelope; Buss, Emily; Calandruccio, Lauren

2018-01-01

Purpose: The relationship between reading (decoding) skills, phonological processing abilities, and masked speech recognition in typically developing children was explored. This experiment was designed to evaluate the relationship between phonological processing and decoding abilities and 2 aspects of masked speech recognition in typically…
Six characteristics of effective structured reporting and the inevitable integration with speech recognition.

PubMed

Liu, David; Zucherman, Mark; Tulloss, William B

2006-03-01

The reporting of radiological images is undergoing dramatic changes due to the introduction of two new technologies: structured reporting and speech recognition. Each technology has its own unique advantages. The highly organized content of structured reporting facilitates data mining and billing, whereas speech recognition offers a natural succession from the traditional dictation-transcription process. This article clarifies the distinction between the process and outcome of structured reporting, describes fundamental requirements for any effective structured reporting system, and describes the potential development of a novel, easy-to-use, customizable structured reporting system that incorporates speech recognition. This system should have all the advantages derived from structured reporting, accommodate a wide variety of user needs, and incorporate speech recognition as a natural component and extension of the overall reporting process.
Interdependence of linguistic and indexical speech perception skills in school-age children with early cochlear implantation.

PubMed

Geers, Ann E; Davidson, Lisa S; Uchanski, Rosalie M; Nicholas, Johanna G

2013-09-01

This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Preimplant-aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Sixty 9- to 12-year olds, first implanted at an early age (12 to 38 months), participated in a comprehensive test battery that included the following LSP skills: (1) recognition of monosyllabic words at loud and soft levels, (2) repetition of phonemes and suprasegmental features from nonwords, and (3) recognition of key words from sentences presented within a noise background, and the following ISP skills: (1) discrimination of across-gender and within-gender (female) talkers and (2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the nonword repetition, and talker- and emotion-perception tasks for comparison. Word-recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of key words presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from nonword stimuli at about the same level of accuracy as suprasegmental attributes (70 and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = 0.76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world.
Interdependence of Linguistic and Indexical Speech Perception Skills in School-Aged Children with Early Cochlear Implantation

PubMed Central

Geers, Ann; Davidson, Lisa; Uchanski, Rosalie; Nicholas, Johanna

2013-01-01

Objectives This study documented the ability of experienced pediatric cochlear implant (CI) users to perceive linguistic properties (what is said) and indexical attributes (emotional intent and talker identity) of speech, and examined the extent to which linguistic (LSP) and indexical (ISP) perception skills are related. Pre-implant aided hearing, age at implantation, speech processor technology, CI-aided thresholds, sequential bilateral cochlear implantation, and academic integration with hearing age-mates were examined for their possible relationships to both LSP and ISP skills. Design Sixty 9–12 year olds, first implanted at an early age (12–38 months), participated in a comprehensive test battery that included the following LSP skills: 1) recognition of monosyllabic words at loud and soft levels, 2) repetition of phonemes and suprasegmental features from non-words, and 3) recognition of keywords from sentences presented within a noise background, and the following ISP skills: 1) discrimination of male from female and female from female talkers and 2) identification and discrimination of emotional content from spoken sentences. A group of 30 age-matched children without hearing loss completed the non-word repetition, and talker- and emotion-perception tasks for comparison. Results Word recognition scores decreased with signal level from a mean of 77% correct at 70 dB SPL to 52% at 50 dB SPL. On average, CI users recognized 50% of keywords presented in sentences that were 9.8 dB above background noise. Phonetic properties were repeated from non-word stimuli at about the same level of accuracy as suprasegmental attributes (70% and 75%, respectively). The majority of CI users identified emotional content and differentiated talkers significantly above chance levels. Scores on LSP and ISP measures were combined into separate principal component scores and these components were highly correlated (r = .76). Both LSP and ISP component scores were higher for children who received a CI at the youngest ages, upgraded to more recent CI technology and had lower CI-aided thresholds. Higher scores, for both LSP and ISP components, were also associated with higher language levels and mainstreaming at younger ages. Higher ISP scores were associated with better social skills. Conclusions Results strongly support a link between indexical and linguistic properties in perceptual analysis of speech. These two channels of information appear to be processed together in parallel by the auditory system and are inseparable in perception. Better speech performance, for both linguistic and indexical perception, is associated with younger age at implantation and use of more recent speech processor technology. Children with better speech perception demonstrated better spoken language, earlier academic mainstreaming, and placement in more typically-sized classrooms (i.e., >20 students). Well-developed social skills were more highly associated with the ability to discriminate the nuances of talker identity and emotion than with the ability to recognize words and sentences through listening. The extent to which early cochlear implantation enabled these early-implanted children to make use of both linguistic and indexical properties of speech influenced not only their development of spoken language, but also their ability to function successfully in a hearing world. PMID:23652814
Military applications of automatic speech recognition and future requirements

NASA Technical Reports Server (NTRS)

Beek, Bruno; Cupples, Edward J.

1977-01-01

An updated summary of the state-of-the-art of automatic speech recognition and its relevance to military applications is provided. A number of potential systems for military applications are under development. These include: (1) digital narrowband communication systems; (2) automatic speech verification; (3) on-line cartographic processing unit; (4) word recognition for militarized tactical data system; and (5) voice recognition and synthesis for aircraft cockpit.
Speech Recognition as a Support Service for Deaf and Hard of Hearing Students: Adaptation and Evaluation. Final Report to Spencer Foundation.

ERIC Educational Resources Information Center

Stinson, Michael; Elliot, Lisa; McKee, Barbara; Coyne, Gina

This report discusses a project that adapted new automatic speech recognition (ASR) technology to provide real-time speech-to-text transcription as a support service for students who are deaf and hard of hearing (D/HH). In this system, as the teacher speaks, a hearing intermediary, or captionist, dictates into the speech recognition system in a…
Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearinga)

PubMed Central

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-01-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes. PMID:26233047
Comparing auditory filter bandwidths, spectral ripple modulation detection, spectral ripple discrimination, and speech recognition: Normal and impaired hearing.

PubMed

Davies-Venn, Evelyn; Nelson, Peggy; Souza, Pamela

2015-07-01

Some listeners with hearing loss show poor speech recognition scores in spite of using amplification that optimizes audibility. Beyond audibility, studies have suggested that suprathreshold abilities such as spectral and temporal processing may explain differences in amplified speech recognition scores. A variety of different methods has been used to measure spectral processing. However, the relationship between spectral processing and speech recognition is still inconclusive. This study evaluated the relationship between spectral processing and speech recognition in listeners with normal hearing and with hearing loss. Narrowband spectral resolution was assessed using auditory filter bandwidths estimated from simultaneous notched-noise masking. Broadband spectral processing was measured using the spectral ripple discrimination (SRD) task and the spectral ripple depth detection (SMD) task. Three different measures were used to assess unamplified and amplified speech recognition in quiet and noise. Stepwise multiple linear regression revealed that SMD at 2.0 cycles per octave (cpo) significantly predicted speech scores for amplified and unamplified speech in quiet and noise. Commonality analyses revealed that SMD at 2.0 cpo combined with SRD and equivalent rectangular bandwidth measures to explain most of the variance captured by the regression model. Results suggest that SMD and SRD may be promising clinical tools for diagnostic evaluation and predicting amplification outcomes.
Microscopic prediction of speech recognition for listeners with normal hearing in noise using an auditory model.

PubMed

Jürgens, Tim; Brand, Thomas

2009-11-01

This study compares the phoneme recognition performance in speech-shaped noise of a microscopic model for speech recognition with the performance of normal-hearing listeners. "Microscopic" is defined in terms of this model twofold. First, the speech recognition rate is predicted on a phoneme-by-phoneme basis. Second, microscopic modeling means that the signal waveforms to be recognized are processed by mimicking elementary parts of human's auditory processing. The model is based on an approach by Holube and Kollmeier [J. Acoust. Soc. Am. 100, 1703-1716 (1996)] and consists of a psychoacoustically and physiologically motivated preprocessing and a simple dynamic-time-warp speech recognizer. The model is evaluated while presenting nonsense speech in a closed-set paradigm. Averaged phoneme recognition rates, specific phoneme recognition rates, and phoneme confusions are analyzed. The influence of different perceptual distance measures and of the model's a-priori knowledge is investigated. The results show that human performance can be predicted by this model using an optimal detector, i.e., identical speech waveforms for both training of the recognizer and testing. The best model performance is yielded by distance measures which focus mainly on small perceptual distances and neglect outliers.
The influence of speech rate and accent on access and use of semantic information.

PubMed

Sajin, Stanislav M; Connine, Cynthia M

2017-04-01

Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
Use of intonation contours for speech recognition in noise by cochlear implant recipients.

PubMed

Meister, Hartmut; Landwehr, Markus; Pyschny, Verena; Grugel, Linda; Walger, Martin

2011-05-01

The corruption of intonation contours has detrimental effects on sentence-based speech recognition in normal-hearing listeners Binns and Culling [(2007). J. Acoust. Soc. Am. 122, 1765-1776]. This paper examines whether this finding also applies to cochlear implant (CI) recipients. The subjects' F0-discrimination and speech perception in the presence of noise were measured, using sentences with regular and inverted F0-contours. The results revealed that speech recognition for regular contours was significantly better than for inverted contours. This difference was related to the subjects' F0-discrimination providing further evidence that the perception of intonation patterns is important for the CI-mediated speech recognition in noise.
Can unaided non-linguistic measures predict cochlear implant candidacy?

PubMed Central

Shim, Hyun Joon; Won, Jong Ho; Moon, Il Joon; Anderson, Elizabeth S.; Drennan, Ward R.; McIntosh, Nancy E.; Weaver, Edward M.; Rubinstein, Jay T.

2014-01-01

Objective To determine if unaided, non-linguistic psychoacoustic measures can be effective in evaluating cochlear implant (CI) candidacy. Study Design Prospective split-cohort study including predictor development subgroup and independent predictor validation subgroup. Setting Tertiary referral center. Subjects Fifteen subjects (28 ears) with hearing loss were recruited from patients visiting the University of Washington Medical Center for CI evaluation. Methods Spectral-ripple discrimination (using a 13-dB modulation depth) and temporal modulation detection using 10- and 100-Hz modulation frequencies were assessed with stimuli presented through insert earphones. Correlations between performance for psychoacoustic tasks and speech perception tasks were assessed. Receiver operating characteristic (ROC) curve analysis was performed to estimate the optimal psychoacoustic score for CI candidacy evaluation in the development subgroup and then tested in an independent sample. Results Strong correlations were observed between spectral-ripple thresholds and both aided sentence recognition and unaided word recognition. Weaker relationships were found between temporal modulation detection and speech tests. ROC curve analysis demonstrated that the unaided spectral ripple discrimination shows a good sensitivity, specificity, positive predictive value, and negative predictive value compared to the current gold standard, aided sentence recognition. Conclusions Results demonstrated that the unaided spectral-ripple discrimination test could be a promising tool for evaluating CI candidacy. PMID:24901669
Modernising speech audiometry: using a smartphone application to test word recognition.

PubMed

van Zyl, Marianne; Swanepoel, De Wet; Myburgh, Hermanus C

2018-04-20

This study aimed to develop and assess a method to measure word recognition abilities using a smartphone application (App) connected to an audiometer. Word lists were recorded in South African English and Afrikaans. Analyses were conducted to determine the effect of hardware used for presentation (computer, compact-disc player, or smartphone) on the frequency content of recordings. An Android App was developed to enable presentation of recorded materials via a smartphone connected to the auxiliary input of the audiometer. Experiments were performed to test feasibility and validity of the developed App and recordings. Participants were 100 young adults (18-30 years) with pure tone thresholds ≤15 dB across the frequency spectrum (250-8000 Hz). Hardware used for presentation had no significant effect on the frequency content of recordings. Listening experiments indicated good inter-list reliability for recordings in both languages, with no significant differences between scores on different lists at each of the tested intensities. Performance-intensity functions had slopes of 4.05%/dB for English and 4.75%/dB for Afrikaans lists at the 50% point. The developed smartphone App constitutes a feasible and valid method for measuring word recognition scores, and can support standardisation and accessibility of recorded speech audiometry.
Does quality of life depend on speech recognition performance for adult cochlear implant users?

PubMed

Capretta, Natalie R; Moberly, Aaron C

2016-03-01

Current postoperative clinical outcome measures for adults receiving cochlear implants (CIs) consist of testing speech recognition, primarily under quiet conditions. However, it is strongly suspected that results on these measures may not adequately reflect patients' quality of life (QOL) using their implants. This study aimed to evaluate whether QOL for CI users depends on speech recognition performance. Twenty-three postlingually deafened adults with CIs were assessed. Participants were tested for speech recognition (Central Institute for the Deaf word and AzBio sentence recognition in quiet) and completed three QOL measures-the Nijmegen Cochlear Implant Questionnaire; either the Hearing Handicap Inventory for Adults or the Hearing Handicap Inventory for the Elderly; and the Speech, Spatial and Qualities of Hearing Scale questionnaires-to assess a variety of QOL factors. Correlations were sought between speech recognition and QOL scores. Demographics, audiologic history, language, and cognitive skills were also examined as potential predictors of QOL. Only a few QOL scores significantly correlated with postoperative sentence or word recognition in quiet, and correlations were primarily isolated to speech-related subscales on QOL measures. Poorer pre- and postoperative unaided hearing predicted better QOL. Socioeconomic status, duration of deafness, age at implantation, duration of CI use, reading ability, vocabulary size, and cognitive status did not consistently predict QOL scores. For adult, postlingually deafened CI users, clinical speech recognition measures in quiet do not correlate broadly with QOL. Results suggest the need for additional outcome measures of the benefits and limitations of cochlear implantation. 4. Laryngoscope, 126:699-706, 2016. © 2015 The American Laryngological, Rhinological and Otological Society, Inc.
Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

PubMed

Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

2014-12-01

In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.
Speech processing using maximum likelihood continuity mapping

DOEpatents

Hogden, John E.

2000-01-01

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Speech processing using maximum likelihood continuity mapping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hogden, J.E.

Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions. In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described. The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds. The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Perceptual consequences of normal and abnormal peripheral compression: Potential links between psychoacoustics and speech perception

NASA Astrophysics Data System (ADS)

Oxenham, Andrew J.; Rosengard, Peninah S.; Braida, Louis D.

2004-05-01

Cochlear damage can lead to a reduction in the overall amount of peripheral auditory compression, presumably due to outer hair cell (OHC) loss or dysfunction. The perceptual consequences of functional OHC loss include loudness recruitment and reduced dynamic range, poorer frequency selectivity, and poorer effective temporal resolution. These in turn may lead to a reduced ability to make use of spectral and temporal fluctuations in background noise when listening to a target sound, such as speech. We tested the effect of OHC function on speech reception in hearing-impaired listeners by comparing psychoacoustic measures of cochlear compression and sentence recognition in a variety of noise backgrounds. In line with earlier studies, we found weak (nonsignificant) correlations between the psychoacoustic tasks and speech reception thresholds in quiet or in steady-state noise. However, when spectral and temporal fluctuations were introduced in the masker, speech reception improved to an extent that was well predicted by the psychoacoustic measures. Thus, our initial results suggest a strong relationship between measures of cochlear compression and the ability of listeners to take advantage of spectral and temporal masker fluctuations in recognizing speech. [Work supported by NIH Grants Nos. R01DC03909, T32DC00038, and R01DC00117.

Evidence-based occupational hearing screening II: validation of a screening methodology using measures of functional hearing ability.

PubMed

Soli, Sigfrid D; Amano-Kusumoto, Akiko; Clavier, Odile; Wilbur, Jed; Casto, Kristen; Freed, Daniel; Laroche, Chantal; Vaillancourt, Véronique; Giguère, Christian; Dreschler, Wouter A; Rhebergen, Koenraad S

2018-05-01

Validate use of the Extended Speech Intelligibility Index (ESII) for prediction of speech intelligibility in non-stationary real-world noise environments. Define a means of using these predictions for objective occupational hearing screening for hearing-critical public safety and law enforcement jobs. Analyses of predicted and measured speech intelligibility in recordings of real-world noise environments were performed in two studies using speech recognition thresholds (SRTs) and intelligibility measures. ESII analyses of the recordings were used to predict intelligibility. Noise recordings were made in prison environments and at US Army facilities for training ground and airborne forces. Speech materials included full bandwidth sentences and bandpass filtered sentences that simulated radio transmissions. A total of 22 adults with normal hearing (NH) and 15 with mild-moderate hearing impairment (HI) participated in the two studies. Average intelligibility predictions for individual NH and HI subjects were accurate in both studies (r 2 ≥ 0.94). Pooled predictions were slightly less accurate (0.78 ≤ r 2 ≤ 0.92). An individual's SRT and audiogram can accurately predict the likelihood of effective speech communication in noise environments with known ESII characteristics, where essential hearing-critical tasks are performed. These predictions provide an objective means of occupational hearing screening.
Supporting Dictation Speech Recognition Error Correction: The Impact of External Information

ERIC Educational Resources Information Center

Shi, Yongmei; Zhou, Lina

2011-01-01

Although speech recognition technology has made remarkable progress, its wide adoption is still restricted by notable effort made and frustration experienced by users while correcting speech recognition errors. One of the promising ways to improve error correction is by providing user support. Although support mechanisms have been proposed for…
Application of an auditory model to speech recognition.

PubMed

Cohen, J R

1989-06-01

Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.
Stages of change in adults who have failed an online hearing screening.

PubMed

Laplante-Lévesque, Ariane; Brännström, K Jonas; Ingo, Elisabeth; Andersson, Gerhard; Lunner, Thomas

2015-01-01

Hearing screening has been proposed to promote help-seeking and rehabilitation in adults with hearing impairment. However, some longitudinal studies point to low help-seeking and subsequent rehabilitation after a failed hearing screening (positive screening result). Some barriers to help-seeking and rehabilitation could be intrinsic to the profiles and needs of people who have failed a hearing screening. Theories of health behavior change could help to understand this population. One of these theories is the transtheoretical (stages-of-change) model of health behavior change, which describes profiles and needs of people facing behavior changes such as seeking help and taking up rehabilitation. According to this model, people go through distinct stages toward health behavior change: precontemplation, contemplation, action, and finally, maintenance. The present study describes the psychometric properties (construct validity) of the stages of change in adults who have failed an online hearing screening. Stages of change were measured with the University of Rhode Island Change Assessment (URICA). Principal component analysis is presented, along with cluster analysis. Internal consistency was investigated. Finally, relationships between URICA scores and speech-in-noise recognition threshold, self-reported hearing disability, and self-reported duration of hearing disability are presented. In total, 224 adults who had failed a Swedish online hearing screening test (measure of speech-in-noise recognition) completed further questionnaires online, including the URICA. A principal component analysis identified the stages of precontemplation, contemplation, and action, plus an additional stage, termed preparation (between contemplation and action). According to the URICA, half (50%) of the participants were in the preparation stage of change. The contemplation stage was represented by 38% of participants, while 9% were in the precontemplation stage. Finally, the action stage was represented by approximately 3% of the participants. Cluster analysis identified four stages-of-change clusters: they were named decision making (44% of sample), participation (28% of sample), indecision (16% of sample), and reluctance (12% of sample). The construct validity of the model was good. Participants who reported a more advanced stage of change had significantly greater self-reported hearing disability. However, participants who reported a more advanced stage of change did not have a significantly worse speech-in-noise recognition threshold or reported a significantly longer duration of hearing impairment. The additional stage this study uncovered, and which other studies have also uncovered, preparation, highlights the need for adequate guidance for adults who are yet to seek help for their hearing. The fact that very few people were in the action stage (approximately 3% of the sample) signals that screening alone is unlikely to be enough to improve help-seeking and rehabilitation rates. As expected, people in the later stages of change reported significantly greater hearing disability. The lack of significant relationships between stages-of-change measures and speech-in-noise recognition threshold and self-reported duration of hearing disability highlights the complex interplay between impairment, disability, and behaviors in adults who have failed an online hearing screening and who are yet to seek help.
Specific acoustic models for spontaneous and dictated style in indonesian speech recognition

NASA Astrophysics Data System (ADS)

Vista, C. B.; Satriawan, C. H.; Lestari, D. P.; Widyantoro, D. H.

2018-03-01

The performance of an automatic speech recognition system is affected by differences in speech style between the data the model is originally trained upon and incoming speech to be recognized. In this paper, the usage of GMM-HMM acoustic models for specific speech styles is investigated. We develop two systems for the experiments; the first employs a speech style classifier to predict the speech style of incoming speech, either spontaneous or dictated, then decodes this speech using an acoustic model specifically trained for that speech style. The second system uses both acoustic models to recognise incoming speech and decides upon a final result by calculating a confidence score of decoding. Results show that training specific acoustic models for spontaneous and dictated speech styles confers a slight recognition advantage as compared to a baseline model trained on a mixture of spontaneous and dictated training data. In addition, the speech style classifier approach of the first system produced slightly more accurate results than the confidence scoring employed in the second system.
The Development of the Text Reception Threshold Test: A Visual Analogue of the Speech Reception Threshold Test

ERIC Educational Resources Information Center

Zekveld, Adriana A.; George, Erwin L. J.; Kramer, Sophia E.; Goverts, S. Theo; Houtgast, Tammo

2007-01-01

Purpose: In this study, the authors aimed to develop a visual analogue of the widely used Speech Reception Threshold (SRT; R. Plomp & A. M. Mimpen, 1979b) test. The Text Reception Threshold (TRT) test, in which visually presented sentences are masked by a bar pattern, enables the quantification of modality-aspecific variance in speech-in-noise…
Increase in Speech Recognition Due to Linguistic Mismatch between Target and Masker Speech: Monolingual and Simultaneous Bilingual Performance

ERIC Educational Resources Information Center

Calandruccio, Lauren; Zhou, Haibo

2014-01-01

Purpose: To examine whether improved speech recognition during linguistically mismatched target-masker experiments is due to linguistic unfamiliarity of the masker speech or linguistic dissimilarity between the target and masker speech. Method: Monolingual English speakers (n = 20) and English-Greek simultaneous bilinguals (n = 20) listened to…
The influence of audibility on speech recognition with nonlinear frequency compression for children and adults with hearing loss

PubMed Central

McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.

2014-01-01

Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558
Neuroanatomical and resting state EEG power correlates of central hearing loss in older adults.

PubMed

Giroud, Nathalie; Hirsiger, Sarah; Muri, Raphaela; Kegel, Andrea; Dillier, Norbert; Meyer, Martin

2018-01-01

To gain more insight into central hearing loss, we investigated the relationship between cortical thickness and surface area, speech-relevant resting state EEG power, and above-threshold auditory measures in older adults and younger controls. Twenty-three older adults and 13 younger controls were tested with an adaptive auditory test battery to measure not only traditional pure-tone thresholds, but also above individual thresholds of temporal and spectral processing. The participants' speech recognition in noise (SiN) was evaluated, and a T1-weighted MRI image obtained for each participant. We then determined the cortical thickness (CT) and mean cortical surface area (CSA) of auditory and higher speech-relevant regions of interest (ROIs) with FreeSurfer. Further, we obtained resting state EEG from all participants as well as data on the intrinsic theta and gamma power lateralization, the latter in accordance with predictions of the Asymmetric Sampling in Time hypothesis regarding speech processing (Poeppel, Speech Commun 41:245-255, 2003). Methodological steps involved the calculation of age-related differences in behavior, anatomy and EEG power lateralization, followed by multiple regressions with anatomical ROIs as predictors for auditory performance. We then determined anatomical regressors for theta and gamma lateralization, and further constructed all regressions to investigate age as a moderator variable. Behavioral results indicated that older adults performed worse in temporal and spectral auditory tasks, and in SiN, despite having normal peripheral hearing as signaled by the audiogram. These behavioral age-related distinctions were accompanied by lower CT in all ROIs, while CSA was not different between the two age groups. Age modulated the regressions specifically in right auditory areas, where a thicker cortex was associated with better auditory performance in older adults. Moreover, a thicker right supratemporal sulcus predicted more rightward theta lateralization, indicating the functional relevance of the right auditory areas in older adults. The question how age-related cortical thinning and intrinsic EEG architecture relates to central hearing loss has so far not been addressed. Here, we provide the first neuroanatomical and neurofunctional evidence that cortical thinning and lateralization of speech-relevant frequency band power relates to the extent of age-related central hearing loss in older adults. The results are discussed within the current frameworks of speech processing and aging.
Mandarin-Speaking Children's Speech Recognition: Developmental Changes in the Influences of Semantic Context and F0 Contours.

PubMed

Zhou, Hong; Li, Yu; Liang, Meng; Guan, Connie Qun; Zhang, Linjun; Shu, Hua; Zhang, Yang

2017-01-01

The goal of this developmental speech perception study was to assess whether and how age group modulated the influences of high-level semantic context and low-level fundamental frequency ( F 0 ) contours on the recognition of Mandarin speech by elementary and middle-school-aged children in quiet and interference backgrounds. The results revealed different patterns for semantic and F 0 information. One the one hand, age group modulated significantly the use of F 0 contours, indicating that elementary school children relied more on natural F 0 contours than middle school children during Mandarin speech recognition. On the other hand, there was no significant modulation effect of age group on semantic context, indicating that children of both age groups used semantic context to assist speech recognition to a similar extent. Furthermore, the significant modulation effect of age group on the interaction between F 0 contours and semantic context revealed that younger children could not make better use of semantic context in recognizing speech with flat F 0 contours compared with natural F 0 contours, while older children could benefit from semantic context even when natural F 0 contours were altered, thus confirming the important role of F 0 contours in Mandarin speech recognition by elementary school children. The developmental changes in the effects of high-level semantic and low-level F 0 information on speech recognition might reflect the differences in auditory and cognitive resources associated with processing of the two types of information in speech perception.
Talker variability in audio-visual speech perception

PubMed Central

Heald, Shannon L. M.; Nusbaum, Howard C.

2014-01-01

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker’s face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker’s face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker’s face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred. PMID:25076919
Talker variability in audio-visual speech perception.

PubMed

Heald, Shannon L M; Nusbaum, Howard C

2014-01-01

A change in talker is a change in the context for the phonetic interpretation of acoustic patterns of speech. Different talkers have different mappings between acoustic patterns and phonetic categories and listeners need to adapt to these differences. Despite this complexity, listeners are adept at comprehending speech in multiple-talker contexts, albeit at a slight but measurable performance cost (e.g., slower recognition). So far, this talker variability cost has been demonstrated only in audio-only speech. Other research in single-talker contexts have shown, however, that when listeners are able to see a talker's face, speech recognition is improved under adverse listening (e.g., noise or distortion) conditions that can increase uncertainty in the mapping between acoustic patterns and phonetic categories. Does seeing a talker's face reduce the cost of word recognition in multiple-talker contexts? We used a speeded word-monitoring task in which listeners make quick judgments about target word recognition in single- and multiple-talker contexts. Results show faster recognition performance in single-talker conditions compared to multiple-talker conditions for both audio-only and audio-visual speech. However, recognition time in a multiple-talker context was slower in the audio-visual condition compared to audio-only condition. These results suggest that seeing a talker's face during speech perception may slow recognition by increasing the importance of talker identification, signaling to the listener a change in talker has occurred.
Implications of Minimizing Trauma During Conventional Cochlear Implantation

PubMed Central

Carlson, Matthew L.; Driscoll, Colin L. W.; Gifford, René H.; Service, Geoffrey J.; Tombers, Nicole M.; Hughes-Borst, Becky J.; Neff, Brian A.; Beatty, Charles W.

2014-01-01

Objective To describe the relationship between implantation-associated trauma and postoperative speech perception scores among adult and pediatric patients undergoing cochlear implantation using conventional length electrodes and minimally traumatic surgical techniques. Study Design Retrospective chart review (2002–2010). Setting Tertiary academic referral center. Patients All subjects with significant preoperative low-frequency hearing (≤70 dB HL at 250 Hz) who underwent cochlear implantation with a newer generation implant electrode (Nucleus Contour Advance, Advanced Bionics HR90K [1J and Helix], and Med El Sonata standard H array) were reviewed. Intervention(s) Preimplant and postimplant audiometric thresholds and speech recognition scores were recorded using the electronic medical record. Main Outcome Measure(s) Postimplantation pure tone threshold shifts were used as a surrogate measure for extent of intracochlear injury and correlated with postoperative speech perception scores. Results Between 2002 and 2010, 703 cochlear implant (CI) operations were performed. Data from 126 implants were included in the analysis. The mean preoperative low-frequency pure-tone average was 55.4 dB HL. Hearing preservation was observed in 55% of patients. Patients with hearing preservation were found to have significantly higher postoperative speech perception performance in the cochlear implantation-only condition than those who lost all residual hearing. Conclusion Conservation of acoustic hearing after conventional length cochlear implantation is unpredictable but remains a realistic goal. The combination of improved technology and refined surgical technique may allow for conservation of some residual hearing in more than 50% of patients. Germane to the conventional length CI recipient with substantial hearing loss, minimizing trauma allows for improved speech perception in the electric condition. These findings support the use of minimally traumatic techniques in all CI recipients, even those destined for electric-only stimulation. PMID:21659922
Methods and apparatus for non-acoustic speech characterization and recognition

DOEpatents

Holzrichter, John F.

1999-01-01

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Methods and apparatus for non-acoustic speech characterization and recognition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.

By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.
Voice technology and BBN

NASA Technical Reports Server (NTRS)

Wolf, Jared J.

1977-01-01

The following research was discussed: (1) speech signal processing; (2) automatic speech recognition; (3) continuous speech understanding; (4) speaker recognition; (5) speech compression; (6) subjective and objective evaluation of speech communication system; (7) measurement of the intelligibility and quality of speech when degraded by noise or other masking stimuli; (8) speech synthesis; (9) instructional aids for second-language learning and for training of the deaf; and (10) investigation of speech correlates of psychological stress. Experimental psychology, control systems, and human factors engineering, which are often relevant to the proper design and operation of speech systems are described.
Effects of Age and Working Memory Capacity on Speech Recognition Performance in Noise Among Listeners With Normal Hearing.

PubMed

Gordon-Salant, Sandra; Cole, Stacey Samuels

2016-01-01

This study aimed to determine if younger and older listeners with normal hearing who differ on working memory span perform differently on speech recognition tests in noise. Older adults typically exhibit poorer speech recognition scores in noise than younger adults, which is attributed primarily to poorer hearing sensitivity and more limited working memory capacity in older than younger adults. Previous studies typically tested older listeners with poorer hearing sensitivity and shorter working memory spans than younger listeners, making it difficult to discern the importance of working memory capacity on speech recognition. This investigation controlled for hearing sensitivity and compared speech recognition performance in noise by younger and older listeners who were subdivided into high and low working memory groups. Performance patterns were compared for different speech materials to assess whether or not the effect of working memory capacity varies with the demands of the specific speech test. The authors hypothesized that (1) normal-hearing listeners with low working memory span would exhibit poorer speech recognition performance in noise than those with high working memory span; (2) older listeners with normal hearing would show poorer speech recognition scores than younger listeners with normal hearing, when the two age groups were matched for working memory span; and (3) an interaction between age and working memory would be observed for speech materials that provide contextual cues. Twenty-eight older (61 to 75 years) and 25 younger (18 to 25 years) normal-hearing listeners were assigned to groups based on age and working memory status. Northwestern University Auditory Test No. 6 words and Institute of Electrical and Electronics Engineers sentences were presented in noise using an adaptive procedure to measure the signal-to-noise ratio corresponding to 50% correct performance. Cognitive ability was evaluated with two tests of working memory (Listening Span Test and Reading Span Test) and two tests of processing speed (Paced Auditory Serial Addition Test and The Letter Digit Substitution Test). Significant effects of age and working memory capacity were observed on the speech recognition measures in noise, but these effects were mediated somewhat by the speech signal. Specifically, main effects of age and working memory were revealed for both words and sentences, but the interaction between the two was significant for sentences only. For these materials, effects of age were observed for listeners in the low working memory groups only. Although all cognitive measures were significantly correlated with speech recognition in noise, working memory span was the most important variable accounting for speech recognition performance. The results indicate that older adults with high working memory capacity are able to capitalize on contextual cues and perform as well as young listeners with high working memory capacity for sentence recognition. The data also suggest that listeners with normal hearing and low working memory capacity are less able to adapt to distortion of speech signals caused by background noise, which requires the allocation of more processing resources to earlier processing stages. These results indicate that both younger and older adults with low working memory capacity and normal hearing are at a disadvantage for recognizing speech in noise.
Speech Recognition and Cognitive Skills in Bimodal Cochlear Implant Users

ERIC Educational Resources Information Center

Hua, Håkan; Johansson, Björn; Magnusson, Lennart; Lyxell, Björn; Ellis, Rachel J.

2017-01-01

Purpose: To examine the relation between speech recognition and cognitive skills in bimodal cochlear implant (CI) and hearing aid users. Method: Seventeen bimodal CI users (28-74 years) were recruited to the study. Speech recognition tests were carried out in quiet and in noise. The cognitive tests employed included the Reading Span Test and the…
Leveraging Automatic Speech Recognition Errors to Detect Challenging Speech Segments in TED Talks

ERIC Educational Resources Information Center

Mirzaei, Maryam Sadat; Meshgi, Kourosh; Kawahara, Tatsuya

2016-01-01

This study investigates the use of Automatic Speech Recognition (ASR) systems to epitomize second language (L2) listeners' problems in perception of TED talks. ASR-generated transcripts of videos often involve recognition errors, which may indicate difficult segments for L2 listeners. This paper aims to discover the root-causes of the ASR errors…
Visual Speech Primes Open-Set Recognition of Spoken Words

ERIC Educational Resources Information Center

Buchwald, Adam B.; Winters, Stephen J.; Pisoni, David B.

2009-01-01

Visual speech perception has become a topic of considerable interest to speech researchers. Previous research has demonstrated that perceivers neurally encode and use speech information from the visual modality, and this information has been found to facilitate spoken word recognition in tasks such as lexical decision (Kim, Davis, & Krins,…

Significance of parametric spectral ratio methods in detection and recognition of whispered speech

NASA Astrophysics Data System (ADS)

Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.

2012-12-01

In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.
Current trends in small vocabulary speech recognition for equipment control

NASA Astrophysics Data System (ADS)

Doukas, Nikolaos; Bardis, Nikolaos G.

2017-09-01

Speech recognition systems allow human - machine communication to acquire an intuitive nature that approaches the simplicity of inter - human communication. Small vocabulary speech recognition is a subset of the overall speech recognition problem, where only a small number of words need to be recognized. Speaker independent small vocabulary recognition can find significant applications in field equipment used by military personnel. Such equipment may typically be controlled by a small number of commands that need to be given quickly and accurately, under conditions where delicate manual operations are difficult to achieve. This type of application could hence significantly benefit by the use of robust voice operated control components, as they would facilitate the interaction with their users and render it much more reliable in times of crisis. This paper presents current challenges involved in attaining efficient and robust small vocabulary speech recognition. These challenges concern feature selection, classification techniques, speaker diversity and noise effects. A state machine approach is presented that facilitates the voice guidance of different equipment in a variety of situations.
Preschoolers Benefit From Visually Salient Speech Cues

PubMed Central

Holt, Rachael Frush

2015-01-01

Purpose This study explored visual speech influence in preschoolers using 3 developmentally appropriate tasks that vary in perceptual difficulty and task demands. They also examined developmental differences in the ability to use visually salient speech cues and visual phonological knowledge. Method Twelve adults and 27 typically developing 3- and 4-year-old children completed 3 audiovisual (AV) speech integration tasks: matching, discrimination, and recognition. The authors compared AV benefit for visually salient and less visually salient speech discrimination contrasts and assessed the visual saliency of consonant confusions in auditory-only and AV word recognition. Results Four-year-olds and adults demonstrated visual influence on all measures. Three-year-olds demonstrated visual influence on speech discrimination and recognition measures. All groups demonstrated greater AV benefit for the visually salient discrimination contrasts. AV recognition benefit in 4-year-olds and adults depended on the visual saliency of speech sounds. Conclusions Preschoolers can demonstrate AV speech integration. Their AV benefit results from efficient use of visually salient speech cues. Four-year-olds, but not 3-year-olds, used visual phonological knowledge to take advantage of visually salient speech cues, suggesting possible developmental differences in the mechanisms of AV benefit. PMID:25322336
[Cochlear implantation in patients with Waardenburg syndrome type II].

PubMed

Wan, Liangcai; Guo, Menghe; Chen, Shuaijun; Liu, Shuangriu; Chen, Hao; Gong, Jian

2010-05-01

To describe the multi-channel cochlear implantation in patients with Waardenburg syndrome including surgeries, pre and postoperative hearing assessments as well as outcomes of speech recognition. Multi-channel cochlear implantation surgeries have been performed in 12 cases with Waardenburg syndrome type II in our department from 2000 to 2008. All the patients received multi-channel cochlear implantation through transmastoid facial recess approach. The postoperative outcomes of 12 cases were compared with 12 cases with no inner ear malformation as a control group. The electrodes were totally inserted into the cochlear successfully, there was no facial paralysis and cerebrospinal fluid leakage occurred after operation. The hearing threshold in this series were similar to that of the normal cochlear implantation. After more than half a year of speech rehabilitation, the abilities of speech discrimination and spoken language of all the patients were improved compared with that of preoperation. Multi-channel cochlear implantation could be performed in the cases with Waardenburg syndrome, preoperative hearing and images assessments should be done.
Speech Perception in Noise by Children With Cochlear Implants

PubMed Central

Caldwell, Amanda; Nittrouer, Susan

2013-01-01

Purpose Common wisdom suggests that listening in noise poses disproportionately greater difficulty for listeners with cochlear implants (CIs) than for peers with normal hearing (NH). The purpose of this study was to examine phonological, language, and cognitive skills that might help explain speech-in-noise abilities for children with CIs. Method Three groups of kindergartners (NH, hearing aid wearers, and CI users) were tested on speech recognition in quiet and noise and on tasks thought to underlie the abilities that fit into the domains of phonological awareness, general language, and cognitive skills. These last measures were used as predictor variables in regression analyses with speech-in-noise scores as dependent variables. Results Compared to children with NH, children with CIs did not perform as well on speech recognition in noise or on most other measures, including recognition in quiet. Two surprising results were that (a) noise effects were consistent across groups and (b) scores on other measures did not explain any group differences in speech recognition. Conclusions Limitations of implant processing take their primary toll on recognition in quiet and account for poor speech recognition and language/phonological deficits in children with CIs. Implications are that teachers/clinicians need to teach language/phonology directly and maximize signal-to-noise levels in the classroom. PMID:22744138
Simulation of talking faces in the human brain improves auditory speech recognition

PubMed Central

von Kriegstein, Katharina; Dogan, Özgür; Grüter, Martina; Giraud, Anne-Lise; Kell, Christian A.; Grüter, Thomas; Kleinschmidt, Andreas; Kiebel, Stefan J.

2008-01-01

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face. PMID:18436648
Adaptive spatial filtering improves speech reception in noise while preserving binaural cues.

PubMed

Bissmeyer, Susan R S; Goldsworthy, Raymond L

2017-09-01

Hearing loss greatly reduces an individual's ability to comprehend speech in the presence of background noise. Over the past decades, numerous signal-processing algorithms have been developed to improve speech reception in these situations for cochlear implant and hearing aid users. One challenge is to reduce background noise while not introducing interaural distortion that would degrade binaural hearing. The present study evaluates a noise reduction algorithm, referred to as binaural Fennec, that was designed to improve speech reception in background noise while preserving binaural cues. Speech reception thresholds were measured for normal-hearing listeners in a simulated environment with target speech generated in front of the listener and background noise originating 90° to the right of the listener. Lateralization thresholds were also measured in the presence of background noise. These measures were conducted in anechoic and reverberant environments. Results indicate that the algorithm improved speech reception thresholds, even in highly reverberant environments. Results indicate that the algorithm also improved lateralization thresholds for the anechoic environment while not affecting lateralization thresholds for the reverberant environments. These results provide clear evidence that this algorithm can improve speech reception in background noise while preserving binaural cues used to lateralize sound.
Measures for assessing architectural speech security (privacy) of closed offices and meeting rooms.

PubMed

Gover, Bradford N; Bradley, John S

2004-12-01

Objective measures were investigated as predictors of the speech security of closed offices and rooms. A new signal-to-noise type measure is shown to be a superior indicator for security than existing measures such as the Articulation Index, the Speech Intelligibility Index, the ratio of the loudness of speech to that of noise, and the A-weighted level difference of speech and noise. This new measure is a weighted sum of clipped one-third-octave-band signal-to-noise ratios; various weightings and clipping levels are explored. Listening tests had 19 subjects rate the audibility and intelligibility of 500 English sentences, filtered to simulate transmission through various wall constructions, and presented along with background noise. The results of the tests indicate that the new measure is highly correlated with sentence intelligibility scores and also with three security thresholds: the threshold of intelligibility (below which speech is unintelligible), the threshold of cadence (below which the cadence of speech is inaudible), and the threshold of audibility (below which speech is inaudible). The ratio of the loudness of speech to that of noise, and simple A-weighted level differences are both shown to be well correlated with these latter two thresholds (cadence and audibility), but not well correlated with intelligibility.
Bone-anchored Hearing Aids: correlation between pure-tone thresholds and outcome in three user groups.

PubMed

Pfiffner, Flurin; Kompis, Martin; Stieger, Christof

2009-10-01

To investigate correlations between preoperative hearing thresholds and postoperative aided thresholds and speech understanding of users of Bone-anchored Hearing Aids (BAHA). Such correlations may be useful to estimate the postoperative outcome with BAHA from preoperative data. Retrospective case review. Tertiary referral center. : Ninety-two adult unilaterally implanted BAHA users in 3 groups: (A) 24 subjects with a unilateral conductive hearing loss, (B) 38 subjects with a bilateral conductive hearing loss, and (C) 30 subjects with single-sided deafness. Preoperative air-conduction and bone-conduction thresholds and 3-month postoperative aided and unaided sound-field thresholds as well as speech understanding using German 2-digit numbers and monosyllabic words were measured and analyzed. Correlation between preoperative air-conduction and bone-conduction thresholds of the better and of the poorer ear and postoperative aided thresholds as well as correlations between gain in sound-field threshold and gain in speech understanding. Aided postoperative sound-field thresholds correlate best with BC threshold of the better ear (correlation coefficients, r2 = 0.237 to 0.419, p = 0.0006 to 0.0064, depending on the group of subjects). Improvements in sound-field threshold correspond to improvements in speech understanding. When estimating expected postoperative aided sound-field thresholds of BAHA users from preoperative hearing thresholds, the BC threshold of the better ear should be used. For the patient groups considered, speech understanding in quiet can be estimated from the improvement in sound-field thresholds.
Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes.

PubMed

Meyer, Bernd T; Brand, Thomas; Kollmeier, Birger

2011-01-01

The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, J.F.; Ng, L.C.

1998-03-17

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOEpatents

Holzrichter, John F.; Ng, Lawrence C.

1998-01-01

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.
Characteristics of speaking style and implications for speech recognition.

PubMed

Shinozaki, Takahiro; Ostendorf, Mari; Atlas, Les

2009-09-01

Differences in speaking style are associated with more or less spectral variability, as well as different modulation characteristics. The greater variation in some styles (e.g., spontaneous speech and infant-directed speech) poses challenges for recognition but possibly also opportunities for learning more robust models, as evidenced by prior work and motivated by child language acquisition studies. In order to investigate this possibility, this work proposes a new method for characterizing speaking style (the modulation spectrum), examines spontaneous, read, adult-directed, and infant-directed styles in this space, and conducts pilot experiments in style detection and sampling for improved speech recognizer training. Speaking style classification is improved by using the modulation spectrum in combination with standard pitch and energy variation. Speech recognition experiments on a small vocabulary conversational speech recognition task show that sampling methods for training with a small amount of data benefit from the new features.
How should a speech recognizer work?

PubMed

Scharenborg, Odette; Norris, Dennis; Bosch, Louis; McQueen, James M

2005-11-12

Although researchers studying human speech recognition (HSR) and automatic speech recognition (ASR) share a common interest in how information processing systems (human or machine) recognize spoken language, there is little communication between the two disciplines. We suggest that this lack of communication follows largely from the fact that research in these related fields has focused on the mechanics of how speech can be recognized. In Marr's (1982) terms, emphasis has been on the algorithmic and implementational levels rather than on the computational level. In this article, we provide a computational-level analysis of the task of speech recognition, which reveals the close parallels between research concerned with HSR and ASR. We illustrate this relation by presenting a new computational model of human spoken-word recognition, built using techniques from the field of ASR that, in contrast to current existing models of HSR, recognizes words from real speech input. 2005 Lawrence Erlbaum Associates, Inc.
Evaluation of Speech Recognition of Cochlear Implant Recipients Using Adaptive, Digital Remote Microphone Technology and a Speech Enhancement Sound Processing Algorithm.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin; Agrawal, Smita; Koch, Dawn

2015-05-01

Cochlear implant recipients often experience difficulty with understanding speech in the presence of noise. Cochlear implant manufacturers have developed sound processing algorithms designed to improve speech recognition in noise, and research has shown these technologies to be effective. Remote microphone technology utilizing adaptive, digital wireless radio transmission has also been shown to provide significant improvement in speech recognition in noise. There are no studies examining the potential improvement in speech recognition in noise when these two technologies are used simultaneously. The goal of this study was to evaluate the potential benefits and limitations associated with the simultaneous use of a sound processing algorithm designed to improve performance in noise (Advanced Bionics ClearVoice) and a remote microphone system that incorporates adaptive, digital wireless radio transmission (Phonak Roger). A two-by-two way repeated measures design was used to examine performance differences obtained without these technologies compared to the use of each technology separately as well as the simultaneous use of both technologies. Eleven Advanced Bionics (AB) cochlear implant recipients, ages 11 to 68 yr. AzBio sentence recognition was measured in quiet and in the presence of classroom noise ranging in level from 50 to 80 dBA in 5-dB steps. Performance was evaluated in four conditions: (1) No ClearVoice and no Roger, (2) ClearVoice enabled without the use of Roger, (3) ClearVoice disabled with Roger enabled, and (4) simultaneous use of ClearVoice and Roger. Speech recognition in quiet was better than speech recognition in noise for all conditions. Use of ClearVoice and Roger each provided significant improvement in speech recognition in noise. The best performance in noise was obtained with the simultaneous use of ClearVoice and Roger. ClearVoice and Roger technology each improves speech recognition in noise, particularly when used at the same time. Because ClearVoice does not degrade performance in quiet settings, clinicians should consider recommending ClearVoice for routine, full-time use for AB implant recipients. Roger should be used in all instances in which remote microphone technology may assist the user in understanding speech in the presence of noise. American Academy of Audiology.
Speech recognition: how good is good enough?

PubMed

Krohn, Richard

2002-03-01

Since its infancy in the early 1990s, the technology of speech recognition has undergone a rapid evolution. Not only has the reliability of the programming improved dramatically, the return on investment has become increasingly compelling. The author describes some of the latest health care applications of speech-recognition technology, and how the next advances will be made in this area.
Speech recognition systems on the Cell Broadband Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Y; Jones, H; Vaidya, S

In this paper we describe our design, implementation, and first results of a prototype connected-phoneme-based speech recognition system on the Cell Broadband Engine{trademark} (Cell/B.E.). Automatic speech recognition decodes speech samples into plain text (other representations are possible) and must process samples at real-time rates. Fortunately, the computational tasks involved in this pipeline are highly data-parallel and can receive significant hardware acceleration from vector-streaming architectures such as the Cell/B.E. Identifying and exploiting these parallelism opportunities is challenging, but also critical to improving system performance. We observed, from our initial performance timings, that a single Cell/B.E. processor can recognize speech from thousandsmore » of simultaneous voice channels in real time--a channel density that is orders-of-magnitude greater than the capacity of existing software speech recognizers based on CPUs (central processing units). This result emphasizes the potential for Cell/B.E.-based speech recognition and will likely lead to the future development of production speech systems using Cell/B.E. clusters.« less
Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

PubMed Central

Cao, Beiming; Kim, Myungjong; Mau, Ted; Wang, Jun

2017-01-01

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance. PMID:29423453
The Swedish Hayling task, and its relation to working memory, verbal ability, and speech-recognition-in-noise.

PubMed

Stenbäck, Victoria; Hällgren, Mathias; Lyxell, Björn; Larsby, Birgitta

2015-06-01

Cognitive functions and speech-recognition-in-noise were evaluated with a cognitive test battery, assessing response inhibition using the Hayling task, working memory capacity (WMC) and verbal information processing, and an auditory test of speech recognition. The cognitive tests were performed in silence whereas the speech recognition task was presented in noise. Thirty young normally-hearing individuals participated in the study. The aim of the study was to investigate one executive function, response inhibition, and whether it is related to individual working memory capacity (WMC), and how speech-recognition-in-noise relates to WMC and inhibitory control. The results showed a significant difference between initiation and response inhibition, suggesting that the Hayling task taps cognitive activity responsible for executive control. Our findings also suggest that high verbal ability was associated with better performance in the Hayling task. We also present findings suggesting that individuals who perform well on tasks involving response inhibition, and WMC, also perform well on a speech-in-noise task. Our findings indicate that capacity to resist semantic interference can be used to predict performance on speech-in-noise tasks. © 2015 Scandinavian Psychological Associations and John Wiley & Sons Ltd.
Contribution of auditory working memory to speech understanding in mandarin-speaking cochlear implant users.

PubMed

Tao, Duoduo; Deng, Rui; Jiang, Ye; Galvin, John J; Fu, Qian-Jie; Chen, Bing

2014-01-01

To investigate how auditory working memory relates to speech perception performance by Mandarin-speaking cochlear implant (CI) users. Auditory working memory and speech perception was measured in Mandarin-speaking CI and normal-hearing (NH) participants. Working memory capacity was measured using forward digit span and backward digit span; working memory efficiency was measured using articulation rate. Speech perception was assessed with: (a) word-in-sentence recognition in quiet, (b) word-in-sentence recognition in speech-shaped steady noise at +5 dB signal-to-noise ratio, (c) Chinese disyllable recognition in quiet, (d) Chinese lexical tone recognition in quiet. Self-reported school rank was also collected regarding performance in schoolwork. There was large inter-subject variability in auditory working memory and speech performance for CI participants. Working memory and speech performance were significantly poorer for CI than for NH participants. All three working memory measures were strongly correlated with each other for both CI and NH participants. Partial correlation analyses were performed on the CI data while controlling for demographic variables. Working memory efficiency was significantly correlated only with sentence recognition in quiet when working memory capacity was partialled out. Working memory capacity was correlated with disyllable recognition and school rank when efficiency was partialled out. There was no correlation between working memory and lexical tone recognition in the present CI participants. Mandarin-speaking CI users experience significant deficits in auditory working memory and speech performance compared with NH listeners. The present data suggest that auditory working memory may contribute to CI users' difficulties in speech understanding. The present pattern of results with Mandarin-speaking CI users is consistent with previous auditory working memory studies with English-speaking CI users, suggesting that the lexical importance of voice pitch cues (albeit poorly coded by the CI) did not influence the relationship between working memory and speech perception.

Implementation of the Intelligent Voice System for Kazakh

NASA Astrophysics Data System (ADS)

Yessenbayev, Zh; Saparkhojayev, N.; Tibeyev, T.

2014-04-01

Modern speech technologies are highly advanced and widely used in day-to-day applications. However, this is mostly concerned with the languages of well-developed countries such as English, German, Japan, Russian, etc. As for Kazakh, the situation is less prominent and research in this field is only starting to evolve. In this research and application-oriented project, we introduce an intelligent voice system for the fast deployment of call-centers and information desks supporting Kazakh speech. The demand on such a system is obvious if the country's large size and small population is considered. The landline and cell phones become the only means of communication for the distant villages and suburbs. The system features Kazakh speech recognition and synthesis modules as well as a web-GUI for efficient dialog management. For speech recognition we use CMU Sphinx engine and for speech synthesis- MaryTTS. The web-GUI is implemented in Java enabling operators to quickly create and manage the dialogs in user-friendly graphical environment. The call routines are handled by Asterisk PBX and JBoss Application Server. The system supports such technologies and protocols as VoIP, VoiceXML, FastAGI, Java SpeechAPI and J2EE. For the speech recognition experiments we compiled and used the first Kazakh speech corpus with the utterances from 169 native speakers. The performance of the speech recognizer is 4.1% WER on isolated word recognition and 6.9% WER on clean continuous speech recognition tasks. The speech synthesis experiments include the training of male and female voices.
Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J.F.; Ng, L.C.

The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less
Female voice communications in high level aircraft cockpit noises--part II: vocoder and automatic speech recognition systems.

PubMed

Nixon, C; Anderson, T; Morris, L; McCavitt, A; McKinley, R; Yeager, D; McDaniel, M

1998-11-01

The intelligibility of female and male speech is equivalent under most ordinary living conditions. However, due to small differences between their acoustic speech signals, called speech spectra, one can be more or less intelligible than the other in certain situations such as high levels of noise. Anecdotal information, supported by some empirical observations, suggests that some of the high intensity noise spectra of military aircraft cockpits may degrade the intelligibility of female speech more than that of male speech. In an applied research study, the intelligibility of female and male speech was measured in several high level aircraft cockpit noise conditions experienced in military aviation. In Part I, (Nixon CW, et al. Aviat Space Environ Med 1998; 69:675-83) female speech intelligibility measured in the spectra and levels of aircraft cockpit noises and with noise-canceling microphones was lower than that of the male speech in all conditions. However, the differences were small and only those at some of the highest noise levels were significant. Although speech intelligibility of both genders was acceptable during normal cruise noises, improvements are required in most of the highest levels of noise created during maximum aircraft operating conditions. These results are discussed in a Part I technical report. This Part II report examines the intelligibility in the same aircraft cockpit noises of vocoded female and male speech and the accuracy with which female and male speech in some of the cockpit noises were understood by automatic speech recognition systems. The intelligibility of vocoded female speech was generally the same as that of vocoded male speech. No significant differences were measured between the recognition accuracy of male and female speech by the automatic speech recognition systems. The intelligibility of female and male speech was equivalent for these conditions.
Functional connectivity between face-movement and speech-intelligibility areas during auditory-only speech perception.

PubMed

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers' voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker's face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas.
Recognition of Time-Compressed and Natural Speech with Selective Temporal Enhancements by Young and Elderly Listeners

ERIC Educational Resources Information Center

Gordon-Salant, Sandra; Fitzgibbons, Peter J.; Friedman, Sarah A.

2007-01-01

Purpose: The goal of this experiment was to determine whether selective slowing of speech segments improves recognition performance by young and elderly listeners. The hypotheses were (a) the benefits of time expansion occur for rapid speech but not for natural-rate speech, (b) selective time expansion of consonants produces greater score…
Hierarchical singleton-type recurrent neural fuzzy networks for noisy speech recognition.

PubMed

Juang, Chia-Feng; Chiou, Chyi-Tian; Lai, Chun-Lung

2007-05-01

This paper proposes noisy speech recognition using hierarchical singleton-type recurrent neural fuzzy networks (HSRNFNs). The proposed HSRNFN is a hierarchical connection of two singleton-type recurrent neural fuzzy networks (SRNFNs), where one is used for noise filtering and the other for recognition. The SRNFN is constructed by recurrent fuzzy if-then rules with fuzzy singletons in the consequences, and their recurrent properties make them suitable for processing speech patterns with temporal characteristics. In n words recognition, n SRNFNs are created for modeling n words, where each SRNFN receives the current frame feature and predicts the next one of its modeling word. The prediction error of each SRNFN is used as recognition criterion. In filtering, one SRNFN is created, and each SRNFN recognizer is connected to the same SRNFN filter, which filters noisy speech patterns in the feature domain before feeding them to the SRNFN recognizer. Experiments with Mandarin word recognition under different types of noise are performed. Other recognizers, including multilayer perceptron (MLP), time-delay neural networks (TDNNs), and hidden Markov models (HMMs), are also tested and compared. These experiments and comparisons demonstrate good results with HSRNFN for noisy speech recognition tasks.
Automatic speech recognition (ASR) based approach for speech therapy of aphasic patients: A review

NASA Astrophysics Data System (ADS)

Jamal, Norezmi; Shanta, Shahnoor; Mahmud, Farhanahani; Sha'abani, MNAH

2017-09-01

This paper reviews the state-of-the-art an automatic speech recognition (ASR) based approach for speech therapy of aphasic patients. Aphasia is a condition in which the affected person suffers from speech and language disorder resulting from a stroke or brain injury. Since there is a growing body of evidence indicating the possibility of improving the symptoms at an early stage, ASR based solutions are increasingly being researched for speech and language therapy. ASR is a technology that transfers human speech into transcript text by matching with the system's library. This is particularly useful in speech rehabilitation therapies as they provide accurate, real-time evaluation for speech input from an individual with speech disorder. ASR based approaches for speech therapy recognize the speech input from the aphasic patient and provide real-time feedback response to their mistakes. However, the accuracy of ASR is dependent on many factors such as, phoneme recognition, speech continuity, speaker and environmental differences as well as our depth of knowledge on human language understanding. Hence, the review examines recent development of ASR technologies and its performance for individuals with speech and language disorders.
Robot Command Interface Using an Audio-Visual Speech Recognition System

NASA Astrophysics Data System (ADS)

Ceballos, Alexánder; Gómez, Juan; Prieto, Flavio; Redarce, Tanneguy

In recent years audio-visual speech recognition has emerged as an active field of research thanks to advances in pattern recognition, signal processing and machine vision. Its ultimate goal is to allow human-computer communication using voice, taking into account the visual information contained in the audio-visual speech signal. This document presents a command's automatic recognition system using audio-visual information. The system is expected to control the laparoscopic robot da Vinci. The audio signal is treated using the Mel Frequency Cepstral Coefficients parametrization method. Besides, features based on the points that define the mouth's outer contour according to the MPEG-4 standard are used in order to extract the visual speech information.
Sentence intelligibility during segmental interruption and masking by speech-modulated noise: Effects of age and hearing loss

PubMed Central

Fogerty, Daniel; Ahlstrom, Jayne B.; Bologna, William J.; Dubno, Judy R.

2015-01-01

This study investigated how single-talker modulated noise impacts consonant and vowel cues to sentence intelligibility. Younger normal-hearing, older normal-hearing, and older hearing-impaired listeners completed speech recognition tests. All listeners received spectrally shaped speech matched to their individual audiometric thresholds to ensure sufficient audibility with the exception of a second younger listener group who received spectral shaping that matched the mean audiogram of the hearing-impaired listeners. Results demonstrated minimal declines in intelligibility for older listeners with normal hearing and more evident declines for older hearing-impaired listeners, possibly related to impaired temporal processing. A correlational analysis suggests a common underlying ability to process information during vowels that is predictive of speech-in-modulated noise abilities. Whereas, the ability to use consonant cues appears specific to the particular characteristics of the noise and interruption. Performance declines for older listeners were mostly confined to consonant conditions. Spectral shaping accounted for the primary contributions of audibility. However, comparison with the young spectral controls who received identical spectral shaping suggests that this procedure may reduce wideband temporal modulation cues due to frequency-specific amplification that affected high-frequency consonants more than low-frequency vowels. These spectral changes may impact speech intelligibility in certain modulation masking conditions. PMID:26093436
Recognition of Emotions in Mexican Spanish Speech: An Approach Based on Acoustic Modelling of Emotion-Specific Vowels

PubMed Central

Caballero-Morales, Santiago-Omar

2013-01-01

An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410
Enhancing speech recognition using improved particle swarm optimization based hidden Markov model.

PubMed

Selvaraj, Lokesh; Ganesan, Balakrishnan

2014-01-01

Enhancing speech recognition is the primary intention of this work. In this paper a novel speech recognition method based on vector quantization and improved particle swarm optimization (IPSO) is suggested. The suggested methodology contains four stages, namely, (i) denoising, (ii) feature mining (iii), vector quantization, and (iv) IPSO based hidden Markov model (HMM) technique (IP-HMM). At first, the speech signals are denoised using median filter. Next, characteristics such as peak, pitch spectrum, Mel frequency Cepstral coefficients (MFCC), mean, standard deviation, and minimum and maximum of the signal are extorted from the denoised signal. Following that, to accomplish the training process, the extracted characteristics are given to genetic algorithm based codebook generation in vector quantization. The initial populations are created by selecting random code vectors from the training set for the codebooks for the genetic algorithm process and IP-HMM helps in doing the recognition. At this point the creativeness will be done in terms of one of the genetic operation crossovers. The proposed speech recognition technique offers 97.14% accuracy.
Speaker recognition with temporal cues in acoustic and electric hearing

NASA Astrophysics Data System (ADS)

Vongphoe, Michael; Zeng, Fan-Gang

2005-08-01

Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
A speech-controlled environmental control system for people with severe dysarthria.

PubMed

Hawley, Mark S; Enderby, Pam; Green, Phil; Cunningham, Stuart; Brownsell, Simon; Carmichael, James; Parker, Mark; Hatzis, Athanassios; O'Neill, Peter; Palmer, Rebecca

2007-06-01

Automatic speech recognition (ASR) can provide a rapid means of controlling electronic assistive technology. Off-the-shelf ASR systems function poorly for users with severe dysarthria because of the increased variability of their articulations. We have developed a limited vocabulary speaker dependent speech recognition application which has greater tolerance to variability of speech, coupled with a computerised training package which assists dysarthric speakers to improve the consistency of their vocalisations and provides more data for recogniser training. These applications, and their implementation as the interface for a speech-controlled environmental control system (ECS), are described. The results of field trials to evaluate the training program and the speech-controlled ECS are presented. The user-training phase increased the recognition rate from 88.5% to 95.4% (p<0.001). Recognition rates were good for people with even the most severe dysarthria in everyday usage in the home (mean word recognition rate 86.9%). Speech-controlled ECS were less accurate (mean task completion accuracy 78.6% versus 94.8%) but were faster to use than switch-scanning systems, even taking into account the need to repeat unsuccessful operations (mean task completion time 7.7s versus 16.9s, p<0.001). It is concluded that a speech-controlled ECS is a viable alternative to switch-scanning systems for some people with severe dysarthria and would lead, in many cases, to more efficient control of the home.
The influence of age, hearing, and working memory on the speech comprehension benefit derived from an automatic speech recognition system.

PubMed

Zekveld, Adriana A; Kramer, Sophia E; Kessens, Judith M; Vlaming, Marcel S M G; Houtgast, Tammo

2009-04-01

The aim of the current study was to examine whether partly incorrect subtitles that are automatically generated by an Automatic Speech Recognition (ASR) system, improve speech comprehension by listeners with hearing impairment. In an earlier study (Zekveld et al. 2008), we showed that speech comprehension in noise by young listeners with normal hearing improves when presenting partly incorrect, automatically generated subtitles. The current study focused on the effects of age, hearing loss, visual working memory capacity, and linguistic skills on the benefit obtained from automatically generated subtitles during listening to speech in noise. In order to investigate the effects of age and hearing loss, three groups of participants were included: 22 young persons with normal hearing (YNH, mean age = 21 years), 22 middle-aged adults with normal hearing (MA-NH, mean age = 55 years) and 30 middle-aged adults with hearing impairment (MA-HI, mean age = 57 years). The benefit from automatic subtitling was measured by Speech Reception Threshold (SRT) tests (Plomp & Mimpen, 1979). Both unimodal auditory and bimodal audiovisual SRT tests were performed. In the audiovisual tests, the subtitles were presented simultaneously with the speech, whereas in the auditory test, only speech was presented. The difference between the auditory and audiovisual SRT was defined as the audiovisual benefit. Participants additionally rated the listening effort. We examined the influences of ASR accuracy level and text delay on the audiovisual benefit and the listening effort using a repeated measures General Linear Model analysis. In a correlation analysis, we evaluated the relationships between age, auditory SRT, visual working memory capacity and the audiovisual benefit and listening effort. The automatically generated subtitles improved speech comprehension in noise for all ASR accuracies and delays covered by the current study. Higher ASR accuracy levels resulted in more benefit obtained from the subtitles. Speech comprehension improved even for relatively low ASR accuracy levels; for example, participants obtained about 2 dB SNR audiovisual benefit for ASR accuracies around 74%. Delaying the presentation of the text reduced the benefit and increased the listening effort. Participants with relatively low unimodal speech comprehension obtained greater benefit from the subtitles than participants with better unimodal speech comprehension. We observed an age-related decline in the working-memory capacity of the listeners with normal hearing. A higher age and a lower working memory capacity were associated with increased effort required to use the subtitles to improve speech comprehension. Participants were able to use partly incorrect and delayed subtitles to increase their comprehension of speech in noise, regardless of age and hearing loss. This supports the further development and evaluation of an assistive listening system that displays automatically recognized speech to aid speech comprehension by listeners with hearing impairment.
Improving speech-in-noise recognition for children with hearing loss: Potential effects of language abilities, binaural summation, and head shadow

PubMed Central

Nittrouer, Susan; Caldwell-Tarr, Amanda; Tarr, Eric; Lowenstein, Joanna H.; Rice, Caitlin; Moberly, Aaron C.

2014-01-01

Objective: This study examined speech recognition in noise for children with hearing loss, compared it to recognition for children with normal hearing, and examined mechanisms that might explain variance in children’s abilities to recognize speech in noise. Design: Word recognition was measured in two levels of noise, both when the speech and noise were co-located in front and when the noise came separately from one side. Four mechanisms were examined as factors possibly explaining variance: vocabulary knowledge, sensitivity to phonological structure, binaural summation, and head shadow. Study sample: Participants were 113 eight-year-old children. Forty-eight had normal hearing (NH) and 65 had hearing loss: 18 with hearing aids (HAs), 19 with one cochlear implant (CI), and 28 with two CIs. Results: Phonological sensitivity explained a significant amount of between-groups variance in speech-in-noise recognition. Little evidence of binaural summation was found. Head shadow was similar in magnitude for children with NH and with CIs, regardless of whether they wore one or two CIs. Children with HAs showed reduced head shadow effects. Conclusion: These outcomes suggest that in order to improve speech-in-noise recognition for children with hearing loss, intervention needs to be comprehensive, focusing on both language abilities and auditory mechanisms. PMID:23834373
Auditory training of speech recognition with interrupted and continuous noise maskers by children with hearing impairment

PubMed Central

Sullivan, Jessica R.; Thibodeau, Linda M.; Assmann, Peter F.

2013-01-01

Previous studies have indicated that individuals with normal hearing (NH) experience a perceptual advantage for speech recognition in interrupted noise compared to continuous noise. In contrast, adults with hearing impairment (HI) and younger children with NH receive a minimal benefit. The objective of this investigation was to assess whether auditory training in interrupted noise would improve speech recognition in noise for children with HI and perhaps enhance their utilization of glimpsing skills. A partially-repeated measures design was used to evaluate the effectiveness of seven 1-h sessions of auditory training in interrupted and continuous noise. Speech recognition scores in interrupted and continuous noise were obtained from pre-, post-, and 3 months post-training from 24 children with moderate-to-severe hearing loss. Children who participated in auditory training in interrupted noise demonstrated a significantly greater improvement in speech recognition compared to those who trained in continuous noise. Those who trained in interrupted noise demonstrated similar improvements in both noise conditions while those who trained in continuous noise only showed modest improvements in the interrupted noise condition. This study presents direct evidence that auditory training in interrupted noise can be beneficial in improving speech recognition in noise for children with HI. PMID:23297921
Automatic Speech Recognition Predicts Speech Intelligibility and Comprehension for Listeners with Simulated Age-Related Hearing Loss

ERIC Educational Resources Information Center

Fontan, Lionel; Ferrané, Isabelle; Farinas, Jérôme; Pinquier, Julien; Tardieu, Julien; Magnen, Cynthia; Gaillard, Pascal; Aumont, Xavier; Füllgrabe, Christian

2017-01-01

Purpose: The purpose of this article is to assess speech processing for listeners with simulated age-related hearing loss (ARHL) and to investigate whether the observed performance can be replicated using an automatic speech recognition (ASR) system. The long-term goal of this research is to develop a system that will assist…
The Relationship between Binaural Benefit and Difference in Unilateral Speech Recognition Performance for Bilateral Cochlear Implant Users

PubMed Central

Yoon, Yang-soo; Li, Yongxin; Kang, Hou-Yong; Fu, Qian-Jie

2011-01-01

Objective The full benefit of bilateral cochlear implants may depend on the unilateral performance with each device, the speech materials, processing ability of the user, and/or the listening environment. In this study, bilateral and unilateral speech performances were evaluated in terms of recognition of phonemes and sentences presented in quiet or in noise. Design Speech recognition was measured for unilateral left, unilateral right, and bilateral listening conditions; speech and noise were presented at 0° azimuth. The “binaural benefit” was defined as the difference between bilateral performance and unilateral performance with the better ear. Study Sample 9 adults with bilateral cochlear implants participated. Results On average, results showed a greater binaural benefit in noise than in quiet for all speech tests. More importantly, the binaural benefit was greater when unilateral performance was similar across ears. As the difference in unilateral performance between ears increased, the binaural advantage decreased; this functional relationship was observed across the different speech materials and noise levels even though there was substantial intra- and inter-subject variability. Conclusions The results indicate that subjects who show symmetry in speech recognition performance between implanted ears in general show a large binaural benefit. PMID:21696329
Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.

PubMed

Polur, Prasad D; Miller, Gerald E

2006-10-01

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.
Effects of intelligibility on working memory demand for speech perception.

PubMed

Francis, Alexander L; Nusbaum, Howard C

2009-08-01

Understanding low-intelligibility speech is effortful. In three experiments, we examined the effects of intelligibility on working memory (WM) demands imposed by perception of synthetic speech. In all three experiments, a primary speeded word recognition task was paired with a secondary WM-load task designed to vary the availability of WM capacity during speech perception. Speech intelligibility was varied either by training listeners to use available acoustic cues in a more diagnostic manner (as in Experiment 1) or by providing listeners with more informative acoustic cues (i.e., better speech quality, as in Experiments 2 and 3). In the first experiment, training significantly improved intelligibility and recognition speed; increasing WM load significantly slowed recognition. A significant interaction between training and load indicated that the benefit of training on recognition speed was observed only under low memory load. In subsequent experiments, listeners received no training; intelligibility was manipulated by changing synthesizers. Improving intelligibility without training improved recognition accuracy, and increasing memory load still decreased it, but more intelligible speech did not produce more efficient use of available WM capacity. This suggests that perceptual learning modifies the way available capacity is used, perhaps by increasing the use of more phonetically informative features and/or by decreasing use of less informative ones.

Speech recognition: Acoustic phonetic and lexical knowledge representation

NASA Astrophysics Data System (ADS)

Zue, V. W.

1983-02-01

The purpose of this program is to develop a speech data base facility under which the acoustic characteristics of speech sounds in various contexts can be studied conveniently; investigate the phonological properties of a large lexicon of, say 10,000 words, and determine to what extent the phontactic constraints can be utilized in speech recognition; study the acoustic cues that are used to mark work boundaries; develop a test bed in the form of a large-vocabulary, IWR system to study the interactions of acoustic, phonetic and lexical knowledge; and develop a limited continuous speech recognition system with the goal of recognizing any English word from its spelling in order to assess the interactions of higher-level knowledge sources.
Ageing without hearing loss or cognitive impairment causes a decrease in speech intelligibility only in informational maskers.

PubMed

Rajan, R; Cainer, K E

2008-06-23

In most everyday settings, speech is heard in the presence of competing sounds and understanding speech requires skills in auditory streaming and segregation, followed by identification and recognition, of the attended signals. Ageing leads to difficulties in understanding speech in noisy backgrounds. In addition to age-related changes in hearing-related factors, cognitive factors also play a role but it is unclear to what extent these are generalized or modality-specific cognitive factors. We examined how ageing in normal-hearing decade age cohorts from 20 to 69 years affected discrimination of open-set speech in background noise. We used two types of sentences of similar structural and linguistic characteristics but different masking levels (i.e. differences in signal-to-noise ratios required for detection of sentences in a standard masker) so as to vary sentence demand, and two background maskers (one causing purely energetic masking effects and the other causing energetic and informational masking) to vary load conditions. There was a decline in performance (measured as speech reception thresholds for perception of sentences in noise) in the oldest cohort for both types of sentences, but only in the presence of the more demanding informational masker. We interpret these results to indicate a modality-specific decline in cognitive processing, likely a decrease in the ability to use acoustic and phonetic cues efficiently to segregate speech from background noise, in subjects aged >60.
Assessment of directionality performances: comparison between Freedom and CP810 sound processors.

PubMed

Razza, Sergio; Albanese, Greta; Ermoli, Lucilla; Zaccone, Monica; Cristofari, Eliana

2013-10-01

To compare speech recognition in noise for the Nucleus Freedom and CP810 sound processors using different directional settings among those available in the SmartSound portfolio. Single-subject, repeated measures study. Tertiary care referral center. Thirty-one monoaurally and binaurally implanted subjects (24 children and 7 adults) were enrolled. They were all experienced Nucleus Freedom sound processor users and achieved a 100% open set word recognition score in quiet listening conditions. Each patient was fitted with the Freedom and the CP810 processor. The program setting incorporated Adaptive Dynamic Range Optimization (ADRO) and adopted the directional algorithm BEAM (both devices) and ZOOM (only on CP810). Speech reception threshold (SRT) was assessed in a free-field layout, with disyllabic word list and interfering multilevel babble noise in the 3 different pre-processing configurations. On average, CP810 improved significantly patients' SRTs as compared to Freedom SP after 1 hour of use. Instead, no significant difference was observed in patients' SRT between the BEAM and the ZOOM algorithm fitted in the CP810 processor. The results suggest that hardware developments achieved in the design of CP810 allow an immediate and relevant directional advantage as compared to the previous-generation Freedom device.
Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity

NASA Astrophysics Data System (ADS)

Moses, David A.; Mesgarani, Nima; Leonard, Matthew K.; Chang, Edward F.

2016-10-01

Objective. The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. Approach. The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. Main results. The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. Significance. These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
Functional Connectivity between Face-Movement and Speech-Intelligibility Areas during Auditory-Only Speech Perception

PubMed Central

Schall, Sonja; von Kriegstein, Katharina

2014-01-01

It has been proposed that internal simulation of the talking face of visually-known speakers facilitates auditory speech recognition. One prediction of this view is that brain areas involved in auditory-only speech comprehension interact with visual face-movement sensitive areas, even under auditory-only listening conditions. Here, we test this hypothesis using connectivity analyses of functional magnetic resonance imaging (fMRI) data. Participants (17 normal participants, 17 developmental prosopagnosics) first learned six speakers via brief voice-face or voice-occupation training (<2 min/speaker). This was followed by an auditory-only speech recognition task and a control task (voice recognition) involving the learned speakers’ voices in the MRI scanner. As hypothesized, we found that, during speech recognition, familiarity with the speaker’s face increased the functional connectivity between the face-movement sensitive posterior superior temporal sulcus (STS) and an anterior STS region that supports auditory speech intelligibility. There was no difference between normal participants and prosopagnosics. This was expected because previous findings have shown that both groups use the face-movement sensitive STS to optimize auditory-only speech comprehension. Overall, the present findings indicate that learned visual information is integrated into the analysis of auditory-only speech and that this integration results from the interaction of task-relevant face-movement and auditory speech-sensitive areas. PMID:24466026
A preliminary comparison of speech recognition functionality in dental practice management systems.

PubMed

Irwin, Jeannie Y; Schleyer, Titus

2008-11-06

In this study, we examined speech recognition functionality in four leading dental practice management systems. Twenty dental students used voice to chart a simulated patient with 18 findings in each system. Results show it can take over a minute to chart one finding and that users frequently have to repeat commands. Limited functionality, poor usability and a high error rate appear to retard adoption of speech recognition in dentistry.
Two Stage Data Augmentation for Low Resourced Speech Recognition (Author’s Manuscript)

DTIC Science & Technology

2016-09-12

speech recognition, deep neural networks, data augmentation 1. Introduction When training data is limited—whether it be audio or text—the obvious...Schwartz, and S. Tsakalidis, “Enhancing low resource keyword spotting with au- tomatically retrieved web documents,” in Interspeech, 2015, pp. 839–843. [2...and F. Seide, “Feature learning in deep neural networks - a study on speech recognition tasks,” in International Conference on Learning Representations
Getting What You Want: Accurate Document Filtering in a Terabyte World

DTIC Science & Technology

2002-11-01

models are used widely in speech recognition and have shown promise for ad-hoc information retrieval (Ponte and Croft, 1998; Lafferty and Zhai, 2001...tasks is focused on developing techniques similar to those used in speech recognition. However the differing requirements of speech recognition and...Conference on Research and Development in Information Retrieval. ACM. 6. T.Ault, and Y. Yang. (2001.) kNN at TREC-9: A failure analysis. In
V2S: Voice to Sign Language Translation System for Malaysian Deaf People

NASA Astrophysics Data System (ADS)

Mean Foong, Oi; Low, Tang Jung; La, Wai Wan

The process of learning and understand the sign language may be cumbersome to some, and therefore, this paper proposes a solution to this problem by providing a voice (English Language) to sign language translation system using Speech and Image processing technique. Speech processing which includes Speech Recognition is the study of recognizing the words being spoken, regardless of whom the speaker is. This project uses template-based recognition as the main approach in which the V2S system first needs to be trained with speech pattern based on some generic spectral parameter set. These spectral parameter set will then be stored as template in a database. The system will perform the recognition process through matching the parameter set of the input speech with the stored templates to finally display the sign language in video format. Empirical results show that the system has 80.3% recognition rate.
Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

PubMed

Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

2014-11-15

Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.
Linguistic contributions to speech-on-speech masking for native and non-native listeners: language familiarity and semantic content.

PubMed

Brouwer, Susanne; Van Engen, Kristin J; Calandruccio, Lauren; Bradlow, Ann R

2012-02-01

This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener's knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. © 2012 Acoustical Society of America
Linguistic contributions to speech-on-speech masking for native and non-native listeners: Language familiarity and semantic content

PubMed Central

Brouwer, Susanne; Van Engen, Kristin J.; Calandruccio, Lauren; Bradlow, Ann R.

2012-01-01

This study examined whether speech-on-speech masking is sensitive to variation in the degree of similarity between the target and the masker speech. Three experiments investigated whether speech-in-speech recognition varies across different background speech languages (English vs Dutch) for both English and Dutch targets, as well as across variation in the semantic content of the background speech (meaningful vs semantically anomalous sentences), and across variation in listener status vis-à-vis the target and masker languages (native, non-native, or unfamiliar). The results showed that the more similar the target speech is to the masker speech (e.g., same vs different language, same vs different levels of semantic content), the greater the interference on speech recognition accuracy. Moreover, the listener’s knowledge of the target and the background language modulate the size of the release from masking. These factors had an especially strong effect on masking effectiveness in highly unfavorable listening conditions. Overall this research provided evidence that that the degree of target-masker similarity plays a significant role in speech-in-speech recognition. The results also give insight into how listeners assign their resources differently depending on whether they are listening to their first or second language. PMID:22352516
VALIDATION OF A CLINICAL ASSESSMENT OF SPECTRAL RIPPLE RESOLUTION FOR COCHLEAR-IMPLANT USERS

PubMed Central

Drennan, Ward. R.; Anderson, Elizabeth S.; Won, Jong Ho; Rubinstein, Jay T.

2013-01-01

Objectives Non-speech psychophysical tests of spectral resolution, such as the spectral-ripple discrimination task, have been shown to correlate with speech recognition performance in cochlear implant (CI) users (Henry et al., 2005; Won et al. 2007, 2011; Drennan et al. 2008; Anderson et al. 2011). However, these tests are best suited for use in the research laboratory setting and are impractical for clinical use. A test of spectral resolution that is quicker and could more easily be implemented in the clinical setting has been developed. The objectives of this study were 1) To determine if this new clinical ripple test would yield individual results equivalent to the longer, adaptive version of the ripple discrimination test; 2) To evaluate test-retest reliability for the clinical ripple measure; and 3) To examine the relationship between clinical ripple performance and monosyllabic word recognition in quiet for a group of CI listeners. Design Twenty-eight CI recipients participated in the study. Each subject was tested on both the adaptive and the clinical versions of spectral ripple discrimination, as well as CNC word recognition in quiet. The adaptive version of spectral ripple employed a 2-up, 1-down procedure for determining spectral ripple discrimination threshold. The clinical ripple test used a method of constant stimuli, with trials for each of 12 fixed ripple densities occurring six times in random order. Results from the clinical ripple test (proportion correct) were then compared to ripple discrimination thresholds (in ripples per octave) from the adaptive test. Results The clinical ripple test showed strong concurrent validity, evidenced by a good correlation between clinical ripple and adaptive ripple results (r=0.79), as well as a correlation with word recognition (r = 0.7). Excellent test-retest reliability was also demonstrated with a high test-retest correlation (r = 0.9). Conclusions The clinical ripple test is a reliable non-linguistic measure of spectral resolution, optimized for use with cochlear implant users in a clinical setting. The test might be useful as a diagnostic tool or as a possible surrogate outcome measure for evaluating treatment effects in hearing. PMID:24552679
Speech Recognition in Adults With Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging.

PubMed

Moberly, Aaron C; Harris, Michael S; Boyce, Lauren; Nittrouer, Susan

2017-04-14

Models of speech recognition suggest that "top-down" linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced years of hearing loss obtains a cochlear implant. Thirty adults with cochlear implants and 30 age-matched controls with age-normal hearing underwent testing of verbal working memory using digit span and serial recall of words. Phonological capacities were assessed using a lexical decision task and nonword repetition. Recognition of words in sentences in speech-shaped noise was measured. Implant users had only slightly poorer working memory accuracy than did controls and only on serial recall of words; however, phonological sensitivity was highly impaired. Working memory did not facilitate speech recognition in noise for either group. Phonological sensitivity predicted sentence recognition for implant users but not for listeners with normal hearing. Clinical speech recognition outcomes for adult implant users relate to the ability of these users to process phonological information. Results suggest that phonological capacities may serve as potential clinical targets through rehabilitative training. Such novel interventions may be particularly helpful for older adult implant users.
Speech Recognition in Adults With Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging

PubMed Central

Harris, Michael S.; Boyce, Lauren; Nittrouer, Susan

2017-01-01

Purpose Models of speech recognition suggest that “top-down” linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced years of hearing loss obtains a cochlear implant. Method Thirty adults with cochlear implants and 30 age-matched controls with age-normal hearing underwent testing of verbal working memory using digit span and serial recall of words. Phonological capacities were assessed using a lexical decision task and nonword repetition. Recognition of words in sentences in speech-shaped noise was measured. Results Implant users had only slightly poorer working memory accuracy than did controls and only on serial recall of words; however, phonological sensitivity was highly impaired. Working memory did not facilitate speech recognition in noise for either group. Phonological sensitivity predicted sentence recognition for implant users but not for listeners with normal hearing. Conclusion Clinical speech recognition outcomes for adult implant users relate to the ability of these users to process phonological information. Results suggest that phonological capacities may serve as potential clinical targets through rehabilitative training. Such novel interventions may be particularly helpful for older adult implant users. PMID:28384805
Effects of Semantic Context and Fundamental Frequency Contours on Mandarin Speech Recognition by Second Language Learners.

PubMed

Zhang, Linjun; Li, Yu; Wu, Han; Li, Xin; Shu, Hua; Zhang, Yang; Li, Ping

2016-01-01

Speech recognition by second language (L2) learners in optimal and suboptimal conditions has been examined extensively with English as the target language in most previous studies. This study extended existing experimental protocols (Wang et al., 2013) to investigate Mandarin speech recognition by Japanese learners of Mandarin at two different levels (elementary vs. intermediate) of proficiency. The overall results showed that in addition to L2 proficiency, semantic context, F0 contours, and listening condition all affected the recognition performance on the Mandarin sentences. However, the effects of semantic context and F0 contours on L2 speech recognition diverged to some extent. Specifically, there was significant modulation effect of listening condition on semantic context, indicating that L2 learners made use of semantic context less efficiently in the interfering background than in quiet. In contrast, no significant modulation effect of listening condition on F0 contours was found. Furthermore, there was significant interaction between semantic context and F0 contours, indicating that semantic context becomes more important for L2 speech recognition when F0 information is degraded. None of these effects were found to be modulated by L2 proficiency. The discrepancy in the effects of semantic context and F0 contours on L2 speech recognition in the interfering background might be related to differences in processing capacities required by the two types of information in adverse listening conditions.
Auditory detection of non-speech and speech stimuli in noise: Effects of listeners' native language background.

PubMed

Liu, Chang; Jin, Su-Hyun

2015-11-01

This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals.

PubMed

Wen, Guihua; Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

2017-01-01

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.
Random Deep Belief Networks for Recognizing Emotions from Speech Signals

PubMed Central

Li, Huihui; Huang, Jubing; Li, Danyang; Xun, Eryang

2017-01-01

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition. PMID:28356908
Speech recognition for embedded automatic positioner for laparoscope

NASA Astrophysics Data System (ADS)

Chen, Xiaodong; Yin, Qingyun; Wang, Yi; Yu, Daoyin

2014-07-01

In this paper a novel speech recognition methodology based on Hidden Markov Model (HMM) is proposed for embedded Automatic Positioner for Laparoscope (APL), which includes a fixed point ARM processor as the core. The APL system is designed to assist the doctor in laparoscopic surgery, by implementing the specific doctor's vocal control to the laparoscope. Real-time respond to the voice commands asks for more efficient speech recognition algorithm for the APL. In order to reduce computation cost without significant loss in recognition accuracy, both arithmetic and algorithmic optimizations are applied in the method presented. First, depending on arithmetic optimizations most, a fixed point frontend for speech feature analysis is built according to the ARM processor's character. Then the fast likelihood computation algorithm is used to reduce computational complexity of the HMM-based recognition algorithm. The experimental results show that, the method shortens the recognition time within 0.5s, while the accuracy higher than 99%, demonstrating its ability to achieve real-time vocal control to the APL.

Investigation of potential cognitive tests for use with older adults in audiology clinics.

PubMed

Vaughan, Nancy; Storzbach, Daniel; Furukawa, Izumi

2008-01-01

Cognitive declines in working memory and processing speed are hallmarks of aging. Deficits in speech understanding also are seen in aging individuals. A clinical test to determine whether the cognitive aging changes contribute to aging speech understanding difficulties would be helpful for determining rehabilitation strategies in audiology clinics. To identify a clinical neurocognitive test or battery of tests that could be used in audiology clinics to help explain deficits in speech recognition in some older listeners. A correlational study examining the association between certain cognitive test scores and speech recognition performance. Speeded (time-compressed) speech was used to increase the cognitive processing load. Two hundred twenty-five adults aged 50 through 75 years were participants in this study. Both batteries of tests were administered to all participants in two separate sessions. A selected battery of neurocognitive tests and a time-compressed speech recognition test battery using various rates of speech were administered. Principal component analysis was used to extract the important component factors from each set of tests, and regression models were constructed to examine the association between tests and to identify the neurocognitive test most strongly associated with speech recognition performance. A sequencing working memory test (Letter-Number Sequencing [LNS]) was most strongly associated with rapid speech understanding. The association between the LNS test results and the compressed sentence recognition scores (CSRS) was strong even when age and hearing loss were controlled. The LNS is a sequencing test that provides information about temporal processing at the cognitive level and may prove useful in diagnosis of speech understanding problems, and in the development of aural rehabilitation and training strategies.
Recognition of voice commands using adaptation of foreign language speech recognizer via selection of phonetic transcriptions

NASA Astrophysics Data System (ADS)

Maskeliunas, Rytis; Rudzionis, Vytautas

2011-06-01

In recent years various commercial speech recognizers have become available. These recognizers provide the possibility to develop applications incorporating various speech recognition techniques easily and quickly. All of these commercial recognizers are typically targeted to widely spoken languages having large market potential; however, it may be possible to adapt available commercial recognizers for use in environments where less widely spoken languages are used. Since most commercial recognition engines are closed systems the single avenue for the adaptation is to try set ways for the selection of proper phonetic transcription methods between the two languages. This paper deals with the methods to find the phonetic transcriptions for Lithuanian voice commands to be recognized using English speech engines. The experimental evaluation showed that it is possible to find phonetic transcriptions that will enable the recognition of Lithuanian voice commands with recognition accuracy of over 90%.
Relating dynamic brain states to dynamic machine states: Human and machine solutions to the speech recognition problem

PubMed Central

Liu, Xunying; Zhang, Chao; Woodland, Phil; Fonteneau, Elisabeth

2017-01-01

There is widespread interest in the relationship between the neurobiological systems supporting human cognition and emerging computational systems capable of emulating these capacities. Human speech comprehension, poorly understood as a neurobiological process, is an important case in point. Automatic Speech Recognition (ASR) systems with near-human levels of performance are now available, which provide a computationally explicit solution for the recognition of words in continuous speech. This research aims to bridge the gap between speech recognition processes in humans and machines, using novel multivariate techniques to compare incremental ‘machine states’, generated as the ASR analysis progresses over time, to the incremental ‘brain states’, measured using combined electro- and magneto-encephalography (EMEG), generated as the same inputs are heard by human listeners. This direct comparison of dynamic human and machine internal states, as they respond to the same incrementally delivered sensory input, revealed a significant correspondence between neural response patterns in human superior temporal cortex and the structural properties of ASR-derived phonetic models. Spatially coherent patches in human temporal cortex responded selectively to individual phonetic features defined on the basis of machine-extracted regularities in the speech to lexicon mapping process. These results demonstrate the feasibility of relating human and ASR solutions to the problem of speech recognition, and suggest the potential for further studies relating complex neural computations in human speech comprehension to the rapidly evolving ASR systems that address the same problem domain. PMID:28945744
Speech Recognition for Medical Dictation: Overview in Quebec and Systematic Review.

PubMed

Poder, Thomas G; Fisette, Jean-François; Déry, Véronique

2018-04-03

Speech recognition is increasingly used in medical reporting. The aim of this article is to identify in the literature the strengths and weaknesses of this technology, as well as barriers to and facilitators of its implementation. A systematic review of systematic reviews was performed using PubMed, Scopus, the Cochrane Library and the Center for Reviews and Dissemination through August 2017. The gray literature has also been consulted. The quality of systematic reviews has been assessed with the AMSTAR checklist. The main inclusion criterion was use of speech recognition for medical reporting (front-end or back-end). A survey has also been conducted in Quebec, Canada, to identify the dissemination of this technology in this province, as well as the factors leading to the success or failure of its implementation. Five systematic reviews were identified. These reviews indicated a high level of heterogeneity across studies. The quality of the studies reported was generally poor. Speech recognition is not as accurate as human transcription, but it can dramatically reduce turnaround times for reporting. In front-end use, medical doctors need to spend more time on dictation and correction than required with human transcription. With speech recognition, major errors occur up to three times more frequently. In back-end use, a potential increase in productivity of transcriptionists was noted. In conclusion, speech recognition offers several advantages for medical reporting. However, these advantages are countered by an increased burden on medical doctors and by risks of additional errors in medical reports. It is also hard to identify for which medical specialties and which clinical activities the use of speech recognition will be the most beneficial.
Strategies for distant speech recognitionin reverberant environments

NASA Astrophysics Data System (ADS)

Delcroix, Marc; Yoshioka, Takuya; Ogawa, Atsunori; Kubo, Yotaro; Fujimoto, Masakiyo; Ito, Nobutaka; Kinoshita, Keisuke; Espi, Miquel; Araki, Shoko; Hori, Takaaki; Nakatani, Tomohiro

2015-12-01

Reverberation and noise are known to severely affect the automatic speech recognition (ASR) performance of speech recorded by distant microphones. Therefore, we must deal with reverberation if we are to realize high-performance hands-free speech recognition. In this paper, we review a recognition system that we developed at our laboratory to deal with reverberant speech. The system consists of a speech enhancement (SE) front-end that employs long-term linear prediction-based dereverberation followed by noise reduction. We combine our SE front-end with an ASR back-end that uses neural networks for acoustic and language modeling. The proposed system achieved top scores on the ASR task of the REVERB challenge. This paper describes the different technologies used in our system and presents detailed experimental results that justify our implementation choices and may provide hints for designing distant ASR systems.
Optimal pattern synthesis for speech recognition based on principal component analysis

NASA Astrophysics Data System (ADS)

Korsun, O. N.; Poliyev, A. V.

2018-02-01

The algorithm for building an optimal pattern for the purpose of automatic speech recognition, which increases the probability of correct recognition, is developed and presented in this work. The optimal pattern forming is based on the decomposition of an initial pattern to principal components, which enables to reduce the dimension of multi-parameter optimization problem. At the next step the training samples are introduced and the optimal estimates for principal components decomposition coefficients are obtained by a numeric parameter optimization algorithm. Finally, we consider the experiment results that show the improvement in speech recognition introduced by the proposed optimization algorithm.
The effect of guessing on the speech reception thresholds of children.

PubMed

Moodley, A

1990-01-01

Speech audiometry is an essential part of the assessment of hearing impaired children and it is now widely used throughout the United Kingdom. Although instructions are universally agreed upon as an important aspect in the administration of any form of audiometric testing, there has been little, if any, research towards evaluating the influence which instructions that are given to a listener have on the Speech Reception Threshold obtained. This study attempts to evaluate what effect guessing has on the Speech Reception Threshold of children. A sample of 30 secondary school pupils between 16 and 18 years of age with normal hearing was used in the study. It is argued that the type of instruction normally used for Speech Reception Threshold in audiometric testing may not provide a sufficient amount of control for guessing and the implications of this, using data obtained in the study, are examined.
Age and measurement time-of-day effects on speech recognition in noise.

PubMed

Veneman, Carrie E; Gordon-Salant, Sandra; Matthews, Lois J; Dubno, Judy R

2013-01-01

The purpose of this study was to determine the effect of measurement time of day on speech recognition in noise and the extent to which time-of-day effects differ with age. Older adults tend to have more difficulty understanding speech in noise than younger adults, even when hearing is normal. Two possible contributors to this age difference in speech recognition may be measurement time of day and inhibition. Most younger adults are "evening-type," showing peak circadian arousal in the evening, whereas most older adults are "morning-type," with circadian arousal peaking in the morning. Tasks that require inhibition of irrelevant information have been shown to be affected by measurement time of day, with maximum performance attained at one's peak time of day. The authors hypothesized that a change in inhibition will be associated with measurement time of day and therefore affect speech recognition in noise, with better performance in the morning for older adults and in the evening for younger adults. Fifteen younger evening-type adults (20-28 years) and 15 older morning-type adults with normal hearing (66-78 years) listened to the Hearing in Noise Test (HINT) and the Quick Speech in Noise (QuickSIN) test in the morning and evening (peak and off-peak times). Time of day preference was assessed using the Morningness-Eveningness Questionnaire. Sentences and noise were presented binaurally through insert earphones. During morning and evening sessions, participants solved word-association problems within the visual-distraction task (VDT), which was used as an estimate of inhibition. After each session, participants rated perceived mental demand of the tasks using a revised version of the NASA Task Load Index. Younger adults performed significantly better on the speech-in-noise tasks and rated themselves as requiring significantly less mental demand when tested at their peak (evening) than off-peak (morning) time of day. In contrast, time-of-day effects were not observed for the older adults on the speech recognition or rating tasks. Although older adults required significantly more advantageous signal-to-noise ratios than younger adults for equivalent speech-recognition performance, a significantly larger younger versus older age difference in speech recognition was observed in the evening than in the morning. Older adults performed significantly poorer than younger adults on the VDT, but performance was not affected by measurement time of day. VDT performance for misleading distracter items was significantly correlated with HINT and QuickSIN test performance at the peak measurement time of day. Although all participants had normal hearing, speech recognition in noise was significantly poorer for older than younger adults, with larger age-related differences in the evening (an off-peak time for older adults) than in the morning. The significant effect of measurement time of day suggests that this factor may impact the clinical assessment of speech recognition in noise for all individuals. It appears that inhibition, as estimated by a visual distraction task for misleading visual items, is a cognitive mechanism that is related to speech-recognition performance in noise, at least at a listener's peak time of day.
Amplification of transcutaneous and percutaneous bone-conduction devices with a test-band in an induced model of conductive hearing loss.

PubMed

Park, Marn Joon; Lee, Jae Ryung; Yang, Chan Joo; Yoo, Myung Hoon; Jin, In Suk; Choi, Chi Ho; Park, Hong Ju

2016-11-01

Transcutaneous devices have a disadvantage, the dampening effect by soft tissue between the bone and devices. We investigated hearing outcomes with percutaneous and transcutaneous devices using test-bands in an induced unilateral conductive hearing loss. Comparison of hearing outcomes of two devices in the same individuals. The right ear was plugged in 30 subjects and a test-band with devices (Cochlear™ Baha® BP110 Power and Sophono® Alpha-2 MPO™) was applied on the right mastoid tip with the left ear masked. Sound-field thresholds, speech recognition thresholds (SRTs), and word recognition scores (WRSs) were compared. Aided thresholds of Sophono were significantly better than those of Baha at most frequencies. Sophono WRSs (86 ± 12%) at 40 dB SPL and SRTs (14 ± 5 dB HL) were significantly better than those (73 ± 24% and 23 ± 8 dB HL) of Baha. However, Sophono WRSs (98 ± 3%) at 60 dB SPL did not differ from Baha WRSs (95 ± 12%). Amplifications of the current transcutaneous device were not inferior to those of percutaneous devices with a test-band in subjects with normal bone-conduction thresholds. Since the percutaneous devices can increase the gain when fixed to the skull by eliminating the dampening effect, both devices are expected to provide sufficient hearing amplification.
Multitasking During Degraded Speech Recognition in School-Age Children

PubMed Central

Ward, Kristina M.; Brehm, Laurel

2017-01-01

Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children’s multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children’s accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children’s dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children’s proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition. PMID:28105890
Multitasking During Degraded Speech Recognition in School-Age Children.

PubMed

Grieco-Calub, Tina M; Ward, Kristina M; Brehm, Laurel

2017-01-01

Multitasking requires individuals to allocate their cognitive resources across different tasks. The purpose of the current study was to assess school-age children's multitasking abilities during degraded speech recognition. Children (8 to 12 years old) completed a dual-task paradigm including a sentence recognition (primary) task containing speech that was either unprocessed or noise-band vocoded with 8, 6, or 4 spectral channels and a visual monitoring (secondary) task. Children's accuracy and reaction time on the visual monitoring task was quantified during the dual-task paradigm in each condition of the primary task and compared with single-task performance. Children experienced dual-task costs in the 6- and 4-channel conditions of the primary speech recognition task with decreased accuracy on the visual monitoring task relative to baseline performance. In all conditions, children's dual-task performance on the visual monitoring task was strongly predicted by their single-task (baseline) performance on the task. Results suggest that children's proficiency with the secondary task contributes to the magnitude of dual-task costs while multitasking during degraded speech recognition.
Processing Electromyographic Signals to Recognize Words

NASA Technical Reports Server (NTRS)

Jorgensen, C. C.; Lee, D. D.

2009-01-01

A recently invented speech-recognition method applies to words that are articulated by means of the tongue and throat muscles but are otherwise not voiced or, at most, are spoken sotto voce. This method could satisfy a need for speech recognition under circumstances in which normal audible speech is difficult, poses a hazard, is disturbing to listeners, or compromises privacy. The method could also be used to augment traditional speech recognition by providing an additional source of information about articulator activity. The method can be characterized as intermediate between (1) conventional speech recognition through processing of voice sounds and (2) a method, not yet developed, of processing electroencephalographic signals to extract unspoken words directly from thoughts. This method involves computational processing of digitized electromyographic (EMG) signals from muscle innervation acquired by surface electrodes under a subject's chin near the tongue and on the side of the subject s throat near the larynx. After preprocessing, digitization, and feature extraction, EMG signals are processed by a neural-network pattern classifier, implemented in software, that performs the bulk of the recognition task as described.
Relations Between Self-reported Executive Functioning and Speech Perception Skills in Adult Cochlear Implant Users.

PubMed

Moberly, Aaron C; Patel, Tirth R; Castellanos, Irina

2018-02-01

As a result of their hearing loss, adults with cochlear implants (CIs) would self-report poorer executive functioning (EF) skills than normal-hearing (NH) peers, and these EF skills would be associated with performance on speech recognition tasks. EF refers to a group of high order neurocognitive skills responsible for behavioral and emotional regulation during goal-directed activity, and EF has been found to be poorer in children with CIs than their NH age-matched peers. Moreover, there is increasing evidence that neurocognitive skills, including some EF skills, contribute to the ability to recognize speech through a CI. Thirty postlingually deafened adults with CIs and 42 age-matched NH adults were enrolled. Participants and their spouses or significant others (informants) completed well-validated self-reports or informant-reports of EF, the Behavior Rating Inventory of Executive Function - Adult (BRIEF-A). CI users' speech recognition skills were assessed in quiet using several measures of sentence recognition. NH peers were tested for recognition of noise-vocoded versions of the same speech stimuli. CI users self-reported difficulty on EF tasks of shifting and task monitoring. In CI users, measures of speech recognition correlated with several self-reported EF skills. The present findings provide further evidence that neurocognitive factors, including specific EF skills, may decline in association with hearing loss, and that some of these EF skills contribute to speech processing under degraded listening conditions.
Spontaneous Speech Collection for the CSR Corpus

DTIC Science & Technology

1992-01-01

Menlo Park, California 94025 1. ABSTRACT As part of a pilot data collection for DARPA’s Continuous Speech Recognition ( CSR ) speech corpus, SRI...International experi- mented with the collection of spontaneous speeoh material. The bulk of the CSR pilot data was read versions of news articles from...variable. 2. INTRODUCTION The CSR (Continuous Speech Recognition) Corpus collec- tion can be considered the successor to the Resource Man- agemen t
Eyes and ears: Using eye tracking and pupillometry to understand challenges to speech recognition.

PubMed

Van Engen, Kristin J; McLaughlin, Drew J

2018-05-04

Although human speech recognition is often experienced as relatively effortless, a number of common challenges can render the task more difficult. Such challenges may originate in talkers (e.g., unfamiliar accents, varying speech styles), the environment (e.g. noise), or in listeners themselves (e.g., hearing loss, aging, different native language backgrounds). Each of these challenges can reduce the intelligibility of spoken language, but even when intelligibility remains high, they can place greater processing demands on listeners. Noisy conditions, for example, can lead to poorer recall for speech, even when it has been correctly understood. Speech intelligibility measures, memory tasks, and subjective reports of listener difficulty all provide critical information about the effects of such challenges on speech recognition. Eye tracking and pupillometry complement these methods by providing objective physiological measures of online cognitive processing during listening. Eye tracking records the moment-to-moment direction of listeners' visual attention, which is closely time-locked to unfolding speech signals, and pupillometry measures the moment-to-moment size of listeners' pupils, which dilate in response to increased cognitive load. In this paper, we review the uses of these two methods for studying challenges to speech recognition. Copyright © 2018. Published by Elsevier B.V.
Robust recognition of loud and Lombard speech in the fighter cockpit environment

NASA Astrophysics Data System (ADS)

Stanton, Bill J., Jr.

1988-08-01

There are a number of challenges associated with incorporating speech recognition technology into the fighter cockpit. One of the major problems is the wide range of variability in the pilot's voice. That can result from changing levels of stress and workload. Increasing the training set to include abnormal speech is not an attractive option because of the innumerable conditions that would have to be represented and the inordinate amount of time to collect such a training set. A more promising approach is to study subsets of abnormal speech that have been produced under controlled cockpit conditions with the purpose of characterizing reliable shifts that occur relative to normal speech. Such was the initiative of this research. Analyses were conducted for 18 features on 17671 phoneme tokens across eight speakers for normal, loud, and Lombard speech. It was discovered that there was a consistent migration of energy in the sonorants. This discovery of reliable energy shifts led to the development of a method to reduce or eliminate these shifts in the Euclidean distances between LPC log magnitude spectra. This combination significantly improved recognition performance of loud and Lombard speech. Discrepancies in recognition error rates between normal and abnormal speech were reduced by approximately 50 percent for all eight speakers combined.
Robust relationship between reading span and speech recognition in noise

PubMed Central

Souza, Pamela; Arehart, Kathryn

2015-01-01

Objective Working memory refers to a cognitive system that manages information processing and temporary storage. Recent work has demonstrated that individual differences in working memory capacity measured using a reading span task are related to ability to recognize speech in noise. In this project, we investigated whether the specific implementation of the reading span task influenced the strength of the relationship between working memory capacity and speech recognition. Design The relationship between speech recognition and working memory capacity was examined for two different working memory tests that varied in approach, using a within-subject design. Data consisted of audiometric results along with the two different working memory tests; one speech-in-noise test; and a reading comprehension test. Study sample The test group included 94 older adults with varying hearing loss and 30 younger adults with normal hearing. Results Listeners with poorer working memory capacity had more difficulty understanding speech in noise after accounting for age and degree of hearing loss. That relationship did not differ significantly between the two different implementations of reading span. Conclusions Our findings suggest that different implementations of a verbal reading span task do not affect the strength of the relationship between working memory capacity and speech recognition. PMID:25975360
Robust relationship between reading span and speech recognition in noise.

PubMed

Souza, Pamela; Arehart, Kathryn

2015-01-01

Working memory refers to a cognitive system that manages information processing and temporary storage. Recent work has demonstrated that individual differences in working memory capacity measured using a reading span task are related to ability to recognize speech in noise. In this project, we investigated whether the specific implementation of the reading span task influenced the strength of the relationship between working memory capacity and speech recognition. The relationship between speech recognition and working memory capacity was examined for two different working memory tests that varied in approach, using a within-subject design. Data consisted of audiometric results along with the two different working memory tests; one speech-in-noise test; and a reading comprehension test. The test group included 94 older adults with varying hearing loss and 30 younger adults with normal hearing. Listeners with poorer working memory capacity had more difficulty understanding speech in noise after accounting for age and degree of hearing loss. That relationship did not differ significantly between the two different implementations of reading span. Our findings suggest that different implementations of a verbal reading span task do not affect the strength of the relationship between working memory capacity and speech recognition.
Automatic lip reading by using multimodal visual features

NASA Astrophysics Data System (ADS)

Takahashi, Shohei; Ohya, Jun

2013-12-01

Since long time ago, speech recognition has been researched, though it does not work well in noisy places such as in the car or in the train. In addition, people with hearing-impaired or difficulties in hearing cannot receive benefits from speech recognition. To recognize the speech automatically, visual information is also important. People understand speeches from not only audio information, but also visual information such as temporal changes in the lip shape. A vision based speech recognition method could work well in noisy places, and could be useful also for people with hearing disabilities. In this paper, we propose an automatic lip-reading method for recognizing the speech by using multimodal visual information without using any audio information such as speech recognition. First, the ASM (Active Shape Model) is used to track and detect the face and lip in a video sequence. Second, the shape, optical flow and spatial frequencies of the lip features are extracted from the lip detected by ASM. Next, the extracted multimodal features are ordered chronologically so that Support Vector Machine is performed in order to learn and classify the spoken words. Experiments for classifying several words show promising results of this proposed method.
Influence of signal processing strategy in auditory abilities.

PubMed

Melo, Tatiana Mendes de; Bevilacqua, Maria Cecília; Costa, Orozimbo Alves; Moret, Adriane Lima Mortari

2013-01-01

The signal processing strategy is a parameter that may influence the auditory performance of cochlear implant and is important to optimize this parameter to provide better speech perception, especially in difficult listening situations. To evaluate the individual's auditory performance using two different signal processing strategy. Prospective study with 11 prelingually deafened children with open-set speech recognition. A within-subjects design was used to compare performance with standard HiRes and HiRes 120 in three different moments. During test sessions, subject's performance was evaluated by warble-tone sound-field thresholds, speech perception evaluation, in quiet and in noise. In the silence, children S1, S4, S5, S7 showed better performance with the HiRes 120 strategy and children S2, S9, S11 showed better performance with the HiRes strategy. In the noise was also observed that some children performed better using the HiRes 120 strategy and other with HiRes. Not all children presented the same pattern of response to the different strategies used in this study, which reinforces the need to look at optimizing cochlear implant clinical programming.

[Perception of emotional intonation of noisy speech signal with different acoustic parameters by adults of different age and gender].

PubMed

Dmitrieva, E S; Gel'man, V Ia

2011-01-01

The listener-distinctive features of recognition of different emotional intonations (positive, negative and neutral) of male and female speakers in the presence or absence of background noise were studied in 49 adults aged 20-79 years. In all the listeners noise produced the most pronounced decrease in recognition accuracy for positive emotional intonation ("joy") as compared to other intonations, whereas it did not influence the recognition accuracy of "anger" in 65-79-year-old listeners. The higher emotion recognition rates of a noisy signal were observed for speech emotional intonations expressed by female speakers. Acoustic characteristics of noisy and clear speech signals underlying perception of speech emotional prosody were found for adult listeners of different age and gender.
Evaluating deep learning architectures for Speech Emotion Recognition.

PubMed

Fayek, Haytham M; Lech, Margaret; Cavedon, Lawrence

2017-08-01

Speech Emotion Recognition (SER) can be regarded as a static or dynamic classification problem, which makes SER an excellent test bed for investigating and comparing various deep learning architectures. We describe a frame-based formulation to SER that relies on minimal speech processing and end-to-end deep learning to model intra-utterance dynamics. We use the proposed SER system to empirically explore feed-forward and recurrent neural network architectures and their variants. Experiments conducted illuminate the advantages and limitations of these architectures in paralinguistic speech recognition and emotion recognition in particular. As a result of our exploration, we report state-of-the-art results on the IEMOCAP database for speaker-independent SER and present quantitative and qualitative assessments of the models' performances. Copyright © 2017 Elsevier Ltd. All rights reserved.
Spatial Release from Masking in Children: Effects of Simulated Unilateral Hearing Loss

PubMed Central

Corbin, Nicole E.; Buss, Emily; Leibold, Lori J.

2016-01-01

Objectives The purpose of this study was twofold: 1) to determine the effect of an acute simulated unilateral hearing loss on children’s spatial release from masking in two-talker speech and speech-shaped noise, and 2) to develop a procedure to be used in future studies that will assess spatial release from masking in children who have permanent unilateral hearing loss. There were three main predictions. First, spatial release from masking was expected to be larger in two-talker speech than speech-shaped noise. Second, simulated unilateral hearing loss was expected to worsen performance in all listening conditions, but particularly in the spatially separated two-talker speech masker. Third, spatial release from masking was expected to be smaller for children than for adults in the two-talker masker. Design Participants were 12 children (8.7 to 10.9 yrs) and 11 adults (18.5 to 30.4 yrs) with normal bilateral hearing. Thresholds for 50%-correct recognition of Bamford-Kowal-Bench sentences were measured adaptively in continuous two-talker speech or speech-shaped noise. Target sentences were always presented from a loudspeaker at 0° azimuth. The masker stimulus was either co-located with the target or spatially separated to +90° or −90° azimuth. Spatial release from masking was quantified as the difference between thresholds obtained when the target and masker were co-located and thresholds obtained when the masker was presented from +90° or − 90°. Testing was completed both with and without a moderate simulated unilateral hearing loss, created with a foam earplug and supra-aural earmuff. A repeated-measures design was used to compare performance between children and adults, and performance in the no-plug and simulated-unilateral-hearing-loss conditions. Results All listeners benefited from spatial separation of target and masker stimuli on the azimuth plane in the no-plug listening conditions; this benefit was larger in two-talker speech than in speech-shaped noise. In the simulated-unilateral-hearing-loss conditions, a positive spatial release from masking was observed only when the masker was presented ipsilateral to the simulated unilateral hearing loss. In the speech-shaped noise masker, spatial release from masking in the no-plug condition was similar to that obtained when the masker was presented ipsilateral to the simulated unilateral hearing loss. In contrast, in the two-talker speech masker, spatial release from masking in the no-plug condition was much larger than that obtained when the masker was presented ipsilateral to the simulated unilateral hearing loss. When either masker was presented contralateral to the simulated unilateral hearing loss, spatial release from masking was negative. This pattern of results was observed for both children and adults, although children performed more poorly overall. Conclusions Children and adults with normal bilateral hearing experience greater spatial release from masking for a two-talker speech than a speech-shaped noise masker. Testing in a two-talker speech masker revealed listening difficulties in the presence of disrupted binaural input that were not observed in a speech-shaped noise masker. This procedure offers promise for the assessment of spatial release from masking in children with permanent unilateral hearing loss. PMID:27787392
Improving speech perception in noise for children with cochlear implants.

PubMed

Gifford, René H; Olund, Amy P; Dejong, Melissa

2011-10-01

Current cochlear implant recipients are achieving increasingly higher levels of speech recognition; however, the presence of background noise continues to significantly degrade speech understanding for even the best performers. Newer generation Nucleus cochlear implant sound processors can be programmed with SmartSound strategies that have been shown to improve speech understanding in noise for adult cochlear implant recipients. The applicability of these strategies for use in children, however, is not fully understood nor widely accepted. To assess speech perception for pediatric cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE™) array in order to determine whether Nucleus sound processor SmartSound strategies yield improved sentence recognition in noise for children who learn language through the implant. Single subject, repeated measures design. Twenty-two experimental subjects with cochlear implants (mean age 11.1 yr) and 25 control subjects with normal hearing (mean age 9.6 yr) participated in this prospective study. Speech reception thresholds (SRT) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the experimental subjects' everyday program incorporating Adaptive Dynamic Range Optimization (ADRO) as well as with the addition of Autosensitivity control (ASC). Adaptive SRTs with the Hearing In Noise Test (HINT) sentences were obtained for all 22 experimental subjects, and performance-in percent correct-was assessed in a fixed +6 dB SNR (signal-to-noise ratio) for a six-subject subset. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the SmartSound setting on the SRT in noise. The primary findings mirrored those reported previously with adult cochlear implant recipients in that the addition of ASC to ADRO significantly improved speech recognition in noise for pediatric cochlear implant recipients. The mean degree of improvement in the SRT with the addition of ASC to ADRO was 3.5 dB for a mean SRT of 10.9 dB SNR. Thus, despite the fact that these children have acquired auditory/oral speech and language through the use of their cochlear implant(s) equipped with ADRO, the addition of ASC significantly improved their ability to recognize speech in high levels of diffuse background noise. The mean SRT for the control subjects with normal hearing was 0.0 dB SNR. Given that the mean SRT for the experimental group was 10.9 dB SNR, despite the improvements in performance observed with the addition of ASC, cochlear implants still do not completely overcome the speech perception deficit encountered in noisy environments accompanying the diagnosis of severe-to-profound hearing loss. SmartSound strategies currently available in latest generation Nucleus cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise for pediatric cochlear implant recipients. Despite the reluctance of pediatric audiologists to utilize SmartSound settings for regular use, the results of the current study support the addition of ASC to ADRO for everyday listening environments to improve speech perception in a child's typical everyday program. American Academy of Audiology.
Developing and Evaluating an Oral Skills Training Website Supported by Automatic Speech Recognition Technology

ERIC Educational Resources Information Center

Chen, Howard Hao-Jan

2011-01-01

Oral communication ability has become increasingly important to many EFL students. Several commercial software programs based on automatic speech recognition (ASR) technologies are available but their prices are not affordable for many students. This paper will demonstrate how the Microsoft Speech Application Software Development Kit (SASDK), a…
Vocal Tract Representation in the Recognition of Cerebral Palsied Speech

ERIC Educational Resources Information Center

Rudzicz, Frank; Hirst, Graeme; van Lieshout, Pascal

2012-01-01

Purpose: In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine. Method: Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech.…
Micro-Based Speech Recognition: Instructional Innovation for Handicapped Learners.

ERIC Educational Resources Information Center

Horn, Carin E.; Scott, Brian L.

A new voice based learning system (VBLS), which allows the handicapped user to interact with a microcomputer by voice commands, is described. Speech or voice recognition is the computerized process of identifying a spoken word or phrase, including those resulting from speech impediments. This new technology is helpful to the severely physically…
Speech Recognition Software for Language Learning: Toward an Evaluation of Validity and Student Perceptions

ERIC Educational Resources Information Center

Cordier, Deborah

2009-01-01

A renewed focus on foreign language (FL) learning and speech for communication has resulted in computer-assisted language learning (CALL) software developed with Automatic Speech Recognition (ASR). ASR features for FL pronunciation (Lafford, 2004) are functional components of CALL designs used for FL teaching and learning. The ASR features…
Thresholds of information leakage for speech security outside meeting rooms.

PubMed

Robinson, Matthew; Hopkins, Carl; Worrall, Ken; Jackson, Tim

2014-09-01

This paper describes an approach to provide speech security outside meeting rooms where a covert listener might attempt to extract confidential information. Decision-based experiments are used to establish a relationship between an objective measurement of the Speech Transmission Index (STI) and a subjective assessment relating to the threshold of information leakage. This threshold is defined for a specific percentage of English words that are identifiable with a maximum safe vocal effort (e.g., "normal" speech) used by the meeting participants. The results demonstrate that it is possible to quantify an offset that links STI with a specific threshold of information leakage which describes the percentage of words identified. The offsets for male talkers are shown to be approximately 10 dB larger than for female talkers. Hence for speech security it is possible to determine offsets for the threshold of information leakage using male talkers as the "worst case scenario." To define a suitable threshold of information leakage, the results show that a robust definition can be based upon 1%, 2%, or 5% of words identified. For these percentages, results are presented for offset values corresponding to different STI values in a range from 0.1 to 0.3.
Noise Robust Speech Recognition Applied to Voice-Driven Wheelchair

NASA Astrophysics Data System (ADS)

Sasou, Akira; Kojima, Hiroaki

2009-12-01

Conventional voice-driven wheelchairs usually employ headset microphones that are capable of achieving sufficient recognition accuracy, even in the presence of surrounding noise. However, such interfaces require users to wear sensors such as a headset microphone, which can be an impediment, especially for the hand disabled. Conversely, it is also well known that the speech recognition accuracy drastically degrades when the microphone is placed far from the user. In this paper, we develop a noise robust speech recognition system for a voice-driven wheelchair. This system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors. We verified the effectiveness of our system in experiments in different environments, and confirmed that our system can achieve almost the same recognition accuracy as the headset microphone without wearing sensors.
The Effect of Remote Masking on the Reception of Speech by Young School-Age Children.

PubMed

Youngdahl, Carla L; Healy, Eric W; Yoho, Sarah E; Apoux, Frédéric; Holt, Rachael Frush

2018-02-15

Psychoacoustic data indicate that infants and children are less likely than adults to focus on a spectral region containing an anticipated signal and are more susceptible to remote masking of a signal. These detection tasks suggest that infants and children, unlike adults, do not listen selectively. However, less is known about children's ability to listen selectively during speech recognition. Accordingly, the current study examines remote masking during speech recognition in children and adults. Adults and 7- and 5-year-old children performed sentence recognition in the presence of various spectrally remote maskers. Intelligibility was determined for each remote-masker condition, and performance was compared across age groups. It was found that speech recognition for 5-year-olds was reduced in the presence of spectrally remote noise, whereas the maskers had no effect on the 7-year-olds or adults. Maskers of different bandwidth and remoteness had similar effects. In accord with psychoacoustic data, young children do not appear to focus on a spectral region of interest and ignore other regions during speech recognition. This tendency may help account for their typically poorer speech perception in noise. This study also appears to capture an important developmental stage, during which a substantial refinement in spectral listening occurs.
Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Speech Recognition in Noise by Children with and without Dyslexia: How is it Related to Reading?

PubMed

Nittrouer, Susan; Krieg, Letitia M; Lowenstein, Joanna H

2018-06-01

Developmental dyslexia is commonly viewed as a phonological deficit that makes it difficult to decode written language. But children with dyslexia typically exhibit other problems, as well, including poor speech recognition in noise. The purpose of this study was to examine whether the speech-in-noise problems of children with dyslexia are related to their reading problems, and if so, if a common underlying factor might explain both. The specific hypothesis examined was that a spectral processing disorder results in these children receiving smeared signals, which could explain both the diminished sensitivity to phonological structure - leading to reading problems - and the speech recognition in noise difficulties. The alternative hypothesis tested in this study was that children with dyslexia simply have broadly based language deficits. Ninety-seven children between the ages of 7 years; 10 months and 12 years; 9 months participated: 46 with dyslexia and 51 without dyslexia. Children were tested on two dependent measures: word reading and recognition in noise with two types of sentence materials: as unprocessed (UP) signals, and as spectrally smeared (SM) signals. Data were collected for four predictor variables: phonological awareness, vocabulary, grammatical knowledge, and digit span. Children with dyslexia showed deficits on both dependent and all predictor variables. Their scores for speech recognition in noise were poorer than those of children without dyslexia for both the UP and SM signals, but by equivalent amounts across signal conditions indicating that they were not disproportionately hindered by spectral distortion. Correlation analyses on scores from children with dyslexia showed that reading ability and speech-in-noise recognition were only mildly correlated, and each skill was related to different underlying abilities. No substantial evidence was found to support the suggestion that the reading and speech recognition in noise problems of children with dyslexia arise from a single factor that could be defined as a spectral processing disorder. The reading and speech recognition in noise deficits of these children appeared to be largely independent. Copyright © 2018 Elsevier Ltd. All rights reserved.
Schizophrenia alters intra-network functional connectivity in the caudate for detecting speech under informational speech masking conditions.

PubMed

Zheng, Yingjun; Wu, Chao; Li, Juanhua; Li, Ruikeng; Peng, Hongjun; She, Shenglin; Ning, Yuping; Li, Liang

2018-04-04

Speech recognition under noisy "cocktail-party" environments involves multiple perceptual/cognitive processes, including target detection, selective attention, irrelevant signal inhibition, sensory/working memory, and speech production. Compared to health listeners, people with schizophrenia are more vulnerable to masking stimuli and perform worse in speech recognition under speech-on-speech masking conditions. Although the schizophrenia-related speech-recognition impairment under "cocktail-party" conditions is associated with deficits of various perceptual/cognitive processes, it is crucial to know whether the brain substrates critically underlying speech detection against informational speech masking are impaired in people with schizophrenia. Using functional magnetic resonance imaging (fMRI), this study investigated differences between people with schizophrenia (n = 19, mean age = 33 ± 10 years) and their matched healthy controls (n = 15, mean age = 30 ± 9 years) in intra-network functional connectivity (FC) specifically associated with target-speech detection under speech-on-speech-masking conditions. The target-speech detection performance under the speech-on-speech-masking condition in participants with schizophrenia was significantly worse than that in matched healthy participants (healthy controls). Moreover, in healthy controls, but not participants with schizophrenia, the strength of intra-network FC within the bilateral caudate was positively correlated with the speech-detection performance under the speech-masking conditions. Compared to controls, patients showed altered spatial activity pattern and decreased intra-network FC in the caudate. In people with schizophrenia, the declined speech-detection performance under speech-on-speech masking conditions is associated with reduced intra-caudate functional connectivity, which normally contributes to detecting target speech against speech masking via its functions of suppressing masking-speech signals.
Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

NASA Astrophysics Data System (ADS)

Caballero Morales, Santiago Omar; Cox, Stephen J.

2009-12-01

Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
Equivalence and test-retest reproducibility of conventional and extended-high-frequency audiometric thresholds obtained using pure-tone and narrow-band-noise stimuli.

PubMed

John, Andrew B; Kreisman, Brian M

2017-09-01

Extended high-frequency (EHF) audiometry is useful for evaluating ototoxic exposures and may relate to speech recognition, localisation and hearing aid benefit. There is a need to determine whether common clinical practice for EHF audiometry using tone and noise stimuli is reliable. We evaluated equivalence and compared test-retest (TRT) reproducibility for audiometric thresholds obtained using pure tones and narrowband noise (NBN) from 0.25 to 16 kHz. Thresholds and test-retest reproducibility for stimuli in the conventional (0.25-6 kHz) and EHF (8-16 kHz) frequency ranges were compared in a repeated-measures design. A total of 70 ears of adults with normal hearing. Thresholds obtained using NBN were significantly lower than thresholds obtained using pure tones from 0.5 to 16 kHz, but not 0.25 kHz. Good TRT reproducibility (within 2 dB) was observed for both stimuli at all frequencies. Responses at the lower limit of the presentation range for NBN centred at 14 and 16 kHz suggest unreliability for NBN as a threshold stimulus at these frequencies. Thresholds in the conventional and EHF ranges showed good test-retest reproducibility, but differed between stimulus types. Care should be taken when comparing pure-tone thresholds with NBN thresholds especially at these frequencies.
Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross- correlation bands.

PubMed

Na, Sung Dae; Wei, Qun; Seong, Ki Woong; Cho, Jin Ho; Kim, Myoung Nam

2018-01-01

The conventional methods of speech enhancement, noise reduction, and voice activity detection are based on the suppression of noise or non-speech components of the target air-conduction signals. However, air-conduced speech is hard to differentiate from babble or white noise signals. To overcome this problem, the proposed algorithm uses the bone-conduction speech signals and soft thresholding based on the Shannon entropy principle and cross-correlation of air- and bone-conduction signals. A new algorithm for speech detection and noise reduction is proposed, which makes use of the Shannon entropy principle and cross-correlation with the bone-conduction speech signals to threshold the wavelet packet coefficients of the noisy speech. The proposed method can be get efficient result by objective quality measure that are PESQ, RMSE, Correlation, SNR. Each threshold is generated by the entropy and cross-correlation approaches in the decomposed bands using the wavelet packet decomposition. As a result, the noise is reduced by the proposed method using the MATLAB simulation. To verify the method feasibility, we compared the air- and bone-conduction speech signals and their spectra by the proposed method. As a result, high performance of the proposed method is confirmed, which makes it quite instrumental to future applications in communication devices, noisy environment, construction, and military operations.
Application of advanced speech technology in manned penetration bombers

NASA Astrophysics Data System (ADS)

North, R.; Lea, W.

1982-03-01

This report documents research on the potential use of speech technology in a manned penetration bomber aircraft (B-52/G and H). The objectives of the project were to analyze the pilot/copilot crewstation tasks over a three-hour-and forty-minute mission and determine the tasks that would benefit the most from conversion to speech recognition/generation, determine the technological feasibility of each of the identified tasks, and prioritize these tasks based on these criteria. Secondary objectives of the program were to enunciate research strategies in the application of speech technologies in airborne environments, and develop guidelines for briefing user commands on the potential of using speech technologies in the cockpit. The results of this study indicated that for the B-52 crewmember, speech recognition would be most beneficial for retrieving chart and procedural data that is contained in the flight manuals. Technological feasibility of these tasks indicated that the checklist and procedural retrieval tasks would be highly feasible for a speech recognition system.
On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions

NASA Astrophysics Data System (ADS)

Selouani, Sid-Ahmed; O'Shaughnessy, Douglas

2003-12-01

Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the mel-frequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to[InlineEquation not available: see fulltext.] dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
The Effect of Dynamic Pitch on Speech Recognition in Temporally Modulated Noise

ERIC Educational Resources Information Center

Shen, Jung; Souza, Pamela E.

2017-01-01

Purpose: This study investigated the effect of dynamic pitch in target speech on older and younger listeners' speech recognition in temporally modulated noise. First, we examined whether the benefit from dynamic-pitch cues depends on the temporal modulation of noise. Second, we tested whether older listeners can benefit from dynamic-pitch cues for…

Introduction and Overview of the Vicens-Reddy Speech Recognition System.

ERIC Educational Resources Information Center

Kameny, Iris; Ritea, H.

The Vicens-Reddy System is unique in the sense that it approaches the problem of speech recognition as a whole, rather than treating particular aspects of the problems as in previous attempts. For example, where earlier systems treated only segmentation of speech into phoneme groups, or detected phonemes in a given context, the Vicens-Reddy System…
Accommodation and Compliance Series: Employees with Arthritis

MedlinePlus

... handed keyboard, an articulating keyboard tray, speech recognition software, a trackball, and office equipment for a workstation ... space heater, additional window insulation, and speech recognition software. An insurance clerk with arthritis from systemic lupus ...
[Research on Barrier-free Home Environment System Based on Speech Recognition].

PubMed

Zhu, Husheng; Yu, Hongliu; Shi, Ping; Fang, Youfang; Jian, Zhuo

2015-10-01

The number of people with physical disabilities is increasing year by year, and the trend of population aging is more and more serious. In order to improve the quality of the life, a control system of accessible home environment for the patients with serious disabilities was developed to control the home electrical devices with the voice of the patients. The control system includes a central control platform, a speech recognition module, a terminal operation module, etc. The system combines the speech recognition control technology and wireless information transmission technology with the embedded mobile computing technology, and interconnects the lamp, electronic locks, alarms, TV and other electrical devices in the home environment as a whole system through a wireless network node. The experimental results showed that speech recognition success rate was more than 84% in the home environment.
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

PubMed

Schafer, Phillip B; Jin, Dezhe Z

2014-03-01

Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
Speech-recognition interfaces for music information retrieval

NASA Astrophysics Data System (ADS)

Goto, Masataka

2005-09-01

This paper describes two hands-free music information retrieval (MIR) systems that enable a user to retrieve and play back a musical piece by saying its title or the artist's name. Although various interfaces for MIR have been proposed, speech-recognition interfaces suitable for retrieving musical pieces have not been studied. Our MIR-based jukebox systems employ two different speech-recognition interfaces for MIR, speech completion and speech spotter, which exploit intentionally controlled nonverbal speech information in original ways. The first is a music retrieval system with the speech-completion interface that is suitable for music stores and car-driving situations. When a user only remembers part of the name of a musical piece or an artist and utters only a remembered fragment, the system helps the user recall and enter the name by completing the fragment. The second is a background-music playback system with the speech-spotter interface that can enrich human-human conversation. When a user is talking to another person, the system allows the user to enter voice commands for music playback control by spotting a special voice-command utterance in face-to-face or telephone conversations. Experimental results from use of these systems have demonstrated the effectiveness of the speech-completion and speech-spotter interfaces. (Video clips: http://staff.aist.go.jp/m.goto/MIR/speech-if.html)
Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.

ERIC Educational Resources Information Center

Makhoul, John I.

The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…
Performance-intensity functions of Mandarin word recognition tests in noise: test dialect and listener language effects.

PubMed

Liu, Danzheng; Shi, Lu-Feng

2013-06-01

This study established the performance-intensity function for Beijing and Taiwan Mandarin bisyllabic word recognition tests in noise in native speakers of Wu Chinese. Effects of the test dialect and listeners' first language on psychometric variables (i.e., slope and 50%-correct threshold) were analyzed. Thirty-two normal-hearing Wu-speaking adults who used Mandarin since early childhood were compared to 16 native Mandarin-speaking adults. Both Beijing and Taiwan bisyllabic word recognition tests were presented at 8 signal-to-noise ratios (SNRs) in 4-dB steps (-12 dB to +16 dB). At each SNR, a half list (25 words) was presented in speech-spectrum noise to listeners' right ear. The order of the test, SNR, and half list was randomized across listeners. Listeners responded orally and in writing. Overall, the Wu-speaking listeners performed comparably to the Mandarin-speaking listeners on both tests. Compared to the Taiwan test, the Beijing test yielded a significantly lower threshold for both the Mandarin- and Wu-speaking listeners, as well as a significantly steeper slope for the Wu-speaking listeners. Both Mandarin tests can be used to evaluate Wu-speaking listeners. Of the 2, the Taiwan Mandarin test results in more comparable functions across listener groups. Differences in the performance-intensity function between listener groups and between tests indicate a first language and dialectal effect, respectively.
A physiologically-inspired model reproducing the speech intelligibility benefit in cochlear implant listeners with residual acoustic hearing.

PubMed

Zamaninezhad, Ladan; Hohmann, Volker; Büchner, Andreas; Schädler, Marc René; Jürgens, Tim

2017-02-01

This study introduces a speech intelligibility model for cochlear implant users with ipsilateral preserved acoustic hearing that aims at simulating the observed speech-in-noise intelligibility benefit when receiving simultaneous electric and acoustic stimulation (EA-benefit). The model simulates the auditory nerve spiking in response to electric and/or acoustic stimulation. The temporally and spatially integrated spiking patterns were used as the final internal representation of noisy speech. Speech reception thresholds (SRTs) in stationary noise were predicted for a sentence test using an automatic speech recognition framework. The model was employed to systematically investigate the effect of three physiologically relevant model factors on simulated SRTs: (1) the spatial spread of the electric field which co-varies with the number of electrically stimulated auditory nerves, (2) the "internal" noise simulating the deprivation of auditory system, and (3) the upper bound frequency limit of acoustic hearing. The model results show that the simulated SRTs increase monotonically with increasing spatial spread for fixed internal noise, and also increase with increasing the internal noise strength for a fixed spatial spread. The predicted EA-benefit does not follow such a systematic trend and depends on the specific combination of the model parameters. Beyond 300 Hz, the upper bound limit for preserved acoustic hearing is less influential on speech intelligibility of EA-listeners in stationary noise. The proposed model-predicted EA-benefits are within the range of EA-benefits shown by 18 out of 21 actual cochlear implant listeners with preserved acoustic hearing. Copyright © 2016 Elsevier B.V. All rights reserved.
Recognition of Speech from the Television with Use of a Wireless Technology Designed for Cochlear Implants.

PubMed

Duke, Mila Morais; Wolfe, Jace; Schafer, Erin

2016-05-01

Cochlear implant (CI) recipients often experience difficulty understanding speech in noise and speech that originates from a distance. Many CI recipients also experience difficulty understanding speech originating from a television. Use of hearing assistance technology (HAT) may improve speech recognition in noise and for signals that originate from more than a few feet from the listener; however, there are no published studies evaluating the potential benefits of a wireless HAT designed to deliver audio signals from a television directly to a CI sound processor. The objective of this study was to compare speech recognition in quiet and in noise of CI recipients with the use of their CI alone and with the use of their CI and a wireless HAT (Cochlear Wireless TV Streamer). A two-way repeated measures design was used to evaluate performance differences obtained in quiet and in competing noise (65 dBA) with the CI sound processor alone and with the sound processor coupled to the Cochlear Wireless TV Streamer. Sixteen users of Cochlear Nucleus 24 Freedom, CI512, and CI422 implants were included in the study. Participants were evaluated in four conditions including use of the sound processor alone and use of the sound processor with the wireless streamer in quiet and in the presence of competing noise at 65 dBA. Speech recognition was evaluated in each condition with two full lists of Computer-Assisted Speech Perception Testing and Training Sentence-Level Test sentences presented from a light-emitting diode television. Speech recognition in noise was significantly better with use of the wireless streamer compared to participants' performance with their CI sound processor alone. There was also a nonsignificant trend toward better performance in quiet with use of the TV Streamer. Performance was significantly poorer when evaluated in noise compared to performance in quiet when the TV Streamer was not used. Use of the Cochlear Wireless TV Streamer designed to stream audio from a television directly to a CI sound processor provides better speech recognition in quiet and in noise when compared to performance obtained with use of the CI sound processor alone. American Academy of Audiology.
Institute for the Study of Human Capabilities Summary Descriptions of Research for the Period September 1988 through June 1989

DTIC Science & Technology

1989-06-01

12 1.7 Application of the Modified Speech Transmission Index to Monaural and Binaural Speech Recognition in Normal and Impaired...describe all of the data from both groups. 1.7 Application of the Modified Speech Transmission Index to Monaural and Binaural Speech Recognition in...were obtained for materials presented to each ear separately (monaurally) and to both ears ( binaurally ). Results from the normal listeners are accurately
Emotion recognition from speech: tools and challenges

NASA Astrophysics Data System (ADS)

Al-Talabani, Abdulbasit; Sellahewa, Harin; Jassim, Sabah A.

2015-05-01

Human emotion recognition from speech is studied frequently for its importance in many applications, e.g. human-computer interaction. There is a wide diversity and non-agreement about the basic emotion or emotion-related states on one hand and about where the emotion related information lies in the speech signal on the other side. These diversities motivate our investigations into extracting Meta-features using the PCA approach, or using a non-adaptive random projection RP, which significantly reduce the large dimensional speech feature vectors that may contain a wide range of emotion related information. Subsets of Meta-features are fused to increase the performance of the recognition model that adopts the score-based LDC classifier. We shall demonstrate that our scheme outperform the state of the art results when tested on non-prompted databases or acted databases (i.e. when subjects act specific emotions while uttering a sentence). However, the huge gap between accuracy rates achieved on the different types of datasets of speech raises questions about the way emotions modulate the speech. In particular we shall argue that emotion recognition from speech should not be dealt with as a classification problem. We shall demonstrate the presence of a spectrum of different emotions in the same speech portion especially in the non-prompted data sets, which tends to be more "natural" than the acted datasets where the subjects attempt to suppress all but one emotion.
Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

PubMed Central

Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

2017-01-01

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Hearing Handicap and Speech Recognition Correlate With Self-Reported Listening Effort and Fatigue.

PubMed

Alhanbali, Sara; Dawes, Piers; Lloyd, Simon; Munro, Kevin J

To investigate the correlations between hearing handicap, speech recognition, listening effort, and fatigue. Eighty-four adults with hearing loss (65 to 85 years) completed three self-report questionnaires: the Fatigue Assessment Scale, the Effort Assessment Scale, and the Hearing Handicap Inventory for Elderly. Audiometric assessment included pure-tone audiometry and speech recognition in noise. There was a significant positive correlation between handicap and fatigue (r = 0.39, p < 0.05) and handicap and effort (r = 0.73, p < 0.05). There were significant (but lower) correlations between speech recognition and fatigue (r = 0.22, p < 0.05) or effort (r = 0.32, p< 0.05). There was no significant correlation between hearing level and fatigue or effort. Hearing handicap and speech recognition both correlate with self-reported listening effort and fatigue, which is consistent with a model of listening effort and fatigue where perceived difficulty is related to sustained effort and fatigue for unrewarding tasks over which the listener has low control. A clinical implication is that encouraging clients to recognize and focus on the pleasure and positive experiences of listening may result in greater satisfaction and benefit from hearing aid use.
Development of coffee maker service robot using speech and face recognition systems using POMDP

NASA Astrophysics Data System (ADS)

Budiharto, Widodo; Meiliana; Santoso Gunawan, Alexander Agung

2016-07-01

There are many development of intelligent service robot in order to interact with user naturally. This purpose can be done by embedding speech and face recognition ability on specific tasks to the robot. In this research, we would like to propose Intelligent Coffee Maker Robot which the speech recognition is based on Indonesian language and powered by statistical dialogue systems. This kind of robot can be used in the office, supermarket or restaurant. In our scenario, robot will recognize user's face and then accept commands from the user to do an action, specifically in making a coffee. Based on our previous work, the accuracy for speech recognition is about 86% and face recognition is about 93% in laboratory experiments. The main problem in here is to know the intention of user about how sweetness of the coffee. The intelligent coffee maker robot should conclude the user intention through conversation under unreliable automatic speech in noisy environment. In this paper, this spoken dialog problem is treated as a partially observable Markov decision process (POMDP). We describe how this formulation establish a promising framework by empirical results. The dialog simulations are presented which demonstrate significant quantitative outcome.
Financial and workflow analysis of radiology reporting processes in the planning phase of implementation of a speech recognition system

NASA Astrophysics Data System (ADS)

Whang, Tom; Ratib, Osman M.; Umamoto, Kathleen; Grant, Edward G.; McCoy, Michael J.

2002-05-01

The goal of this study is to determine the financial value and workflow improvements achievable by replacing traditional transcription services with a speech recognition system in a large, university hospital setting. Workflow metrics were measured at two hospitals, one of which exclusively uses a transcription service (UCLA Medical Center), and the other which exclusively uses speech recognition (West Los Angeles VA Hospital). Workflow metrics include time spent per report (the sum of time spent interpreting, dictating, reviewing, and editing), transcription turnaround, and total report turnaround. Compared to traditional transcription, speech recognition resulted in radiologists spending 13-32% more time per report, but it also resulted in reduction of report turnaround time by 22-62% and reduction of marginal cost per report by 94%. The model developed here helps justify the introduction of a speech recognition system by showing that the benefits of reduced operating costs and decreased turnaround time outweigh the cost of increased time spent per report. Whether the ultimate goal is to achieve a financial objective or to improve operational efficiency, it is important to conduct a thorough analysis of workflow before implementation.
Effects of cooperating and conflicting cues on speech intonation recognition by cochlear implant users and normal hearing listeners.

PubMed

Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita

2009-01-01

Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g. intensity properties) may also contribute. This study examined the effects of cooperating or conflicting acoustic cues on speech intonation recognition by adult CI and normal hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues (i.e. F0 contour and intensity patterns) were either cooperating or conflicting. Subjects identified if each stimulus is a 'statement' or a 'question' in a single-interval, 2-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners' intonation recognition was enhanced by cooperating F0 contour and intensity cues, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners' intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. Copyright (C) 2009 S. Karger AG, Basel.
Effects of cooperating and conflicting cues on speech intonation recognition by cochlear implant users and normal hearing listeners

PubMed Central

Peng, Shu-Chen; Lu, Nelson; Chatterjee, Monita

2009-01-01

Cochlear implant (CI) recipients have only limited access to fundamental frequency (F0) information, and thus exhibit deficits in speech intonation recognition. For speech intonation, F0 serves as the primary cue, and other potential acoustic cues (e.g., intensity properties) may also contribute. This study examined the effects of acoustic cues being cooperating or conflicting on speech intonation recognition by adult cochlear implant (CI), and normal-hearing (NH) listeners with full-spectrum and spectrally degraded speech stimuli. Identification of speech intonation that signifies question and statement contrasts was measured in 13 CI recipients and 4 NH listeners, using resynthesized bi-syllabic words, where F0 and intensity properties were systematically manipulated. The stimulus set was comprised of tokens whose acoustic cues, i.e., F0 contour and intensity patterns, were either cooperating or conflicting. Subjects identified if each stimulus is a “statement” or a “question” in a single-interval, two-alternative forced-choice (2AFC) paradigm. Logistic models were fitted to the data, and estimated coefficients were compared under cooperating and conflicting conditions, between the subject groups (CI vs. NH), and under full-spectrum and spectrally degraded conditions for NH listeners. The results indicated that CI listeners’ intonation recognition was enhanced by F0 contour and intensity cues being cooperating, but was adversely affected by these cues being conflicting. On the other hand, with full-spectrum stimuli, NH listeners’ intonation recognition was not affected by cues being cooperating or conflicting. The effects of cues being cooperating or conflicting were comparable between the CI group and NH listeners with spectrally-degraded stimuli. These findings suggest the importance of taking multiple acoustic sources for speech recognition into consideration in aural rehabilitation for CI recipients. PMID:19372651
Improving Mobile Phone Speech Recognition by Personalized Amplification: Application in People with Normal Hearing and Mild-to-Moderate Hearing Loss.

PubMed

Kam, Anna Chi Shan; Sung, John Ka Keung; Lee, Tan; Wong, Terence Ka Cheong; van Hasselt, Andrew

In this study, the authors evaluated the effect of personalized amplification on mobile phone speech recognition in people with and without hearing loss. This prospective study used double-blind, within-subjects, repeated measures, controlled trials to evaluate the effectiveness of applying personalized amplification based on the hearing level captured on the mobile device. The personalized amplification settings were created using modified one-third gain targets. The participants in this study included 100 adults of age between 20 and 78 years (60 with age-adjusted normal hearing and 40 with hearing loss). The performance of the participants with personalized amplification and standard settings was compared using both subjective and speech-perception measures. Speech recognition was measured in quiet and in noise using Cantonese disyllabic words. Subjective ratings on the quality, clarity, and comfortableness of the mobile signals were measured with an 11-point visual analog scale. Subjective preferences of the settings were also obtained by a paired-comparison procedure. The personalized amplification application provided better speech recognition via the mobile phone both in quiet and in noise for people with hearing impairment (improved 8 to 10%) and people with normal hearing (improved 1 to 4%). The improvement in speech recognition was significantly better for people with hearing impairment. When the average device output level was matched, more participants preferred to have the individualized gain than not to have it. The personalized amplification application has the potential to improve speech recognition for people with mild-to-moderate hearing loss, as well as people with normal hearing, in particular when listening in noisy environments.
Current steering and current focusing in cochlear implants: comparison of monopolar, tripolar, and virtual channel electrode configurations.

PubMed

Berenstein, Carlo K; Mens, Lucas H M; Mulder, Jef J S; Vanpoucke, Filiep J

2008-04-01

To compare the effects of Monopole (Mono), Tripole (Tri), and "Virtual channel" (Vchan) electrode configurations on spectral resolution and speech perception in a crossover design. Nine experienced adults who received an Advanced Bionics CII/90K cochlear implant participated in a crossover design using three experimental strategies for 2 wk each. Three strategies were compared: (1) Mono; (2) Tri with current partly returning to adjacent electrodes and partly (25 or 75%) to the extracochlear reference; and (3) a monopolar "Vchan" strategy creating seven intermediate channels between two contacts. Each strategy was a variant of the standard "HiRes" processing strategy using 14 channels and 1105 pulses/sec/ channel, and a pulse duration of 32 microsec/phase. Spectral resolution was measured using broadband noise with a sinusoidally rippled spectral envelope with peaks evenly spaced on a logarithmic frequency scale. Speech perception was measured for monosyllables in quiet and in steady-state and fluctuating noises. Subjective comments on music experience and preferences in everyday use were assessed through questionnaires. Thresholds and most comfortable levels with Mono and Vchan were both significantly lower than levels with Tri. Spectral resolution was significantly higher with Tri than with Mono; spectral resolution with Vchan did not differ significantly from the other configurations. Moderate but significant correlations between word recognition and spectral resolution were found in speech in quiet and fluctuating noise. For speech in quiet, word recognition was best with Mono and worst with Vchan; Tri did not significantly differ from the other configurations. Pooled across the noise conditions, word recognition was best with Tri and worst with Vchan (Mono did not significantly differ from the other configurations). These differences were small and insufficient to result in a clear increase in performance across subjects if the result from the best configuration per subject was compared with the result from Mono. Across all subjects, music appreciation and satisfaction in everyday use did not clearly differ between configurations. (1) Although spectral resolution was improved with the tripolar configuration, differences in speech performance were too small in this limited group of subjects to justify clinical introduction. (2) Overall spectral resolution remained extremely poor compared with normal hearing; it remains to be seen whether further manipulations of the electrical field will be more effective.
Study of environmental sound source identification based on hidden Markov model for robust speech recognition

NASA Astrophysics Data System (ADS)

Nishiura, Takanobu; Nakamura, Satoshi

2003-10-01

Humans communicate with each other through speech by focusing on the target speech among environmental sounds in real acoustic environments. We can easily identify the target sound from other environmental sounds. For hands-free speech recognition, the identification of the target speech from environmental sounds is imperative. This mechanism may also be important for a self-moving robot to sense the acoustic environments and communicate with humans. Therefore, this paper first proposes hidden Markov model (HMM)-based environmental sound source identification. Environmental sounds are modeled by three states of HMMs and evaluated using 92 kinds of environmental sounds. The identification accuracy was 95.4%. This paper also proposes a new HMM composition method that composes speech HMMs and an HMM of categorized environmental sounds for robust environmental sound-added speech recognition. As a result of the evaluation experiments, we confirmed that the proposed HMM composition outperforms the conventional HMM composition with speech HMMs and a noise (environmental sound) HMM trained using noise periods prior to the target speech in a captured signal. [Work supported by Ministry of Public Management, Home Affairs, Posts and Telecommunications of Japan.

Comparison of the South African Spondaic and CID W-1 wordlists for measuring speech recognition threshold

PubMed Central

Soer, Maggi; Pottas, Lidia

2015-01-01

Background The home language of most audiologists in South Africa is either English or Afrikaans, whereas most South Africans speak an African language as their home language. The use of an English wordlist, the South African Spondaic (SAS) wordlist, which is familiar to the English Second Language (ESL) population, was developed by the author for testing the speech recognition threshold (SRT) of ESL speakers. Objectives The aim of this study was to compare the pure-tone average (PTA)/SRT correlation results of ESL participants when using the SAS wordlist (list A) and the CID W-1 spondaic wordlist (list B – less familiar; list C – more familiar CID W-1 words). Method A mixed-group correlational, quantitative design was adopted. PTA and SRT measurements were compared for lists A, B and C for 101 (197 ears) ESL participants with normal hearing or a minimal hearing loss (<26 dBHL; mean age 33.3). Results The Pearson correlation analysis revealed a strong PTA/SRT correlation when using list A (right 0.65; left 0.58) and list C (right 0.63; left 0.56). The use of list B revealed weak correlations (right 0.30; left 0.32). Paired sample t-tests indicated a statistically significantly stronger PTA/SRT correlation when list A was used, rather than list B or list C, at a 95% level of confidence. Conclusions The use of the SAS wordlist yielded a stronger PTA/SRT correlation than the use of the CID W-1 wordlist, when performing SRT testing on South African ESL speakers with normal hearing, or minimal hearing loss (<26 dBHL). PMID:26304218
Speech therapy and voice recognition instrument

NASA Technical Reports Server (NTRS)

Cohen, J.; Babcock, M. L.

1972-01-01

Characteristics of electronic circuit for examining variations in vocal excitation for diagnostic purposes and in speech recognition for determiniog voice patterns and pitch changes are described. Operation of the circuit is discussed and circuit diagram is provided.
Speech recognition by bilateral cochlear implant users in a cocktail-party setting

PubMed Central

Loizou, Philipos C.; Hu, Yi; Litovsky, Ruth; Yu, Gongqiang; Peters, Robert; Lake, Jennifer; Roland, Peter

2009-01-01

Unlike prior studies with bilateral cochlear implant users which considered only one interferer, the present study considered realistic listening situations wherein multiple interferers were present and in some cases originating from both hemifields. Speech reception thresholds were measured in bilateral users unilaterally and bilaterally in four different spatial configurations, with one and three interferers consisting of modulated noise or competing talkers. The data were analyzed in terms of binaural benefits including monaural advantage (better-ear listening) and binaural interaction. The total advantage (overall spatial release) received was 2–5 dB and was maintained with multiple interferers present. This advantage was dominated by the monaural advantage, which ranged from 1 to 6 dB and was largest when the interferers were mostly energetic. No binaural-interaction benefit was found in the present study with either type of interferer (speech or noise). While the total and monaural advantage obtained for noise interferers was comparable to that attained by normal-hearing listeners, it was considerably lower for speech interferers. This suggests that bilateral users are less capable of taking advantage of binaural cues, in particular, under conditions of informational masking. Furthermore, the use of noise interferers does not adequately reflect the difficulties experienced by bilateral users in real-life situations. PMID:19173424
The relative impact of generic head-related transfer functions on auditory speech thresholds: implications for the design of three-dimensional audio displays.

PubMed

Arrabito, G R; McFadden, S M; Crabtree, R B

2001-07-01

Auditory speech thresholds were measured in this study. Subjects were required to discriminate a female voice recording of three-digit numbers in the presence of diotic speech babble. The voice stimulus was spatialized at 11 static azimuth positions on the horizontal plane using three different head-related transfer functions (HRTFs) measured on individuals who did not participate in this study. The diotic presentation of the voice stimulus served as the control condition. The results showed that two of the HRTFS performed similarly and had significantly lower auditory speech thresholds than the third HRTF. All three HRTFs yielded significantly lower auditory speech thresholds compared with the diotic presentation of the voice stimulus, with the largest difference at 60 degrees azimuth. The practical implications of these results suggest that lower headphone levels of the communication system in military aircraft can be achieved without sacrificing intelligibility, thereby lessening the risk of hearing loss.
Multilevel Analysis in Analyzing Speech Data

ERIC Educational Resources Information Center

Guddattu, Vasudeva; Krishna, Y.

2011-01-01

The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…
A Development of a System Enables Character Input and PC Operation via Voice for a Physically Disabled Person with a Speech Impediment

NASA Astrophysics Data System (ADS)

Tanioka, Toshimasa; Egashira, Hiroyuki; Takata, Mayumi; Okazaki, Yasuhisa; Watanabe, Kenzi; Kondo, Hiroki

We have designed and implemented a PC operation support system for a physically disabled person with a speech impediment via voice. Voice operation is an effective method for a physically disabled person with involuntary movement of the limbs and the head. We have applied a commercial speech recognition engine to develop our system for practical purposes. Adoption of a commercial engine reduces development cost and will contribute to make our system useful to another speech impediment people. We have customized commercial speech recognition engine so that it can recognize the utterance of a person with a speech impediment. We have restricted the words that the recognition engine recognizes and separated a target words from similar words in pronunciation to avoid misrecognition. Huge number of words registered in commercial speech recognition engines cause frequent misrecognition for speech impediments' utterance, because their utterance is not clear and unstable. We have solved this problem by narrowing the choice of input down in a small number and also by registering their ambiguous pronunciations in addition to the original ones. To realize all character inputs and all PC operation with a small number of words, we have designed multiple input modes with categorized dictionaries and have introduced two-step input in each mode except numeral input to enable correct operation with small number of words. The system we have developed is in practical level. The first author of this paper is physically disabled with a speech impediment. He has been able not only character input into PC but also to operate Windows system smoothly by using this system. He uses this system in his daily life. This paper is written by him with this system. At present, the speech recognition is customized to him. It is, however, possible to customize for other users by changing words and registering new pronunciation according to each user's utterance.
Use of Adaptive Digital Signal Processing to Improve Speech Communication for Normally Hearing aand Hearing-Impaired Subjects.

ERIC Educational Resources Information Center

Harris, Richard W.; And Others

1988-01-01

A two-microphone adaptive digital noise cancellation technique improved word-recognition ability for 20 normal and 12 hearing-impaired adults by reducing multitalker speech babble and speech spectrum noise 18-22 dB. Word recognition improvements averaged 37-50 percent for normal and 27-40 percent for hearing-impaired subjects. Improvement was best…
Multilingual Phoneme Models for Rapid Speech Processing System Development

DTIC Science & Technology

2006-09-01

processes are used to develop an Arabic speech recognition system starting from monolingual English models, In- ternational Phonetic Association (IPA...clusters. It was found that multilingual bootstrapping methods out- perform monolingual English bootstrapping methods on the Arabic evaluation data initially...International Phonetic Alphabet . . . . . . . . . 7 2.3.2 Multilingual vs. Monolingual Speech Recognition 7 2.3.3 Data-Driven Approaches
An Exploration of the Potential of Automatic Speech Recognition to Assist and Enable Receptive Communication in Higher Education

ERIC Educational Resources Information Center

Wald, Mike

2006-01-01

The potential use of Automatic Speech Recognition to assist receptive communication is explored. The opportunities and challenges that this technology presents students and staff to provide captioning of speech online or in classrooms for deaf or hard of hearing students and assist blind, visually impaired or dyslexic learners to read and search…
A novel probabilistic framework for event-based speech recognition

NASA Astrophysics Data System (ADS)

Juneja, Amit; Espy-Wilson, Carol

2003-10-01

One of the reasons for unsatisfactory performance of the state-of-the-art automatic speech recognition (ASR) systems is the inferior acoustic modeling of low-level acoustic-phonetic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal, but such a system for continuous speech recognition (CSR) is not known to exist. A probabilistic and statistical framework for CSR based on the idea of the representation of speech sounds by bundles of binary valued articulatory phonetic features is proposed. Multiple probabilistic sequences of linguistically motivated landmarks are obtained using binary classifiers of manner phonetic features-syllabic, sonorant and continuant-and the knowledge-based acoustic parameters (APs) that are acoustic correlates of those features. The landmarks are then used for the extraction of knowledge-based APs for source and place phonetic features and their binary classification. Probabilistic landmark sequences are constrained using manner class language models for isolated or connected word recognition. The proposed method could overcome the disadvantages encountered by the early acoustic-phonetic knowledge-based systems that led the ASR community to switch to systems highly dependent on statistical pattern analysis methods and probabilistic language or grammar models.
Effect of a Bluetooth-implemented hearing aid on speech recognition performance: subjective and objective measurement.

PubMed

Kim, Min-Beom; Chung, Won-Ho; Choi, Jeesun; Hong, Sung Hwa; Cho, Yang-Sun; Park, Gyuseok; Lee, Sangmin

2014-06-01

The object was to evaluate speech perception improvement through Bluetooth-implemented hearing aids in hearing-impaired adults. Thirty subjects with bilateral symmetric moderate sensorineural hearing loss participated in this study. A Bluetooth-implemented hearing aid was fitted unilaterally in all study subjects. Objective speech recognition score and subjective satisfaction were measured with a Bluetooth-implemented hearing aid to replace the acoustic connection from either a cellular phone or a loudspeaker system. In each system, participants were assigned to 4 conditions: wireless speech signal transmission into hearing aid (wireless mode) in quiet or noisy environment and conventional speech signal transmission using external microphone of hearing aid (conventional mode) in quiet or noisy environment. Also, participants completed questionnaires to investigate subjective satisfaction. Both cellular phone and loudspeaker system situation, participants showed improvements in sentence and word recognition scores with wireless mode compared to conventional mode in both quiet and noise conditions (P < .001). Participants also reported subjective improvements, including better sound quality, less noise interference, and better accuracy naturalness, when using the wireless mode (P < .001). Bluetooth-implemented hearing aids helped to improve subjective and objective speech recognition performances in quiet and noisy environments during the use of electronic audio devices.
Speech recognition in individuals with sensorineural hearing loss.

PubMed

de Andrade, Adriana Neves; Iorio, Maria Cecilia Martinelli; Gil, Daniela

2016-01-01

Hearing loss can negatively influence the communication performance of individuals, who should be evaluated with suitable material and in situations of listening close to those found in everyday life. To analyze and compare the performance of patients with mild-to-moderate sensorineural hearing loss in speech recognition tests carried out in silence and with noise, according to the variables ear (right and left) and type of stimulus presentation. The study included 19 right-handed individuals with mild-to-moderate symmetrical bilateral sensorineural hearing loss, submitted to the speech recognition test with words in different modalities and speech test with white noise and pictures. There was no significant difference between right and left ears in any of the tests. The mean number of correct responses in the speech recognition test with pictures, live voice, and recorded monosyllables was 97.1%, 85.9%, and 76.1%, respectively, whereas after the introduction of noise, the performance decreased to 72.6% accuracy. The best performances in the Speech Recognition Percentage Index were obtained using monosyllabic stimuli, represented by pictures presented in silence, with no significant differences between the right and left ears. After the introduction of competitive noise, there was a decrease in individuals' performance. Copyright © 2015 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Relationships between Structural and Acoustic Properties of Maternal Talk and Children's Early Word Recognition

ERIC Educational Resources Information Center

Suttora, Chiara; Salerni, Nicoletta; Zanchi, Paola; Zampini, Laura; Spinelli, Maria; Fasolo, Mirco

2017-01-01

This study aimed to investigate specific associations between structural and acoustic characteristics of infant-directed (ID) speech and word recognition. Thirty Italian-acquiring children and their mothers were tested when the children were 1;3. Children's word recognition was measured with the looking-while-listening task. Maternal ID speech was…
Speech Recognition in Adults with Cochlear Implants: The Effects of Working Memory, Phonological Sensitivity, and Aging

ERIC Educational Resources Information Center

Moberly, Aaron C.; Harris, Michael S.; Boyce, Lauren; Nittrouer, Susan

2017-01-01

Purpose: Models of speech recognition suggest that "top-down" linguistic and cognitive functions, such as use of phonotactic constraints and working memory, facilitate recognition under conditions of degradation, such as in noise. The question addressed in this study was what happens to these functions when a listener who has experienced…
The Effect of Asymmetrical Signal Degradation on Binaural Speech Recognition in Children and Adults.

ERIC Educational Resources Information Center

Rothpletz, Ann M.; Tharpe, Anne Marie; Grantham, D. Wesley

2004-01-01

To determine the effect of asymmetrical signal degradation on binaural speech recognition, 28 children and 14 adults were administered a sentence recognition task amidst multitalker babble. There were 3 listening conditions: (a) monaural, with mild degradation in 1 ear; (b) binaural, with mild degradation in both ears (symmetric degradation); and…
Learning Models and Real-Time Speech Recognition.

ERIC Educational Resources Information Center

Danforth, Douglas G.; And Others

This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…
Use of Authentic-Speech Technique for Teaching Sound Recognition to EFL Students

ERIC Educational Resources Information Center

Sersen, William J.

2011-01-01

The main objective of this research was to test an authentic-speech technique for improving the sound-recognition skills of EFL (English as a foreign language) students at Roi-Et Rajabhat University. The secondary objective was to determine the correlation, if any, between students' self-evaluation of sound-recognition progress and the actual…
What happens to the motor theory of perception when the motor system is damaged?

PubMed

Stasenko, Alena; Garcea, Frank E; Mahon, Bradford Z

2013-09-01

Motor theories of perception posit that motor information is necessary for successful recognition of actions. Perhaps the most well known of this class of proposals is the motor theory of speech perception, which argues that speech recognition is fundamentally a process of identifying the articulatory gestures (i.e. motor representations) that were used to produce the speech signal. Here we review neuropsychological evidence from patients with damage to the motor system, in the context of motor theories of perception applied to both manual actions and speech. Motor theories of perception predict that patients with motor impairments will have impairments for action recognition. Contrary to that prediction, the available neuropsychological evidence indicates that recognition can be spared despite profound impairments to production. These data falsify strong forms of the motor theory of perception, and frame new questions about the dynamical interactions that govern how information is exchanged between input and output systems.
What happens to the motor theory of perception when the motor system is damaged?

PubMed Central

Stasenko, Alena; Garcea, Frank E.; Mahon, Bradford Z.

2016-01-01

Motor theories of perception posit that motor information is necessary for successful recognition of actions. Perhaps the most well known of this class of proposals is the motor theory of speech perception, which argues that speech recognition is fundamentally a process of identifying the articulatory gestures (i.e. motor representations) that were used to produce the speech signal. Here we review neuropsychological evidence from patients with damage to the motor system, in the context of motor theories of perception applied to both manual actions and speech. Motor theories of perception predict that patients with motor impairments will have impairments for action recognition. Contrary to that prediction, the available neuropsychological evidence indicates that recognition can be spared despite profound impairments to production. These data falsify strong forms of the motor theory of perception, and frame new questions about the dynamical interactions that govern how information is exchanged between input and output systems. PMID:26823687
Assessment of Spectral and Temporal Resolution in Cochlear Implant Users Using Psychoacoustic Discrimination and Speech Cue Categorization.

PubMed

Winn, Matthew B; Won, Jong Ho; Moon, Il Joon

This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.

Assessment of spectral and temporal resolution in cochlear implant users using psychoacoustic discrimination and speech cue categorization

PubMed Central

Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon

2016-01-01

Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871
Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

NASA Technical Reports Server (NTRS)

Simpson, C. A.

1985-01-01

In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.
Speech recognition in advanced rotorcraft - Using speech controls to reduce manual control overload

NASA Technical Reports Server (NTRS)

Vidulich, Michael A.; Bortolussi, Michael R.

1988-01-01

An experiment has been conducted to ascertain the usefulness of helicopter pilot speech controls and their effect on time-sharing performance, under the impetus of multiple-resource theories of attention which predict that time-sharing should be more efficient with mixed manual and speech controls than with all-manual ones. The test simulation involved an advanced, single-pilot scout/attack helicopter. Performance and subjective workload levels obtained supported the claimed utility of speech recognition-based controls; specifically, time-sharing performance was improved while preparing a data-burst transmission of information during helicopter hover.
Incorporating Speech Recognition into a Natural User Interface

NASA Technical Reports Server (NTRS)

Chapa, Nicholas

2017-01-01

The Augmented/ Virtual Reality (AVR) Lab has been working to study the applicability of recent virtual and augmented reality hardware and software to KSC operations. This includes the Oculus Rift, HTC Vive, Microsoft HoloLens, and Unity game engine. My project in this lab is to integrate voice recognition and voice commands into an easy to modify system that can be added to an existing portion of a Natural User Interface (NUI). A NUI is an intuitive and simple to use interface incorporating visual, touch, and speech recognition. The inclusion of speech recognition capability will allow users to perform actions or make inquiries using only their voice. The simplicity of needing only to speak to control an on-screen object or enact some digital action means that any user can quickly become accustomed to using this system. Multiple programs were tested for use in a speech command and recognition system. Sphinx4 translates speech to text using a Hidden Markov Model (HMM) based Language Model, an Acoustic Model, and a word Dictionary running on Java. PocketSphinx had similar functionality to Sphinx4 but instead ran on C. However, neither of these programs were ideal as building a Java or C wrapper slowed performance. The most ideal speech recognition system tested was the Unity Engine Grammar Recognizer. A Context Free Grammar (CFG) structure is written in an XML file to specify the structure of phrases and words that will be recognized by Unity Grammar Recognizer. Using Speech Recognition Grammar Specification (SRGS) 1.0 makes modifying the recognized combinations of words and phrases very simple and quick to do. With SRGS 1.0, semantic information can also be added to the XML file, which allows for even more control over how spoken words and phrases are interpreted by Unity. Additionally, using a CFG with SRGS 1.0 produces a Finite State Machine (FSM) functionality limiting the potential for incorrectly heard words or phrases. The purpose of my project was to investigate options for a Speech Recognition System. To that end I attempted to integrate Sphinx4 into a user interface. Sphinx4 had great accuracy and is the only free program able to perform offline speech dictation. However it had a limited dictionary of words that could be recognized, single syllable words were almost impossible for it to hear, and since it ran on Java it could not be integrated into the Unity based NUI. PocketSphinx ran much faster than Sphinx4 which would've made it ideal as a plugin to the Unity NUI, unfortunately creating a C# wrapper for the C code made the program unusable with Unity due to the wrapper slowing code execution and class files becoming unreachable. Unity Grammar Recognizer is the ideal speech recognition interface, it is flexible in recognizing multiple variations of the same command. It is also the most accurate program in recognizing speech due to using an XML grammar to specify speech structure instead of relying solely on a Dictionary and Language model. The Unity Grammar Recognizer will be used with the NUI for these reasons as well as being written in C# which further simplifies the incorporation.
Temporal Effects on Monaural Amplitude-Modulation Sensitivity in Ipsilateral, Contralateral and Bilateral Noise.

PubMed

Marrufo-Pérez, Miriam I; Eustaquio-Martín, Almudena; López-Bascuas, Luis E; Lopez-Poveda, Enrique A

2018-04-01

The amplitude modulations (AMs) in speech signals are useful cues for speech recognition. Several adaptation mechanisms may make the detection of AM in noisy backgrounds easier when the AM carrier is presented later rather than earlier in the noise. The aim of the present study was to characterize temporal adaptation to noise in AM detection. AM detection thresholds were measured for monaural (50 ms, 1.5 kHz) pure-tone carriers presented at the onset ('early' condition) and 300 ms after the onset ('late' condition) of ipsilateral, contralateral, and bilateral (diotic) broadband noise, as well as in quiet. Thresholds were 2-4 dB better in the late than in the early condition for the three noise lateralities. The temporal effect held for carriers at equal sensation levels, confirming that it was not due to overshoot on carrier audibility. The temporal effect was larger for broadband than for low-band contralateral noises. Many aspects in the results were consistent with the noise activating the medial olivocochlear reflex (MOCR) and enhancing AM depth in the peripheral auditory response. Other aspects, however, indicate that central masking and adaptation unrelated to the MOCR also affect both carrier-tone and AM detection and are involved in the temporal effects.
A novel speech processing algorithm based on harmonicity cues in cochlear implant

NASA Astrophysics Data System (ADS)

Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng

2017-08-01

This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.
Result on speech perception after conversion from Spectra® to Freedom®.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, Maria Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2012-04-01

New technology in the Freedom® speech processor for cochlear implants was developed to improve how incoming acoustic sound is processed; this applies not only for new users, but also for previous generations of cochlear implants. To identify the contribution of this technology-- the Nucleus 22®--on speech perception tests in silence and in noise, and on audiometric thresholds. A cross-sectional cohort study was undertaken. Seventeen patients were selected. The last map based on the Spectra® was revised and optimized before starting the tests. Troubleshooting was used to identify malfunction. To identify the contribution of the Freedom® technology for the Nucleus22®, auditory thresholds and speech perception tests were performed in free field in sound-proof booths. Recorded monosyllables and sentences in silence and in noise (SNR = 0dB) were presented at 60 dBSPL. The nonparametric Wilcoxon test for paired data was used to compare groups. Freedom® applied for the Nucleus22® showed a statistically significant difference in all speech perception tests and audiometric thresholds. The Freedom® technology improved the performance of speech perception and audiometric thresholds of patients with Nucleus 22®.
Blind speech separation system for humanoid robot with FastICA for audio filtering and separation

NASA Astrophysics Data System (ADS)

Budiharto, Widodo; Santoso Gunawan, Alexander Agung

2016-07-01

Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
New Ideas for Speech Recognition and Related Technologies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holzrichter, J F

The ideas relating to the use of organ motion sensors for the purposes of speech recognition were first described by.the author in spring 1994. During the past year, a series of productive collaborations between the author, Tom McEwan and Larry Ng ensued and have lead to demonstrations, new sensor ideas, and algorithmic descriptions of a large number of speech recognition concepts. This document summarizes the basic concepts of recognizing speech once organ motions have been obtained. Micro power radars and their uses for the measurement of body organ motions, such as those of the heart and lungs, have been demonstratedmore » by Tom McEwan over the past two years. McEwan and I conducted a series of experiments, using these instruments, on vocal organ motions beginning in late spring, during which we observed motions of vocal folds (i.e., cords), tongue, jaw, and related organs that are very useful for speech recognition and other purposes. These will be reviewed in a separate paper. Since late summer 1994, Lawrence Ng and I have worked to make many of the initial recognition ideas more rigorous and to investigate the applications of these new ideas to new speech recognition algorithms, to speech coding, and to speech synthesis. I introduce some of those ideas in section IV of this document, and we describe them more completely in the document following this one, UCRL-UR-120311. For the design and operation of micro-power radars and their application to body organ motions, the reader may contact Tom McEwan directly. The capability for using EM sensors (i.e., radar units) to measure body organ motions and positions has been available for decades. Impediments to their use appear to have been size, excessive power, lack of resolution, and lack of understanding of the value of organ motion measurements, especially as applied to speech related technologies. However, with the invention of very low power, portable systems as demonstrated by McEwan at LLNL researchers have begun to think differently about practical applications of such radars. In particular, his demonstrations of heart and lung motions have opened up many new areas of application for human and animal measurements.« less
Human phoneme recognition depending on speech-intrinsic variability.

PubMed

Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger

2010-11-01

The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
A voice-input voice-output communication aid for people with severe speech impairment.

PubMed

Hawley, Mark S; Cunningham, Stuart P; Green, Phil D; Enderby, Pam; Palmer, Rebecca; Sehgal, Siddharth; O'Neill, Peter

2013-01-01

A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user and builds messages, which are converted into synthetic speech. System development was carried out employing user-centered design and development methods, which identified and refined key requirements for the device. A novel methodology for building small vocabulary, speaker-dependent automatic speech recognizers with reduced amounts of training data, was applied. Experiments showed that this method is successful in generating good recognition performance (mean accuracy 96%) on highly disordered speech, even when recognition perplexity is increased. The selected message-building technique traded off various factors including speed of message construction and range of available message outputs. The VIVOCA was evaluated in a field trial by individuals with moderate to severe dysarthria and confirmed that they can make use of the device to produce intelligible speech output from disordered speech input. The trial highlighted some issues which limit the performance and usability of the device when applied in real usage situations, with mean recognition accuracy of 67% in these circumstances. These limitations will be addressed in future work.
The effect of sensorineural hearing loss and tinnitus on speech recognition over air and bone conduction military communications headsets.

PubMed

Manning, Candice; Mermagen, Timothy; Scharine, Angelique

2017-06-01

Military personnel are at risk for hearing loss due to noise exposure during deployment (USACHPPM, 2008). Despite mandated use of hearing protection, hearing loss and tinnitus are prevalent due to reluctance to use hearing protection. Bone conduction headsets can offer good speech intelligibility for normal hearing (NH) listeners while allowing the ears to remain open in quiet environments and the use of hearing protection when needed. Those who suffer from tinnitus, the experience of perceiving a sound not produced by an external source, often show degraded speech recognition; however, it is unclear whether this is a result of decreased hearing sensitivity or increased distractibility (Moon et al., 2015). It has been suggested that the vibratory stimulation of a bone conduction headset might ameliorate the effects of tinnitus on speech perception; however, there is currently no research to support or refute this claim (Hoare et al., 2014). Speech recognition of words presented over air conduction and bone conduction headsets was measured for three groups of listeners: NH, sensorineural hearing impaired, and/or tinnitus sufferers. Three levels of speech-to-noise (SNR = 0, -6, -12 dB) were created by embedding speech items in pink noise. Better speech recognition performance was observed with the bone conduction headset regardless of hearing profile, and speech intelligibility was a function of SNR. Discussion will include study limitations and the implications of these findings for those serving in the military. Published by Elsevier B.V.
Speaker normalization for chinese vowel recognition in cochlear implants.

PubMed

Luo, Xin; Fu, Qian-Jie

2005-07-01

Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.
Speech Perception, Word Recognition and the Structure of the Lexicon. Research on Speech Perception Progress Report No. 10.

ERIC Educational Resources Information Center

Pisoni, David B.; And Others

The results of three projects concerned with auditory word recognition and the structure of the lexicon are reported in this paper. The first project described was designed to test experimentally several specific predictions derived from MACS, a simulation model of the Cohort Theory of word recognition. The second project description provides the…
The Development of the Orthographic Consistency Effect in Speech Recognition: From Sublexical to Lexical Involvement

ERIC Educational Resources Information Center

Ventura, Paulo; Morais, Jose; Kolinsky, Regine

2007-01-01

The influence of orthography on children's on-line auditory word recognition was studied from the end of Grade 2 to the end of Grade 4, by examining the orthographic consistency effect [Ziegler, J. C., & Ferrand, L. (1998). Orthography shapes the perception of speech: The consistency effect in auditory recognition. "Psychonomic Bulletin & Review",…
Do What I Say! Voice Recognition Makes Major Advances.

ERIC Educational Resources Information Center

Ruley, C. Dorsey

1994-01-01

Explains voice recognition technology applications in the workplace, schools, and libraries. Highlights include a voice-controlled work station using the DragonDictate system that can be used with dyslexic students, converting text to speech, and converting speech to text. (LRW)
Emotional recognition from the speech signal for a virtual education agent

NASA Astrophysics Data System (ADS)

Tickle, A.; Raghu, S.; Elshaw, M.

2013-06-01

This paper explores the extraction of features from the speech wave to perform intelligent emotion recognition. A feature extract tool (openSmile) was used to obtain a baseline set of 998 acoustic features from a set of emotional speech recordings from a microphone. The initial features were reduced to the most important ones so recognition of emotions using a supervised neural network could be performed. Given that the future use of virtual education agents lies with making the agents more interactive, developing agents with the capability to recognise and adapt to the emotional state of humans is an important step.
Robotics control using isolated word recognition of voice input

NASA Technical Reports Server (NTRS)

Weiner, J. M.

1977-01-01

A speech input/output system is presented that can be used to communicate with a task oriented system. Human speech commands and synthesized voice output extend conventional information exchange capabilities between man and machine by utilizing audio input and output channels. The speech input facility is comprised of a hardware feature extractor and a microprocessor implemented isolated word or phrase recognition system. The recognizer offers a medium sized (100 commands), syntactically constrained vocabulary, and exhibits close to real time performance. The major portion of the recognition processing required is accomplished through software, minimizing the complexity of the hardware feature extractor.
Measuring the effects of spectral smearing and enhancement on speech recognition in noise for adults and children

PubMed Central

Nittrouer, Susan; Tarr, Eric; Wucinich, Taylor; Moberly, Aaron C.; Lowenstein, Joanna H.

2015-01-01

Broadened auditory filters associated with sensorineural hearing loss have clearly been shown to diminish speech recognition in noise for adults, but far less is known about potential effects for children. This study examined speech recognition in noise for adults and children using simulated auditory filters of different widths. Specifically, 5 groups (20 listeners each) of adults or children (5 and 7 yrs), were asked to recognize sentences in speech-shaped noise. Seven-year-olds listened at 0 dB signal-to-noise ratio (SNR) only; 5-yr-olds listened at +3 or 0 dB SNR; and adults listened at 0 or −3 dB SNR. Sentence materials were processed both to smear the speech spectrum (i.e., simulate broadened filters), and to enhance the spectrum (i.e., simulate narrowed filters). Results showed: (1) Spectral smearing diminished recognition for listeners of all ages; (2) spectral enhancement did not improve recognition, and in fact diminished it somewhat; and (3) interactions were observed between smearing and SNR, but only for adults. That interaction made age effects difficult to gauge. Nonetheless, it was concluded that efforts to diagnose the extent of broadening of auditory filters and to develop techniques to correct this condition could benefit patients with hearing loss, especially children. PMID:25920851
Histogram equalization with Bayesian estimation for noise robust speech recognition.

PubMed

Suh, Youngjoo; Kim, Hoirin

2018-02-01

The histogram equalization approach is an efficient feature normalization technique for noise robust automatic speech recognition. However, it suffers from performance degradation when some fundamental conditions are not satisfied in the test environment. To remedy these limitations of the original histogram equalization methods, class-based histogram equalization approach has been proposed. Although this approach showed substantial performance improvement under noise environments, it still suffers from performance degradation due to the overfitting problem when test data are insufficient. To address this issue, the proposed histogram equalization technique employs the Bayesian estimation method in the test cumulative distribution function estimation. It was reported in a previous study conducted on the Aurora-4 task that the proposed approach provided substantial performance gains in speech recognition systems based on the acoustic modeling of the Gaussian mixture model-hidden Markov model. In this work, the proposed approach was examined in speech recognition systems with deep neural network-hidden Markov model (DNN-HMM), the current mainstream speech recognition approach where it also showed meaningful performance improvement over the conventional maximum likelihood estimation-based method. The fusion of the proposed features with the mel-frequency cepstral coefficients provided additional performance gains in DNN-HMM systems, which otherwise suffer from performance degradation in the clean test condition.

A longitudinal study of the bilateral benefit in children with bilateral cochlear implants.

PubMed

Asp, Filip; Mäki-Torkko, Elina; Karltorp, Eva; Harder, Henrik; Hergils, Leif; Eskilsson, Gunnar; Stenfelt, Stefan

2015-02-01

To study the development of the bilateral benefit in children using bilateral cochlear implants by measurements of speech recognition and sound localization. Bilateral and unilateral speech recognition in quiet, in multi-source noise, and horizontal sound localization was measured at three occasions during a two-year period, without controlling for age or implant experience. Longitudinal and cross-sectional analyses were performed. Results were compared to cross-sectional data from children with normal hearing. Seventy-eight children aged 5.1-11.9 years, with a mean bilateral cochlear implant experience of 3.3 years and a mean age of 7.8 years, at inclusion in the study. Thirty children with normal hearing aged 4.8-9.0 years provided normative data. For children with cochlear implants, bilateral and unilateral speech recognition in quiet was comparable whereas a bilateral benefit for speech recognition in noise and sound localization was found at all three test occasions. Absolute performance was lower than in children with normal hearing. Early bilateral implantation facilitated sound localization. A bilateral benefit for speech recognition in noise and sound localization continues to exist over time for children with bilateral cochlear implants, but no relative improvement is found after three years of bilateral cochlear implant experience.
Phonological mismatch makes aided speech recognition in noise cognitively taxing.

PubMed

Rudner, Mary; Foo, Catharina; Rönnberg, Jerker; Lunner, Thomas

2007-12-01

The working memory framework for Ease of Language Understanding predicts that speech processing becomes more effortful, thus requiring more explicit cognitive resources, when there is mismatch between speech input and phonological representations in long-term memory. To test this prediction, we changed the compression release settings in the hearing instruments of experienced users and allowed them to train for 9 weeks with the new settings. After training, aided speech recognition in noise was tested with both the trained settings and orthogonal settings. We postulated that training would lead to acclimatization to the trained setting, which in turn would involve establishment of new phonological representations in long-term memory. Further, we postulated that after training, testing with orthogonal settings would give rise to phonological mismatch, associated with more explicit cognitive processing. Thirty-two participants (mean=70.3 years, SD=7.7) with bilateral sensorineural hearing loss (pure-tone average=46.0 dB HL, SD=6.5), bilaterally fitted for more than 1 year with digital, two-channel, nonlinear signal processing hearing instruments and chosen from the patient population at the Linköping University Hospital were randomly assigned to 9 weeks training with new, fast (40 ms) or slow (640 ms), compression release settings in both channels. Aided speech recognition in noise performance was tested according to a design with three within-group factors: test occasion (T1, T2), test setting (fast, slow), and type of noise (unmodulated, modulated) and one between-group factor: experience setting (fast, slow) for two types of speech materials-the highly constrained Hagerman sentences and the less-predictable Hearing in Noise Test (HINT). Complex cognitive capacity was measured using the reading span and letter monitoring tests. PREDICTION: We predicted that speech recognition in noise at T2 with mismatched experience and test settings would be associated with more explicit cognitive processing and thus stronger correlations with complex cognitive measures, as well as poorer performance if complex cognitive capacity was exceeded. Under mismatch conditions, stronger correlations were found between performance on speech recognition with the Hagerman sentences and reading span, along with poorer speech recognition for participants with low reading span scores. No consistent mismatch effect was found with HINT. The mismatch prediction generated by the working memory framework for Ease of Language Understanding is supported for speech recognition in noise with the highly constrained Hagerman sentences but not the less-predictable HINT.
Effectiveness of the Directional Microphone in the Baha® Divino™

PubMed Central

Oeding, Kristi; Valente, Michael; Kerckhoff, Jessica

2010-01-01

Background Patients with unilateral sensorineural hearing loss (USNHL) experience great difficulty listening to speech in noisy environments. A directional microphone (DM) could potentially improve speech recognition in this difficult listening environment. It is well known that DMs in behind-the-ear (BTE) and custom hearing aids can provide a greater signal-to-noise ratio (SNR) in comparison to an omnidirectional microphone (OM) to improve speech recognition in noise for persons with hearing impairment. Studies examining the DM in bone anchored auditory osseointegrated implants (Baha), however, have been mixed, with little to no benefit reported for the DM compared to an OM. Purpose The primary purpose of this study was to determine if there are statistically significant differences in the mean reception threshold for sentences (RTS in dB) in noise between the OM and DM in the Baha® Divino™. The RTS of these two microphone modes was measured utilizing two loudspeaker arrays (speech from 0° and noise from 180° or a diffuse eight-loudspeaker array) and with the better ear open or closed with an earmold impression and noise attenuating earmuff. Subjective benefit was assessed using the Abbreviated Profile of Hearing Aid Benefit (APHAB) to compare unaided and aided (Divino OM and DM combined) problem scores. Research Design A repeated measures design was utilized, with each subject counterbalanced to each of the eight treatment levels for three independent variables: (1) microphone (OM and DM), (2) loudspeaker array (180° and diffuse), and (3) better ear (open and closed). Study Sample Sixteen subjects with USNHL currently utilizing the Baha were recruited from Washington University’s Center for Advanced Medicine and the surrounding area. Data Collection and Analysis Subjects were tested at the initial visit if they entered the study wearing the Divino or after at least four weeks of acclimatization to a loaner Divino. The RTS was determined utilizing Hearing in Noise Test (HINT) sentences in the R-Space™ system, and subjective benefit was determined utilizing the APHAB. A three-way repeated measures analysis of variance (ANOVA) and a paired samples t-test were utilized to analyze results of the HINT and APHAB, respectively. Results Results revealed statistically significant differences within microphone (p < 0.001; directional advantage of 3.2 dB), loudspeaker array (p = 0.046; 180° advantage of 1.1 dB), and better ear conditions (p < 0.001; open ear advantage of 4.9 dB). Results from the APHAB revealed statistically and clinically significant benefit for the Divino relative to unaided on the subscales of Ease of Communication (EC) (p = 0.037), Background Noise (BN) (p < 0.001), and Reverberation (RV) (p = 0.005). Conclusions The Divino’s DM provides a statistically significant improvement in speech recognition in noise compared to the OM for subjects with USNHL. Therefore, it is recommended that audiologists consider selecting a Baha with a DM to provide improved speech recognition performance in noisy listening environments. PMID:21034701
Effectiveness of the directional microphone in the Baha® Divino™.

PubMed

Oeding, Kristi; Valente, Michael; Kerckhoff, Jessica

2010-09-01

Patients with unilateral sensorineural hearing loss (USNHL) experience great difficulty listening to speech in noisy environments. A directional microphone (DM) could potentially improve speech recognition in this difficult listening environment. It is well known that DMs in behind-the-ear (BTE) and custom hearing aids can provide a greater signal-to-noise ratio (SNR) in comparison to an omnidirectional microphone (OM) to improve speech recognition in noise for persons with hearing impairment. Studies examining the DM in bone anchored auditory osseointegrated implants (Baha), however, have been mixed, with little to no benefit reported for the DM compared to an OM. The primary purpose of this study was to determine if there are statistically significant differences in the mean reception threshold for sentences (RTS in dB) in noise between the OM and DM in the Baha® Divino™. The RTS of these two microphone modes was measured utilizing two loudspeaker arrays (speech from 0° and noise from 180° or a diffuse eight-loudspeaker array) and with the better ear open or closed with an earmold impression and noise attenuating earmuff. Subjective benefit was assessed using the Abbreviated Profile of Hearing Aid Benefit (APHAB) to compare unaided and aided (Divino OM and DM combined) problem scores. A repeated measures design was utilized, with each subject counterbalanced to each of the eight treatment levels for three independent variables: (1) microphone (OM and DM), (2) loudspeaker array (180° and diffuse), and (3) better ear (open and closed). Sixteen subjects with USNHL currently utilizing the Baha were recruited from Washington University's Center for Advanced Medicine and the surrounding area. Subjects were tested at the initial visit if they entered the study wearing the Divino or after at least four weeks of acclimatization to a loaner Divino. The RTS was determined utilizing Hearing in Noise Test (HINT) sentences in the R-Space™ system, and subjective benefit was determined utilizing the APHAB. A three-way repeated measures analysis of variance (ANOVA) and a paired samples t-test were utilized to analyze results of the HINT and APHAB, respectively. Results revealed statistically significant differences within microphone (p < 0.001; directional advantage of 3.2 dB), loudspeaker array (p = 0.046; 180° advantage of 1.1 dB), and better ear conditions (p < 0.001; open ear advantage of 4.9 dB). Results from the APHAB revealed statistically and clinically significant benefit for the Divino relative to unaided on the subscales of Ease of Communication (EC) (p = 0.037), Background Noise (BN) (p < 0.001), and Reverberation (RV) (p = 0.005). The Divino's DM provides a statistically significant improvement in speech recognition in noise compared to the OM for subjects with USNHL. Therefore, it is recommended that audiologists consider selecting a Baha with a DM to provide improved speech recognition performance in noisy listening environments. American Academy of Audiology.
Improving language models for radiology speech recognition.

PubMed

Paulett, John M; Langlotz, Curtis P

2009-02-01

Speech recognition systems have become increasingly popular as a means to produce radiology reports, for reasons both of efficiency and of cost. However, the suboptimal recognition accuracy of these systems can affect the productivity of the radiologists creating the text reports. We analyzed a database of over two million de-identified radiology reports to determine the strongest determinants of word frequency. Our results showed that body site and imaging modality had a similar influence on the frequency of words and of three-word phrases as did the identity of the speaker. These findings suggest that the accuracy of speech recognition systems could be significantly enhanced by further tailoring their language models to body site and imaging modality, which are readily available at the time of report creation.
Intelligibility of emotional speech in younger and older adults.

PubMed

Dupuis, Kate; Pichora-Fuller, M Kathleen

2014-01-01

Little is known about the influence of vocal emotions on speech understanding. Word recognition accuracy for stimuli spoken to portray seven emotions (anger, disgust, fear, sadness, neutral, happiness, and pleasant surprise) was tested in younger and older listeners. Emotions were presented in either mixed (heterogeneous emotions mixed in a list) or blocked (homogeneous emotion blocked in a list) conditions. Three main hypotheses were tested. First, vocal emotion affects word recognition accuracy; specifically, portrayals of fear enhance word recognition accuracy because listeners orient to threatening information and/or distinctive acoustical cues such as high pitch mean and variation. Second, older listeners recognize words less accurately than younger listeners, but the effects of different emotions on intelligibility are similar across age groups. Third, blocking emotions in list results in better word recognition accuracy, especially for older listeners, and reduces the effect of emotion on intelligibility because as listeners develop expectations about vocal emotion, the allocation of processing resources can shift from emotional to lexical processing. Emotion was the within-subjects variable: all participants heard speech stimuli consisting of a carrier phrase followed by a target word spoken by either a younger or an older talker, with an equal number of stimuli portraying each of seven vocal emotions. The speech was presented in multi-talker babble at signal to noise ratios adjusted for each talker and each listener age group. Listener age (younger, older), condition (mixed, blocked), and talker (younger, older) were the main between-subjects variables. Fifty-six students (Mage= 18.3 years) were recruited from an undergraduate psychology course; 56 older adults (Mage= 72.3 years) were recruited from a volunteer pool. All participants had clinically normal pure-tone audiometric thresholds at frequencies ≤3000 Hz. There were significant main effects of emotion, listener age group, and condition on the accuracy of word recognition in noise. Stimuli spoken in a fearful voice were the most intelligible, while those spoken in a sad voice were the least intelligible. Overall, word recognition accuracy was poorer for older than younger adults, but there was no main effect of talker, and the pattern of the effects of different emotions on intelligibility did not differ significantly across age groups. Acoustical analyses helped elucidate the effect of emotion and some intertalker differences. Finally, all participants performed better when emotions were blocked. For both groups, performance improved over repeated presentations of each emotion in both blocked and mixed conditions. These results are the first to demonstrate a relationship between vocal emotion and word recognition accuracy in noise for younger and older listeners. In particular, the enhancement of intelligibility by emotion is greatest for words spoken to portray fear and presented heterogeneously with other emotions. Fear may have a specialized role in orienting attention to words heard in noise. This finding may be an auditory counterpart to the enhanced detection of threat information in visual displays. The effect of vocal emotion on word recognition accuracy is preserved in older listeners with good audiograms and both age groups benefit from blocking and the repetition of emotions.
Connected word recognition using a cascaded neuro-computational model

NASA Astrophysics Data System (ADS)

Hoya, Tetsuya; van Leeuwen, Cees

2016-10-01

We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.
Measuring listening effort: driving simulator vs. simple dual-task paradigm

PubMed Central

Wu, Yu-Hsiang; Aksan, Nazan; Rizzo, Matthew; Stangl, Elizabeth; Zhang, Xuyang; Bentler, Ruth

2014-01-01

Objectives The dual-task paradigm has been widely used to measure listening effort. The primary objectives of the study were to (1) investigate the effect of hearing aid amplification and a hearing aid directional technology on listening effort measured by a complicated, more real world dual-task paradigm, and (2) compare the results obtained with this paradigm to a simpler laboratory-style dual-task paradigm. Design The listening effort of adults with hearing impairment was measured using two dual-task paradigms, wherein participants performed a speech recognition task simultaneously with either a driving task in a simulator or a visual reaction-time task in a sound-treated booth. The speech materials and road noises for the speech recognition task were recorded in a van traveling on the highway in three hearing aid conditions: unaided, aided with omni directional processing (OMNI), and aided with directional processing (DIR). The change in the driving task or the visual reaction-time task performance across the conditions quantified the change in listening effort. Results Compared to the driving-only condition, driving performance declined significantly with the addition of the speech recognition task. Although the speech recognition score was higher in the OMNI and DIR conditions than in the unaided condition, driving performance was similar across these three conditions, suggesting that listening effort was not affected by amplification and directional processing. Results from the simple dual-task paradigm showed a similar trend: hearing aid technologies improved speech recognition performance, but did not affect performance in the visual reaction-time task (i.e., reduce listening effort). The correlation between listening effort measured using the driving paradigm and the visual reaction-time task paradigm was significant. The finding showing that our older (56 to 85 years old) participants’ better speech recognition performance did not result in reduced listening effort was not consistent with literature that evaluated younger (approximately 20 years old), normal hearing adults. Because of this, a follow-up study was conducted. In the follow-up study, the visual reaction-time dual-task experiment using the same speech materials and road noises was repeated on younger adults with normal hearing. Contrary to findings with older participants, the results indicated that the directional technology significantly improved performance in both speech recognition and visual reaction-time tasks. Conclusions Adding a speech listening task to driving undermined driving performance. Hearing aid technologies significantly improved speech recognition while driving, but did not significantly reduce listening effort. Listening effort measured by dual-task experiments using a simulated real-world driving task and a conventional laboratory-style task was generally consistent. For a given listening environment, the benefit of hearing aid technologies on listening effort measured from younger adults with normal hearing may not be fully translated to older listeners with hearing impairment. PMID:25083599
Speech recognition: Acoustic-phonetic knowledge acquisition and representation

NASA Astrophysics Data System (ADS)

Zue, Victor W.

1988-09-01

The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.
Science 101: How Does Speech-Recognition Software Work?

ERIC Educational Resources Information Center

Robertson, Bill

2016-01-01

This column provides background science information for elementary teachers. Many innovations with computer software begin with analysis of how humans do a task. This article takes a look at how humans recognize spoken words and explains the origins of speech-recognition software.
Watch what you say, your computer might be listening: A review of automated speech recognition

NASA Technical Reports Server (NTRS)

Degennaro, Stephen V.

1991-01-01

Spoken language is the most convenient and natural means by which people interact with each other and is, therefore, a promising candidate for human-machine interactions. Speech also offers an additional channel for hands-busy applications, complementing the use of motor output channels for control. Current speech recognition systems vary considerably across a number of important characteristics, including vocabulary size, speaking mode, training requirements for new speakers, robustness to acoustic environments, and accuracy. Algorithmically, these systems range from rule-based techniques through more probabilistic or self-learning approaches such as hidden Markov modeling and neural networks. This tutorial begins with a brief summary of the relevant features of current speech recognition systems and the strengths and weaknesses of the various algorithmic approaches.
Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children With Normal Hearing: A Replication and Extension of ).

PubMed

Roman, Adrienne S; Pisoni, David B; Kronenberger, William G; Faulkner, Kathleen F

Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by ) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention (AA) and response set, talker discrimination, and verbal and nonverbal short-term working memory. Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (Peabody Picture Vocabulary test-4th Edition and Expressive Vocabulary test-2nd Edition) and measures of AA (NEPSY AA and response set and a talker discrimination task) and short-term memory (visual digit and symbol spans). Consistent with the findings reported in the original ) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the Peabody Picture Vocabulary test-4th Edition using language quotients to control for age effects. However, children who scored higher on the Expressive Vocabulary test-2nd Edition recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of AA and short-term memory capacity were significantly correlated with a child's ability to perceive noise-vocoded isolated words and sentences. First, we successfully replicated the major findings from the ) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children's ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally degraded speech reflects early peripheral auditory processes, as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that AA and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, because they are routinely required to encode, process, and understand spectrally degraded acoustic signals.
Some Neurocognitive Correlates of Noise-Vocoded Speech Perception in Children with Normal Hearing: A Replication and Extension of Eisenberg et al., 2002

PubMed Central

Roman, Adrienne S.; Pisoni, David B.; Kronenberger, William G.; Faulkner, Kathleen F.

2016-01-01

Objectives Noise-vocoded speech is a valuable research tool for testing experimental hypotheses about the effects of spectral-degradation on speech recognition in adults with normal hearing (NH). However, very little research has utilized noise-vocoded speech with children with NH. Earlier studies with children with NH focused primarily on the amount of spectral information needed for speech recognition without assessing the contribution of neurocognitive processes to speech perception and spoken word recognition. In this study, we first replicated the seminal findings reported by Eisenberg et al. (2002) who investigated effects of lexical density and word frequency on noise-vocoded speech perception in a small group of children with NH. We then extended the research to investigate relations between noise-vocoded speech recognition abilities and five neurocognitive measures: auditory attention and response set, talker discrimination and verbal and nonverbal short-term working memory. Design Thirty-one children with NH between 5 and 13 years of age were assessed on their ability to perceive lexically controlled words in isolation and in sentences that were noise-vocoded to four spectral channels. Children were also administered vocabulary assessments (PPVT-4 and EVT-2) and measures of auditory attention (NEPSY Auditory Attention (AA) and Response Set (RS) and a talker discrimination task (TD)) and short-term memory (visual digit and symbol spans). Results Consistent with the findings reported in the original Eisenberg et al. (2002) study, we found that children perceived noise-vocoded lexically easy words better than lexically hard words. Words in sentences were also recognized better than the same words presented in isolation. No significant correlations were observed between noise-vocoded speech recognition scores and the PPVT-4 using language quotients to control for age effects. However, children who scored higher on the EVT-2 recognized lexically easy words better than lexically hard words in sentences. Older children perceived noise-vocoded speech better than younger children. Finally, we found that measures of auditory attention and short-term memory capacity were significantly correlated with a child’s ability to perceive noise-vocoded isolated words and sentences. Conclusions First, we successfully replicated the major findings from the Eisenberg et al. (2002) study. Because familiarity, phonological distinctiveness and lexical competition affect word recognition, these findings provide additional support for the proposal that several foundational elementary neurocognitive processes underlie the perception of spectrally-degraded speech. Second, we found strong and significant correlations between performance on neurocognitive measures and children’s ability to recognize words and sentences noise-vocoded to four spectral channels. These findings extend earlier research suggesting that perception of spectrally-degraded speech reflects early peripheral auditory processes as well as additional contributions of executive function, specifically, selective attention and short-term memory processes in spoken word recognition. The present findings suggest that auditory attention and short-term memory support robust spoken word recognition in children with NH even under compromised and challenging listening conditions. These results are relevant to research carried out with listeners who have hearing loss, since they are routinely required to encode, process and understand spectrally-degraded acoustic signals. PMID:28045787
Perception of Sung Speech in Bimodal Cochlear Implant Users.

PubMed

Crew, Joseph D; Galvin, John J; Fu, Qian-Jie

2016-11-11

Combined use of a hearing aid (HA) and cochlear implant (CI) has been shown to improve CI users' speech and music performance. However, different hearing devices, test stimuli, and listening tasks may interact and obscure bimodal benefits. In this study, speech and music perception were measured in bimodal listeners for CI-only, HA-only, and CI + HA conditions, using the Sung Speech Corpus, a database of monosyllabic words produced at different fundamental frequencies. Sentence recognition was measured using sung speech in which pitch was held constant or varied across words, as well as for spoken speech. Melodic contour identification (MCI) was measured using sung speech in which the words were held constant or varied across notes. Results showed that sentence recognition was poorer with sung speech relative to spoken, with little difference between sung speech with a constant or variable pitch; mean performance was better with CI-only relative to HA-only, and best with CI + HA. MCI performance was better with constant words versus variable words; mean performance was better with HA-only than with CI-only and was best with CI + HA. Relative to CI-only, a strong bimodal benefit was observed for speech and music perception. Relative to the better ear, bimodal benefits remained strong for sentence recognition but were marginal for MCI. While variations in pitch and timbre may negatively affect CI users' speech and music perception, bimodal listening may partially compensate for these deficits. © The Author(s) 2016.
Sperry Univac speech communications technology

NASA Technical Reports Server (NTRS)

Medress, Mark F.

1977-01-01

Technology and systems for effective verbal communication with computers were developed. A continuous speech recognition system for verbal input, a word spotting system to locate key words in conversational speech, prosodic tools to aid speech analysis, and a prerecorded voice response system for speech output are described.
[Research on Control System of an Exoskeleton Upper-limb Rehabilitation Robot].

PubMed

Wang, Lulu; Hu, Xin; Hu, Jie; Fang, Youfang; He, Rongrong; Yu, Hongliu

2016-12-01

In order to help the patients with upper-limb disfunction go on rehabilitation training,this paper proposed an upper-limb exoskeleton rehabilitation robot with four degrees of freedom(DOF),and realized two control schemes,i.e.,voice control and electromyography control.The hardware and software design of the voice control system was completed based on RSC-4128 chips,which realized the speech recognition technology of a specific person.Besides,this study adapted self-made surface eletromyogram(sEMG)signal extraction electrodes to collect sEMG signals and realized pattern recognition by conducting sEMG signals processing,extracting time domain features and fixed threshold algorithm.In addition,the pulse-width modulation(PWM)algorithm was used to realize the speed adjustment of the system.Voice control and electromyography control experiments were then carried out,and the results showed that the mean recognition rate of the voice control and electromyography control reached 93.1%and 90.9%,respectively.The results proved the feasibility of the control system.This study is expected to lay a theoretical foundation for the further improvement of the control system of the upper-limb rehabilitation robot.
Assistive Technology and Adults with Learning Disabilities: A Blueprint for Exploration and Advancement.

ERIC Educational Resources Information Center

Raskind, Marshall

1993-01-01

This article describes assistive technologies for persons with learning disabilities, including word processing, spell checking, proofreading programs, outlining/"brainstorming" programs, abbreviation expanders, speech recognition, speech synthesis/screen review, optical character recognition systems, personal data managers, free-form databases,…
Speech Recognition for A Digital Video Library.

ERIC Educational Resources Information Center

Witbrock, Michael J.; Hauptmann, Alexander G.

1998-01-01

Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…
Role of the middle ear muscle apparatus in mechanisms of speech signal discrimination

NASA Technical Reports Server (NTRS)

Moroz, B. S.; Bazarov, V. G.; Sachenko, S. V.

1980-01-01

A method of impedance reflexometry was used to examine 101 students with hearing impairment in order to clarify the interrelation between speech discrimination and the state of the middle ear muscles. Ability to discriminate speech signals depends to some extent on the functional state of intraaural muscles. Speech discrimination was greatly impaired in the absence of stapedial muscle acoustic reflex, in the presence of low thresholds of stimulation and in very small values of reflex amplitude increase. Discrimination was not impeded in positive AR, high values of relative thresholds and normal increase of reflex amplitude in response to speech signals with augmenting intensity.
Listeners Experience Linguistic Masking Release in Noise-Vocoded Speech-in-Speech Recognition

ERIC Educational Resources Information Center

Viswanathan, Navin; Kokkinakis, Kostas; Williams, Brittany T.

2018-01-01

Purpose: The purpose of this study was to evaluate whether listeners with normal hearing perceiving noise-vocoded speech-in-speech demonstrate better intelligibility of target speech when the background speech was mismatched in language (linguistic release from masking [LRM]) and/or location (spatial release from masking [SRM]) relative to the…

Detection of target phonemes in spontaneous and read speech.

PubMed

Mehta, G; Cutler, A

1988-01-01

Although spontaneous speech occurs more frequently in most listeners' experience than read speech, laboratory studies of human speech recognition typically use carefully controlled materials read from a script. The phonological and prosodic characteristics of spontaneous and read speech differ considerably, however, which suggests that laboratory results may not generalise to the recognition of spontaneous speech. In the present study listeners were presented with both spontaneous and read speech materials, and their response time to detect word-initial target phonemes was measured. Responses were, overall, equally fast in each speech mode. However, analysis of effects previously reported in phoneme detection studies revealed significant differences between speech modes. In read speech but not in spontaneous speech, later targets were detected more rapidly than targets preceded by short words. In contrast, in spontaneous speech but not in read speech, targets were detected more rapidly in accented than in unaccented words and in strong than in weak syllables. An explanation for this pattern is offered in terms of characteristic prosodic differences between spontaneous and read speech. The results support claims from previous work that listeners pay great attention to prosodic information in the process of recognising speech.
Is talking to an automated teller machine natural and fun?

PubMed

Chan, F Y; Khalid, H M

Usability and affective issues of using automatic speech recognition technology to interact with an automated teller machine (ATM) are investigated in two experiments. The first uncovered dialogue patterns of ATM users for the purpose of designing the user interface for a simulated speech ATM system. Applying the Wizard-of-Oz methodology, multiple mapping and word spotting techniques, the speech driven ATM accommodates bilingual users of Bahasa Melayu and English. The second experiment evaluates the usability of a hybrid speech ATM, comparing it with a simulated manual ATM. The aim is to investigate how natural and fun can talking to a speech ATM be for these first-time users. Subjects performed the withdrawal and balance enquiry tasks. The ANOVA was performed on the usability and affective data. The results showed significant differences between systems in the ability to complete the tasks as well as in transaction errors. Performance was measured on the time taken by subjects to complete the task and the number of speech recognition errors that occurred. On the basis of user emotions, it can be said that the hybrid speech system enabled pleasurable interaction. Despite the limitations of speech recognition technology, users are set to talk to the ATM when it becomes available for public use.
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 JOHNS HOPKINS SUMMER WORKSHOP.

PubMed

Hasegawa-Johnson, Mark; Baker, James; Borys, Sarah; Chen, Ken; Coogan, Emily; Greenberg, Steven; Juneja, Amit; Kirchhoff, Katrin; Livescu, Karen; Mohan, Srividya; Muller, Jennifer; Sonmez, Kemal; Wang, Tianyu

2005-01-01

Three research prototype speech recognition systems are described, all of which use recently developed methods from artificial intelligence (specifically support vector machines, dynamic Bayesian networks, and maximum entropy classification) in order to implement, in the form of an automatic speech recognizer, current theories of human speech perception and phonology (specifically landmark-based speech perception, nonlinear phonology, and articulatory phonology). All three systems begin with a high-dimensional multiframe acoustic-to-distinctive feature transformation, implemented using support vector machines trained to detect and classify acoustic phonetic landmarks. Distinctive feature probabilities estimated by the support vector machines are then integrated using one of three pronunciation models: a dynamic programming algorithm that assumes canonical pronunciation of each word, a dynamic Bayesian network implementation of articulatory phonology, or a discriminative pronunciation model trained using the methods of maximum entropy classification. Log probability scores computed by these models are then combined, using log-linear combination, with other word scores available in the lattice output of a first-pass recognizer, and the resulting combination score is used to compute a second-pass speech recognition output.
An Analysis of Individual Differences in Recognizing Monosyllabic Words Under the Speech Intelligibility Index Framework

PubMed Central

Shen, Yi; Kern, Allison B.

2018-01-01

Individual differences in the recognition of monosyllabic words, either in isolation (NU6 test) or in sentence context (SPIN test), were investigated under the theoretical framework of the speech intelligibility index (SII). An adaptive psychophysical procedure, namely the quick-band-importance-function procedure, was developed to enable the fitting of the SII model to individual listeners. Using this procedure, the band importance function (i.e., the relative weights of speech information across the spectrum) and the link function relating the SII to recognition scores can be simultaneously estimated while requiring only 200 to 300 trials of testing. Octave-frequency band importance functions and link functions were estimated separately for NU6 and SPIN materials from 30 normal-hearing listeners who were naïve to speech recognition experiments. For each type of speech material, considerable individual differences in the spectral weights were observed in some but not all frequency regions. At frequencies where the greatest intersubject variability was found, the spectral weights were correlated between the two speech materials, suggesting that the variability in spectral weights reflected listener-originated factors. PMID:29532711
Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals

PubMed Central

Muthusamy, Hariharan; Polat, Kemal; Yaacob, Sazali

2015-01-01

In the recent years, many research works have been published using speech related features for speech emotion recognition, however, recent studies show that there is a strong correlation between emotional states and glottal features. In this work, Mel-frequency cepstralcoefficients (MFCCs), linear predictive cepstral coefficients (LPCCs), perceptual linear predictive (PLP) features, gammatone filter outputs, timbral texture features, stationary wavelet transform based timbral texture features and relative wavelet packet energy and entropy features were extracted from the emotional speech (ES) signals and its glottal waveforms(GW). Particle swarm optimization based clustering (PSOC) and wrapper based particle swarm optimization (WPSO) were proposed to enhance the discerning ability of the features and to select the discriminating features respectively. Three different emotional speech databases were utilized to gauge the proposed method. Extreme learning machine (ELM) was employed to classify the different types of emotions. Different experiments were conducted and the results show that the proposed method significantly improves the speech emotion recognition performance compared to previous works published in the literature. PMID:25799141
Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model.

PubMed

Polur, Prasad D; Miller, Gerald E

2005-01-01

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients, requires a robust technique that can handle conditions of very high variability and limited training data. In this study, a hidden Markov model (HMM) was constructed and conditions investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system intended to act as an assistive/control tool. In particular, we investigated the effect of high-frequency spectral components on the recognition rate of the system to determine if they contributed useful additional information to the system. A small-size vocabulary spoken by three cerebral palsy subjects was chosen. Mel-frequency cepstral coefficients extracted with the use of 15 ms frames served as training input to an ergodic HMM setup. Subsequent results demonstrated that no significant useful information was available to the system for enhancing its ability to discriminate dysarthric speech above 5.5 kHz in the current set of dysarthric data. The level of variability in input dysarthric speech patterns limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor-impaired individuals such as cerebral palsy subjects holds sufficient promise.
Application of Business Process Management to drive the deployment of a speech recognition system in a healthcare organization.

PubMed

González Sánchez, María José; Framiñán Torres, José Manuel; Parra Calderón, Carlos Luis; Del Río Ortega, Juan Antonio; Vigil Martín, Eduardo; Nieto Cervera, Jaime

2008-01-01

We present a methodology based on Business Process Management to guide the development of a speech recognition system in a hospital in Spain. The methodology eases the deployment of the system by 1) involving the clinical staff in the process, 2) providing the IT professionals with a description of the process and its requirements, 3) assessing advantages and disadvantages of the speech recognition system, as well as its impact in the organisation, and 4) help reorganising the healthcare process before implementing the new technology in order to identify how it can better contribute to the overall objective of the organisation.
Speech perception for adult cochlear implant recipients in a realistic background noise: effectiveness of preprocessing strategies and external options for improving speech recognition in noise.

PubMed

Gifford, René H; Revit, Lawrence J

2010-01-01

Although cochlear implant patients are achieving increasingly higher levels of performance, speech perception in noise continues to be problematic. The newest generations of implant speech processors are equipped with preprocessing and/or external accessories that are purported to improve listening in noise. Most speech perception measures in the clinical setting, however, do not provide a close approximation to real-world listening environments. To assess speech perception for adult cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE) array in order to determine whether commercially available preprocessing strategies and/or external accessories yield improved sentence recognition in noise. Single-subject, repeated-measures design with two groups of participants: Advanced Bionics and Cochlear Corporation recipients. Thirty-four subjects, ranging in age from 18 to 90 yr (mean 54.5 yr), participated in this prospective study. Fourteen subjects were Advanced Bionics recipients, and 20 subjects were Cochlear Corporation recipients. Speech reception thresholds (SRTs) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the subjects' preferred listening programs as well as with the addition of either Beam preprocessing (Cochlear Corporation) or the T-Mic accessory option (Advanced Bionics). In Experiment 1, adaptive SRTs with the Hearing in Noise Test sentences were obtained for all 34 subjects. For Cochlear Corporation recipients, SRTs were obtained with their preferred everyday listening program as well as with the addition of Focus preprocessing. For Advanced Bionics recipients, SRTs were obtained with the integrated behind-the-ear (BTE) mic as well as with the T-Mic. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the preprocessing strategy or external accessory in reducing the SRT in noise. In addition, a standard t-test was run to evaluate effectiveness across manufacturer for improving the SRT in noise. In Experiment 2, 16 of the 20 Cochlear Corporation subjects were reassessed obtaining an SRT in noise using the manufacturer-suggested "Everyday," "Noise," and "Focus" preprocessing strategies. A repeated-measures ANOVA was employed to assess the effects of preprocessing. The primary findings were (i) both Noise and Focus preprocessing strategies (Cochlear Corporation) significantly improved the SRT in noise as compared to Everyday preprocessing, (ii) the T-Mic accessory option (Advanced Bionics) significantly improved the SRT as compared to the BTE mic, and (iii) Focus preprocessing and the T-Mic resulted in similar degrees of improvement that were not found to be significantly different from one another. Options available in current cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise with both Cochlear Corporation and Advanced Bionics systems. For Cochlear Corporation recipients, Focus preprocessing yields the best speech-recognition performance in a complex listening environment; however, it is recommended that Noise preprocessing be used as the new default for everyday listening environments to avoid the need for switching programs throughout the day. For Advanced Bionics recipients, the T-Mic offers significantly improved performance in noise and is recommended for everyday use in all listening environments. American Academy of Audiology.
Psychometric Functions for Shortened Administrations of a Speech Recognition Approach Using Tri-Word Presentations and Phonemic Scoring

ERIC Educational Resources Information Center

Gelfand, Stanley A.; Gelfand, Jessica T.

2012-01-01

Method: Complete psychometric functions for phoneme and word recognition scores at 8 signal-to-noise ratios from -15 dB to 20 dB were generated for the first 10, 20, and 25, as well as all 50, three-word presentations of the Tri-Word or Computer Assisted Speech Recognition Assessment (CASRA) Test (Gelfand, 1998) based on the results of 12…
Adaptive method of recognition of signals for one and two-frequency signal system in the telephony on the background of speech

NASA Astrophysics Data System (ADS)

Kuznetsov, Michael V.

2006-05-01

For reliable teamwork of various systems of automatic telecommunication including transferring systems of optical communication networks it is necessary authentic recognition of signals for one- or two-frequency service signal system. The analysis of time parameters of an accepted signal allows increasing reliability of detection and recognition of the service signal system on a background of speech.
Age-related Effects on Word Recognition: Reliance on Cognitive Control Systems with Structural Declines in Speech-responsive Cortex

PubMed Central

Walczak, Adam; Ahlstrom, Jayne; Denslow, Stewart; Horwitz, Amy; Dubno, Judy R.

2008-01-01

Speech recognition can be difficult and effortful for older adults, even for those with normal hearing. Declining frontal lobe cognitive control has been hypothesized to cause age-related speech recognition problems. This study examined age-related changes in frontal lobe function for 15 clinically normal hearing adults (21–75 years) when they performed a word recognition task that was made challenging by decreasing word intelligibility. Although there were no age-related changes in word recognition, there were age-related changes in the degree of activity within left middle frontal gyrus (MFG) and anterior cingulate (ACC) regions during word recognition. Older adults engaged left MFG and ACC regions when words were most intelligible compared to younger adults who engaged these regions when words were least intelligible. Declining gray matter volume within temporal lobe regions responsive to word intelligibility significantly predicted left MFG activity, even after controlling for total gray matter volume, suggesting that declining structural integrity of brain regions responsive to speech leads to the recruitment of frontal regions when words are easily understood. Electronic supplementary material The online version of this article (doi:10.1007/s10162-008-0113-3) contains supplementary material, which is available to authorized users. PMID:18274825
An audiovisual emotion recognition system

NASA Astrophysics Data System (ADS)

Han, Yi; Wang, Guoyin; Yang, Yong; He, Kun

2007-12-01

Human emotions could be expressed by many bio-symbols. Speech and facial expression are two of them. They are both regarded as emotional information which is playing an important role in human-computer interaction. Based on our previous studies on emotion recognition, an audiovisual emotion recognition system is developed and represented in this paper. The system is designed for real-time practice, and is guaranteed by some integrated modules. These modules include speech enhancement for eliminating noises, rapid face detection for locating face from background image, example based shape learning for facial feature alignment, and optical flow based tracking algorithm for facial feature tracking. It is known that irrelevant features and high dimensionality of the data can hurt the performance of classifier. Rough set-based feature selection is a good method for dimension reduction. So 13 speech features out of 37 ones and 10 facial features out of 33 ones are selected to represent emotional information, and 52 audiovisual features are selected due to the synchronization when speech and video fused together. The experiment results have demonstrated that this system performs well in real-time practice and has high recognition rate. Our results also show that the work in multimodules fused recognition will become the trend of emotion recognition in the future.
The Relationship Between Spectral Modulation Detection and Speech Recognition: Adult Versus Pediatric Cochlear Implant Recipients

PubMed Central

Noble, Jack H.; Camarata, Stephen M.; Sunderhaus, Linsey W.; Dwyer, Robert T.; Dawant, Benoit M.; Dietrich, Mary S.; Labadie, Robert F.

2018-01-01

Adult cochlear implant (CI) recipients demonstrate a reliable relationship between spectral modulation detection and speech understanding. Prior studies documenting this relationship have focused on postlingually deafened adult CI recipients—leaving an open question regarding the relationship between spectral resolution and speech understanding for adults and children with prelingual onset of deafness. Here, we report CI performance on the measures of speech recognition and spectral modulation detection for 578 CI recipients including 477 postlingual adults, 65 prelingual adults, and 36 prelingual pediatric CI users. The results demonstrated a significant correlation between spectral modulation detection and various measures of speech understanding for 542 adult CI recipients. For 36 pediatric CI recipients, however, there was no significant correlation between spectral modulation detection and speech understanding in quiet or in noise nor was spectral modulation detection significantly correlated with listener age or age at implantation. These findings suggest that pediatric CI recipients might not depend upon spectral resolution for speech understanding in the same manner as adult CI recipients. It is possible that pediatric CI users are making use of different cues, such as those contained within the temporal envelope, to achieve high levels of speech understanding. Further investigation is warranted to investigate the relationship between spectral and temporal resolution and speech recognition to describe the underlying mechanisms driving peripheral auditory processing in pediatric CI users. PMID:29716437
Speech Recognition Using Multiple Features and Multiple Recognizers

DTIC Science & Technology

1991-12-03

6 2.1 Introduction ............................................... 6 2.2 Human Speech Communication Process...119 How to Setup ASRT.......................................... 119 How to Use Interactive Menus .................................. 120...recognize a word from an acoustic signal. The human ear and brain perform this type of recognition with incredible speed and precision. Even though
[Improvement in Phoneme Discrimination in Noise in Normal Hearing Adults].

PubMed

Schumann, A; Garea Garcia, L; Hoppe, U

2017-02-01

Objective: The study's aim was to examine the possibility to train phoneme-discrimination in noise with normal hearing adults, and its effectivity on speech recognition in noise. A specific computerised training program was used, consisting of special nonsense-syllables with background noise, to train participants' discrimination ability. Material and Methods: 46 normal hearing subjects took part in this study, 28 as training group participants, 18 as control group participants. Only the training group subjects were asked to train over a period of 3 weeks, twice a week for an hour with a computer-based training program. Speech recognition in noise were measured pre- to posttraining for the training group subjects with the Freiburger Einsilber Test. The control group subjects obtained test and restest measures within a 2-3 week break. For the training group follow-up speech recognition was measured 2-3 months after the end of the training. Results: The majority of training group subjects improved their phoneme discrimination significantly. Besides, their speech recognition in noise improved significantly during the training compared to the control group, and remained stable for a period of time. Conclusions: Phonem-Discrimination in noise can be trained by normal hearing adults. The improvements have got a positiv effect on speech recognition in noise, also for a longer period of time. © Georg Thieme Verlag KG Stuttgart · New York.
The development and validation of the Closed-set Mandarin Sentence (CMS) test.

PubMed

Tao, Duo-Duo; Fu, Qian-Jie; Galvin, John J; Yu, Ya-Feng

2017-09-01

Matrix-styled sentence tests offer a closed-set paradigm that may be useful when evaluating speech intelligibility. Ideally, sentence test materials should reflect the distribution of phonemes within the target language. We developed and validated the Closed-set Mandarin Sentence (CMS) test to assess Mandarin speech intelligibility in noise. CMS test materials were selected to be familiar words and to represent the natural distribution of vowels, consonants, and lexical tones found in Mandarin Chinese. Ten key words in each of five categories (Name, Verb, Number, Color, and Fruit) were produced by a native Mandarin talker, resulting in a total of 50 words that could be combined to produce 100,000 unique sentences. Normative data were collected in 10 normal-hearing, adult Mandarin-speaking Chinese listeners using a closed-set test paradigm. Two test runs were conducted for each subject, and 20 sentences per run were randomly generated while ensuring that each word was presented only twice in each run. First, the level of the words in each category were adjusted to produce equal intelligibility in noise. Test-retest reliability for word-in-sentence recognition was excellent according to Cronbach's alpha (0.952). After the category level adjustments, speech reception thresholds (SRTs) for sentences in noise, defined as the signal-to-noise ratio (SNR) that produced 50% correct whole sentence recognition, were adaptively measured by adjusting the SNR according to the correctness of response. The mean SRT was -7.9 (SE=0.41) and -8.1 (SE=0.34) dB for runs 1 and 2, respectively. The mean standard deviation across runs was 0.93 dB, and paired t-tests showed no significant difference between runs 1 and 2 (p=0.74) despite random sentences being generated for each run and each subject. The results suggest that the CMS provides large stimulus set with which to repeatedly and reliably measure Mandarin-speaking listeners' speech understanding in noise using a closed-set paradigm.
The Effectiveness of Clear Speech as a Masker

ERIC Educational Resources Information Center

Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.

2010-01-01

Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…
Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project.

PubMed

Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

2006-01-01

The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.

PubMed

Borrie, Stephanie A

2015-03-01

This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

PubMed

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.

Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

PubMed Central

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766
Benefits of adaptive FM systems on speech recognition in noise for listeners who use hearing aids.

PubMed

Thibodeau, Linda

2010-06-01

To compare the benefits of adaptive FM and fixed FM systems through measurement of speech recognition in noise with adults and students in clinical and real-world settings. Five adults and 5 students with moderate-to-severe hearing loss completed objective and subjective speech recognition in noise measures with the 2 types of FM processing. Sentence recognition was evaluated in a classroom for 5 competing noise levels ranging from 54 to 80 dBA while the FM microphone was positioned 6 in. from the signal loudspeaker to receive input at 84 dB SPL. The subjective measures included 2 classroom activities and 6 auditory lessons in a noisy, public aquarium. On the objective measures, adaptive FM processing resulted in significantly better speech recognition in noise than fixed FM processing for 68- and 73-dBA noise levels. On the subjective measures, all individuals preferred adaptive over fixed processing for half of the activities. Adaptive processing was also preferred by most (8-9) individuals for the remaining 4 activities. The adaptive FM processing resulted in significant improvements at the higher noise levels and was preferred by the majority of participants in most of the conditions.
Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss.

PubMed

Miller, Christi W; Stewart, Erin K; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A; Tremblay, Kelly

2017-08-16

This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed.
Speech Understanding with a New Implant Technology: A Comparative Study with a New Nonskin Penetrating Baha System

PubMed Central

Caversaccio, Marco

2014-01-01

Objective. To compare hearing and speech understanding between a new, nonskin penetrating Baha system (Baha Attract) to the current Baha system using a skin-penetrating abutment. Methods. Hearing and speech understanding were measured in 16 experienced Baha users. The transmission path via the abutment was compared to a simulated Baha Attract transmission path by attaching the implantable magnet to the abutment and then by adding a sample of artificial skin and the external parts of the Baha Attract system. Four different measurements were performed: bone conduction thresholds directly through the sound processor (BC Direct), aided sound field thresholds, aided speech understanding in quiet, and aided speech understanding in noise. Results. The simulated Baha Attract transmission path introduced an attenuation starting from approximately 5 dB at 1000 Hz, increasing to 20–25 dB above 6000 Hz. However, aided sound field threshold shows smaller differences and aided speech understanding in quiet and in noise does not differ significantly between the two transmission paths. Conclusion. The Baha Attract system transmission path introduces predominately high frequency attenuation. This attenuation can be partially compensated by adequate fitting of the speech processor. No significant decrease in speech understanding in either quiet or in noise was found. PMID:25140314
Neuronal Spoken Word Recognition: The Time Course of Processing Variation in the Speech Signal

ERIC Educational Resources Information Center

Schild, Ulrike; Roder, Brigitte; Friedrich, Claudia K.

2012-01-01

Recent neurobiological studies revealed evidence for lexical representations that are not specified for the coronal place of articulation (PLACE; Friedrich, Eulitz, & Lahiri, 2006; Friedrich, Lahiri, & Eulitz, 2008). Here we tested when these types of underspecified representations influence neuronal speech recognition. In a unimodal…
Recognizing Speech under a Processing Load: Dissociating Energetic from Informational Factors

ERIC Educational Resources Information Center

Mattys, Sven L.; Brooks, Joanna; Cooke, Martin

2009-01-01

Effects of perceptual and cognitive loads on spoken-word recognition have so far largely escaped investigation. This study lays the foundations of a psycholinguistic approach to speech recognition in adverse conditions that draws upon the distinction between energetic masking, i.e., listening environments leading to signal degradation, and…
Bilingual Computerized Speech Recognition Screening for Depression Symptoms

ERIC Educational Resources Information Center

Gonzalez, Gerardo; Carter, Colby; Blanes, Erika

2007-01-01

The Voice-Interactive Depression Assessment System (VIDAS) is a computerized speech recognition application for screening depression based on the Center for Epidemiological Studies--Depression scale in English and Spanish. Study 1 included 50 English and 47 Spanish speakers. Study 2 involved 108 English and 109 Spanish speakers. Participants…
Multichannel Compression, Temporal Cues, and Audibility.

ERIC Educational Resources Information Center

Souza, Pamela E.; Turner, Christopher W.

1998-01-01

The effect of the reduction of the temporal envelope produced by multichannel compression on recognition was examined in 16 listeners with hearing loss, with particular focus on audibility of the speech signal. Multichannel compression improved speech recognition when superior audibility was provided by a two-channel compression system over linear…
The effect of hearing aid technologies on listening in an automobile.

PubMed

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A; Stanziola, Rachel W

2013-06-01

Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile/road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the listener/driver, conventional directional processing that places the directivity beam toward the listener's front may not be helpful and, in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener's speech recognition performance and preference for communication in a traveling automobile. A single-blinded, repeated-measures design was used. Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 yr participated in the study. The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 mph on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound-treated booth to assess speech recognition performance and preference with each programmed condition. Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants' preferences for a given processing scheme were generally consistent with speech recognition results. The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. American Academy of Audiology.
Pulse-spreading harmonic complex as an alternative carrier for vocoder simulations of cochlear implants.

PubMed

Mesnildrey, Quentin; Hilkhuysen, Gaston; Macherey, Olivier

2016-02-01

Noise- and sine-carrier vocoders are often used to acoustically simulate the information transmitted by a cochlear implant (CI). However, sine-waves fail to mimic the broad spread of excitation produced by a CI and noise-bands contain intrinsic modulations that are absent in CIs. The present study proposes pulse-spreading harmonic complexes (PSHCs) as an alternative acoustic carrier in vocoders. Sentence-in-noise recognition was measured in 12 normal-hearing subjects for noise-, sine-, and PSHC-vocoders. Consistent with the amount of intrinsic modulations present in each vocoder condition, the average speech reception threshold obtained with the PSHC-vocoder was higher than with sine-vocoding but lower than with noise-vocoding.
Not all sounds sound the same: Parkinson's disease affects differently emotion processing in music and in speech prosody.

PubMed

Lima, César F; Garrett, Carolina; Castro, São Luís

2013-01-01

Does emotion processing in music and speech prosody recruit common neurocognitive mechanisms? To examine this question, we implemented a cross-domain comparative design in Parkinson's disease (PD). Twenty-four patients and 25 controls performed emotion recognition tasks for music and spoken sentences. In music, patients had impaired recognition of happiness and peacefulness, and intact recognition of sadness and fear; this pattern was independent of general cognitive and perceptual abilities. In speech, patients had a small global impairment, which was significantly mediated by executive dysfunction. Hence, PD affected differently musical and prosodic emotions. This dissociation indicates that the mechanisms underlying the two domains are partly independent.
Speech-on-speech masking with variable access to the linguistic content of the masker speech for native and nonnative english speakers.

PubMed

Calandruccio, Lauren; Bradlow, Ann R; Dhar, Sumitrajit

2014-04-01

Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared with native-accented English speech was reported in Calandruccio et al (2010a). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech Affiliationect masking release. A mixed-model design with within-subject (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech and high-intelligibility, moderate-intelligibility, and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Three listener groups were tested, including monolingual English speakers with normal hearing, nonnative English speakers with normal hearing, and monolingual English speakers with hearing loss. The nonnative English speakers were from various native language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetric mild sloping to moderate sensorineural hearing loss. Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the key words within the sentences (100 key words per masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and listener groups. Monolingual English speakers with normal hearing benefited when the competing speech signal was foreign accented compared with native accented, allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the nonnative English-speaking listeners with normal hearing nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Slight modifications between the target and the masker speech allowed monolingual English speakers with normal hearing to improve their recognition of native-accented English, even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. American Academy of Audiology.
Application of speech recognition and synthesis in the general aviation cockpit

NASA Technical Reports Server (NTRS)

North, R. A.; Mountford, S. J.; Bergeron, H.

1984-01-01

Interactive speech recognition/synthesis technology is assessed as a method for the aleviation of single-pilot IFR flight workloads. Attention was given during this series of evaluations to the conditions typical of general aviation twin-engine aircrft cockpits, covering several commonly encountered IFR flight condition scenarios. The most beneficial speech command tasks are noted to be in the data retrieval domain, which would allow the pilot access to uplinked data, checklists, and performance charts. Data entry tasks also appear to benefit from this technology.
Simulated Critical Differences for Speech Reception Thresholds

ERIC Educational Resources Information Center

Pedersen, Ellen Raben; Juhl, Peter Møller

2017-01-01

Purpose: Critical differences state by how much 2 test results have to differ in order to be significantly different. Critical differences for discrimination scores have been available for several decades, but they do not exist for speech reception thresholds (SRTs). This study presents and discusses how critical differences for SRTs can be…
Noise-immune multisensor transduction of speech

NASA Astrophysics Data System (ADS)

Viswanathan, Vishu R.; Henry, Claudia M.; Derr, Alan G.; Roucos, Salim; Schwartz, Richard M.

1986-08-01

Two types of configurations of multiple sensors were developed, tested and evaluated in speech recognition application for robust performance in high levels of acoustic background noise: One type combines the individual sensor signals to provide a single speech signal input, and the other provides several parallel inputs. For single-input systems, several configurations of multiple sensors were developed and tested. Results from formal speech intelligibility and quality tests in simulated fighter aircraft cockpit noise show that each of the two-sensor configurations tested outperforms the constituent individual sensors in high noise. Also presented are results comparing the performance of two-sensor configurations and individual sensors in speaker-dependent, isolated-word speech recognition tests performed using a commercial recognizer (Verbex 4000) in simulated fighter aircraft cockpit noise.
Evaluation of speech recognizers for use in advanced combat helicopter crew station research and development

NASA Technical Reports Server (NTRS)

Simpson, Carol A.

1990-01-01

The U.S. Army Crew Station Research and Development Facility uses vintage 1984 speech recognizers. An evaluation was performed of newer off-the-shelf speech recognition devices to determine whether newer technology performance and capabilities are substantially better than that of the Army's current speech recognizers. The Phonetic Discrimination (PD-100) Test was used to compare recognizer performance in two ambient noise conditions: quiet office and helicopter noise. Test tokens were spoken by males and females and in isolated-word and connected-work mode. Better overall recognition accuracy was obtained from the newer recognizers. Recognizer capabilities needed to support the development of human factors design requirements for speech command systems in advanced combat helicopters are listed.
"Who" is saying "what"? Brain-based decoding of human voice and speech.

PubMed

Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

2008-11-07

Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.
Effects of noise on speech recognition: Challenges for communication by service members.

PubMed

Le Prell, Colleen G; Clavier, Odile H

2017-06-01

Speech communication often takes place in noisy environments; this is an urgent issue for military personnel who must communicate in high-noise environments. The effects of noise on speech recognition vary significantly according to the sources of noise, the number and types of talkers, and the listener's hearing ability. In this review, speech communication is first described as it relates to current standards of hearing assessment for military and civilian populations. The next section categorizes types of noise (also called maskers) according to their temporal characteristics (steady or fluctuating) and perceptive effects (energetic or informational masking). Next, speech recognition difficulties experienced by listeners with hearing loss and by older listeners are summarized, and questions on the possible causes of speech-in-noise difficulty are discussed, including recent suggestions of "hidden hearing loss". The final section describes tests used by military and civilian researchers, audiologists, and hearing technicians to assess performance of an individual in recognizing speech in background noise, as well as metrics that predict performance based on a listener and background noise profile. This article provides readers with an overview of the challenges associated with speech communication in noisy backgrounds, as well as its assessment and potential impact on functional performance, and provides guidance for important new research directions relevant not only to military personnel, but also to employees who work in high noise environments. Copyright © 2016 Elsevier B.V. All rights reserved.
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation

PubMed Central

Banks, Briony; Gowen, Emma; Munro, Kevin J.; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker’s facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants’ eye gaze was recorded to verify that they looked at the speaker’s face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation. PMID:26283946
Audiovisual cues benefit recognition of accented speech in noise but not perceptual adaptation.

PubMed

Banks, Briony; Gowen, Emma; Munro, Kevin J; Adank, Patti

2015-01-01

Perceptual adaptation allows humans to recognize different varieties of accented speech. We investigated whether perceptual adaptation to accented speech is facilitated if listeners can see a speaker's facial and mouth movements. In Study 1, participants listened to sentences in a novel accent and underwent a period of training with audiovisual or audio-only speech cues, presented in quiet or in background noise. A control group also underwent training with visual-only (speech-reading) cues. We observed no significant difference in perceptual adaptation between any of the groups. To address a number of remaining questions, we carried out a second study using a different accent, speaker and experimental design, in which participants listened to sentences in a non-native (Japanese) accent with audiovisual or audio-only cues, without separate training. Participants' eye gaze was recorded to verify that they looked at the speaker's face during audiovisual trials. Recognition accuracy was significantly better for audiovisual than for audio-only stimuli; however, no statistical difference in perceptual adaptation was observed between the two modalities. Furthermore, Bayesian analysis suggested that the data supported the null hypothesis. Our results suggest that although the availability of visual speech cues may be immediately beneficial for recognition of unfamiliar accented speech in noise, it does not improve perceptual adaptation.

Speech Recognition of Bimodal Cochlear Implant Recipients Using a Wireless Audio Streaming Accessory for the Telephone.

PubMed

Wolfe, Jace; Morais, Mila; Schafer, Erin

2016-02-01

The goals of the present investigation were (1) to evaluate recognition of recorded speech presented over a mobile telephone for a group of adult bimodal cochlear implant users, and (2) to measure the potential benefits of wireless hearing assistance technology (HAT) for mobile telephone speech recognition using bimodal stimulation (i.e., a cochlear implant in one ear and a hearing aid on the other ear). A three-by-two-way repeated measures design was used to evaluate mobile telephone sentence-recognition performance differences obtained in quiet and in noise with and without the wireless HAT accessory coupled to the hearing aid alone, CI sound processor alone, and in the bimodal condition. Outpatient cochlear implant clinic. Sixteen bimodal users with Nucleus 24, Freedom, CI512, or CI422 cochlear implants participated in this study. Performance was measured with and without the use of a wireless HAT for the telephone used with the hearing aid alone, CI alone, and bimodal condition. CNC word recognition in quiet and in noise with and without the use of a wireless HAT telephone accessory in the hearing aid alone, CI alone, and bimodal conditions. Results suggested that the bimodal condition gave significantly better speech recognition on the mobile telephone with the wireless HAT. A wireless HAT for the mobile telephone provides bimodal users with significant improvement in word recognition in quiet and in noise over the mobile telephone.
Assessment of Severe Apnoea through Voice Analysis, Automatic Speech, and Speaker Recognition Techniques

NASA Astrophysics Data System (ADS)

Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.

2009-12-01

This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Automatic speech recognition in air traffic control

NASA Technical Reports Server (NTRS)

Karlsson, Joakim

1990-01-01

Automatic Speech Recognition (ASR) technology and its application to the Air Traffic Control system are described. The advantages of applying ASR to Air Traffic Control, as well as criteria for choosing a suitable ASR system are presented. Results from previous research and directions for future work at the Flight Transportation Laboratory are outlined.
Transcribe Your Class: Using Speech Recognition to Improve Access for At-Risk Students

ERIC Educational Resources Information Center

Bain, Keith; Lund-Lucas, Eunice; Stevens, Janice

2012-01-01

Through a project supported by Canada's Social Development Partnerships Program, a team of leading National Disability Organizations, universities, and industry partners are piloting a prototype Hosted Transcription Service that uses speech recognition to automatically create multimedia transcripts that can be used by students for study purposes.…
Automatic Speech Recognition: Reliability and Pedagogical Implications for Teaching Pronunciation

ERIC Educational Resources Information Center

Kim, In-Seok

2006-01-01

This study examines the reliability of automatic speech recognition (ASR) software used to teach English pronunciation, focusing on one particular piece of software, "FluSpeak, as a typical example." Thirty-six Korean English as a Foreign Language (EFL) college students participated in an experiment in which they listened to 15 sentences…
Speech Recognition Technology for Disabilities Education

ERIC Educational Resources Information Center

Tang, K. Wendy; Kamoua, Ridha; Sutan, Victor; Farooq, Omer; Eng, Gilbert; Chu, Wei Chern; Hou, Guofeng

2005-01-01

Speech recognition is an alternative to traditional methods of interacting with a computer, such as textual input through a keyboard. An effective system can replace or reduce the reliability on standard keyboard and mouse input. This can especially assist dyslexic students who have problems with character or word use and manipulation in a textual…
Automatic Speech Recognition Technology as an Effective Means for Teaching Pronunciation

ERIC Educational Resources Information Center

Elimat, Amal Khalil; AbuSeileek, Ali Farhan

2014-01-01

This study aimed to explore the effect of using automatic speech recognition technology (ASR) on the third grade EFL students' performance in pronunciation, whether teaching pronunciation through ASR is better than regular instruction, and the most effective teaching technique (individual work, pair work, or group work) in teaching pronunciation…
An Investigation of the Compensatory Effectiveness of Assistive Technology on Postsecondary Students with Learning Disabilities. Final Report.

ERIC Educational Resources Information Center

Murphy, Harry; Higgins, Eleanor

This final report describes the activities and accomplishments of a 3-year study on the compensatory effectiveness of three assistive technologies, optical character recognition, speech synthesis, and speech recognition, on postsecondary students (N=140) with learning disabilities. These technologies were investigated relative to: (1) immediate…
Effects of Cognitive Load on Speech Recognition

ERIC Educational Resources Information Center

Mattys, Sven L.; Wiget, Lukas

2011-01-01

The effect of cognitive load (CL) on speech recognition has received little attention despite the prevalence of CL in everyday life, e.g., dual-tasking. To assess the effect of CL on the interaction between lexically-mediated and acoustically-mediated processes, we measured the magnitude of the "Ganong effect" (i.e., lexical bias on phoneme…
Auditory Training with Multiple Talkers and Passage-Based Semantic Cohesion

ERIC Educational Resources Information Center

Casserly, Elizabeth D.; Barney, Erin C.

2017-01-01

Purpose: Current auditory training methods typically result in improvements to speech recognition abilities in quiet, but learner gains may not extend to other domains in speech (e.g., recognition in noise) or self-assessed benefit. This study examined the potential of training involving multiple talkers and training emphasizing discourse-level…
Implicit Processing of Phonotactic Cues: Evidence from Electrophysiological and Vascular Responses

ERIC Educational Resources Information Center

Rossi, Sonja; Jurgenson, Ina B.; Hanulikova, Adriana; Telkemeyer, Silke; Wartenburger, Isabell; Obrig, Hellmuth

2011-01-01

Spoken word recognition is achieved via competition between activated lexical candidates that match the incoming speech input. The competition is modulated by prelexical cues that are important for segmenting the auditory speech stream into linguistic units. One such prelexical cue that listeners rely on in spoken word recognition is phonotactics.…
Spoken Word Recognition of Chinese Words in Continuous Speech

ERIC Educational Resources Information Center

Yip, Michael C. W.

2015-01-01

The present study examined the role of positional probability of syllables played in recognition of spoken word in continuous Cantonese speech. Because some sounds occur more frequently at the beginning position or ending position of Cantonese syllables than the others, so these kinds of probabilistic information of syllables may cue the locations…
Phonotactics Constraints and the Spoken Word Recognition of Chinese Words in Speech

ERIC Educational Resources Information Center

Yip, Michael C.

2016-01-01

Two word-spotting experiments were conducted to examine the question of whether native Cantonese listeners are constrained by phonotactics information in spoken word recognition of Chinese words in speech. Because no legal consonant clusters occurred within an individual Chinese word, this kind of categorical phonotactics information of Chinese…
Evaluating Automatic Speech Recognition-Based Language Learning Systems: A Case Study

ERIC Educational Resources Information Center

van Doremalen, Joost; Boves, Lou; Colpaert, Jozef; Cucchiarini, Catia; Strik, Helmer

2016-01-01

The purpose of this research was to evaluate a prototype of an automatic speech recognition (ASR)-based language learning system that provides feedback on different aspects of speaking performance (pronunciation, morphology and syntax) to students of Dutch as a second language. We carried out usability reviews, expert reviews and user tests to…
Motor Speech Disorders Associated with Primary Progressive Aphasia

PubMed Central

Duffy, Joseph R.; Strand, Edythe A.; Josephs, Keith A.

2014-01-01

Background Primary progressive aphasia (PPA) and conditions that overlap with it can be accompanied by motor speech disorders. Recognition and understanding of motor speech disorders can contribute to a fuller clinical understanding of PPA and its management as well as its localization and underlying pathology. Aims To review the types of motor speech disorders that may occur with PPA, its primary variants, and its overlap syndromes (progressive supranuclear palsy syndrome, corticobasal syndrome, motor neuron disease), as well as with primary progressive apraxia of speech. Main Contribution The review should assist clinicians' and researchers' understanding of the relationship between motor speech disorders and PPA and its major variants. It also highlights the importance of recognizing neurodegenerative apraxia of speech as a condition that can occur with little or no evidence of aphasia. Conclusion Motor speech disorders can occur with PPA. Their recognition can contribute to clinical diagnosis and management of PPA and to understanding and predicting the localization and pathology associated with PPA variants and conditions that can overlap with them. PMID:25309017
Quadcopter Control Using Speech Recognition

NASA Astrophysics Data System (ADS)

Malik, H.; Darma, S.; Soekirno, S.

2018-04-01

This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
Participation of the Classical Speech Areas in Auditory Long-Term Memory

PubMed Central

Karabanov, Anke Ninija; Paine, Rainer; Chao, Chi Chao; Schulze, Katrin; Scott, Brian; Hallett, Mark; Mishkin, Mortimer

2015-01-01

Accumulating evidence suggests that storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences. If so, then the classical speech areas in the caudalmost portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation (rTMS) to each of these left-hemisphere loci, as well as to a nonspeech locus, while participants listened to pseudowords. After 5 minutes these stimuli were re-presented together with new ones in a recognition test. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate, without affecting reaction time. By contrast, IFG stimulation led only to a weak, non-significant, trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. Whether or not the IFG also plays a role in speech-sound recognition could not be determined from the present results. PMID:25815813
Participation of the classical speech areas in auditory long-term memory.

PubMed

Karabanov, Anke Ninija; Paine, Rainer; Chao, Chi Chao; Schulze, Katrin; Scott, Brian; Hallett, Mark; Mishkin, Mortimer

2015-01-01

Accumulating evidence suggests that storing speech sounds requires transposing rapidly fluctuating sound waves into more easily encoded oromotor sequences. If so, then the classical speech areas in the caudalmost portion of the temporal gyrus (pSTG) and in the inferior frontal gyrus (IFG) may be critical for performing this acoustic-oromotor transposition. We tested this proposal by applying repetitive transcranial magnetic stimulation (rTMS) to each of these left-hemisphere loci, as well as to a nonspeech locus, while participants listened to pseudowords. After 5 minutes these stimuli were re-presented together with new ones in a recognition test. Compared to control-site stimulation, pSTG stimulation produced a highly significant increase in recognition error rate, without affecting reaction time. By contrast, IFG stimulation led only to a weak, non-significant, trend toward recognition memory impairment. Importantly, the impairment after pSTG stimulation was not due to interference with perception, since the same stimulation failed to affect pseudoword discrimination examined with short interstimulus intervals. Our findings suggest that pSTG is essential for transforming speech sounds into stored motor plans for reproducing the sound. Whether or not the IFG also plays a role in speech-sound recognition could not be determined from the present results.
Syntactic error modeling and scoring normalization in speech recognition

NASA Technical Reports Server (NTRS)

Olorenshaw, Lex

1991-01-01

The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read.
Tone classification of syllable-segmented Thai speech based on multilayer perception

NASA Astrophysics Data System (ADS)

Satravaha, Nuttavudh; Klinkhachorn, Powsiri; Lass, Norman

2002-05-01

Thai is a monosyllabic tonal language that uses tone to convey lexical information about the meaning of a syllable. Thus to completely recognize a spoken Thai syllable, a speech recognition system not only has to recognize a base syllable but also must correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. Thai has five distinctive tones (``mid,'' ``low,'' ``falling,'' ``high,'' and ``rising'') and each tone is represented by a single fundamental frequency (F0) pattern. However, several factors, including tonal coarticulation, stress, intonation, and speaker variability, affect the F0 pattern of a syllable in continuous Thai speech. In this study, an efficient method for tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress, and intonation, as well as a method to perform automatic syllable segmentation, were developed. Acoustic parameters were used as the main discriminating parameters. The F0 contour of a segmented syllable was normalized by using a z-score transformation before being presented to a tone classifier. The proposed system was evaluated on 920 test utterances spoken by 8 speakers. A recognition rate of 91.36% was achieved by the proposed system.

Effect of age at cochlear implantation on auditory and speech development of children with auditory neuropathy spectrum disorder.

PubMed

Liu, Yuying; Dong, Ruijuan; Li, Yuling; Xu, Tianqiu; Li, Yongxin; Chen, Xueqing; Gong, Shusheng

2014-12-01

To evaluate the auditory and speech abilities in children with auditory neuropathy spectrum disorder (ANSD) after cochlear implantation (CI) and determine the role of age at implantation. Ten children participated in this retrospective case series study. All children had evidence of ANSD. All subjects had no cochlear nerve deficiency on magnetic resonance imaging and had used the cochlear implants for a period of 12-84 months. We divided our children into two groups: children who underwent implantation before 24 months of age and children who underwent implantation after 24 months of age. Their auditory and speech abilities were evaluated using the following: behavioral audiometry, the Categories of Auditory Performance (CAP), the Meaningful Auditory Integration Scale (MAIS), the Infant-Toddler Meaningful Auditory Integration Scale (IT-MAIS), the Standard-Chinese version of the Monosyllabic Lexical Neighborhood Test (LNT), the Multisyllabic Lexical Neighborhood Test (MLNT), the Speech Intelligibility Rating (SIR) and the Meaningful Use of Speech Scale (MUSS). All children showed progress in their auditory and language abilities. The 4-frequency average hearing level (HL) (500Hz, 1000Hz, 2000Hz and 4000Hz) of aided hearing thresholds ranged from 17.5 to 57.5dB HL. All children developed time-related auditory perception and speech skills. Scores of children with ANSD who received cochlear implants before 24 months tended to be better than those of children who received cochlear implants after 24 months. Seven children completed the Mandarin Lexical Neighborhood Test. Approximately half of the children showed improved open-set speech recognition. Cochlear implantation is helpful for children with ANSD and may be a good optional treatment for many ANSD children. In addition, children with ANSD fitted with cochlear implants before 24 months tended to acquire auditory and speech skills better than children fitted with cochlear implants after 24 months. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

NASA Astrophysics Data System (ADS)

Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

2016-09-01

We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.
Deactivating stimulation sites based on low-rate thresholds improves spectral ripple and speech reception thresholds in cochlear implant users.

PubMed

Zhou, Ning

2017-03-01

The study examined whether the benefit of deactivating stimulation sites estimated to have broad neural excitation was attributed to improved spectral resolution in cochlear implant users. The subjects' spatial neural excitation pattern was estimated by measuring low-rate detection thresholds across the array [see Zhou (2016). PLoS One 11, e0165476]. Spectral resolution, as assessed by spectral-ripple discrimination thresholds, significantly improved after deactivation of five high-threshold sites. The magnitude of improvement in spectral-ripple discrimination thresholds predicted the magnitude of improvement in speech reception thresholds after deactivation. Results suggested that a smaller number of relatively independent channels provide a better outcome than using all channels that might interact.
Relating hearing loss and executive functions to hearing aid users' preference for, and speech recognition with, different combinations of binaural noise reduction and microphone directionality

PubMed Central

Neher, Tobias

2014-01-01

Knowledge of how executive functions relate to preferred hearing aid (HA) processing is sparse and seemingly inconsistent with related knowledge for speech recognition outcomes. This study thus aimed to find out if (1) performance on a measure of reading span (RS) is related to preferred binaural noise reduction (NR) strength, (2) similar relations exist for two different, non-verbal measures of executive function, (3) pure-tone average hearing loss (PTA), signal-to-noise ratio (SNR), and microphone directionality (DIR) also influence preferred NR strength, and (4) preference and speech recognition outcomes are similar. Sixty elderly HA users took part. Six HA conditions consisting of omnidirectional or cardioid microphones followed by inactive, moderate, or strong binaural NR as well as linear amplification were tested. Outcome was assessed at fixed SNRs using headphone simulations of a frontal target talker in a busy cafeteria. Analyses showed positive effects of active NR and DIR on preference, and negative and positive effects of, respectively, strong NR and DIR on speech recognition. Also, while moderate NR was the most preferred NR setting overall, preference for strong NR increased with SNR. No relation between RS and preference was found. However, larger PTA was related to weaker preference for inactive NR and stronger preference for strong NR for both microphone modes. Equivalent (but weaker) relations between worse performance on one non-verbal measure of executive function and the HA conditions without DIR were found. For speech recognition, there were relations between HA condition, PTA, and RS, but their pattern differed from that for preference. Altogether, these results indicate that, while moderate NR works well in general, a notable proportion of HA users prefer stronger NR. Furthermore, PTA and executive functions can account for some of the variability in preference for, and speech recognition with, different binaural NR and DIR settings. PMID:25538547
Effect of different signal-processing options on speech-in-noise recognition for cochlear implant recipients with the cochlear CP810 speech processor.

PubMed

Potts, Lisa G; Kolb, Kelly A

2014-04-01

Difficulty understanding speech in the presence of background noise is a common report among cochlear implant (CI) recipients. Several speech-processing options designed to improve speech recognition, especially in noise, are currently available in the Cochlear Nucleus CP810 speech processor. These include adaptive dynamic range optimization (ADRO), autosensitivity control (ASC), Beam, and Zoom. The purpose of this study was to evaluate CI recipients' speech-in-noise recognition to determine which currently available processing option or options resulted in best performance in a simulated restaurant environment. Experimental study with one study group. The independent variable was speech-processing option, and the dependent variable was the reception threshold for sentences score. Thirty-two adult CI recipients. Eight processing options were tested: Beam, Beam + ASC, Beam + ADRO, Beam + ASC + ADRO, Zoom, Zoom + ASC, Zoom + ADRO, and Zoom + ASC + ADRO. Participants repeated Hearing in Noise Test sentences presented at a 0° azimuth, with R-Space restaurant noise presented from a 360° eight-loudspeaker array at 70 dB sound pressure level. A one-way repeated-measures analysis of variance was used to analyze differences in Beam options, Zoom options, and Beam versus Zoom options. Among the Beam options, Beam + ADRO was significantly poorer than Beam only, Beam + ASC, and Beam + ASC + ADRO. A 1.6-dB difference was observed between the best (Beam only) and poorest (Beam + ADRO) options. Among the Zoom options, Zoom only and Zoom + ADRO were significantly poorer than Zoom + ASC. A 2.2-dB difference was observed between the best (Zoom + ASC) and poorest (Zoom only) options. The comparison between Beam and Zoom options showed one significant difference, with Zoom only significantly poorer than Beam only. No significant difference was found between the other Beam and Zoom options (Beam + ASC vs Zoom + ASC, Beam + ADRO vs Zoom + ADRO, and Beam + ASC + ADRO vs Zoom + ASC + ADRO). The best processing option varied across subjects, with an almost equal number of participants performing best with a Beam option (n = 15) compared with a Zoom option (n = 17). There were no significant demographic or audiological moderating variables for any option. The results showed no significant differences between adaptive directionality (Beam) and fixed directionality (Zoom) when ASC was active in the R-Space environment. This finding suggests that noise-reduction processing is extremely valuable in loud semidiffuse environments in which the effectiveness of directional filtering might be diminished. However, there was no significant difference between the Beam-only and Beam + ASC options, which is most likely related to the additional noise cancellation performed by the Beam option (i.e., two-stage directional filtering and noise cancellation). In addition, the processing options with ADRO resulted in the poorest performances. This could be related to how the CI recipients were programmed or the loud noise level used in this study. The best processing option varied across subjects, but the majority performed best with directional filtering (Beam or Zoom) in combination with ASC. Therefore in a loud semidiffuse environment, the use of either Beam + ASC or Zoom + ASC is recommended. American Academy of Audiology.
Lessons Learned in Part-of-Speech Tagging of Conversational Speech

DTIC Science & Technology

2010-10-01

for conversational speech recognition. In Plenary Meeting and Symposium on Prosody and Speech Processing. Slav Petrov and Dan Klein. 2007. Improved...inference for unlexicalized parsing. In HLT-NAACL. Slav Petrov. 2010. Products of random latent variable grammars. In HLT-NAACL. Brian Roark, Yang Liu
Cleft Audit Protocol for Speech (CAPS-A): A Comprehensive Training Package for Speech Analysis

ERIC Educational Resources Information Center

Sell, D.; John, A.; Harding-Bell, A.; Sweeney, T.; Hegarty, F.; Freeman, J.

2009-01-01

Background: The previous literature has largely focused on speech analysis systems and ignored process issues, such as the nature of adequate speech samples, data acquisition, recording and playback. Although there has been recognition of the need for training on tools used in speech analysis associated with cleft palate, little attention has been…
Staff Report to the Senior Department Official on Recognition Compliance Issues. Recommendation Page: American Speech-Language-Hearing Association

ERIC Educational Resources Information Center

US Department of Education, 2010

2010-01-01

The American Speech-Language-Hearing Association, Council on Academic Accreditation in Audiology and Speech-Language Pathology (CAA) is a national accrediting agency of graduate education programs in audiology or speech-language pathology. The CAA currently accredits or or preaccredits 319 programs (247 in speech-language pathology and 72 in…
Working Memory and Speech Recognition in Noise Under Ecologically Relevant Listening Conditions: Effects of Visual Cues and Noise Type Among Adults With Hearing Loss

PubMed Central

Stewart, Erin K.; Wu, Yu-Hsiang; Bishop, Christopher; Bentler, Ruth A.; Tremblay, Kelly

2017-01-01

Purpose This study evaluated the relationship between working memory (WM) and speech recognition in noise with different noise types as well as in the presence of visual cues. Method Seventy-six adults with bilateral, mild to moderately severe sensorineural hearing loss (mean age: 69 years) participated. Using a cross-sectional design, 2 measures of WM were taken: a reading span measure, and Word Auditory Recognition and Recall Measure (Smith, Pichora-Fuller, & Alexander, 2016). Speech recognition was measured with the Multi-Modal Lexical Sentence Test for Adults (Kirk et al., 2012) in steady-state noise and 4-talker babble, with and without visual cues. Testing was under unaided conditions. Results A linear mixed model revealed visual cues and pure-tone average as the only significant predictors of Multi-Modal Lexical Sentence Test outcomes. Neither WM measure nor noise type showed a significant effect. Conclusion The contribution of WM in explaining unaided speech recognition in noise was negligible and not influenced by noise type or visual cues. We anticipate that with audibility partially restored by hearing aids, the effects of WM will increase. For clinical practice to be affected, more significant effect sizes are needed. PMID:28744550
Influence of auditory attention on sentence recognition captured by the neural phase.

PubMed

Müller, Jana Annina; Kollmeier, Birger; Debener, Stefan; Brand, Thomas

2018-03-07

The aim of this study was to investigate whether attentional influences on speech recognition are reflected in the neural phase entrained by an external modulator. Sentences were presented in 7 Hz sinusoidally modulated noise while the neural response to that modulation frequency was monitored by electroencephalogram (EEG) recordings in 21 participants. We implemented a selective attention paradigm including three different attention conditions while keeping physical stimulus parameters constant. The participants' task was either to repeat the sentence as accurately as possible (speech recognition task), to count the number of decrements implemented in modulated noise (decrement detection task), or to do both (dual task), while the EEG was recorded. Behavioural analysis revealed reduced performance in the dual task condition for decrement detection, possibly reflecting limited cognitive resources. EEG analysis revealed no significant differences in power for the 7 Hz modulation frequency, but an attention-dependent phase difference between tasks. Further phase analysis revealed a significant difference 500 ms after sentence onset between trials with correct and incorrect responses for speech recognition, indicating that speech recognition performance and the neural phase are linked via selective attention mechanisms, at least shortly after sentence onset. However, the neural phase effects identified were small and await further investigation. © 2018 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Hybrid simulated annealing and its application to optimization of hidden Markov models for visual speech recognition.

PubMed

Lee, Jong-Seok; Park, Cheol Hoon

2010-08-01

We propose a novel stochastic optimization algorithm, hybrid simulated annealing (SA), to train hidden Markov models (HMMs) for visual speech recognition. In our algorithm, SA is combined with a local optimization operator that substitutes a better solution for the current one to improve the convergence speed and the quality of solutions. We mathematically prove that the sequence of the objective values converges in probability to the global optimum in the algorithm. The algorithm is applied to train HMMs that are used as visual speech recognizers. While the popular training method of HMMs, the expectation-maximization algorithm, achieves only local optima in the parameter space, the proposed method can perform global optimization of the parameters of HMMs and thereby obtain solutions yielding improved recognition performance. The superiority of the proposed algorithm to the conventional ones is demonstrated via isolated word recognition experiments.
Speech-Enabled Interfaces for Travel Information Systems with Large Grammars

NASA Astrophysics Data System (ADS)

Zhao, Baoli; Allen, Tony; Bargiela, Andrzej

This paper introduces three grammar-segmentation methods capable of handling the large grammar issues associated with producing a real-time speech-enabled VXML bus travel application for London. Large grammars tend to produce relatively slow recognition interfaces and this work shows how this limitation can be successfully addressed. Comparative experimental results show that the novel last-word recognition based grammar segmentation method described here achieves an optimal balance between recognition rate, speed of processing and naturalness of interaction.
Memristive Computational Architecture of an Echo State Network for Real-Time Speech Emotion Recognition

DTIC Science & Technology

2015-05-28

recognition is simpler and requires less computational resources compared to other inputs such as facial expressions . The Berlin database of Emotional ...Processing Magazine, IEEE, vol. 18, no. 1, pp. 32– 80, 2001. [15] K. R. Scherer, T. Johnstone, and G. Klasmeyer, “Vocal expression of emotion ...Network for Real-Time Speech- Emotion Recognition 5a. CONTRACT NUMBER IN-HOUSE 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 62788F 6. AUTHOR(S) Q
Inferring Speaker Affect in Spoken Natural Language Communication

ERIC Educational Resources Information Center

Pon-Barry, Heather Roberta

2013-01-01

The field of spoken language processing is concerned with creating computer programs that can understand human speech and produce human-like speech. Regarding the problem of understanding human speech, there is currently growing interest in moving beyond speech recognition (the task of transcribing the words in an audio stream) and towards…
Automated Assessment of Speech Fluency for L2 English Learners

ERIC Educational Resources Information Center

Yoon, Su-Youn

2009-01-01

This dissertation provides an automated scoring method of speech fluency for second language learners of English (L2 learners) based that uses speech recognition technology. Non-standard pronunciation, frequent disfluencies, faulty grammar, and inappropriate lexical choices are crucial characteristics of L2 learners' speech. Due to the ease of…
Use of Computer Speech Technologies To Enhance Learning.

ERIC Educational Resources Information Center

Ferrell, Joe

1999-01-01

Discusses the design of an innovative learning system that uses new technologies for the man-machine interface, incorporating a combination of Automatic Speech Recognition (ASR) and Text To Speech (TTS) synthesis. Highlights include using speech technologies to mimic the attributes of the ideal tutor and design features. (AEF)
Enhancing Speech Intelligibility: Interactions among Context, Modality, Speech Style, and Masker

ERIC Educational Resources Information Center

Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath

2014-01-01

Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…
The Affordance of Speech Recognition Technology for EFL Learning in an Elementary School Setting

ERIC Educational Resources Information Center

Liaw, Meei-Ling

2014-01-01

This study examined the use of speech recognition (SR) technology to support a group of elementary school children's learning of English as a foreign language (EFL). SR technology has been used in various language learning contexts. Its application to EFL teaching and learning is still relatively recent, but a solid understanding of its…
The Effect of Automatic Speech Recognition Eyespeak Software on Iraqi Students' English Pronunciation: A Pilot Study

ERIC Educational Resources Information Center

Sidgi, Lina Fathi Sidig; Shaari, Ahmad Jelani

2017-01-01

The use of technology, such as computer-assisted language learning (CALL), is used in teaching and learning in the foreign language classrooms where it is most needed. One promising emerging technology that supports language learning is automatic speech recognition (ASR). Integrating such technology, especially in the instruction of pronunciation…
Machine Learning Through Signature Trees. Applications to Human Speech.

ERIC Educational Resources Information Center

White, George M.

A signature tree is a binary decision tree used to classify unknown patterns. An attempt was made to develop a computer program for manipulating signature trees as a general research tool for exploring machine learning and pattern recognition. The program was applied to the problem of speech recognition to test its effectiveness for a specific…

Apps, iPads, and Literacy: Examining the Feasibility of Speech Recognition in a First-Grade Classroom

ERIC Educational Resources Information Center

Baker, Elizabeth A.

2017-01-01

Informed by sociocultural and systems theory tenets, this study used ethnographic research methods to examine the feasibility of using speech recognition (SR) technology to support struggling readers in an early elementary classroom setting. Observations of eight first graders were conducted as they participated in a structured SR-supported…
User Experience of a Mobile Speaking Application with Automatic Speech Recognition for EFL Learning

ERIC Educational Resources Information Center

Ahn, Tae youn; Lee, Sangmin-Michelle

2016-01-01

With the spread of mobile devices, mobile phones have enormous potential regarding their pedagogical use in language education. The goal of this study is to analyse user experience of a mobile-based learning system that is enhanced by speech recognition technology for the improvement of EFL (English as a foreign language) learners' speaking…
[The role of temporal fine structure in tone recognition and music perception].

PubMed

Zhou, Q; Gu, X; Liu, B

2017-11-07

The sound signal can be decomposed into temporal envelope and temporal fine structure information. The temporal envelope information is crucial for speech perception in quiet environment, and the temporal fine structure information plays an important role in speech perception in noise, Mandarin tone recognition and music perception, especially the pitch and melody perception.
EduSpeak[R]: A Speech Recognition and Pronunciation Scoring Toolkit for Computer-Aided Language Learning Applications

ERIC Educational Resources Information Center

Franco, Horacio; Bratt, Harry; Rossier, Romain; Rao Gadde, Venkata; Shriberg, Elizabeth; Abrash, Victor; Precoda, Kristin

2010-01-01

SRI International's EduSpeak[R] system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. Automatic pronunciation scoring allows the computer to provide feedback on the overall quality of pronunciation and to point to…
Investigating an Innovative Computer Application to Improve L2 Word Recognition from Speech

ERIC Educational Resources Information Center

Matthews, Joshua; O'Toole, John Mitchell

2015-01-01

The ability to recognise words from the aural modality is a critical aspect of successful second language (L2) listening comprehension. However, little research has been reported on computer-mediated development of L2 word recognition from speech in L2 learning contexts. This report describes the development of an innovative computer application…
Using Automatic Speech Recognition Technology with Elicited Oral Response Testing

ERIC Educational Resources Information Center

Cox, Troy L.; Davies, Randall S.

2012-01-01

This study examined the use of automatic speech recognition (ASR) scored elicited oral response (EOR) tests to assess the speaking ability of English language learners. It also examined the relationship between ASR-scored EOR and other language proficiency measures and the ability of the ASR to rate speakers without bias to gender or native…
Performance Evaluation of Speech Recognition Systems as a Next-Generation Pilot-Vehicle Interface Technology

NASA Technical Reports Server (NTRS)

Arthur, Jarvis J., III; Shelton, Kevin J.; Prinzel, Lawrence J., III; Bailey, Randall E.

2016-01-01

During the flight trials known as Gulfstream-V Synthetic Vision Systems Integrated Technology Evaluation (GV-SITE), a Speech Recognition System (SRS) was used by the evaluation pilots. The SRS system was intended to be an intuitive interface for display control (rather than knobs, buttons, etc.). This paper describes the performance of the current "state of the art" Speech Recognition System (SRS). The commercially available technology was evaluated as an application for possible inclusion in commercial aircraft flight decks as a crew-to-vehicle interface. Specifically, the technology is to be used as an interface from aircrew to the onboard displays, controls, and flight management tasks. A flight test of a SRS as well as a laboratory test was conducted.
Cortical Auditory Evoked Potentials to Evaluate Cochlear Implant Candidacy in an Ear With Long-standing Hearing Loss: A Case Report.

PubMed

Patel, Tirth R; Shahin, Antoine J; Bhat, Jyoti; Welling, D Bradley; Moberly, Aaron C

2016-10-01

We describe a novel use of cortical auditory evoked potentials in the preoperative workup to determine ear candidacy for cochlear implantation. A 71-year-old male was evaluated who had a long-deafened right ear, had never worn a hearing aid in that ear, and relied heavily on use of a left-sided hearing aid. Electroencephalographic testing was performed using free field auditory stimulation of each ear independently with pure tones at 1000 and 2000 Hz at approximately 10 dB above pure-tone thresholds for each frequency and for each ear. Mature cortical potentials were identified through auditory stimulation of the long-deafened ear. The patient underwent successful implantation of that ear. He experienced progressively improving aided pure-tone thresholds and binaural speech recognition benefit (AzBio score of 74%). Findings suggest that use of cortical auditory evoked potentials may serve a preoperative role in ear selection prior to cochlear implantation. © The Author(s) 2016.
The effect of hearing aid technologies on listening in an automobile

PubMed Central

Wu, Yu-Hsiang; Stangl, Elizabeth; Bentler, Ruth A.; Stanziola, Rachel W.

2014-01-01

Background Communication while traveling in an automobile often is very difficult for hearing aid users. This is because the automobile /road noise level is usually high, and listeners/drivers often do not have access to visual cues. Since the talker of interest usually is not located in front of the driver/listener, conventional directional processing that places the directivity beam toward the listener’s front may not be helpful, and in fact, could have a negative impact on speech recognition (when compared to omnidirectional processing). Recently, technologies have become available in commercial hearing aids that are designed to improve speech recognition and/or listening effort in noisy conditions where talkers are located behind or beside the listener. These technologies include (1) a directional microphone system that uses a backward-facing directivity pattern (Back-DIR processing), (2) a technology that transmits audio signals from the ear with the better signal-to-noise ratio (SNR) to the ear with the poorer SNR (Side-Transmission processing), and (3) a signal processing scheme that suppresses the noise at the ear with the poorer SNR (Side-Suppression processing). Purpose The purpose of the current study was to determine the effect of (1) conventional directional microphones and (2) newer signal processing schemes (Back-DIR, Side-Transmission, and Side-Suppression) on listener’s speech recognition performance and preference for communication in a traveling automobile. Research design A single-blinded, repeated-measures design was used. Study Sample Twenty-five adults with bilateral symmetrical sensorineural hearing loss aged 44 through 84 years participated in the study. Data Collection and Analysis The automobile/road noise and sentences of the Connected Speech Test (CST) were recorded through hearing aids in a standard van moving at a speed of 70 miles/hour on a paved highway. The hearing aids were programmed to omnidirectional microphone, conventional adaptive directional microphone, and the three newer schemes. CST sentences were presented from the side and back of the hearing aids, which were placed on the ears of a manikin. The recorded stimuli were presented to listeners via earphones in a sound treated booth to assess speech recognition performance and preference with each programmed condition. Results Compared to omnidirectional microphones, conventional adaptive directional processing had a detrimental effect on speech recognition when speech was presented from the back or side of the listener. Back-DIR and Side-Transmission processing improved speech recognition performance (relative to both omnidirectional and adaptive directional processing) when speech was from the back and side, respectively. The performance with Side-Suppression processing was better than with adaptive directional processing when speech was from the side. The participants’ preferences for a given processing scheme were generally consistent with speech recognition results. Conclusions The finding that performance with adaptive directional processing was poorer than with omnidirectional microphones demonstrates the importance of selecting the correct microphone technology for different listening situations. The results also suggest the feasibility of using hearing aid technologies to provide a better listening experience for hearing aid users in automobiles. PMID:23886425
Identification of Pure-Tone Audiologic Thresholds for Pediatric Cochlear Implant Candidacy: A Systematic Review.

PubMed

de Kleijn, Jasper L; van Kalmthout, Ludwike W M; van der Vossen, Martijn J B; Vonck, Bernard M D; Topsakal, Vedat; Bruijnzeel, Hanneke

2018-05-24

Although current guidelines recommend cochlear implantation only for children with profound hearing impairment (HI) (>90 decibel [dB] hearing level [HL]), studies show that children with severe hearing impairment (>70-90 dB HL) could also benefit from cochlear implantation. To perform a systematic review to identify audiologic thresholds (in dB HL) that could serve as an audiologic candidacy criterion for pediatric cochlear implantation using 4 domains of speech and language development as independent outcome measures (speech production, speech perception, receptive language, and auditory performance). PubMed and Embase databases were searched up to June 28, 2017, to identify studies comparing speech and language development between children who were profoundly deaf using cochlear implants and children with severe hearing loss using hearing aids, because no studies are available directly comparing children with severe HI in both groups. If cochlear implant users with profound HI score better on speech and language tests than those with severe HI who use hearing aids, this outcome could support adjusting cochlear implantation candidacy criteria to lower audiologic thresholds. Literature search, screening, and article selection were performed using a predefined strategy. Article screening was executed independently by 4 authors in 2 pairs; consensus on article inclusion was reached by discussion between these 4 authors. This study is reported according to the Preferred Reporting Items for Systematic Review and Meta-analysis (PRISMA) statement. Title and abstract screening of 2822 articles resulted in selection of 130 articles for full-text review. Twenty-one studies were selected for critical appraisal, resulting in selection of 10 articles for data extraction. Two studies formulated audiologic thresholds (in dB HLs) at which children could qualify for cochlear implantation: (1) at 4-frequency pure-tone average (PTA) thresholds of 80 dB HL or greater based on speech perception and auditory performance subtests and (2) at PTA thresholds of 88 and 96 dB HL based on a speech perception subtest. In 8 of the 18 outcome measures, children with profound HI using cochlear implants performed similarly to children with severe HI using hearing aids. Better performance of cochlear implant users was shown with a picture-naming test and a speech perception in noise test. Owing to large heterogeneity in study population and selected tests, it was not possible to conduct a meta-analysis. Studies indicate that lower audiologic thresholds (≥80 dB HL) than are advised in current national and manufacturer guidelines would be appropriate as audiologic candidacy criteria for pediatric cochlear implantation.
Recognition ROCS Are Curvilinear--Or Are They? On Premature Arguments against the Two-High-Threshold Model of Recognition

ERIC Educational Resources Information Center

Broder, Arndt; Schutz, Julia

2009-01-01

Recent reviews of recognition receiver operating characteristics (ROCs) claim that their curvilinear shape rules out threshold models of recognition. However, the shape of ROCs based on confidence ratings is not diagnostic to refute threshold models, whereas ROCs based on experimental bias manipulations are. Also, fitting predicted frequencies to…
Speech Perception for Adult Cochlear Implant Recipients in a Realistic Background Noise: Effectiveness of Preprocessing Strategies and External Options for Improving Speech Recognition in Noise

PubMed Central

Gifford, René H.; Revit, Lawrence J.

2014-01-01

Background Although cochlear implant patients are achieving increasingly higher levels of performance, speech perception in noise continues to be problematic. The newest generations of implant speech processors are equipped with preprocessing and/or external accessories that are purported to improve listening in noise. Most speech perception measures in the clinical setting, however, do not provide a close approximation to real-world listening environments. Purpose To assess speech perception for adult cochlear implant recipients in the presence of a realistic restaurant simulation generated by an eight-loudspeaker (R-SPACE™) array in order to determine whether commercially available preprocessing strategies and/or external accessories yield improved sentence recognition in noise. Research Design Single-subject, repeated-measures design with two groups of participants: Advanced Bionics and Cochlear Corporation recipients. Study Sample Thirty-four subjects, ranging in age from 18 to 90 yr (mean 54.5 yr), participated in this prospective study. Fourteen subjects were Advanced Bionics recipients, and 20 subjects were Cochlear Corporation recipients. Intervention Speech reception thresholds (SRTs) in semidiffuse restaurant noise originating from an eight-loudspeaker array were assessed with the subjects’ preferred listening programs as well as with the addition of either Beam™ preprocessing (Cochlear Corporation) or the T-Mic® accessory option (Advanced Bionics). Data Collection and Analysis In Experiment 1, adaptive SRTs with the Hearing in Noise Test sentences were obtained for all 34 subjects. For Cochlear Corporation recipients, SRTs were obtained with their preferred everyday listening program as well as with the addition of Focus preprocessing. For Advanced Bionics recipients, SRTs were obtained with the integrated behind-the-ear (BTE) mic as well as with the T-Mic. Statistical analysis using a repeated-measures analysis of variance (ANOVA) evaluated the effects of the preprocessing strategy or external accessory in reducing the SRT in noise. In addition, a standard t-test was run to evaluate effectiveness across manufacturer for improving the SRT in noise. In Experiment 2, 16 of the 20 Cochlear Corporation subjects were reassessed obtaining an SRT in noise using the manufacturer-suggested “Everyday,” “Noise,” and “Focus” preprocessing strategies. A repeated-measures ANOVA was employed to assess the effects of preprocessing. Results The primary findings were (i) both Noise and Focus preprocessing strategies (Cochlear Corporation) significantly improved the SRT in noise as compared to Everyday preprocessing, (ii) the T-Mic accessory option (Advanced Bionics) significantly improved the SRT as compared to the BTE mic, and (iii) Focus preprocessing and the T-Mic resulted in similar degrees of improvement that were not found to be significantly different from one another. Conclusion Options available in current cochlear implant sound processors are able to significantly improve speech understanding in a realistic, semidiffuse noise with both Cochlear Corporation and Advanced Bionics systems. For Cochlear Corporation recipients, Focus preprocessing yields the best speech-recognition performance in a complex listening environment; however, it is recommended that Noise preprocessing be used as the new default for everyday listening environments to avoid the need for switching programs throughout the day. For Advanced Bionics recipients, the T-Mic offers significantly improved performance in noise and is recommended for everyday use in all listening environments. PMID:20807480
Automatic speech recognition using a predictive echo state network classifier.

PubMed

Skowronski, Mark D; Harris, John G

2007-04-01

We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.
A Mis-recognized Medical Vocabulary Correction System for Speech-based Electronic Medical Record

PubMed Central

Seo, Hwa Jeong; Kim, Ju Han; Sakabe, Nagamasa

2002-01-01

Speech recognition as an input tool for electronic medical record (EMR) enables efficient data entry at the point of care. However, the recognition accuracy for medical vocabulary is much poorer than that for doctor-patient dialogue. We developed a mis-recognized medical vocabulary correction system based on syllable-by-syllable comparison of speech text against medical vocabulary database. Using specialty medical vocabulary, the algorithm detects and corrects mis-recognized medical vocabularies in narrative text. Our preliminary evaluation showed 94% of accuracy in mis-recognized medical vocabulary correction.
Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

NASA Astrophysics Data System (ADS)

Malcangi, Mario

2009-08-01

Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.
Technological evaluation of gesture and speech interfaces for enabling dismounted soldier-robot dialogue

NASA Astrophysics Data System (ADS)

Kattoju, Ravi Kiran; Barber, Daniel J.; Abich, Julian; Harris, Jonathan

2016-05-01

With increasing necessity for intuitive Soldier-robot communication in military operations and advancements in interactive technologies, autonomous robots have transitioned from assistance tools to functional and operational teammates able to service an array of military operations. Despite improvements in gesture and speech recognition technologies, their effectiveness in supporting Soldier-robot communication is still uncertain. The purpose of the present study was to evaluate the performance of gesture and speech interface technologies to facilitate Soldier-robot communication during a spatial-navigation task with an autonomous robot. Gesture and speech semantically based spatial-navigation commands leveraged existing lexicons for visual and verbal communication from the U.S Army field manual for visual signaling and a previously established Squad Level Vocabulary (SLV). Speech commands were recorded by a Lapel microphone and Microsoft Kinect, and classified by commercial off-the-shelf automatic speech recognition (ASR) software. Visual signals were captured and classified using a custom wireless gesture glove and software. Participants in the experiment commanded a robot to complete a simulated ISR mission in a scaled down urban scenario by delivering a sequence of gesture and speech commands, both individually and simultaneously, to the robot. Performance and reliability of gesture and speech hardware interfaces and recognition tools were analyzed and reported. Analysis of experimental results demonstrated the employed gesture technology has significant potential for enabling bidirectional Soldier-robot team dialogue based on the high classification accuracy and minimal training required to perform gesture commands.
Hidden Markov models in automatic speech recognition

NASA Astrophysics Data System (ADS)

Wrzoskowicz, Adam

1993-11-01

This article describes a method for constructing an automatic speech recognition system based on hidden Markov models (HMMs). The author discusses the basic concepts of HMM theory and the application of these models to the analysis and recognition of speech signals. The author provides algorithms which make it possible to train the ASR system and recognize signals on the basis of distinct stochastic models of selected speech sound classes. The author describes the specific components of the system and the procedures used to model and recognize speech. The author discusses problems associated with the choice of optimal signal detection and parameterization characteristics and their effect on the performance of the system. The author presents different options for the choice of speech signal segments and their consequences for the ASR process. The author gives special attention to the use of lexical, syntactic, and semantic information for the purpose of improving the quality and efficiency of the system. The author also describes an ASR system developed by the Speech Acoustics Laboratory of the IBPT PAS. The author discusses the results of experiments on the effect of noise on the performance of the ASR system and describes methods of constructing HMM's designed to operate in a noisy environment. The author also describes a language for human-robot communications which was defined as a complex multilevel network from an HMM model of speech sounds geared towards Polish inflections. The author also added mandatory lexical and syntactic rules to the system for its communications vocabulary.
Speech-on-speech masking with variable access to the linguistic content of the masker speech for native and non-native speakers of English

PubMed Central

Calandruccio, Lauren; Bradlow, Ann R.; Dhar, Sumitrajit

2013-01-01

Background Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared to native-accented English speech was reported in Calandruccio, Dhar and Bradlow (2010). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing the significant differences observed. Purpose The purpose of the current experiment was to minimize spectral differences between speech maskers to determine how various amounts of linguistic information within competing speech affect masking release. Research Design A mixed model design with within- (four two-talker speech maskers) and between-subject (listener group) factors was conducted. Speech maskers included native-accented English speech, and high-intelligibility, moderate-intelligibility and low-intelligibility Mandarin-accented English. Normalizing the long-term average speech spectra of the maskers to each other minimized spectral differences between the masker conditions. Study Sample Three listener groups were tested including monolingual English speakers with normal hearing, non-native speakers of English with normal hearing, and monolingual speakers of English with hearing loss. The non-native speakers of English were from various native-language backgrounds, not including Mandarin (or any other Chinese dialect). Listeners with hearing loss had symmetrical, mild sloping to moderate sensorineural hearing loss. Data Collection and Analysis Listeners were asked to repeat back sentences that were presented in the presence of four different two-talker speech maskers. Responses were scored based on the keywords within the sentences (100 keywords/masker condition). A mixed-model regression analysis was used to analyze the difference in performance scores between the masker conditions and the listener groups. Results Monolingual speakers of English with normal hearing benefited when the competing speech signal was foreign-accented compared to native-accented allowing for improved speech recognition. Various levels of intelligibility across the foreign-accented speech maskers did not influence results. Neither the non-native English listeners with normal hearing, nor the monolingual English speakers with hearing loss benefited from masking release when the masker was changed from native-accented to foreign-accented English. Conclusions Slight modifications between the target and the masker speech allowed monolingual speakers of English with normal hearing to improve their recognition of native-accented English even when the competing speech was highly intelligible. Further research is needed to determine which modifications within the competing speech signal caused the Mandarin-accented English to be less effective with respect to masking. Determining the influences within the competing speech that make it less effective as a masker, or determining why monolingual normal-hearing listeners can take advantage of these differences could help improve speech recognition for those with hearing loss in the future. PMID:25126683
Fifty years of progress in acoustic phonetics

NASA Astrophysics Data System (ADS)

Stevens, Kenneth N.

2004-10-01

Three events that occurred 50 or 60 years ago shaped the study of acoustic phonetics, and in the following few decades these events influenced research and applications in speech disorders, speech development, speech synthesis, speech recognition, and other subareas in speech communication. These events were: (1) the source-filter theory of speech production (Chiba and Kajiyama; Fant); (2) the development of the sound spectrograph and its interpretation (Potter, Kopp, and Green; Joos); and (3) the birth of research that related distinctive features to acoustic patterns (Jakobson, Fant, and Halle). Following these events there has been systematic exploration of the articulatory, acoustic, and perceptual bases of phonological categories, and some quantification of the sources of variability in the transformation of this phonological representation of speech into its acoustic manifestations. This effort has been enhanced by studies of how children acquire language in spite of this variability and by research on speech disorders. Gaps in our knowledge of this inherent variability in speech have limited the directions of applications such as synthesis and recognition of speech, and have led to the implementation of data-driven techniques rather than theoretical principles. Some examples of advances in our knowledge, and limitations of this knowledge, are reviewed.
Restoring the missing features of the corrupted speech using linear interpolation methods

NASA Astrophysics Data System (ADS)

Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

2017-10-01

One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.