Speech intelligibility (SI) is important for different fields of research, engineering and diagnostics in order to quantify very different phenomena like the quality of recordings, communication and playback devices, the reverberation of auditoria, characteristics of hearing impairment, benefit using hearing aids or combinations of these things.
Christensen, John M.; Dwyer, Patricia E.
Laryngectomized patients using esophageal speech or an electronic artificial larynx have difficulty producing correct voicing contrasts between homorganic consonants. This paper describes a therapy technique that emphasizes "pushing harder" on voiceless consonants to improve alaryngeal speech intelligibility and proposes focusing on the…
Ryherd, Erica E; Moeller, Michael; Hsu, Timothy
Effective communication between staff members is key to patient safety in hospitals. A variety of patient care activities including admittance, evaluation, and treatment rely on oral communication. Surprisingly, published information on speech intelligibility in hospitals is extremely limited. In this study, speech intelligibility measurements and occupant evaluations were conducted in 20 units of five different U.S. hospitals. A variety of unit types and locations were studied. Results show that overall, no unit had "good" intelligibility based on the speech intelligibility index (SII > 0.75) and several locations found to have "poor" intelligibility (SII < 0.45). Further, occupied spaces were found to have 10%-15% lower SII than unoccupied spaces on average. Additionally, staff perception of communication problems at nurse stations was significantly correlated with SII ratings. In a targeted second phase, a unit treated with sound absorption had higher SII ratings for a larger percentage of time as compared to an identical untreated unit. Taken as a whole, the study provides an extensive baseline evaluation of speech intelligibility across a variety of hospitals and unit types, offers some evidence of the positive impact of absorption on intelligibility, and identifies areas for future research.
Babel, Molly; Russell, Jamie
Socio-indexical cues and paralinguistic information are often beneficial to speech processing as this information assists listeners in parsing the speech stream. Associations that particular populations speak in a certain speech style can, however, make it such that socio-indexical cues have a cost. In this study, native speakers of Canadian English who identify as Chinese Canadian and White Canadian read sentences that were presented to listeners in noise. Half of the sentences were presented with a visual-prime in the form of a photo of the speaker and half were presented in control trials with fixation crosses. Sentences produced by Chinese Canadians showed an intelligibility cost in the face-prime condition, whereas sentences produced by White Canadians did not. In an accentedness rating task, listeners rated White Canadians as less accented in the face-prime trials, but Chinese Canadians showed no such change in perceived accentedness. These results suggest a misalignment between an expected and an observed speech signal for the face-prime trials, which indicates that social information about a speaker can trigger linguistic associations that come with processing benefits and costs.
Bent, Tessa; Bradlow, Ann R.
This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n=2), Korean (n=2), and English (n=1) were recorded reading simple English sentences. Native listeners of English (n=21), Chinese (n=21), Korean (n=10), and a mixed group from various native language backgrounds (n=12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the ``matched interlanguage speech intelligibility benefit.'' Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the ``mismatched interlanguage speech intelligibility benefit.'' These findings shed light on the nature of the talker-listener interaction during speech communication.
Kates, James M.; Arehart, Kathryn H.
The speech intelligibility index (SII) (ANSI S3.5-1997) provides a means for estimating speech intelligibility under conditions of additive stationary noise or bandwidth reduction. The SII concept for estimating intelligibility is extended in this paper to include broadband peak-clipping and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects. The speech intelligibility predictions using the new procedure are compared with intelligibility scores obtained from normal-hearing and hearing-impaired subjects for conditions of additive noise and peak-clipping and center-clipping distortion. The most effective procedure divides the speech signal into low-, mid-, and high-level regions, computes the coherence SII separately for the signal segments in each region, and then estimates intelligibility from a weighted combination of the three coherence SII values. .
Kates, James M; Arehart, Kathryn H
The speech intelligibility index (SII) (ANSI S3.5-1997) provides a means for estimating speech intelligibility under conditions of additive stationary noise or bandwidth reduction. The SII concept for estimating intelligibility is extended in this paper to include broadband peak-clipping and center-clipping distortion, with the coherence between the input and output signals used to estimate the noise and distortion effects. The speech intelligibility predictions using the new procedure are compared with intelligibility scores obtained from normal-hearing and hearing-impaired subjects for conditions of additive noise and peak-clipping and center-clipping distortion. The most effective procedure divides the speech signal into low-, mid-, and high-level regions, computes the coherence SII separately for the signal segments in each region, and then estimates intelligibility from a weighted combination of the three coherence SII values.
Chen, Fei; Loizou, Philipos C.
Objectives The purpose of this study is to evaluate the performance of a number of speech intelligibility indices in terms of predicting the intelligibility of vocoded speech. Design Noise-corrupted sentences were vocoded in a total of 80 conditions, involving three different SNR levels (-5, 0 and 5 dB) and two types of maskers (steady-state noise and two-talker). Tone-vocoder simulations were used as well as simulations of combined electric-acoustic stimulation (EAS). The vocoded sentences were presented to normal-hearing listeners for identification, and the resulting intelligibility scores were used to assess the correlation of various speech intelligibility measures. These included measures designed to assess speech intelligibility, including the speech-transmission index (STI) and articulation index (AI) based measures, as well as distortions in hearing aids (e.g., coherence-based measures). These measures employed primarily either the temporal-envelope or the spectral-envelope information in the prediction model. The underlying hypothesis in the present study is that measures that assess temporal envelope distortions, such as those based on the speech-transmission index, should correlate highly with the intelligibility of vocoded speech. This is based on the fact that vocoder simulations preserve primarily envelope information, similar to the processing implemented in current cochlear implant speech processors. Similarly, it is hypothesized that measures such as the coherence-based index that assess the distortions present in the spectral envelope could also be used to model the intelligibility of vocoded speech. Results Of all the intelligibility measures considered, the coherence-based and the STI-based measures performed the best. High correlations (r=0.9-0.96) were maintained with the coherence-based measures in all noisy conditions. The highest correlation obtained with the STI-based measure was 0.92, and that was obtained when high modulation rates (100
PURPOSE The purpose of this study was to compare men with women in terms of speech intelligibility, to investigate the validity of objective acoustic parameters related with speech intelligibility, and to try to set up the standard data for the future study in various field in prosthodontics. MATERIALS AND METHODS Twenty men and women were served as subjects in the present study. After recording of sample sounds, speech intelligibility tests by three speech pathologists and acoustic analyses were performed. Comparison of the speech intelligibility test scores and acoustic parameters such as fundamental frequency, fundamental frequency range, formant frequency, formant ranges, vowel working space area, and vowel dispersion were done between men and women. In addition, the correlations between the speech intelligibility values and acoustic variables were analyzed. RESULTS Women showed significantly higher speech intelligibility scores than men and there were significant difference between men and women in most of acoustic parameters used in the present study. However, the correlations between the speech intelligibility scores and acoustic parameters were low. CONCLUSION Speech intelligibility test and acoustic parameters used in the present study were effective in differentiating male voice from female voice and their values might be used in the future studies related patients involved with maxillofacial prosthodontics. However, further studies are needed on the correlation between speech intelligibility tests and objective acoustic parameters. PMID:21165272
Bender, Brenda K.; Cannito, Michael P.; Murry, Thomas; Woodson, Gayle E.
This study compared speech intelligibility in nondisabled speakers and speakers with adductor spasmodic dysphonia (ADSD) before and after botulinum toxin (Botox) injection. Standard speech samples were obtained from 10 speakers diagnosed with severe ADSD prior to and 1 month following Botox injection, as well as from 10 age- and gender-matched…
Jørgensen, Søren; Dau, Torsten
Jørgensen and Dau (J Acoust Soc Am 130:1475-1487, 2011) proposed the speech-based envelope power spectrum model (sEPSM) in an attempt to overcome the limitations of the classical speech transmission index (STI) and speech intelligibility index (SII) in conditions with nonlinearly processed speech. Instead of considering the reduction of the temporal modulation energy as the intelligibility metric, as assumed in the STI, the sEPSM applies the signal-to-noise ratio in the envelope domain (SNRenv). This metric was shown to be the key for predicting the intelligibility of reverberant speech as well as noisy speech processed by spectral subtraction. The key role of the SNRenv metric is further supported here by the ability of a short-term version of the sEPSM to predict speech masking release for different speech materials and modulated interferers. However, the sEPSM cannot account for speech subjected to phase jitter, a condition in which the spectral structure of the intelligibility of speech signal is strongly affected, while the broadband temporal envelope is kept largely intact. In contrast, the effects of this distortion can be predicted -successfully by the spectro-temporal modulation index (STMI) (Elhilali et al., Speech Commun 41:331-348, 2003), which assumes an explicit analysis of the spectral "ripple" structure of the speech signal. However, since the STMI applies the same decision metric as the STI, it fails to account for spectral subtraction. The results from this study suggest that the SNRenv might reflect a powerful decision metric, while some explicit across-frequency analysis seems crucial in some conditions. How such across-frequency analysis is "realized" in the auditory system remains unresolved.
Fontan, Lionel; Tardieu, Julien; Gaillard, Pascal; Woisard, Virginie; Ruiz, Robert
Purpose: The authors investigated the relationship between the intelligibility and comprehension of speech presented in babble noise. Method: Forty participants listened to French imperative sentences (commands for moving objects) in a multitalker babble background for which intensity was experimentally controlled. Participants were instructed to…
Cruikshank, Matthew E.; Carney, Melinda J.; Cheenne, Dominique J.
Nine small volume classrooms in schools located in the Chicago suburbs were tested to quantify speech intelligibility at various seat locations. Several popular intelligibility metrics were investigated, including Speech Transmission Index (STI), %Alcons, Signal to Noise Ratios (SNR), and 80 ms Useful/Detrimental Ratios (U80). Incorrect STI values were experienced in high noise environments, while the U80s and the SNRs were found to be the most accurate methodologies. Test results are evaluated against the guidelines of ANSI S12.60-2002, and match the data from previous research.
Lam, Jennifer; Tjaden, Kris
Purpose: The authors investigated how clear speech instructions influence sentence intelligibility. Method: Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis…
Lam, Choi Ling Coriolanus
One of the major acoustical concerns in classrooms is the establishment of effective verbal communication between teachers and students. Non-optimal acoustical conditions, resulting in reduced verbal communication, can cause two main problems. First, they can lead to reduce learning efficiency. Second, they can also cause fatigue, stress, vocal strain and health problems, such as headaches and sore throats, among teachers who are forced to compensate for poor acoustical conditions by raising their voices. Besides, inadequate acoustical conditions can induce the usage of public address system. Improper usage of such amplifiers or loudspeakers can lead to impairment of students' hearing systems. The social costs of poor classroom acoustics will be large to impair the learning of children. This invisible problem has far reaching implications for learning, but is easily solved. Many researches have been carried out that they have accurately and concisely summarized the research findings on classrooms acoustics. Though, there is still a number of challenging questions remaining unanswered. Most objective indices for speech intelligibility are essentially based on studies of western languages. Even several studies of tonal languages as Mandarin have been conducted, there is much less on Cantonese. In this research, measurements have been done in unoccupied rooms to investigate the acoustical parameters and characteristics of the classrooms. The speech intelligibility tests, which based on English, Mandarin and Cantonese, and the survey were carried out on students aged from 5 years old to 22 years old. It aims to investigate the differences in intelligibility between English, Mandarin and Cantonese of the classrooms in Hong Kong. The significance on speech transmission index (STI) related to Phonetically Balanced (PB) word scores will further be developed. Together with developed empirical relationship between the speech intelligibility in classrooms with the variations
Wong, Lena L N; Ho, Amy H S; Chua, Elizabeth W W; Soli, Sigfrid D
A Speech Intelligibility Index (SII) for the sentences in the Cantonese version of the Hearing In Noise Test (CHINT) was derived using conventional procedures described previously in studies such as Studebaker and Sherbecoe [J. Speech Hear. Res. 34, 427-438 (1991)]. Two studies were conducted to determine the signal-to-noise ratios and high- and low-pass filtering conditions that should be used and to measure speech intelligibility in these conditions. Normal hearing subjects listened to the sentences presented in speech-spectrum shaped noise. Compared to other English speech assessment materials such as the English Hearing In Noise Test [Nilsson et al., J. Acoust. Soc. Am. 95, 1085-1099 (1994)], the frequency importance function of the CHINT suggests that low-frequency information is more important for Cantonese speech understanding. The difference in ,frequency importance weight in Chinese, compared to English, was attributed to the redundancy of test material, tonal nature of the Cantonese language, or a combination of these factors.
Englert, Sue Ellen
The present experiment was designed to investigate and understand the causes of failures of the Articulation Index as a predictive tool. An electroacoustic system was used in which: (1) The frequency response was optimally flattened at the listener's ear. (2) An ear-insert earphone was designed to give close electroacoustic control. (3) An infinite-impulse-response digital filter was used to filter the speech signal from a pre-recorded nonsense syllable test. (4) Four formant regions were filtered in fourteen different ways. It was found that the results agreed with past experiments in that: (1) The Articulation Index fails as a predictive tool when using band-pass filters. (2) Low frequencies seem to mask higher frequencies causing a decrease in intelligibility. It was concluded that: (1) It is inappropriate to relate the total fraction of the speech spectrum to a specific intelligibility score since the fraction remaining after filtering may be in the low-, mid-, or high-frequency range. (2) The relationship between intelligibility and the total area under the spectral curve is not monotonic. (3) The fourth formant region (2925Hz to 4200Hz) enhanced intelligibility when included with other formant regions. Methods for relating spectral regions and intelligibility were discussed.
Pearsons, K. S.; Bennett, R. L.
Recordings of the aircraft ambiance from ten different types of aircraft were used in conjunction with four distinct speech interference tests as stimuli to determine the effects of interior aircraft background levels and speech intelligibility on perceived annoyance in 36 subjects. Both speech intelligibility and background level significantly affected judged annoyance. However, the interaction between the two variables showed that above an 85 db background level the speech intelligibility results had a minimal effect on annoyance ratings. Below this level, people rated the background as less annoying if there was adequate speech intelligibility.
RESPIRATOR SPEECH INTELLIGIBILITY TESTING WITH AN EXPERIENCED...2. REPORT TYPE Final 3. DATES COVERED (From - To) Oct 2008 - Jun 2009 4. TITLE AND SUBTITLE Respirator Speech Intelligibility Testing with an...14. ABSTRACT The Modified Rhyme Test (MRT) is used by the National Institute for Occupational Safety and Health (NIOSH) to assess speech
Hustad, Katherine C.; Oakes, Ashley; Allison, Kristen
Purpose: We examined variability of speech intelligibility scores and how well intelligibility scores predicted group membership among 5-year-old children with speech motor impairment (SMI) secondary to cerebral palsy and an age-matched group of typically developing (TD) children. Method: Speech samples varying in length from 1-4 words were…
Loizou, Philipos C; Kim, Gibak
Existing speech enhancement algorithms can improve speech quality but not speech intelligibility, and the reasons for that are unclear. In the present paper, we present a theoretical framework that can be used to analyze potential factors that can influence the intelligibility of processed speech. More specifically, this framework focuses on the fine-grain analysis of the distortions introduced by speech enhancement algorithms. It is hypothesized that if these distortions are properly controlled, then large gains in intelligibility can be achieved. To test this hypothesis, intelligibility tests are conducted with human listeners in which we present processed speech with controlled speech distortions. The aim of these tests is to assess the perceptual effect of the various distortions that can be introduced by speech enhancement algorithms on speech intelligibility. Results with three different enhancement algorithms indicated that certain distortions are more detrimental to speech intelligibility degradation than others. When these distortions were properly controlled, however, large gains in intelligibility were obtained by human listeners, even by spectral-subtractive algorithms which are known to degrade speech quality and intelligibility.
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose: Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystems approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method: Nine acoustic variables reflecting different subsystems, and…
Schuster, Maria; Haderlein, Tino; Nöth, Elmar; Lohscheller, Jörg; Eysholdt, Ulrich; Rosanowski, Frank
Substitute speech after laryngectomy is characterized by restricted aero-acoustic properties in comparison with laryngeal speech and has therefore lower intelligibility. Until now, an objective means to determine and quantify the intelligibility has not existed, although the intelligibility can serve as a global outcome parameter of voice restoration after laryngectomy. An automatic speech recognition system was applied on recordings of a standard text read by 18 German male laryngectomees with tracheoesophageal substitute speech. The system was trained with normal laryngeal speakers and not adapted to severely disturbed voices. Substitute speech was compared to laryngeal speech of a control group. Subjective evaluation of intelligibility was performed by a panel of five experts and compared to automatic speech evaluation. Substitute speech showed lower syllables/s and lower word accuracy than laryngeal speech. Automatic speech recognition for substitute speech yielded word accuracy between 10.0 and 50% (28.7+/-12.1%) with sufficient discrimination. It complied with experts' subjective evaluations of intelligibility. The multi-rater kappa of the experts alone did not differ from the multi-rater kappa of experts and the recognizer. Automatic speech recognition serves as a good means to objectify and quantify global speech outcome of laryngectomees. For clinical use, the speech recognition system will be adapted to disturbed voices and can also be applied in other languages.
Park, H K; Bradley, J S; Gover, B N
This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.
Van Nuffelen, Gwen; Middag, Catherine; De Bodt, Marc; Martens, Jean-Pierre
Background: Currently, clinicians mainly rely on perceptual judgements to assess intelligibility of dysarthric speech. Although often highly reliable, this procedure is subjective with a lot of intrinsic variables. Therefore, certain benefits can be expected from a speech technology-based intelligibility assessment. Previous attempts to develop an…
Eisenberg, Laurie S.; Dirks, Donald D.; Takayanagi, Sumiko; Martinez, Amy Schaefer
A study investigated subjective judgments of clarity and intelligibility in 20 listeners in conditions in which speech was equated for predicted intelligibility but varied in bandwidth. Listeners produced clarity and intelligibility ratings for the same speech material and experimental conditions that were highly related but differed in magnitude.…
Walshe, Margaret; Miller, Nick; Leahy, Margaret; Murray, Aisling
Background: Many factors influence listener perception of dysarthric speech. Final consensus on the role of gender and listener experience is still to be reached. The speaker's perception of his/her speech has largely been ignored. Aims: (1) To compare speaker and listener perception of the intelligibility of dysarthric speech; (2) to explore the…
Schubotz, Wiebke; Brand, Thomas; Kollmeier, Birger; Ewert, Stephan D
Speech intelligibility is strongly affected by the presence of maskers. Depending on the spectro-temporal structure of the masker and its similarity to the target speech, different masking aspects can occur which are typically referred to as energetic, amplitude modulation, and informational masking. In this study speech intelligibility and speech detection was measured in maskers that vary systematically in the time-frequency domain from steady-state noise to a single interfering talker. Male and female target speech was used in combination with maskers based on speech for the same or different gender. Observed data were compared to predictions of the speech intelligibility index, extended speech intelligibility index, multi-resolution speech-based envelope-power-spectrum model, and the short-time objective intelligibility measure. The different models served as analysis tool to help distinguish between the different masking aspects. Comparison shows that overall masking can to a large extent be explained by short-term energetic masking. However, the other masking aspects (amplitude modulation an informational masking) influence speech intelligibility as well. Additionally, it was obvious that all models showed considerable deviations from the data. Therefore, the current study provides a benchmark for further evaluation of speech prediction models.
Larm, Petra; Hongisto, Valtteri
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.
Kates, James; Arehart, Kathryn
Noise and distortion reduce the sound quality in hearing aids, but there is no established procedure for calculating sound quality in these devices. This presentation introduces a new intelligibility and sound-quality calculation procedure based on the Speech Intelligibility Index [ANSI S3.5-1997]. The SII involves measuring the signal-to-noise ratio (SNR) in separate frequency bands, modifying the estimated noise levels to include auditory masking, and computing a weighted sum across frequency of the modified SNR values. In the new procedure, the estimated signal and noise levels are replaced with estimates based on the coherence between the input and output signals of the system under test. Coherence is unaffected by linear transformations of the input signal, but is reduced by nonlinear effects such as additive noise and distortion; the SII calculation is therefore modified to include nonlinear distortion as well as additive noise. For additive noise, the coherence calculation gives SII scores identical to those computed using the standard procedure. Experiments with normal-hearing listeners using additive noise, peak-clipping distortion, and center-clipping distortion are then used to relate the computed coherence SII scores with the subjects' intelligibility and quality ratings. [Work supported by GN ReSound (JMK) and the Whitaker Foundation (KHA).
High speech intelligibility is imperative in classrooms where verbal communication is critical. However, the optimal acoustical conditions to achieve a high degree of speech intelligibility have previously been investigated with inconsistent results, and practical room-acoustical solutions to optimize the acoustical conditions for speech intelligibility have not been developed. This experimental study validated auralization for speech-intelligibility testing, investigated the optimal reverberation for speech intelligibility for both normal and hearing-impaired listeners using more realistic room-acoustical models, and proposed an optimal sound-control design for speech intelligibility based on the findings. The auralization technique was used to perform subjective speech-intelligibility tests. The validation study, comparing auralization results with those of real classroom speech-intelligibility tests, found that if the room to be auralized is not very absorptive or noisy, speech-intelligibility tests using auralization are valid. The speech-intelligibility tests were done in two different auralized sound fields---approximately diffuse and non-diffuse---using the Modified Rhyme Test and both normal and hearing-impaired listeners. A hybrid room-acoustical prediction program was used throughout the work, and it and a 1/8 scale-model classroom were used to evaluate the effects of ceiling barriers and reflectors. For both subject groups, in approximately diffuse sound fields, when the speech source was closer to the listener than the noise source, the optimal reverberation time was zero. When the noise source was closer to the listener than the speech source, the optimal reverberation time was 0.4 s (with another peak at 0.0 s) with relative output power levels of the speech and noise sources SNS = 5 dB, and 0.8 s with SNS = 0 dB. In non-diffuse sound fields, when the noise source was between the speaker and the listener, the optimal reverberation time was 0.6 s with
Lee, Jimin; Hustad, Katherine C.; Weismer, Gary
Purpose Speech acoustic characteristics of children with cerebral palsy (CP) were examined with a multiple speech subsystem approach; speech intelligibility was evaluated using a prediction model in which acoustic measures were selected to represent three speech subsystems. Method Nine acoustic variables reflecting different subsystems, and speech intelligibility, were measured in 22 children with CP. These children included 13 with a clinical diagnosis of dysarthria (SMI), and nine judged to be free of dysarthria (NSMI). Data from children with CP were compared to data from age-matched typically developing children (TD). Results Multiple acoustic variables reflecting the articulatory subsystem were different in the SMI group, compared to the NSMI and TD groups. A significant speech intelligibility prediction model was obtained with all variables entered into the model (Adjusted R-squared = .801). The articulatory subsystem showed the most substantial independent contribution (58%) to speech intelligibility. Incremental R-squared analyses revealed that any single variable explained less than 9% of speech intelligibility variability. Conclusions Children in the SMI group have articulatory subsystem problems as indexed by acoustic measures. As in the adult literature, the articulatory subsystem makes the primary contribution to speech intelligibility variance in dysarthria, with minimal or no contribution from other systems. PMID:24824584
Begault, Durand R.; Sunder, Kaushik; Godfroy, Martine; Otto, Peter
A speech intelligibility test conforming to the Modified Rhyme Test of ANSI S3.2 "Method for Measuring the Intelligibility of Speech Over Communication Systems" was conducted using a prototype 12-channel acoustic beamformer system. The target speech material (signal) was identified against speech babble (noise), with calculated signal-noise ratios of 0, 5 and 10 dB. The signal was delivered at a fixed beam orientation of 135 deg (re 90 deg as the frontal direction of the array) and the noise at 135 deg (co-located) and 0 deg (separated). A significant improvement in intelligibility from 57% to 73% was found for spatial separation for the same signal-noise ratio (0 dB). Significant effects for improved intelligibility due to spatial separation were also found for higher signal-noise ratios (5 and 10 dB).
Carroll, John B.; Cramer, H. Leslie
Time-compressed speech is now being used to present recorded lectures to groups at word rates up to two and one-half times that at which they were originally spoken. This process is particularly helpful to the blind. This study investigated the intelligibility of speech processed with seven different discard intervals and at seven rates from two…
Keintz, Connie K.; Bunton, Kate; Hoit, Jeannette D.
Purpose: To examine the influence of visual information on speech intelligibility for a group of speakers with dysarthria associated with Parkinson's disease. Method: Eight speakers with Parkinson's disease and dysarthria were recorded while they read sentences. Speakers performed a concurrent manual task to facilitate typical speech production.…
Chin, Steven B.; Bergeson, Tonya R.; Phan, Jennifer
Objectives: The purpose of the current study was to examine the relation between speech intelligibility and prosody production in children who use cochlear implants. Methods: The Beginner's Intelligibility Test (BIT) and Prosodic Utterance Production (PUP) task were administered to 15 children who use cochlear implants and 10 children with normal…
Kannenberg, Patricia; And Others
The intelligibility of two voice-output communication aids ("Personal Communicator" and "SpeechPAC'") was evaluated by presenting synthesized words and sentences to 20 listeners. Analysis of listener transcriptions revealed significantly higher intelligibility scores for the "Personal Communicator" compared to the…
Mayr, S; Burkhardt, K; Schuster, M; Rogler, K; Maier, A; Iro, H
Altered nasality influences speech intelligibility. Automatic speech recognition (ASR) has proved suitable for quantifying speech intelligibility in patients with different degrees of nasal emissions. We investigated the influence of hyponasality on the results of speech recognition before and after nasal surgery using ASR. Speech recordings, nasal peak inspiratory flow and self-perception measurements were carried out in 20 German-speaking patients (8 women, 12 men; aged 38 ± 22 years) who underwent surgery for various nasal and sinus pathologies. The degree of speech intelligibility was quantified as the percentage of correctly recognized words of a standardized word chain by ASR (word recognition rate; WR). WR was measured 1 day before (t1), 1 day after with nasal packings (t2), and 3 months after (t3) surgery; nasal peak flow on t1 and t3. WR was calculated with program for the automatic evaluation of all kinds of speech disorders (PEAKS). WR as a parameter of speech intelligibility was significantly decreased immediately after surgery (t1 vs. t2 p < 0.01) but increased 3 months after surgery (t2 vs. t3 p < 0.01). WR showed no association with age or gender. There was no significant difference between WR at t1 and t3, despite a post-operative increase in nasal peak inspiratory flow measurements. The results show that ASR is capable of quantifying the influence of hyponasality on speech; nasal obstruction leads to significantly reduced WR and nasal peak flow cannot replace evaluation of nasality.
McCloy, Daniel R.; Wright, Richard A.; Souza, Pamela E.
This study investigates the relative effects of talker-specific variation and dialect-based variation on speech intelligibility. Listeners from two dialects of American English performed speech-in-noise tasks with sentences spoken by talkers of each dialect. An initial statistical model showed no significant effects for either talker or listener dialect group, and no interaction. However, a mixed-effects regression model including several acoustic measures of the talker’s speech revealed a subtle effect of talker dialect once the various acoustic dimensions were accounted for. Results are discussed in relation to other recent studies of cross-dialect intelligibility. PMID:26529902
Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.
Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967
Aubanel, Vincent; Davis, Chris; Kim, Jeesun
A growing body of evidence shows that brain oscillations track speech. This mechanism is thought to maximize processing efficiency by allocating resources to important speech information, effectively parsing speech into units of appropriate granularity for further decoding. However, some aspects of this mechanism remain unclear. First, while periodicity is an intrinsic property of this physiological mechanism, speech is only quasi-periodic, so it is not clear whether periodicity would present an advantage in processing. Second, it is still a matter of debate which aspect of speech triggers or maintains cortical entrainment, from bottom-up cues such as fluctuations of the amplitude envelope of speech to higher level linguistic cues such as syntactic structure. We present data from a behavioral experiment assessing the effect of isochronous retiming of speech on speech perception in noise. Two types of anchor points were defined for retiming speech, namely syllable onsets and amplitude envelope peaks. For each anchor point type, retiming was implemented at two hierarchical levels, a slow time scale around 2.5 Hz and a fast time scale around 4 Hz. Results show that while any temporal distortion resulted in reduced speech intelligibility, isochronous speech anchored to P-centers (approximated by stressed syllable vowel onsets) was significantly more intelligible than a matched anisochronous retiming, suggesting a facilitative role of periodicity defined on linguistically motivated units in processing speech in noise. PMID:27630552
Alvarsson, Jesper J; Nordström, Henrik; Lundén, Peter; Nilsson, Mats E
Studies of effects on speech intelligibility from aircraft noise in outdoor places are currently lacking. To explore these effects, first-order ambisonic recordings of aircraft noise were reproduced outdoors in a pergola. The average background level was 47 dB LA eq. Lists of phonetically balanced words (LAS max,word = 54 dB) were reproduced simultaneously with aircraft passage noise (LAS max,noise = 72-84 dB). Twenty individually tested listeners wrote down each presented word while seated in the pergola. The main results were (i) aircraft noise negatively affects speech intelligibility at sound pressure levels that exceed those of the speech sound (signal-to-noise ratio, S/N < 0), and (ii) the simple A-weighted S/N ratio was nearly as good an indicator of speech intelligibility as were two more advanced models, the Speech Intelligibility Index and Glasberg and Moore's [J. Audio Eng. Soc. 53, 906-918 (2005)] partial loudness model. This suggests that any of these indicators is applicable for predicting effects of aircraft noise on speech intelligibility outdoors.
Bowden, Erica E.; Wang, Lily M.; Palahanska, Milena S.
Currently there are a number of objective evaluation methods used to quantify the speech intelligibility in a built environment, including the Speech Transmission Index (STI), Rapid Speech Transmission Index (RASTI), Articulation Index (AI), and the Percentage Articulation Loss of Consonants (%ALcons). Many of these have been used for years; however, questions remain about their accuracy in predicting the acoustics of a space. Current widely used software programs can quickly evaluate STI, RASTI, and %ALcons from a measured impulse response. This project compares subjective human performance on modified rhyme and phonetically balanced word tests with objective results calculated from impulse response measurements in four different spaces. The results of these tests aid in understanding performance of various methods of speech intelligibility evaluation. [Work supported by the Univ. of Nebraska Center for Building Integration.] For Speech Communication Best Student Paper Award.
Van Engen, Kristin J.; Phelps, Jasmine E. B.; Smiljanic, Rajka; Chandrasekaran, Bharath
Purpose: The authors sought to investigate interactions among intelligibility-enhancing speech cues (i.e., semantic context, clearly produced speech, and visual information) across a range of masking conditions. Method: Sentence recognition in noise was assessed for 29 normal-hearing listeners. Testing included semantically normal and anomalous…
Leclère, Thibaud; Théry, David; Lavandier, Mathieu; Culling, John F
The speech intelligibility index (SII) calculation is based on the assumption that the effective range of signal-to-noise ratio (SNR) regarding speech intelligibility is [- 15 dB; +15 dB]. In a specific frequency band, speech intelligibility would remain constant by varying the SNRs above + 15 dB or below - 15 dB. These assumptions were tested in four experiments measuring speech reception thresholds (SRTs) with a speech target and speech-spectrum noise, while attenuating target or noise above or below 1400 Hz, with different levels of attenuation in order to test different SNRs in the two bands. SRT varied linearly with attenuation at low-attenuation levels and an asymptote was reached for high-attenuation levels. However, this asymptote was reached (intelligibility was not influenced by further attenuation) for different attenuation levels across experiments. The - 15-dB SII limit was confirmed for high-pass filtered targets, whereas for low-pass filtered targets, intelligibility was further impaired by decreasing the SNR below - 15 dB (until - 37 dB) in the high-frequency band. For high-pass and low-pass filtered noises, speech intelligibility kept improving when increasing the SNR in the rejected band beyond + 15 dB (up to 43 dB). Before reaching the asymptote, a 10-dB increase of SNR obtained by filtering the noise resulted in a larger decrease of SRT than a corresponding 10-dB decrease of SNR obtained by filtering the target (the slopes SRT/attenuation were different depending on which source was filtered). These results question the use of the SNR range and the importance function adopted by the SII when considering sharply filtered signals.
It is well known that the presence of visual cues increases the intelligibility of a speech signal (Sumby and Pollack, 1954). Although much is known about segmental differences in visual-only perception, little is known about the contribution of visual cues to auditory-visual perception for individual segments. The purpose of this study was to examine (1) whether segments differ in their visual contribution to speech intelligibility, and (2) whether the contribution of visual cues is always to increase speech intelligibility. One talker produced triples of real words containing 15 different English consonants. Forced-choice word-identification experiments were carried out with these recordings under auditory-visual (AV) and auditory-only (A) conditions with varying S/N ratios, and identification accuracy for the 15 consonants was compared for A versus AV conditions. As expected, there were significant differences in the visual contribution for the different consonants, with visual cues greatly improving speech intelligibility for most segments. Among them, labio-dentals and interdentals show the largest improvement. Although individual perceivers differed in their performance, the results also suggest that for some consonants, the presence of visual cues can reduce intelligibility. In particular, the intelligibility of [r] decreased significantly in the AV condition, being perceived as [w] in most cases.
van Wijngaarden, Sander J.; Steeneken, Herman J. M.; Houtgast, Tammo; Bronkhorst, Adelbert W.
The calibration of the Speech Transmission Index (STI) is based on native speech, presented to native listeners. This means that the STI predicts speech intelligibility under the implicit assumption of fully native communication. In order to assess effects of both non-native production and non-native perception of speech, the intelligibility of short sentences was measured in various non-native scenarios, as a function of speech-to-noise ratio. Since each speech-to-noise ratio is associated with a unique STI value, this establishes the relation between sentence intelligibility and STI. The difference between native and non-native intelligibility as a function of STI was used to calculate a correction function for the STI for each separate non-native scenario. This correction function was applied to the STI ranges corresponding to certain intelligibility categories (bad-excellent). Depending on the proficiency of non-native talkers and listeners, the category boundaries were found to differ from the standard (native) boundaries by STI values up to 0.30 (on the standard 0-1 scale). The corrections needed for non-native listeners are greater than for non-native talkers with a similar level of proficiency. For some categories of non-native communicators, the qualification excellent requires an STI higher than 1.00, and therefore cannot be reached.
Cooke, Martin; Lecumberri, Maria Luisa García
Speech produced in the presence of noise--Lombard speech--is more intelligible in noise than speech produced in quiet, but the origin of this advantage is poorly understood. Some of the benefit appears to arise from auditory factors such as energetic masking release, but a role for linguistic enhancements similar to those exhibited in clear speech is possible. The current study examined the effect of Lombard speech in noise and in quiet for Spanish learners of English. Non-native listeners showed a substantial benefit of Lombard speech in noise, although not quite as large as that displayed by native listeners tested on the same task in an earlier study [Lu and Cooke (2008), J. Acoust. Soc. Am. 124, 3261-3275]. The difference between the two groups is unlikely to be due to energetic masking. However, Lombard speech was less intelligible in quiet for non-native listeners than normal speech. The relatively small difference in Lombard benefit in noise for native and non-native listeners, along with the absence of Lombard benefit in quiet, suggests that any contribution of linguistic enhancements in the Lombard benefit for natives is small.
LaBlance, G R; Rutherford, D R
This study investigated aspects of respiratory function, during quiet breathing and monologue, in six adult dystonic subjects and compared the findings to a control group of four neurologically intact adults. Additionally, breathing dynamics were compared with speech intelligibility. Respiratory inductive plethysmography was used to assess breathing rate, periodicity of the breathing pattern, and inspiratory lung volume. Ear oximetry was used to assess arterial blood oxygen saturation. Speech intelligibility was rated by a panel of five judges. Breathing patterns differed between groups; the dystonic subjects showed a faster breathing rate, less rhythmic breathing pattern, decreased lung volume, and apnealike periods accompanied by a decrease in arterial blood oxygen saturation. These differences were observed during quiet breathing and monologue. Decreased speech intelligibility was strongly related to differences in breathing dynamics.
Goddard, Helen M.
Paddington Station in London, UK is a large rail terminus for long distance electric and diesel powered trains. This magnificent train shed has four arched spans and is one of the remaining structural testaments to the architect Brunel. Given the current British and European legislative requirements for intelligible speech in public buildings AMS Acoustics were engaged to design an electroacoustic solution. In this paper we will outline how the significant problems of lively natural acoustics, the high operational noise levels and the strict aesthetic constraints were addressed. The resultant design is radical, using the most recent dsp controlled line array loudspeakers. In the paper we detail the acoustic modeling undertaken to predict both even direct sound pressure level coverage and STI. Further it presents the speech intelligibility measured upon handover of the new system. The design has proved to be successful and given the nature of the space, outstanding speech intelligibility is achieved.
Whitmal, Nathaniel A; DeRoy, Kristina
The Articulation Index (AI) and Speech Intelligibility Index (SII) predict intelligibility scores from measurements of speech and hearing parameters. One component in the prediction is the "importance function," a weighting function that characterizes contributions of particular spectral regions of speech to speech intelligibility. Previous work with SII predictions for hearing-impaired subjects suggests that prediction accuracy might improve if importance functions for individual subjects were available. Unfortunately, previous importance function measurements have required extensive intelligibility testing with groups of subjects, using speech processed by various fixed-bandwidth low-pass and high-pass filters. A more efficient approach appropriate to individual subjects is desired. The purpose of this study was to evaluate the feasibility of measuring importance functions for individual subjects with adaptive-bandwidth filters. In two experiments, ten subjects with normal-hearing listened to vowel-consonant-vowel (VCV) nonsense words processed by low-pass and high-pass filters whose bandwidths were varied adaptively to produce specified performance levels in accordance with the transformed up-down rules of Levitt [(1971). J. Acoust. Soc. Am. 49, 467-477]. Local linear psychometric functions were fit to resulting data and used to generate an importance function for VCV words. Results indicate that the adaptive method is reliable and efficient, and produces importance function data consistent with that of the corresponding AI/SII importance function.
Many children with Down syndrome have difficulty with speech intelligibility. The present study used a parent survey to learn more about a specific factor that affects speech intelligibility, i.e. childhood verbal apraxia. One of the factors that affects speech intelligibility for children with Down syndrome is difficulty with voluntarily…
Silber, Ronnie F.
Two studies examined the modifications that adult speakers make in speech to disadvantaged listeners. Previous research that has focused on speech to the deaf individuals and to young children has shown that adults clarify speech when addressing these two populations. Acoustic measurements suggest that the signal undergoes similar changes for both populations. Perceptual tests corroborate these results for the deaf population, but are nonsystematic in developmental studies. The differences in the findings for these populations and the nonsystematic results in the developmental literature may be due to methodological factors. The present experiments addressed these methodological questions. Studies of speech to hearing impaired listeners have used read, nonsense, sentences, for which speakers received explicit clarification instructions and feedback, while in the child literature, excerpts of real-time conversations were used. Therefore, linguistic samples were not precisely matched. In this study, experiments used various linguistic materials. Experiment 1 used a children's story; experiment 2, nonsense sentences. Four mothers read both types of material in four ways: (1) in "normal" adult speech, (2) in "babytalk," (3) under the clarification instructions used in the "hearing impaired studies" (instructed clear speech) and (4) in (spontaneous) clear speech without instruction. No extra practice or feedback was given. Sentences were presented to 40 normal hearing college students with and without simultaneous masking noise. Results were separately tabulated for content and function words, and analyzed using standard statistical tests. The major finding in the study was individual variation in speaker intelligibility. "Real world" speakers vary in their baseline intelligibility. The four speakers also showed unique patterns of intelligibility as a function of each independent variable. Results were as follows. Nonsense sentences were less intelligible than story
Killian, Nathan J.; Watkins, Paul V.; Davidson, Lisa S.; Barbour, Dennis L.
We have previously identified neurons tuned to spectral contrast of wideband sounds in auditory cortex of awake marmoset monkeys. Because additive noise alters the spectral contrast of speech, contrast-tuned neurons, if present in human auditory cortex, may aid in extracting speech from noise. Given that this cortical function may be underdeveloped in individuals with sensorineural hearing loss, incorporating biologically-inspired algorithms into external signal processing devices could provide speech enhancement benefits to cochlear implantees. In this study we first constructed a computational signal processing algorithm to mimic auditory cortex contrast tuning. We then manipulated the shape of contrast channels and evaluated the intelligibility of reconstructed noisy speech using a metric to predict cochlear implant user perception. Candidate speech enhancement strategies were then tested in cochlear implantees with a hearing-in-noise test. Accentuation of intermediate contrast values or all contrast values improved computed intelligibility. Cochlear implant subjects showed significant improvement in noisy speech intelligibility with a contrast shaping procedure. PMID:27555826
Lavandier, Mathieu; Culling, John F
In the presence of competing speech or noise, reverberation degrades speech intelligibility not only by its direct effect on the target but also by affecting the interferer. Two experiments were designed to validate a method for predicting the loss of intelligibility associated with this latter effect. Speech reception thresholds were measured under headphones, using spatially separated target sentences and speech-shaped noise interferers simulated in virtual rooms. To investigate the effect of reverberation on the interferer unambiguously, the target was always anechoic. The interferer was placed in rooms with different sizes and absorptions, and at different distances and azimuths from the listener. The interaural coherence of the interferer did not fully predict the effect of reverberation. The azimuth separation of the sources and the coloration introduced by the room also had to be taken into account. The binaural effects were modeled by computing the binaural masking level differences in the studied configurations, the monaural effects were predicted from the excitation pattern of the noises, and speech intelligibility index weightings were applied to both. These parameters were all calculated from the room impulse responses convolved with noise. A 0.95-0.97 correlation was obtained between the speech reception thresholds and their predicted value.
Peng, Jianxin; Yan, Nanjie; Wang, Dan
The present study investigated Chinese speech intelligibility in 28 classrooms from nine different elementary schools in Guangzhou, China. The subjective Chinese speech intelligibility in the classrooms was evaluated with children in grades 2, 4, and 6 (7 to 12 years old). Acoustical measurements were also performed in these classrooms. Subjective Chinese speech intelligibility scores and objective speech intelligibility parameters, such as speech transmission index (STI), were obtained at each listening position for all tests. The relationship between subjective Chinese speech intelligibility scores and STI was revealed and analyzed. The effects of age on Chinese speech intelligibility scores were compared. Results indicate high correlations between subjective Chinese speech intelligibility scores and STI for grades 2, 4, and 6 children. Chinese speech intelligibility scores increase with increase of age under the same STI condition. The differences in scores among different age groups decrease as STI increases. To achieve 95% Chinese speech intelligibility scores, the STIs required for grades 2, 4, and 6 children are 0.75, 0.69, and 0.63, respectively.
van Wijngaarden, Sander J; Drullman, Rob
Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of environments and applications, it is essentially a monaural model. Advantages of binaural hearing in speech intelligibility are disregarded. In specific conditions, this leads to considerable mismatches between subjective intelligibility and the STI. A binaural version of the STI was developed based on interaural cross correlograms, which shows a considerably improved correspondence with subjective intelligibility in dichotic listening conditions. The new binaural STI is designed to be a relatively simple model, which adds only few parameters to the original standardized STI and changes none of the existing model parameters. For monaural conditions, the outcome is identical to the standardized STI. The new model was validated on a set of 39 dichotic listening conditions, featuring anechoic, classroom, listening room, and strongly echoic environments. For these 39 conditions, speech intelligibility [consonant-vowel-consonant (CVC) word score] and binaural STI were measured. On the basis of these conditions, the relation between binaural STI and CVC word scores closely matches the STI reference curve (standardized relation between STI and CVC word score) for monaural listening. A better-ear STI appears to perform quite well in relation to the binaural STI model; the monaural STI performs poorly in these cases.
Payton, Karen L; Shrestha, Mona
Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679-3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word.
Payton, Karen L.; Shrestha, Mona
Several algorithms have been shown to generate a metric corresponding to the Speech Transmission Index (STI) using speech as a probe stimulus [e.g., Goldsworthy and Greenberg, J. Acoust. Soc. Am. 116, 3679–3689 (2004)]. The time-domain approaches work well on long speech segments and have the added potential to be used for short-time analysis. This study investigates the performance of the Envelope Regression (ER) time-domain STI method as a function of window length, in acoustically degraded environments with multiple talkers and speaking styles. The ER method is compared with a short-time Theoretical STI, derived from octave-band signal-to-noise ratios and reverberation times. For windows as short as 0.3 s, the ER method tracks short-time Theoretical STI changes in stationary speech-shaped noise, fluctuating restaurant babble and stationary noise plus reverberation. The metric is also compared to intelligibility scores on conversational speech and speech articulated clearly but at normal speaking rates (Clear/Norm) in stationary noise. Correlation between the metric and intelligibility scores is high and, consistent with the subject scores, the metrics are higher for Clear/Norm speech than for conversational speech and higher for the first word in a sentence than for the last word. PMID:24180791
Freyman, Richard L.; Griffin, Amanda M.; Oxenham, Andrew J.
This study investigated the role of natural periodic temporal fine structure in helping listeners take advantage of temporal valleys in amplitude-modulated masking noise when listening to speech. Young normal-hearing participants listened to natural, whispered, and/or vocoded nonsense sentences in a variety of masking conditions. Whispering alters normal waveform temporal fine structure dramatically but, unlike vocoding, does not degrade spectral details created by vocal tract resonances. The improvement in intelligibility, or masking release, due to introducing 16-Hz square-wave amplitude modulations in an otherwise steady speech-spectrum noise was reduced substantially with vocoded sentences relative to natural speech, but was not reduced for whispered sentences. In contrast to natural speech, masking release for whispered sentences was observed even at positive signal-to-noise ratios. Whispered speech has a different short-term amplitude distribution relative to natural speech, and this appeared to explain the robust masking release for whispered speech at high signal-to-noise ratios. Recognition of whispered speech was not disproportionately affected by unpredictable modulations created by a speech-envelope modulated noise masker. Overall, the presence or absence of periodic temporal fine structure did not have a major influence on the degree of benefit obtained from imposing temporal fluctuations on a noise masker. PMID:23039445
Naranjo, Michel; Tsirigotis, Georgios
The acceleration of investigations in Speech Recognition allows to augur, in the next future, a wide establishment of Vocal Control Systems in the production units. The communication between a human and a machine necessitates technical devices that emit, or are submitted to important noise perturbations. The vocal interface introduces a new control problem of a deterministic automaton using uncertain information. The purpose is to place exactly the automaton in a final state, ordered by voice, from an unknown initial state. The whole Speech Processing procedure, presented in this paper, has for input the temporal speech signal of a word and for output a recognised word labelled with an intelligibility index given by the recognition quality. In the first part, we present the essential psychoacoustic concepts for the automatic calculation of the loudness of a speech signal. The architecture of a Time Delay Neural Network is presented in second part where we also give the results of the recognition. The theory of the fuzzy subset, in third part, allows to extract at the same time a recognised word and its intelligibility index. In the fourth part, an Anticipatory System models the control of a Sequential Machine. A prediction phase and an updating one appear which involve data coming from the information system. A Bayesian decision strategy is used and the criterion is a weighted sum of criteria defined from information, minimum path functions and speech intelligibility measure.
Klein, Edward S.; Flint, Cari B.
PURPOSE: To determine empirically which of three frequently observed rules in children with phonological disorders contributes most to difficulties in speaker intelligibility. METHOD: To evaluate the relative effects on intelligibility of deletion of final consonants (DFC), stopping of fricatives and affricates (SFA), and fronting of velars (FV),…
LaBlance, Gary R.; Rutherford, David R.
This study compared respiratory function during quiet breathing and monologue, in six adult dystonic subjects and a control group of four neurologically intact adults. Dystonic subjects showed a faster breathing rate, less rhythmic breathing pattern, decreased lung volume, and apnea-like periods. Decreased speech intelligibility was related to…
While the auditory system undergoes tremendous maturation during the first few years of life, it has become clear that in complex scenarios when multiple sounds occur and when echoes are present, children's performance is significantly worse than their adult counterparts. The ability of children (3-7 years of age) to understand speech in a simulated multi-talker environment and to benefit from spatial separation of the target and competing sounds was investigated. In these studies, competing sources vary in number, location, and content (speech, modulated or unmodulated speech-shaped noise and time-reversed speech). The acoustic spaces were also varied in size and amount of reverberation. Finally, children with chronic otitis media who received binaural training were tested pre- and post-training on a subset of conditions. Results indicated the following. (1) Children experienced significantly more masking than adults, even in the simplest conditions tested. (2) When the target and competing sounds were spatially separated speech intelligibility improved, but the amount varied with age, type of competing sound, and number of competitors. (3) In a large reverberant classroom there was no benefit of spatial separation. (4) Binaural training improved speech intelligibility performance in children with otitis media. Future work includes similar studies in children with unilateral and bilateral cochlear implants. [Work supported by NIDCD, DRF, and NOHR.
Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan
Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices.
Lee, Yune-Sang; Min, Nam Eun; Wingfield, Arthur; Grossman, Murray; Peelle, Jonathan E
The information contained in a sensory signal plays a critical role in determining what neural processes are engaged. Here we used interleaved silent steady-state (ISSS) functional magnetic resonance imaging (fMRI) to explore how human listeners cope with different degrees of acoustic richness during auditory sentence comprehension. Twenty-six healthy young adults underwent scanning while hearing sentences that varied in acoustic richness (high vs. low spectral detail) and syntactic complexity (subject-relative vs. object-relative center-embedded clause structures). We manipulated acoustic richness by presenting the stimuli as unprocessed full-spectrum speech, or noise-vocoded with 24 channels. Importantly, although the vocoded sentences were spectrally impoverished, all sentences were highly intelligible. These manipulations allowed us to test how intelligible speech processing was affected by orthogonal linguistic and acoustic demands. Acoustically rich speech showed stronger activation than acoustically less-detailed speech in a bilateral temporoparietal network with more pronounced activity in the right hemisphere. By contrast, listening to sentences with greater syntactic complexity resulted in increased activation of a left-lateralized network including left posterior lateral temporal cortex, left inferior frontal gyrus, and left dorsolateral prefrontal cortex. Significant interactions between acoustic richness and syntactic complexity occurred in left supramarginal gyrus, right superior temporal gyrus, and right inferior frontal gyrus, indicating that the regions recruited for syntactic challenge differed as a function of acoustic properties of the speech. Our findings suggest that the neural systems involved in speech perception are finely tuned to the type of information available, and that reducing the richness of the acoustic signal dramatically alters the brain's response to spoken language, even when intelligibility is high.
Ma, Jianfen; Loizou, Philipos C.
Most of the existing intelligibility measures do not account for the distortions present in processed speech, such as those introduced by speech-enhancement algorithms. In the present study, we propose three new objective measures that can be used for prediction of intelligibility of processed (e.g., via an enhancement algorithm) speech in noisy conditions. All three measures use a critical-band spectral representation of the clean and noise-suppressed signals and are based on the measurement of the SNR loss incurred in each critical band after the corrupted signal goes through a speech enhancement algorithm. The proposed measures are flexible in that they can provide different weights to the two types of spectral distortions introduced by enhancement algorithms, namely spectral attenuation and spectral amplification distortions. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train and street interferences). Highest correlation (r=−0.85) with sentence recognition scores was obtained using a variant of the SNR loss measure that only included vowel/consonant transitions and weak consonant information. High correlation was maintained for all noise types, with a maximum correlation (r=−0.88) achieved in street noise conditions. PMID:21503274
Rhebergen, Koenraad S; Versfeld, Niek J; Dreschler, Wouter A
The extension to the speech intelligibility index (SII; ANSI S3.5-1997 (1997)) proposed by Rhebergen and Versfeld [Rhebergen, K.S., and Versfeld, N.J. (2005). J. Acoust. Soc. Am. 117(4), 2181-2192] is able to predict for normal-hearing listeners the speech intelligibility in both stationary and fluctuating noise maskers with reasonable accuracy. The extended SII model was validated with speech reception threshold (SRT) data from the literature. However, further validation is required and the present paper describes SRT experiments with nonstationary noise conditions that are critical to the extended model. From these data, it can be concluded that the extended SII model is able to predict the SRTs for the majority of conditions, but that predictions are better when the extended SII model includes a function to account for forward masking.
Lagerberg, Tove B.; Åsberg, Jakob; Hartelius, Lena; Persson, Christina
Background: Intelligibility is a speaker's ability to convey a message to a listener. Including an assessment of intelligibility is essential in both research and clinical work relating to individuals with communication disorders due to speech impairment. Assessment of the intelligibility of spontaneous speech can be used as an overall…
Warren, Richard M; Bashford, James A; Lenz, Peter W
The need for determining the relative intelligibility of passbands spanning the speech spectrum has been addressed by publications of the American National Standards Institute (ANSI). When the Articulation Index (AI) standard (ANSI, S3.5, 1969, R1986) was developed, available filters confounded passband and slope contributions. The AI procedure and its updated successor, the Speech Intelligibility Index (SII) standard (ANSI, S3.5, 1997, R2007), cancel slope contributions by using intelligibility scores for partially masked highpass and lowpass speech to calculate passband importance values; these values can be converted to passband intelligibility predictions using transfer functions. However, by using very high-order digital filtering, it is now possible to eliminate contributions from filter skirts and produce rectangular passbands. Employing the same commercial recording and the same one-octave passbands published in the SII standard (Table B.3), the present study compares Rectangular Passband Intelligibility (RPI) with SII estimates of intelligibility. The directly measured RPI differs from the computational SII predictions. Advantages resulting from direct measurement are discussed.
sciences cognitives BP 73, 91223 Brétigny sur Orge, France email@example.com ABSTRACT The FELIN project (Foot soldier with Integrated Equipment...must be made to reach levels allowing for intelligibility in noisy environments (notably for use in armoured vehicles). INTRODUCTION Project FELIN ...contact with the skin . Mechanical vibrations are transmitted through the skin , towards skull bones. Parts of the vibrations are channeled through
Shah, Amee P.; Vavva, Zoi
This study attempts to investigate the importance of the degree of similarity or difference in the language backgrounds of the speakers and listeners, as it interacts differentially in intelligibility judgment of foreign-accented speech (Bent and Bradlow, 2003). The present study attempts to clarify the distinction in the matched and mismatched listening conditions, in context of addressing the overarching question whether auditory exposure to a language alone, without corresponding proficiency in production of that language, can provide a listening advantage. Particularly, do listeners understand accented-English speech spoken by native individuals of the language to which they are exposed to, as compared to listeners without that exposure? Greek-accented English speakers (and native monolingual English speakers) were judged for their speech intelligibility by four groups of listeners (n=10, each): native Greek speakers (matched), Greek-Americans (matched only through auditory exposure to Greek without any corresponding spoken proficiency), native monolingual American-English speakers (unmatched), and a mixed group (mismatched). Pilot data have shown that the intelligibility judgments by Greek-American listeners are intermediate to the native Greeks, and both the American-English and the mixed group. Further data-collection is underway, and will be presented as they bear important theoretical and clinical implications.
Najmul Imam, Sheikh Muhammad
A mosque facilitates a Muslim community through different religious activities like congregational prayers, recitation and theological education. Speech in a mosque usually generates through bare voice though sound amplification system is also applied. Since no musical instrument is used in any liturgy, a mosque involves only speech acoustics. The community mosques of Dhaka city, the densely populated capital of Bangladesh, are usually designed and constructed by common people inspired from religious virtues. Seeking consultancy for acoustical design is almost never done. As an obvious consequence, there is a common crisis of speech intelligibility in different mosques, except those saved for their smaller volume and other parameters generated by chance. In a very few cases, a trial and error method is applied to solve the problem. But in most of the cases, the problem remains unsolved, putting the devotees in endless sufferings. This paper identifies the type and magnitudes of the prevailing crisis of speech intelligibility of these community mosques through instrumental measurements and questionnaire survey. This paper is also intended to establish certain research rationale and hypothesis for further research, which will propose certain parameters in acoustical design for mosques of Dhaka city in particular and of Bangladesh in general.
Astolfi, Arianna; Bottalico, Pasquale; Barbato, Giulio
This work concerns speech intelligibility tests and measurements in three primary schools in Italy, one of which was conducted before and after an acoustical treatment. Speech intelligibility scores (IS) with different reverberation times (RT) and types of noise were obtained using diagnostic rhyme tests on 983 pupils from grades 2-5 (nominally 7-10 year olds), and these scores were then correlated with the Speech Transmission Index (STI). The grade 2 pupils understood fewer words in the lower STI range than the pupils in the higher grades, whereas an IS of ~97% was achieved by all the grades with a STI of 0.9. In the presence of traffic noise, which resulted the most interfering noise, a decrease in RT from 1.6 to 0.4 s determined an IS increase on equal A-weighted speech-to-noise level difference, S/N(A), which varied from 13% to 6%, over the S/N(A) range of -15 to +6 dB, respectively. In the case of babble noise, whose source was located in the middle of the classroom, the same decrease in reverberation time leads to a negligible variation in IS over a similar S/N(A) range.
Müsch, Hannes; Florentine, Mary
In addition to his work in psychoacoustics, Søren Buus also contributed to the field of speech intelligibility prediction by developing a model that predicts the results of speech recognition tests [H. Müsch and S. Buus, J. Acoust. Soc. Am. 109, 2896-2909 (2001)]. The model was successful in test conditions that are outside the scope of the Articulation Index. It builds on Green and Birdsall's concept of describing a speech recognition task as selecting one of several response alternatives [in D. Green and J. Swets, Signal Detection Theory (1966), pp. 609-619], and on Durlach et al.'s model for discriminating broadband sounds [J. Acoust. Soc. Am. 80, 63-72 (1986)]. Experimental evidence suggests that listeners can extract redundant, independent, or synergistic information from spectrally distinct speech bands. One of the main accomplishments of the model is to reflect this ability. The model also provides for a measure of linguistic entropy to enter the intelligibility prediction. Recent model development has focused on investigating whether this measure, the cognitive noise, can account for the effects of semantic and syntactic context. This presentation will review the model and present new model predictions. [Work supported by NIH grant R01DC00187.
Kates, James M; Arehart, Kathryn H
This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
Kates, James M.; Arehart, Kathryn H.
This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships. PMID:26520329
Warren, Richard M.; Bashford, James A., Jr.; Lenz, Peter W.
The need for determining the relative intelligibility of passbands spanning the speech spectrum has been addressed by publications of the American National Standards Institute (ANSI). When the Articulation Index (AI) standard (ANSI, S3.5, 1969, R1986) was developed, available filters confounded passband and slope contributions. The AI procedure…
Dimitrijevic, Andrew; Smith, Michael L.; Kadis, Darren S.; Moore, David R.
Understanding speech in noise (SiN) is a complex task involving sensory encoding and cognitive resources including working memory and attention. Previous work has shown that brain oscillations, particularly alpha rhythms (8–12 Hz) play important roles in sensory processes involving working memory and attention. However, no previous study has examined brain oscillations during performance of a continuous speech perception test. The aim of this study was to measure cortical alpha during attentive listening in a commonly used SiN task (digits-in-noise, DiN) to better understand the neural processes associated with “top-down” cognitive processing in adverse listening environments. We recruited 14 normal hearing (NH) young adults. DiN speech reception threshold (SRT) was measured in an initial behavioral experiment. EEG activity was then collected: (i) while performing the DiN near SRT; and (ii) while attending to a silent, close-caption video during presentation of identical digit stimuli that the participant was instructed to ignore. Three main results were obtained: (1) during attentive (“active”) listening to the DiN, a number of distinct neural oscillations were observed (mainly alpha with some beta; 15–30 Hz). No oscillations were observed during attention to the video (“passive” listening); (2) overall, alpha event-related synchronization (ERS) of central/parietal sources were observed during active listening when data were grand averaged across all participants. In some participants, a smaller magnitude alpha event-related desynchronization (ERD), originating in temporal regions, was observed; and (3) when individual EEG trials were sorted according to correct and incorrect digit identification, the temporal alpha ERD was consistently greater on correctly identified trials. No such consistency was observed with the central/parietal alpha ERS. These data demonstrate that changes in alpha activity are specific to listening conditions. To our
Dimitrijevic, Andrew; Smith, Michael L; Kadis, Darren S; Moore, David R
Understanding speech in noise (SiN) is a complex task involving sensory encoding and cognitive resources including working memory and attention. Previous work has shown that brain oscillations, particularly alpha rhythms (8-12 Hz) play important roles in sensory processes involving working memory and attention. However, no previous study has examined brain oscillations during performance of a continuous speech perception test. The aim of this study was to measure cortical alpha during attentive listening in a commonly used SiN task (digits-in-noise, DiN) to better understand the neural processes associated with "top-down" cognitive processing in adverse listening environments. We recruited 14 normal hearing (NH) young adults. DiN speech reception threshold (SRT) was measured in an initial behavioral experiment. EEG activity was then collected: (i) while performing the DiN near SRT; and (ii) while attending to a silent, close-caption video during presentation of identical digit stimuli that the participant was instructed to ignore. Three main results were obtained: (1) during attentive ("active") listening to the DiN, a number of distinct neural oscillations were observed (mainly alpha with some beta; 15-30 Hz). No oscillations were observed during attention to the video ("passive" listening); (2) overall, alpha event-related synchronization (ERS) of central/parietal sources were observed during active listening when data were grand averaged across all participants. In some participants, a smaller magnitude alpha event-related desynchronization (ERD), originating in temporal regions, was observed; and (3) when individual EEG trials were sorted according to correct and incorrect digit identification, the temporal alpha ERD was consistently greater on correctly identified trials. No such consistency was observed with the central/parietal alpha ERS. These data demonstrate that changes in alpha activity are specific to listening conditions. To our knowledge, this is the
Wan, Rui; Durlach, Nathaniel I; Colburn, H Steven
A short-time-processing version of the Equalization-Cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers, including multiple speech maskers. This short-time EC model, called the STEC model, extends the model described by Wan et al. [J. Acoust. Soc. Am. 128, 3678-3690 (2010)] to allow the EC model's equalization parameters τ and α to be adjusted as a function of time, resulting in improved masker cancellation when the dominant masker location varies in time. Using the Speech Intelligibility Index, the STEC model is applied to speech intelligibility with maskers that vary in number, type, and spatial arrangements. Most notably, when maskers are located on opposite sides of the target, this STEC model predicts improved thresholds when the maskers are modulated independently with speech-envelope modulators; this includes the most relevant case of independent speech maskers. The STEC model describes the spatial dependence of the speech reception threshold with speech maskers better than the steady-state model. Predictions are also improved for independently speech-modulated noise maskers but are poorer for reversed-speech maskers. In general, short-term processing is useful, but much remains to be done in the complex task of understanding speech in speech maskers.
Castellanos, Irina; Kronenberger, William G; Beer, Jessica; Henning, Shirley C; Colson, Bethany G; Pisoni, David B
Speech and language measures during grade school predict adolescent speech-language outcomes in children who receive cochlear implants (CIs), but no research has examined whether speech and language functioning at even younger ages is predictive of long-term outcomes in this population. The purpose of this study was to examine whether early preschool measures of speech and language performance predict speech-language functioning in long-term users of CIs. Early measures of speech intelligibility and receptive vocabulary (obtained during preschool ages of 3-6 years) in a sample of 35 prelingually deaf, early-implanted children predicted speech perception, language, and verbal working memory skills up to 18 years later. Age of onset of deafness and age at implantation added additional variance to preschool speech intelligibility in predicting some long-term outcome scores, but the relationship between preschool speech-language skills and later speech-language outcomes was not significantly attenuated by the addition of these hearing history variables. These findings suggest that speech and language development during the preschool years is predictive of long-term speech and language functioning in early-implanted, prelingually deaf children. As a result, measures of speech-language functioning at preschool ages can be used to identify and adjust interventions for very young CI users who may be at long-term risk for suboptimal speech and language outcomes.
Srinivasan, Nirmal Kumar; Zahorik, Pavel
The temporal envelope and fine structure of speech make distinct contributions to the perception of speech in normal-hearing listeners, and are differentially affected by room reverberation. Previous work has demonstrated enhanced speech intelligibility in reverberant rooms when prior exposure to the room was provided. Here, the relative contributions of envelope and fine structure cues to this intelligibility enhancement were tested using an open-set speech corpus and virtual auditory space techniques to independently manipulate the speech cues within a simulated room. Intelligibility enhancement was observed only when the envelope was reverberant, indicating that the enhancement is envelope-based.
van Wijngaarden, Sander J.; Bronkhorst, Adelbert W.; Houtgast, Tammo; Steeneken, Herman J. M.
While the Speech Transmission Index (STI) is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions for sentence intelligibility in noise, for populations of native and non-native communicators, a correction function for the interpretation of the STI is derived. This function is applied to determine the appropriate STI ranges with qualification labels (``bad''-``excellent''), for specific populations of non-natives. The correction function is derived by relating the non-native psychometric function to the native psychometric function by a single parameter (ν). For listeners, the ν parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed correction function is also valid for conditions featuring bandwidth limiting and reverberation.
van Wijngaarden, Sander J; Bronkhorst, Adelbert W; Houtgast, Tammo; Steeneken, Herman J M
While the Speech Transmission Index (STI) is widely applied for prediction of speech intelligibility in room acoustics and telecommunication engineering, it is unclear how to interpret STI values when non-native talkers or listeners are involved. Based on subjectively measured psychometric functions for sentence intelligibility in noise, for populations of native and non-native communicators, a correction function for the interpretation of the STI is derived. This function is applied to determine the appropriate STI ranges with qualification labels ("bad"-"excellent"), for specific populations of non-natives. The correction function is derived by relating the non-native psychometric function to the native psychometric function by a single parameter (nu). For listeners, the nu parameter is found to be highly correlated with linguistic entropy. It is shown that the proposed correction function is also valid for conditions featuring bandwidth limiting and reverberation.
Aniansson, G.; Björkman, M.
Annoyance ratings in speech intelligibility tests at 45 dB(A) and 55 dB(A) traffic noise were investigated in a laboratory study. Subjects were chosen according to their hearing acuity to be representative of 70-year-old men and women, and of noise-induced hearing losses typical for a great number of industrial workers. These groups were compared with normal hearing subjects of the same sex and, when possible, the same age. The subjects rated their annoyance on an open 100 mm scale. Significant correlations were found between annoyance expressed in millimetres and speech intelligibility in percent when all subjects were taken as one sample. Speech intelligibility was also calculated from physical measurements of speech and noise by using the articulation index method. Observed and calculated speech intelligibility scores are compared and discussed. Also treated is the estimation of annoyance by traffic noise at moderate noise levels via speech intelligibility scores.
Kim, Yunjung; Weismer, Gary; Kent, Ray D.
In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.
Warren, Richard M.; Bashford, James A.; Lenz, Peter W.
The intelligibility of individual bands spanning the speech spectrum is of interest for theoretical and practical reasons, and has been the subject of considerable experimental investigation. Direct measurement of passband intelligibility can be confounded with contributions from filter slopes, but by employing sufficiently high orders of FIR filtering, the present study has removed all slope contributions and measured directly intelligibilities of 1-octave and 1/3-octave passbands. Stimuli employed were based upon the same commercial recording of monosyllabic words and the same frequency bands used for the Speech Intelligibility Index (SII) [American National Standards Institute, S3.5, 1997]. SII employs an indirect procedure for estimating intelligibility: lists of band ``importance'' values are derived from intelligibility scores for high-pass and low-pass speech having incrementally varied cutoff frequencies. These importance values are correlated with intelligibility, and were transformed into intelligibility estimates using the published transfer function. Directly measured intelligibilities differ for some, but not all, SII-based intelligibility estimates for bands heard singly and in combination. Direct determination of intelligibilities of individual and multiple passbands is suggested as a simple and accurate alternative to the methods based upon SII and other indirect procedures for estimating the intelligibility of frequency-limited speech. [Work supported by NIH.
Mencke, E O; Ochsner, G J; Testut, E W
Speech samples (41 CNC monosyllables) of 22 deaf children were analyzed using two distinctive-feature systems, one acoustic and one physiologic. Moderate to high correlations between intelligibility scores by listener judges vs correct feature usage were obtained for positive as well as negative features of both systems. Further, higher correlations between percent-correct feature usage scores vs listener intelligibility scores were observed for phonemes in the initial vs final position-in-work regardless of listener-judge experience, feature system, or presentation mode. These findings suggest that either acoustic or physiologic feature analysis can be employed in describing the articulation of deaf talkers. In general, either of these feature systems also predicts with fair to good accuracy the intelligibility of deaf speakers as judged by either experienced or inexperienced listeners. In view of the appreciably higher correlations obtained between feature use and intelligibility scores in initial compared to final position-in-word, however, caution should be exercised with either of the feature systems studied in predicting the intelligibility of a deaf speaker's final phoneme.
Souza, Pamela E; Arehart, Kathryn H; Shen, Jing; Anderson, Melinda; Kates, James M
Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope.
Souza, Pamela E.; Arehart, Kathryn H.; Shen, Jing; Anderson, Melinda; Kates, James M.
Previous work suggested that individuals with low working memory capacity may be at a disadvantage in adverse listening environments, including situations with background noise or substantial modification of the acoustic signal. This study explored the relationship between patient factors (including working memory capacity) and intelligibility and quality of modified speech for older individuals with sensorineural hearing loss. The modification was created using a combination of hearing aid processing [wide-dynamic range compression (WDRC) and frequency compression (FC)] applied to sentences in multitalker babble. The extent of signal modification was quantified via an envelope fidelity index. We also explored the contribution of components of working memory by including measures of processing speed and executive function. We hypothesized that listeners with low working memory capacity would perform more poorly than those with high working memory capacity across all situations, and would also be differentially affected by high amounts of signal modification. Results showed a significant effect of working memory capacity for speech intelligibility, and an interaction between working memory, amount of hearing loss and signal modification. Signal modification was the major predictor of quality ratings. These data add to the literature on hearing-aid processing and working memory by suggesting that the working memory-intelligibility effects may be related to aggregate signal fidelity, rather than to the specific signal manipulation. They also suggest that for individuals with low working memory capacity, sensorineural loss may be most appropriately addressed with WDRC and/or FC parameters that maintain the fidelity of the signal envelope. PMID:25999874
Humes, Larry E; Kidd, Gary R
The Speech Intelligibility Index (SII) assumes additivity of the importance of acoustically independent bands of speech. To further evaluate this assumption, open-set speech recognition was measured for words and sentences, in quiet and in noise, when the speech stimuli were presented to the listener in selected frequency bands. The filter passbands were constructed from various combinations of 20 bands having equivalent (0.05) importance in the SII framework. This permitted the construction of a variety of equal-SII band patterns that were then evaluated by nine different groups of young adults with normal hearing. For monosyllabic words, a similar dependence on band pattern was observed for SII values of 0.4, 0.5, and 0.6 in both quiet and noise conditions. Specifically, band patterns concentrated toward the lower and upper frequency range tended to yield significantly lower scores than those more evenly sampling a broader frequency range. For all stimuli and test conditions, equal SII values did not yield equal performance. Because the spectral distortions of speech evaluated here may not commonly occur in everyday listening conditions, this finding does not necessarily represent a serious deficit for the application of the SII. These findings, however, challenge the band-independence assumption of the theory underlying the SII.
van Wijngaarden, Sander J.; Steeneken, Herman J. M.; Houtgast, Tammo
The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.
Schepker, Henning; Rennies, Jan; Doclo, Simon
In many speech communication applications, such as public address systems, speech is degraded by additive noise, leading to reduced speech intelligibility. In this paper a pre-processing algorithm is proposed that is capable of increasing speech intelligibility under an equal-power constraint. The proposed AdaptDRC algorithm comprises two time- and frequency-dependent stages, i.e., an amplification stage and a dynamic range compression stage that are both dependent on the Speech Intelligibility Index (SII). Experiments using two objective measures, namely, the extended SII and the short-time objective intelligibility measure (STOI), and a formal listening test were conducted to compare the AdaptDRC algorithm with a modified version of a recently proposed algorithm in three different noise conditions (stationary car noise and speech-shaped noise and non-stationary cafeteria noise). While the objective measures indicate a similar performance for both algorithms, results from the formal listening test indicate that for the two stationary noises both algorithms lead to statistically significant improvements in speech intelligibility and for the non-stationary cafeteria noise only the proposed AdaptDRC algorithm leads to statistically significant improvements. A comparison of both objective measures and results from the listening test shows high correlations, although, in general, the performance of both algorithms is overestimated.
Pennig, Sibylle; Quehl, Julia; Wittkowski, Martin
Acoustic modifications of loudspeaker announcements were investigated in a simulated aircraft cabin to improve passengers' speech intelligibility and quality of communication in this specific setting. Four experiments with 278 participants in total were conducted in an acoustic laboratory using a standardised speech test and subjective rating scales. In experiments 1 and 2 the sound pressure level (SPL) of the announcements was varied (ranging from 70 to 85 dB(A)). Experiments 3 and 4 focused on frequency modification (octave bands) of the announcements. All studies used a background noise with the same SPL (74 dB(A)), but recorded at different seat positions in the aircraft cabin (front, rear). The results quantify speech intelligibility improvements with increasing signal-to-noise ratio and amplification of particular octave bands, especially the 2 kHz and the 4 kHz band. Thus, loudspeaker power in an aircraft cabin can be reduced by using appropriate filter settings in the loudspeaker system.
Haderlein, Tino; Nöth, Elmar; Batliner, Anton; Eysholdt, Ulrich; Rosanowski, Frank
Objective assessment of intelligibility on the telephone is desirable for voice and speech assessment and rehabilitation. A total of 82 patients after partial laryngectomy read a standardized text which was synchronously recorded by a headset and via telephone. Five experienced raters assessed intelligibility perceptually on a five-point scale. Objective evaluation was performed by support vector regression on the word accuracy (WA) and word correctness (WR) of a speech recognition system, and a set of prosodic features. WA and WR alone exhibited correlations to human evaluation between |r| = 0.57 and |r| = 0.75. The correlation was r = 0.79 for headset and r = 0.86 for telephone recordings when prosodic features and WR were combined. The best feature subset was optimal for both signal qualities. It consists of WR, the average duration of the silent pauses before a word, the standard deviation of the fundamental frequency on the entire sample, the standard deviation of jitter, and the ratio of the durations of the voiced sections and the entire recording.
Garadat, Soha N.; Litovsky, Ruth Y.
This study introduces a new test (CRISP-Jr.) for measuring speech intelligibility and spatial release from masking (SRM) in young children ages 2.5–4 years. Study 1 examined whether thresholds, masking, and SRM obtained with a test designed for older children (CRISP) and CRISP-Jr. are comparable in 4 to 5-year-old children. Thresholds were measured for target speech in front, in quiet, and with a different-sex masker either in front or on the right. CRISP-Jr. yielded higher speech reception thresholds (SRTs) than CRISP, but the amount of masking and SRM did not differ across the tests. In study 2, CRISP-Jr. was extended to a group of 3-year-old children. Results showed that while SRTs were higher in the younger group, there were no age differences in masking and SRM. These findings indicate that children as young as 3 years old are able to use spatial cues in sound source segregation, which suggests that some of the auditory mechanisms that mediate this ability develop early in life. In addition, the findings suggest that measures of SRM in young children are not limited to a particular set of stimuli. These tests have potentially useful applications in clinical settings, where bilateral fittings of amplification devices are evaluated. PMID:17348527
Cooke, Martin; Mayo, Catherine; Villegas, Julián
Speech produced in the presence of noise (Lombard speech) is typically more intelligible than speech produced in quiet (plain speech) when presented at the same signal-to-noise ratio, but the factors responsible for the Lombard intelligibility benefit remain poorly understood. Previous studies have demonstrated a clear effect of spectral differences between the two speech styles and a lack of effect of fundamental frequency differences. The current study investigates a possible role for durational differences alongside spectral changes. Listeners identified keywords in sentences manipulated to possess either durational or spectral characteristics of plain or Lombard speech. Durational modifications were produced using linear or nonlinear time warping, while spectral changes were applied at the global utterance level or to individual time frames. Modifications were made to both plain and Lombard speech. No beneficial effects of durational increases were observed in any condition. Lombard sentences spoken at a speech rate substantially slower than their plain counterparts also failed to reveal a durational benefit. Spectral changes to plain speech resulted in large intelligibility gains, although not to the level of Lombard speech. These outcomes suggest that the durational increases seen in Lombard speech have little or no role in the Lombard intelligibility benefit.
Heinrich, Antje; Henshaw, Helen; Ferguson, Melanie A
Listeners vary in their ability to understand speech in noisy environments. Hearing sensitivity, as measured by pure-tone audiometry, can only partly explain these results, and cognition has emerged as another key concept. Although cognition relates to speech perception, the exact nature of the relationship remains to be fully understood. This study investigates how different aspects of cognition, particularly working memory and attention, relate to speech intelligibility for various tests. Perceptual accuracy of speech perception represents just one aspect of functioning in a listening environment. Activity and participation limits imposed by hearing loss, in addition to the demands of a listening environment, are also important and may be better captured by self-report questionnaires. Understanding how speech perception relates to self-reported aspects of listening forms the second focus of the study. Forty-four listeners aged between 50 and 74 years with mild sensorineural hearing loss were tested on speech perception tests differing in complexity from low (phoneme discrimination in quiet), to medium (digit triplet perception in speech-shaped noise) to high (sentence perception in modulated noise); cognitive tests of attention, memory, and non-verbal intelligence quotient; and self-report questionnaires of general health-related and hearing-specific quality of life. Hearing sensitivity and cognition related to intelligibility differently depending on the speech test: neither was important for phoneme discrimination, hearing sensitivity alone was important for digit triplet perception, and hearing and cognition together played a role in sentence perception. Self-reported aspects of auditory functioning were correlated with speech intelligibility to different degrees, with digit triplets in noise showing the richest pattern. The results suggest that intelligibility tests can vary in their auditory and cognitive demands and their sensitivity to the challenges that
Jongmans, P; Hilgers, F J M; Pols, L C W; van As-Brooks, C J
Total laryngectomy has far-reaching effects on vocal tract anatomy and physiology. The preferred method for restoring postlaryngectomy oral communication is prosthetic tracheoesophageal (TE) speech, which like laryngeal speech is pulmonary driven. TE speech quality is better than esophageal or electrolarynx speech quality, but still very deviant from laryngeal speech. For a better understanding of neoglottis physiology and for improving rehabilitation results, study of TE speech intelligibility remains important. Methods used were perceptual evaluation, acoustic analyses, and digital high-speed imaging. First results show large variations between speakers and especially difficulty in producing voiced-voiceless distinction. This paper discusses first results of our experiment.
Carney, Arlene E.; Nie, Yingjiu; Main, Jennifer; Carney, Edward; Higgins, Maureen B.
Studies of clear versus conversational speech have shown perceptual and acoustic differences. The basic premise is that the speaker understands the direction to speak more clearly, and translates the direction to speech motor acts. Children with hearing losses receive intervention during which they are instructed to use their best speech. The purpose of this study was to determine: (1) whether hearing-impaired children's intelligibility changed with directions to use better speech, and (2) whether these children's speech was judged to be clearer when they had intended to produce clear speech. There were two groups of speakers: 14 deaf children with cochlear implants and 7 hard-of-hearing children with hearing aids. Each produced ten short sentences using typical speech, better speech, and best speech. All sentences were presented to a total of 189 adult listeners with normal hearing who wrote down what they heard. Hard-of-hearing children had average speech intelligibility of 98% those with implants averaged 66%. Both groups had very small increases across conditions. All sentences in three speech conditions were presented in a paired-comparison task to ten additional listeners. Results of clarity judgments will be discussed in relation to the relatively small changes in speech intelligibility. [Research supported by NIH.
Chen, Fei; Loizou, Philipos C
Most noise reduction algorithms rely on obtaining reliable estimates of the SNR of each frequency bin. For that reason, much work has been done in analyzing the behavior and performance of SNR estimation algorithms in the context of improving speech quality and reducing speech distortions (e.g., musical noise). Comparatively little work has been reported, however, regarding the analysis and investigation of the effect of errors in SNR estimation on speech intelligibility. It is not known, for instance, whether it is the errors in SNR overestimation, errors in SNR underestimation, or both that are harmful to speech intelligibility. Errors in SNR estimation produce concomitant errors in the computation of the gain (suppression) function, and the impact of gain estimation errors on speech intelligibility is unclear. The present study assesses the effect of SNR estimation errors on gain function estimation via sensitivity analysis. Intelligibility listening studies were conducted to validate the sensitivity analysis. Results indicated that speech intelligibility is severely compromised when SNR and gain over-estimation errors are introduced in spectral components with negative SNR. A theoretical upper bound on the gain function is derived that can be used to constrain the values of the gain function so as to ensure that SNR overestimation errors are minimized. Speech enhancement algorithms that can limit the values of the gain function to fall within this upper bound can improve speech intelligibility.
van Wijngaarden, Sander J.; Steeneken, Herman J. M.; Houtgast, Tammo
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.
The effectiveness of in-vehicle speech communication can be a good indicator of the perception of the overall vehicle quality and customer satisfaction. Currently available speech intelligibility metrics do not account in their procedures for essential parameters needed for a complete and accurate evaluation of in-vehicle speech intelligibility. These include the directivity and the distance of the talker with respect to the listener, binaural listening, hearing profile of the listener, vocal effort, and multisensory hearing. In the first part of this research the effectiveness of in-vehicle application of these metrics is investigated in a series of studies to reveal their shortcomings, including a wide range of scores resulting from each of the metrics for a given measurement configuration and vehicle operating condition. In addition, the nature of a possible correlation between the scores obtained from each metric is unknown. The metrics and the subjective perception of speech intelligibility using, for example, the same speech material have not been compared in literature. As a result, in the second part of this research, an alternative method for speech intelligibility evaluation is proposed for use in the automotive industry by utilizing a virtual reality driving environment for ultimately setting targets, including the associated statistical variability, for future in-vehicle speech intelligibility evaluation. The Speech Intelligibility Index (SII) was evaluated at the sentence Speech Receptions Threshold (sSRT) for various listening situations and hearing profiles using acoustic perception jury testing and a variety of talker and listener configurations and background noise. In addition, the effect of individual sources and transfer paths of sound in an operating vehicle to the vehicle interior sound, specifically their effect on speech intelligibility was quantified, in the framework of the newly developed speech intelligibility evaluation method. Lastly
Begault, Durand R.
The effect on speech intelligibility was measured for speech where talkers reading Diagnostic Rhyme Test material were exposed to 0.7 g whole body vibration to simulate space vehicle launch. Across all talkers, the effect of vibration was to degrade the percentage of correctly transcribed words from 83% to 74%. The magnitude of the effect of vibration on speech communication varies between individuals, for both talkers and listeners. A worst case scenario for intelligibility would be the most sensitive listener hearing the most sensitive talker; one participant s intelligibility was reduced by 26% (97% to 71%) for one of the talkers.
Although RaSTI is a good indicator of the speech intelligibility capability of auditoria and similar spaces, during the past 2-3 years it has been shown that RaSTI is not a robust predictor of sound system intelligibility performance. Instead, it is now recommended, within both national and international codes and standards, that full STI measurement and analysis be employed. However, new research is reported, that indicates that STI is not as flawless, nor robust as many believe. The paper highlights a number of potential error mechanisms. It is shown that the measurement technique and signal excitation stimulus can have a significant effect on the overall result and accuracy, particularly where DSP-based equipment is employed. It is also shown that in its current state of development, STI is not capable of appropriately accounting for a number of fundamental speech and system attributes, including typical sound system frequency response variations and anomalies. This is particularly shown to be the case when a system is operating under reverberant conditions. Comparisons between actual system measurements and corresponding word score data are reported where errors of up to 50 implications for VA and PA system performance verification will be discussed.
Soli, Sigfrid D.; Laroche, Chantal; Giguere, Christian
Many jobs require auditory abilities such as speech communication, sound localization, and sound detection. An employee for whom these abilities are impaired may constitute a safety risk for himself or herself, for fellow workers, and possibly for the general public. A number of methods have been used to predict these abilities from diagnostic measures of hearing (e.g., the pure-tone audiogram); however, these methods have not proved to be sufficiently accurate for predicting performance in the noise environments where hearing-critical jobs are performed. We have taken an alternative and potentially more accurate approach. A direct measure of speech intelligibility in noise, the Hearing in Noise Test (HINT), is instead used to screen individuals. The screening criteria are validated by establishing the empirical relationship between the HINT score and the auditory abilities of the individual, as measured in laboratory recreations of real-world workplace noise environments. The psychometric properties of the HINT enable screening of individuals with an acceptable amount of error. In this presentation, we will describe the predictive model and report the results of field measurements and laboratory studies used to provide empirical validation of the model. [Work supported by Fisheries and Oceans Canada.
Dubbelboer, Finn; Houtgast, Tammo
A new concept is proposed that relates to intelligibility of speech in noise. The concept combines traditional estimations of signal-to-noise ratios (S/N) with elements from the modulation transfer function model, which results in the definition of the signal-to-noise ratio in the modulation domain: the (SN)(mod). It is argued that this (SN)(mod), quantifying the strength of speech modulations relative to a floor of spurious modulations arising from the speech-noise interaction, is the key factor in relation to speech intelligibility. It is shown that, by using a specific test signal, the strength of these spurious modulations can be measured, allowing an estimation of the (SN)(mod) for various conditions of additive noise, noise suppression, and amplitude compression. By relating these results to intelligibility data for these same conditions, the relevance of the (SN)(mod) as the key factor underlying speech intelligibility is clearly illustrated. For instance, it is shown that the commonly observed limited effect of noise suppression on speech intelligibility is correctly "predicted" by the (SN)(mod), whereas traditional measures such as the speech transmission index, considering only the changes in the speech modulations, fall short in this respect. It is argued that (SN)(mod) may provide a relevant tool in the design of successful noise-suppression systems.
Hosoi, H; Murata, K; Ohta, F; Imaizumi, S
The relationship between the rate of speech flow and speech intelligibility was investigated in normal and hearing-impaired subjects. It is usually observed that a slowly and clearly delivered speech is easier for hearing-impaired patients to understand. The purpose of paper is to analyze this phenomenon clinically and to present useful data for developing new hearing aids. Four or 5 syllabic words lists were prepared for this experiment and speech stimuli were chosen from these lists. The subjects consisted of 15 normal subjects and 79 hearing-impaired patients (57 with inner ear hearing loss and 22 with retrocochlear hearing loss). Hearing tests were performed using a tape recorder with various speech control systems in a soundproof room. Speech samples were presented at three speaking rates, that is, a conversational speech rate, a rate one and a half times as fast as the conversational speech rate and a rate twice as fast as that rate. The results obtained in the normal subjects confirmed that the faster the speaking rate was, the more the word intelligibility was reduced. According to the results in the hearing-impaired subjects, both the correlation coefficient and regression parameter between the word intelligibility in this experiment and speech discrimination scores measured by 57S-monosyllabic lists were low at the conversational speaking rate, but the higher the speaking rate was, the closer the relation between the both factors was. It was estimated by analyzing the data of inner ear hearing loss and retrocochler hearing loss separately that the subjects with retrocochler hearing loss had more difficulty in speech-mediated communication than the subjects with inner ear hearing loss.
Peng, Shu-Chen; Spencer, Linda J.; Tomblin, J. Bruce
Speech intelligibility of 24 prelingually deaf pediatric cochlear implant (CI) recipients with 84 months of device experience was investigated. Each CI participant's speech samples were judged by a panel of 3 listeners. Intelligibility scores were calculated as the average of the 3 listeners' responses. The average write-down intelligibility score was 71.54% (SD = 29.89), and the average rating-scale intelligibility score was 3.03 points (SD = 1.01). Write-down and rating-scale intelligibility scores were highly correlated (r = .91, p < .001). Linear regression analyses revealed that both age at implantation and different speech-coding strategies contribute to the variability of CI participants' speech intelligibility. Implantation at a younger age and the use of the spectral-peak speech-coding strategy yielded higher intelligibility scores than implantation at an older age and the use of the multipeak speech-coding strategy. These results serve as indices for clinical applications when long-term advancements in spoken-language development are considered for pediatric CI recipients. PMID:15842006
Wan, Rui; Durlach, Nathaniel I; Colburn, H Steven
An extended version of the equalization-cancellation (EC) model of binaural processing is described and applied to speech intelligibility tasks in the presence of multiple maskers. The model incorporates time-varying jitters, both in time and amplitude, and implements the equalization and cancellation operations in each frequency band independently. The model is consistent with the original EC model in predicting tone-detection performance for a large set of configurations. When the model is applied to speech, the speech intelligibility index is used to predict speech intelligibility performance in a variety of conditions. Specific conditions addressed include different types of maskers, different numbers of maskers, and different spatial locations of maskers. Model predictions are compared with empirical measurements reported by Hawley et al. [J. Acoust. Soc. Am. 115, 833-843 (2004)] and by Marrone et al. [J. Acoust. Soc. Am. 124, 1146-1158 (2008)]. The model succeeds in predicting speech intelligibility performance when maskers are speech-shaped noise or broadband-modulated speech-shaped noise but fails when the maskers are speech or reversed speech.
Koning, Raphael; Wouters, Jan
Recent studies have shown that transient parts of a speech signal contribute most to speech intelligibility in normal-hearing listeners. In this study, the influence of enhancing the onsets of the envelope of the speech signal on speech intelligibility in noisy conditions using an eight channel cochlear implant vocoder simulation was investigated. The enhanced envelope (EE) strategy emphasizes the onsets of the speech envelope by deriving an additional peak signal at the onsets in each frequency band. A sentence recognition task in stationary speech shaped noise showed a significant speech reception threshold (SRT) improvement of 2.5 dB for the EE in comparison to the reference continuous interleaved sampling strategy and of 1.7 dB when an ideal Wiener filter was used for the onset extraction on the noisy signal. In a competitive talker condition, a significant SRT improvement of 2.6 dB was measured. A benefit was obtained in all experiments with the peak signal derived from the clean speech. Although the EE strategy is not effective in many real-life situations, the results suggest that there is potential for speech intelligibility improvement when an enhancement of the onsets of the speech envelope is included in the signal processing of auditory prostheses.
Kyong, Jeong S; Scott, Sophie K; Rosen, Stuart; Howe, Timothy B; Agnew, Zarinah K; McGettigan, Carolyn
The melodic contour of speech forms an important perceptual aspect of tonal and nontonal languages and an important limiting factor on the intelligibility of speech heard through a cochlear implant. Previous work exploring the neural correlates of speech comprehension identified a left-dominant pathway in the temporal lobes supporting the extraction of an intelligible linguistic message, whereas the right anterior temporal lobe showed an overall preference for signals clearly conveying dynamic pitch information [Johnsrude, I. S., Penhune, V. B., & Zatorre, R. J. Functional specificity in the right human auditory cortex for perceiving pitch direction. Brain, 123, 155-163, 2000; Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000]. The current study combined modulations of overall intelligibility (through vocoding and spectral inversion) with a manipulation of pitch contour (normal vs. falling) to investigate the processing of spoken sentences in functional MRI. Our overall findings replicate and extend those of Scott et al. [Scott, S. K., Blank, C. C., Rosen, S., & Wise, R. J. Identification of a pathway for intelligible speech in the left temporal lobe. Brain, 123, 2400-2406, 2000], where greater sentence intelligibility was predominately associated with increased activity in the left STS, and the greatest response to normal sentence melody was found in right superior temporal gyrus. These data suggest a spatial distinction between brain areas associated with intelligibility and those involved in the processing of dynamic pitch information in speech. By including a set of complexity-matched unintelligible conditions created by spectral inversion, this is additionally the first study reporting a fully factorial exploration of spectrotemporal complexity and spectral inversion as they relate to the neural processing of speech intelligibility. Perhaps
Dubbelboer, Finn; Houtgast, Tammo
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.
Bridger, Joseph F.
The impact of room acoustics and speech intelligibility conditions of different school cafeterias on the voice levels of children is examined. Methods of evaluating cafeteria designs and predicting noise levels are discussed. Children are shown to modify their voice levels with changes in speech intelligibility like adults. Reverberation and signal to noise ratio are the important acoustical factors affecting speech intelligibility. Children have much more difficulty than adults in conditions where noise and reverberation are present. To evaluate the relationship of voice level and speech intelligibility, a database of real sound levels and room acoustics data was generated from measurements and data recorded during visits to a variety of existing cafeterias under different occupancy conditions. The effects of speech intelligibility and room acoustics on childrens voice levels are demonstrated. A new method is presented for predicting speech intelligibility conditions and resulting noise levels for the design of new cafeterias and renovation of existing facilities. Measurements are provided for an existing school cafeteria before and after new room acoustics treatments were added. This will be helpful for acousticians, architects, school systems, regulatory agencies, and Parent Teacher Associations to create less noisy cafeteria environments.
Byrne, David C; Palmer, Catherine V
The purpose of this study was to identify any differences between speech intelligibility measures obtained with MineEars electronic earmuffs (ProEars, Westcliffe, CO, USA) and the Bilsom model 847 (Sperian Hearing Protection, San Diego, CA, USA), which is a conventional passive-attenuation earmuff. These two devices are closely related, since the MineEars device consisted of a Bilsom 847 earmuff with the addition of electronic amplification circuits. Intelligibility scores were obtained by conducting listening tests with 15 normal-hearing human subject volunteers wearing the earmuffs. The primary research objective was to determine whether speech understanding differs between the passive earmuffs and the electronic earmuffs (with the volume control set at three different positions) in a background of 90 dB(A) continuous noise. As expected, results showed that speech intelligibility increased with higher speech-to-noise ratios; however, the electronic earmuff with the volume control set at full-on performed worse than when it was set to off or the lowest on setting. This finding suggests that the maximum volume control setting for these electronic earmuffs may not provide any benefits in terms of increased speech intelligibility in the background noise condition that was tested. Other volume control settings would need to be evaluated for their ability to produce higher speech intelligibility scores. Additionally, since an extensive electro-acoustic evaluation of the electronic earmuff was not performed as a part of this study, the exact cause of the reduced intelligibility scores at full volume remains unknown.
Sussman, Joan E.; Tjaden, Kris
Purpose: The primary purpose of this study was to compare percent correct word and sentence intelligibility scores for individuals with multiple sclerosis (MS) and Parkinson's disease (PD) with scaled estimates of speech severity obtained for a reading passage. Method: Speech samples for 78 talkers were judged, including 30 speakers with MS, 16…
Ferguson, Sarah Hargus
Purpose: To establish the range of talker variability for vowel intelligibility in clear versus conversational speech for older adults with hearing loss and to determine whether talkers who produced a clear speech benefit for young listeners with normal hearing also did so for older adults with hearing loss. Method: Clear and conversational vowels…
Haley, Katarina L.; Martin, Gwenyth
This study was designed to estimate test-retest reliability of orthographic speech intelligibility testing in speakers with aphasia and AOS and to examine its relationship to the consistency of speaker and listener responses. Monosyllabic single word speech samples were recorded from 13 speakers with coexisting aphasia and AOS. These words were…
Neel, Amy T.
Purpose: In the two experiments in this study, the author examined the effects of increased vocal effort (loud speech) and amplification on sentence and word intelligibility in speakers with Parkinson disease (PD). Methods: Five talkers with PD produced sentences and words at habitual levels of effort and using loud speech techniques. Amplified…
Kishida, Takuya; Nakajima, Yoshitaka; Ueda, Kazuo; Remijn, Gerard B
Factor analysis (principal component analysis followed by varimax rotation) had shown that 3 common factors appear across 20 critical-band power fluctuations derived from spoken sentences of eight different languages [Ueda et al. (2010). Fechner Day 2010, Padua]. The present study investigated the contributions of such power-fluctuation factors to speech intelligibility. The method of factor analysis was modified to obtain factors suitable for resynthesizing speech sounds as 20-critical-band noise-vocoded speech. The resynthesized speech sounds were used for an intelligibility test. The modification of factor analysis ensured that the resynthesized speech sounds were not accompanied by a steady background noise caused by the data reduction procedure. Spoken sentences of British English, Japanese, and Mandarin Chinese were subjected to this modified analysis. Confirming the earlier analysis, indeed 3-4 factors were common to these languages. The number of power-fluctuation factors needed to make noise-vocoded speech intelligible was then examined. Critical-band power fluctuations of the Japanese spoken sentences were resynthesized from the obtained factors, resulting in noise-vocoded-speech stimuli, and the intelligibility of these speech stimuli was tested by 12 native Japanese speakers. Japanese mora (syllable-like phonological unit) identification performances were measured when the number of factors was 1-9. Statistically significant improvement in intelligibility was observed when the number of factors was increased stepwise up to 6. The 12 listeners identified 92.1% of the morae correctly on average in the 6-factor condition. The intelligibility improved sharply when the number of factors changed from 2 to 3. In this step, the cumulative contribution ratio of factors improved only by 10.6%, from 37.3 to 47.9%, but the average mora identification leaped from 6.9 to 69.2%. The results indicated that, if the number of factors is 3 or more, elementary linguistic
Ertmer, David J.
Background: Newborn hearing screening, early intervention programs, and advancements in cochlear implant and hearing aid technology have greatly increased opportunities for children with hearing loss to become intelligible talkers. Optimizing speech intelligibility requires that progress be monitored closely. Although direct assessment of…
Munro, Murray J.; Derwing, Tracey M.
Examines the interrelationships among accentedness, perceived comprehensibility, and intelligibility in the speech of second-language (L2) learners. The findings suggest that although strength of foreign accent is correlated with perceived comprehensibility and intelligibility, a strong foreign accent does not necessarily reduce the…
Ertmer, David J.
Purpose: This investigation sought to determine whether scores from a commonly used word-based articulation test are closely associated with speech intelligibility in children with hearing loss. If the scores are closely related, articulation testing results might be used to estimate intelligibility. If not, the importance of direct assessment of…
Chen, Fei; Hazrati, Oldooz; Loizou, Philipos C
Reverberation is known to reduce the temporal envelope modulations present in the signal and affect the shape of the modulation spectrum. A non-intrusive intelligibility measure for reverberant speech is proposed motivated by the fact that the area of the modulation spectrum decreases with increasing reverberation. The proposed measure is based on the average modulation area computed across four acoustic frequency bands spanning the signal bandwidth. High correlations (r = 0.98) were observed with sentence intelligibility scores obtained by cochlear implant listeners. Proposed measure outperformed other measures including an intrusive speech-transmission index based measure.
Chen, Fei; Hazrati, Oldooz; Loizou, Philipos C.
Reverberation is known to reduce the temporal envelope modulations present in the signal and affect the shape of the modulation spectrum. A non-intrusive intelligibility measure for reverberant speech is proposed motivated by the fact that the area of the modulation spectrum decreases with increasing reverberation. The proposed measure is based on the average modulation area computed across four acoustic frequency bands spanning the signal bandwidth. High correlations (r = 0.98) were observed with sentence intelligibility scores obtained by cochlear implant listeners. Proposed measure outperformed other measures including an intrusive speech-transmission index based measure. PMID:23710246
Background/Aims This study investigates the effects of familiarization on naïve listeners’ ability to identify consonants in dysarthric speech. Methods A total of 120 listeners (30 listeners/speaker) participated in experiments over a 6-week period. Listeners were randomly assigned to one of the three familiarization conditions: a passive condition in which listeners heard audio recordings of words, an active condition in which listeners heard audio recordings of words while viewing the written material of words, and a control condition in which listeners had no exposure to the audio signal prior to identification tasks. Results Familiarization improved naïve listeners’ ability to identify consonants produced by a speaker with dysarthria. The active familiarization method exhibited an advantage over the other conditions, in terms of the magnitude and rapidness of improvement. One-month delayed test scores were higher than pre-familiarization scores, but the advantage of active familiarization was not present for all speakers. Conclusion This study supports familiarization benefits in enhancing consonant intelligibility in dysarthria and suggests that perceptual learning mechanisms be harnessed for developing effective listener-oriented intervention techniques in the management of dysarthria. Current findings call for further research on a familiarization protocol that can subserve segmental learning with maximum efficacy. PMID:26906426
Chuang, Hsiu-Feng; Yang, Cheng-Chieh; Chi, Lin-Yang; Weismer, Gary; Wang, Yu-Tsai
The effects of the use of cochlear implant (CI) on speech intelligibility, speaking rate, and vowel formant characteristics and the relationships between speech intelligibility, speaking rate, and vowel formant characteristics for children are clinically important. The purposes of this study were to report on the comparisons for speaking rate and vowel space area, and their relationship with speech intelligibility, between 24 Mandarin-speaking children with CI and 24 age-sex-education level matched normal hearing (NH) controls. Participants were audio recorded as they read a designed Mandarin intelligibility test, repeated prolongation of each of the three point vowels /i/, /a/, and /u/ five times, and repeated each of three sentences carrying one point vowel five times. Compared to the NH group, the CI group exhibited: (1) mild-to-moderate speech intelligibility impairment; (2) significantly reduced speaking rate mainly due to significantly longer inter-word pauses and larger pause proportion; and (3) significantly less vowel reduction in the horizontal dimension in sustained vowel phonation. The limitations of speech intelligibility development in children after cochlear implantation were related to atypical patterns and to a smaller degree in vowel reduction and slower speaking rate resulting from less efficient articulatory movement transition.
Warren, Richard M.; Bashford, James A.; Lenz, Peter W.
There is a need, both for speech theory and for many practical applications, to know the intelligibilities of individual passbands that span the speech spectrum when they are heard singly and in combination. While indirect procedures have been employed for estimating passband intelligibilities (e.g., the Speech Intelligibility Index), direct measurements have been blocked by the confounding contributions from transition band slopes that accompany filtering. A recent study has reported that slopes of several thousand dBA/octave produced by high-order finite impulse response filtering were required to produce the effectively rectangular bands necessary to eliminate appreciable contributions from transition bands [Warren et al., J. Acoust. Soc. Am. 115, 1292-1295 (2004)]. Using such essentially vertical slopes, the present study employed sentences, and reports the intelligibilities of their six 1-octave contiguous passbands having center frequencies from 0.25 to 8 kHz when heard alone, and for each of their 15 possible pairings.
Rhebergen, Koenraad S; Lyzenga, Johannes; Dreschler, Wouter A; Festen, Joost M
The speech intelligibility index (SII) is an often used calculation method for estimating the proportion of audible speech in noise. For speech reception thresholds (SRTs), measured in normally hearing listeners using various types of stationary noise, this model predicts a fairly constant speech proportion of about 0.33, necessary for Dutch sentence intelligibility. However, when the SII model is applied for SRTs in quiet, the estimated speech proportions are often higher, and show a larger inter-subject variability, than found for speech in noise near normal speech levels [65 dB sound pressure level (SPL)]. The present model attempts to alleviate this problem by including cochlear compression. It is based on a loudness model for normally hearing and hearing-impaired listeners of Moore and Glasberg [(2004). Hear. Res. 188, 70-88]. It estimates internal excitation levels for speech and noise and then calculates the proportion of speech above noise and threshold using similar spectral weighting as used in the SII. The present model and the standard SII were used to predict SII values in quiet and in stationary noise for normally hearing and hearing-impaired listeners. The present model predicted SIIs for three listener types (normal hearing, noise-induced, and age-induced hearing loss) with markedly less variability than the standard SII.
Hayes-Harb, Rachel; Smith, Bruce L.; Bent, Tessa; Bradlow, Ann R.
This study investigated the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. The word-final voicing contrast was considered (as in minimal pairs such as `cub' and `cup') in a forced-choice word identification task. For these particular talkers and listeners, there was evidence of an interlanguage speech intelligibility benefit for listeners (i.e., native Mandarin listeners were more accurate than native English listeners at identifying Mandarin-accented English words). However, there was no evidence of an interlanguage speech intelligibility benefit for talkers (i.e., native Mandarin listeners did not find Mandarin-accented English speech more intelligible than native English speech). When listener and talker phonological proficiency (operationalized as accentedness) was taken into account, it was found that the interlanguage speech intelligibility benefit for listeners held only for the low phonological proficiency listeners and low phonological proficiency speech. The intelligibility data were also considered in relation to various temporal-acoustic properties of native English and Mandarin-accented English speech in effort to better understand the properties of speech that may contribute to the interlanguage speech intelligibility benefit. PMID:19606271
Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (2011) proposed that the oscillators are in the theta, beta, and gamma frequency bands with the theta oscillator the master, tracking the input syllabic rhythm and setting a time-varying, hierarchical window structure synchronized with the input. In the study described here the hypothesized role of theta was examined by measuring the intelligibility of speech with a manipulated modulation spectrum. Each critical-band signal was manipulated by controlling the degree of temporal envelope flatness. Intelligibility of speech with critical-band envelopes that are flat is poor; inserting extra information, restricted to the input syllabic rhythm, markedly improves intelligibility. It is concluded that flattening the critical-band envelopes prevents the theta oscillator from tracking the input rhythm, hence the disruption of the hierarchical window structure that controls the decoding process. Reinstating the input-rhythm information revives the tracking capability, hence restoring the synchronization between the window structure and the input, resulting in the extraction of additional information from the flat modulation spectrum.
Recent hypotheses on the potential role of neuronal oscillations in speech perception propose that speech is processed on multi-scale temporal analysis windows formed by a cascade of neuronal oscillators locked to the input pseudo-rhythm. In particular, Ghitza (2011) proposed that the oscillators are in the theta, beta, and gamma frequency bands with the theta oscillator the master, tracking the input syllabic rhythm and setting a time-varying, hierarchical window structure synchronized with the input. In the study described here the hypothesized role of theta was examined by measuring the intelligibility of speech with a manipulated modulation spectrum. Each critical-band signal was manipulated by controlling the degree of temporal envelope flatness. Intelligibility of speech with critical-band envelopes that are flat is poor; inserting extra information, restricted to the input syllabic rhythm, markedly improves intelligibility. It is concluded that flattening the critical-band envelopes prevents the theta oscillator from tracking the input rhythm, hence the disruption of the hierarchical window structure that controls the decoding process. Reinstating the input-rhythm information revives the tracking capability, hence restoring the synchronization between the window structure and the input, resulting in the extraction of additional information from the flat modulation spectrum. PMID:22811672
Shafiro, Valeriy; Sheft, Stanley; Risley, Robert; Gygi, Brian
How age and hearing loss affect the perception of interrupted speech may vary based on both the physical properties of preserved or obliterated speech fragments and individual listener characteristics. To investigate perceptual processes and interruption parameters influencing intelligibility across interruption rates, participants of different age and hearing status heard sentences interrupted by silence at either a single primary rate (0.5-8 Hz; 25%, 50%, 75% duty cycle) or at an additional concurrent secondary rate (24 Hz; 50% duty cycle). Although age and hearing loss significantly affected intelligibility, the ability to integrate sub-phonemic speech fragments produced by the fast secondary rate was similar in all listener groups. Age and hearing loss interacted with rate with smallest group differences observed at the lowest and highest interruption rates of 0.5 and 24 Hz. Furthermore, intelligibility of dual-rate gated sentences was higher than single-rate gated sentences with the same proportion of retained speech. Correlations of intelligibility of interrupted speech to pure-tone thresholds, age, or measures of working memory and auditory spectro-temporal pattern discrimination were generally low-to-moderate and mostly nonsignificant. These findings demonstrate rate-dependent effects of age and hearing loss on the perception of interrupted speech, suggesting complex interactions of perceptual processes across different time scales.
Jokinen, Emma; Yrttiaho, Santeri; Pulakka, Hannu; Vainio, Martti; Alku, Paavo
Post-filtering can be utilized to improve the quality and intelligibility of telephone speech. Previous studies have shown that energy reallocation with a high-pass type filter works effectively in improving the intelligibility of speech in difficult noise conditions. The present study introduces a signal-to-noise ratio adaptive post-filtering method that utilizes energy reallocation to transfer energy from the first formant to higher frequencies. The proposed method adapts to the level of the background noise so that, in favorable noise conditions, the post-filter has a flat frequency response and the effect of the post-filtering is increased as the level of the ambient noise increases. The performance of the proposed method is compared with a similar post-filtering algorithm and unprocessed speech in subjective listening tests which evaluate both intelligibility and listener preference. The results indicate that both of the post-filtering methods maintain the quality of speech in negligible noise conditions and are able to provide intelligibility improvement over unprocessed speech in adverse noise conditions. Furthermore, the proposed post-filtering algorithm performs better than the other post-filtering method under evaluation in moderate to difficult noise conditions, where intelligibility improvement is mostly required.
Mikamo, S; Kodama, N; Pan, Q; Maeda, N; Minagi, S
Velopharyngeal incompetence is known as a contributing factor to speech disorders. Suwaki et al. reported that nasal speaking valve (NSV) could improve dysarthria by regulating nasal emission utilising one-way valve. However, disease or condition which would be susceptible to treatment by NSV has not been clarified yet. This study aimed to evaluate the effect of NSV by questionnaire survey using ready-made NSV. Subjects were recruited through the internet bulletin, and NSV survey set was sent to the applicant. Sixty-six participants, who agreed to participate in this study, used NSV and mailed back the questionnaire which included self-evaluation and third-party evaluation of speech intelligibility. Statistical analysis revealed that the use of NSV resulted in significant speech intelligibility improvement in both self-evaluation and third-party evaluation (P < 0·01). Regarding the type of underlying disease of dysarthria, significant effect of NSV on self-evaluation of speech intelligibility could be observed in cerebrovascular disease and neurodegenerative disease (P < 0·01) and that on third-party evaluation in neurodegenerative disease (P < 0·01). Eighty-six percent of subjects showed improvement of speech intelligibility by shutting up nostrils by fingers, and the significant effect of NSV on both self-evaluation and third-party evaluation of speech intelligibility was observed (P < 0·001). From the results of this study, it was suggested that NSV would be effective in cerebrovascular disease and neurodegenerative disease, as well as in subjects whose speech intelligibility was improved by closing nostrils.
Rennies, Jan; Brand, Thomas; Kollmeier, Birger
Reverberation usually degrades speech intelligibility for spatially separated speech and noise sources since spatial unmasking is reduced and late reflections decrease the fidelity of the received speech signal. The latter effect could not satisfactorily be predicted by a recently presented binaural speech intelligibility model [Beutelmann et al. (2010). J. Acoust. Soc. Am. 127, 2479-2497]. This study therefore evaluated three extensions of the model to improve its predictions: (1) an extension of the speech intelligibility index based on modulation transfer functions, (2) a correction factor based on the room acoustical quantity "definition," and (3) a separation of the speech signal into useful and detrimental parts. The predictions were compared to results of two experiments in which speech reception thresholds were measured in a reverberant room in quiet and in the presence of a noise source for listeners with normal hearing. All extensions yielded better predictions than the original model when the influence of reverberation was strong, while predictions were similar for conditions with less reverberation. Although model (3) differed substantially in the assumed interaction of binaural processing and early reflections, its predictions were very similar to model (2) that achieved the best fit to the data.
Rennies, Jan; Schepker, Henning; Holube, Inga; Kollmeier, Birger
This study compared the combined effect of noise and reverberation on listening effort and speech intelligibility to predictions of the speech transmission index (STI). Listening effort was measured in normal-hearing subjects using a scaling procedure. Speech intelligibility scores were measured in the same subjects and conditions: (a) Speech-shaped noise as the only interfering factor, (b) + (c) fixed signal-to-noise ratios (SNRs) of 0 or 7 dB and reverberation as detrimental factors, and (d) reverberation as the only detrimental factor. In each condition, SNR and reverberation were combined to produce STI values of 0.17, 0.30, 0.43, 0.57, and 0.70, respectively. Listening effort always decreased with increasing STI, thus enabling a rough prediction, but a significant bias was observed indicating that listening effort was lower in reverberation only than in noise only at the same STI for one type of impulse responses. Accordingly, speech intelligibility increased with increasing STI and was significantly better in reverberation only than in noise only at the same STI. Further analyses showed that the broadband reverberation time is not always a good estimate of speech degradation in reverberation and that different speech materials may differ in their robustness toward detrimental effects of reverberation.
Weiss, Michael W; Bidelman, Gavin M
Auditory experiences including musicianship and bilingualism have been shown to enhance subcortical speech encoding operating below conscious awareness. Yet, the behavioral consequence of such enhanced subcortical auditory processing remains undetermined. Exploiting their remarkable fidelity, we examined the intelligibility of auditory playbacks (i.e., "sonifications") of brainstem potentials recorded in human listeners. We found naive listeners' behavioral classification of sonifications was faster and more categorical when evaluating brain responses recorded in individuals with extensive musical training versus those recorded in nonmusicians. These results reveal stronger behaviorally relevant speech cues in musicians' neural representations and demonstrate causal evidence that superior subcortical processing creates a more comprehensible speech signal (i.e., to naive listeners). We infer that neural sonifications of speech-evoked brainstem responses could be used in the early detection of speech-language impairments due to neurodegenerative disorders, or in objectively measuring individual differences in speech reception solely by listening to individuals' brain activity.
Grange, Jacques A; Culling, John F
Spatial release from masking is traditionally measured with speech in front. The effect of head-orientation with respect to the speech direction has rarely been studied. Speech-reception thresholds (SRTs) were measured for eight head orientations and four spatial configurations. Benefits of head orientation away from the speech source of up to 8 dB were measured. These correlated with predictions of a model based on better-ear listening and binaural unmasking (r = 0.96). Use of spontaneous head orientations was measured when listeners attended to long speech clips of gradually diminishing speech-to-noise ratio in a sound-deadened room. Speech was presented from the loudspeaker that initially faced the listener and noise from one of four other locations. In an undirected paradigm, listeners spontaneously turned their heads away from the speech in 56% of trials. When instructed to rotate their heads in the diminishing speech-to-noise ratio, all listeners turned away from the speech and reached head orientations associated with lower SRTs. Head orientation may prove valuable for hearing-impaired listeners.
Noordhoek, Ingrid M.; Houtgast, Tammo; Festen, Joost M.
In an adaptive listening test, the bandwidth of speech in complementary notched noise was varied. The bandwidth (center frequency 1 kHz) required for 50% speech intelligibility is called Speech Reception Bandwidth Threshold (SRBT). The SRBT was measured for 10 normal-hearing and 30 hearing-impaired listeners. The average SRBT of the normal-hearing listeners is 1.4 octave. The performance of seven hearing-impaired listeners is considered normal, whereas 23 hearing-impaired listeners have a wider-than-normal SRBT. The SRBT of a hearing-impaired listener may be wider than normal, due to inaudibility of a part of the speech band, or to an impairment in the processing of speech. The Speech Intelligibility Index (SII) is used to separate these effects. The SII may be regarded as the proportion of the total speech information that is available to the listener. Each individual SRBT is converted to an SII value. For the normal-hearing listeners, the SII is about 0.3. For 21 hearing-impaired listeners, the SII is higher. This points to a speech-processing impairment in the 1-kHz frequency region. The deviation of an individual SII value from 0.3 can be used to "quantify" the degree of processing impairment.
Lagerberg, Tove B.; Johnels, Jakob Åsberg; Hartelius, Lena; Persson, Christina
Background: The assessment of intelligibility is an essential part of establishing the severity of a speech disorder. The intelligibility of a speaker is affected by a number of different variables relating, "inter alia," to the speech material, the listener and the listener task. Aims: To explore the impact of the number of…
Katongo, Emily Mwamba; Ndhlovu, Daniel
This study sought to establish the role of music in speech intelligibility of learners with Post Lingual Hearing Impairment (PLHI) and strategies teachers used to enhance speech intelligibility in learners with PLHI in selected special units for the deaf in Lusaka district. The study used a descriptive research design. Qualitative and quantitative…
Koning, Raphael; Wouters, Jan
Speech perception by cochlear implant (CI) users can be very good in quiet but their speech intelligibility (SI) performance decreases in noisy environments. Because recent studies have shown that transient parts of the speech envelope are most important for SI in normal-hearing (NH) listeners, the enhanced envelope (EE) strategy was developed to emphasize onset cues of the speech envelope in the CI signal processing chain. The influence of enhancement of the onsets of the speech envelope on SI was investigated with CI users for speech in stationary speech-shaped noise (SSN) and with an interfering talker. All CI users showed an immediate benefit when a priori knowledge was used for the onset enhancement. A SI improvement was obtained at signal-to-noise ratios (SNRs) below 6 dB, corresponding to a speech reception threshold (SRT) improvement of 2.1 dB. Furthermore, stop consonant reception was improved with the EE strategy in quiet and in SSN at 6 dB SNR. For speech in speech, the SRT improvements were 2.1 dB and 1 dB when the onsets of the target speaker with a priori knowledge of the signal components or of the mixture of the target and the interfering speaker were enhanced, respectively. The latter demonstrates that a small benefit can be obtained without a priori knowledge.
Monaghan, Jessica J M; Goehring, Tobias; Yang, Xin; Bolner, Federico; Wang, Shangqiguo; Wright, Matthew C M; Bleeck, Stefan
Machine-learning based approaches to speech enhancement have recently shown great promise for improving speech intelligibility for hearing-impaired listeners. Here, the performance of three machine-learning algorithms and one classical algorithm, Wiener filtering, was compared. Two algorithms based on neural networks were examined, one using a previously reported feature set and one using a feature set derived from an auditory model. The third machine-learning approach was a dictionary-based sparse-coding algorithm. Speech intelligibility and quality scores were obtained for participants with mild-to-moderate hearing impairments listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithms. Intelligibility and quality scores were significantly improved by each of the three machine-learning approaches, but not by the classical approach. The largest improvements for both speech intelligibility and quality were found by implementing a neural network using the feature set based on auditory modeling. Furthermore, neural network based techniques appeared more promising than dictionary-based, sparse coding in terms of performance and ease of implementation.
Jørgensen, Søren; Ewert, Stephan D; Dau, Torsten
The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.
Cogger, M. K.
The intelligibility of speech presented over the earphones of the Mk 4 flying helmet was assessed using the procedure laid down in the Type Test Schedule. Results obtained using three phonetically balanced word lists presented to six subjects on two occasions indicate that speech intelligibility reaches 80 percent, the criterion of acceptability laid down in the schedule. Frequency response curves for the transducer earpiece assemblies of the helmet are given. The total harmonic distortion of the equipment used to present the spoken word lists is shown.
Otero, Devon Sawin; Bolt, Susan; Kapsner-Smith, Mara; Sullivan, Jessica R.
Purpose The purpose of this study was to examine how sentence intelligibility relates to self-reported communication in tracheoesophageal speakers when speech intelligibility is measured in quiet and noise. Method Twenty-four tracheoesophageal speakers who were at least 1 year postlaryngectomy provided audio recordings of 5 sentences from the Sentence Intelligibility Test. Speakers also completed self-reported measures of communication—the Voice Handicap Index-10 and the Communicative Participation Item Bank short form. Speech recordings were presented to 2 groups of inexperienced listeners who heard sentences in quiet or noise. Listeners transcribed the sentences to yield speech intelligibility scores. Results Very weak relationships were found between intelligibility in quiet and measures of voice handicap and communicative participation. Slightly stronger, but still weak and nonsignificant, relationships were observed between measures of intelligibility in noise and both self-reported measures. However, 12 speakers who were more than 65% intelligible in noise showed strong and statistically significant relationships with both self-reported measures (R 2 = .76–.79). Conclusions Speech intelligibility in quiet is a weak predictor of self-reported communication measures in tracheoesophageal speakers. Speech intelligibility in noise may be a better metric of self-reported communicative function for speakers who demonstrate higher speech intelligibility in noise. PMID:27379754
Ryan, Timothy James
The effects of multiple arrivals on the intelligibility of speech produced by live-sound reinforcement systems are examined. The intent is to determine if correlations exist between the manipulation of sound system optimization parameters and the subjective attribute speech intelligibility. Given the number, and wide range, of variables involved, this exploratory research project attempts to narrow the focus of further studies. Investigated variables are delay time between signals arriving from multiple elements of a loudspeaker array, array type and geometry and the two-way interactions of speech-to-noise ratio and array geometry with delay time. Intelligibility scores were obtained through subjective evaluation of binaural recordings, reproduced via headphone, using the Modified Rhyme Test. These word-score results are compared with objective measurements of Speech Transmission Index (STI). Results indicate that both variables, delay time and array geometry, have significant effects on intelligibility. Additionally, it is seen that all three of the possible two-way interactions have significant effects. Results further reveal that the STI measurement method overestimates the decrease in intelligibility due to short delay times between multiple arrivals.
Versfeld, Niek J.; Dreschler, Wouter A.
A conventional measure to determine the ability to understand speech in noisy backgrounds is the so-called speech reception threshold (SRT) for sentences. It yields the signal-to-noise ratio (in dB) for which half of the sentences are correctly perceived. The SRT defines to what degree speech must be audible to a listener in order to become just intelligible. There are indications that elderly listeners have greater difficulty in understanding speech in adverse listening conditions than young listeners. This may be partly due to the differences in hearing sensitivity (presbycusis), hence audibility, but other factors, such as temporal acuity, may also play a significant role. A potential measure for the temporal acuity may be the threshold to which speech can be accelerated, or compressed in time. A new test is introduced where the speech rate is varied adaptively. In analogy to the SRT, the time-compression threshold (or TCT) then is defined as the speech rate (expressed in syllables per second) for which half of the sentences are correctly perceived. In experiment I, the TCT test is introduced and normative data are provided. In experiment II, four groups of subjects (young and elderly normal-hearing and hearing-impaired subjects) participated, and the SRT's in stationary and fluctuating speech-shaped noise were determined, as well as the TCT. The results show that the SRT in fluctuating noise and the TCT are highly correlated. All tests indicate that, even after correction for the hearing loss, elderly normal-hearing subjects perform worse than young normal-hearing subjects. The results indicate that the use of the TCT test or the SRT test in fluctuating noise is preferred over the SRT test in stationary noise.
Prendergast, Garreth; Green, Gary G. R.
Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…
Li, Junfeng; Xia, Risheng; Ying, Dongwen; Yan, Yonghong; Akagi, Masato
Many objective measures have been reported to predict speech intelligibility in noise, most of which were designed and evaluated with English speech corpora. Given the different perceptual cues used by native listeners of different languages, examining whether there is any language effect when the same objective measure is used to predict speech intelligibility in different languages is of great interest, particularly when non-linear noise-reduction processing is involved. In the present study, an extensive evaluation is taken of objective measures for speech intelligibility prediction of noisy speech processed by noise-reduction algorithms in Chinese, Japanese, and English. Of all the objective measures tested, the short-time objective intelligibility (STOI) measure produced the most accurate results in speech intelligibility prediction for Chinese, while the normalized covariance metric (NCM) and middle-level coherence speech intelligibility index ( CSIIm) incorporating the signal-dependent band-importance functions (BIFs) produced the most accurate results for Japanese and English, respectively. The objective measures that performed best in predicting the effect of non-linear noise-reduction processing in speech intelligibility were found to be the BIF-modified NCM measure for Chinese, the STOI measure for Japanese, and the BIF-modified CSIIm measure for English. Most of the objective measures examined performed differently even under the same conditions for different languages.
Durisala, Naresh; Prakash, S G R; Nambi, Arivudai; Batra, Ridhima
The overall goal of this study is to examine the intelligibility differences of clear and conversational speech and also to objectively analyze the acoustic properties contributing to these differences. Seventeen post-lingual stable sensory-neural hearing impaired listeners with an age range of 17-40 years were recruited for the study. Forty Telugu sentences spoken by a female Telugu speaker in both clear and conversational speech styles were used as stimuli for the subjects. Results revealed that mean scores of clear speech were higher (mean = 84.5) when compared to conversational speech (mean = 61.4) with an advantage of 23.1% points. Acoustic properties revealed greater fundamental frequency (f0) and intensity, longer duration, higher consonant-vowel ratio (CVR) and greater temporal energy in clear speech.
Jørgensen, Søren; Dau, Torsten
A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility.
Coyne, Karen M; Barker, Daniel J
Intelligible speech communication while wearing air-purifying respirators is critical for law enforcement officers, particularly when they are communicating with each other or the public. The National Institute for Occupational Safety and Health (NIOSH) requires a 70% overall performance rating to pass speech intelligibility certification for commercial chemical, biological, radiological, and nuclear air-purifying respirators. However, the speech intelligibility of certified respirators is not reported and the impact on operational performance is unknown. The objective of this effort was to assess the speech intelligibility of 12 certified air-purifying respirators and to predict their impact on operational performance. The NIOSH respirator certification standard testing procedures were followed. Regression equations were fit to data from studies that examined the impact of degraded speech intelligibility on operational performance of simple and complex missions. The impact of the tested respirators on operational performance was estimated from these equations. Performance ratings observed for each respirator were: MSA Millennium (90%), 3M FR-M40 (88%), MSA Ultra Elite (87%), Scott M110 (86%), North 5400 (85%), Scott M120 (85%), Avon C50 (84%), Avon FM12 (84%), Survivair Optifit (81%), Drager CDR 4500 (81%), Peltor-AOSafety M-TAC (79%), and 3M FR-7800B (78%). The Millennium and FR-M40 had statistically significantly higher scores than the FR-7800B. The Millennium also scored significantly higher than the M-TAC. All of the tested respirators were predicted to have little impact on simple and complex mission performance times and on simple mission success rate. However, the regression equations showed that 75% of missions that require complex communications would be completed while wearing the Millennium, FR-M40, or Ultra Elite but that only 60% would be completed successfully while wearing the FR-7800B. These results suggest that some certified respirators may have
Balasundaram, Aruna; Vinayagavel, Mythreyi; Bandi, Dhathri Priya
To appreciate any enhancement in speech following gingivectomy of enlarged anterior palatal gingiva. Periodontal literature has documented various conditions, pathophysiology, and treatment modalities of gingival enlargement. Relationship between gingival maladies and speech alteration has received scant attention. This case report describes on altered speech pattern enhancement secondary to the gingivectomy procedure. A systemically healthy 24-year- female patient reported with bilateral anterior gingival enlargement who was provisionally diagnosed as "gingival abscess with inflammatory enlargement" in relation to palatal aspect of the right maxillary canine to left maxillary canine. Bilateral gingivectomy procedure was performed by external bevel incision in relation to anterior palatal gingiva and a large wedge of epithelium and connective tissue was removed. Patient and her close acquaintances noticed a great improvement in her pronunciation and enunciation of sounds like "t", "d", "n", "l", "th", following removal of excess gingival palatal tissue and was also appreciated with visual analog scale score. Exploration of linguistic research documented the significance of tongue-palate contact during speech. Any excess gingival tissue in palatal region brings about disruption in speech by altering tongue-palate contact. Periodontal surgery like gingivectomy may improve disrupted phonetics. Excess gingival palatal tissue impedes on tongue-palate contact and interferes speech. Pronunciation of consonants like "t", "d", "n", "l", "th", are altered with anterior enlarged palatal gingiva. Excision of the enlarged palatal tissue results in improvement of speech.
Xie, Xin; Fowler, Carol A
This study examined the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. In the latter group, it also examined the role of the language environment and English proficiency. Three groups of listeners were tested: native English listeners (NE), Mandarin-speaking Chinese listeners in the US (M-US) and Mandarin listeners in Beijing, China (M-BJ). As a group, M-US and M-BJ listeners were matched on English proficiency and age of acquisition. A nonword transcription task was used. Identification accuracy for word-final stops in the nonwords established two independent interlanguage intelligibility effects. An interlanguage speech intelligibility benefit for listeners (ISIB-L) was manifest by both groups of Mandarin listeners outperforming native English listeners in identification of Mandarin-accented speech. In the benefit for talkers (ISIB-T), only M-BJ listeners were more accurate identifying Mandarin-accented speech than native English speech. Thus, both Mandarin groups demonstrated an ISIB-L while only the M-BJ group overall demonstrated an ISIB-T. The English proficiency of listeners was found to modulate the magnitude of the ISIB-T in both groups. Regression analyses also suggested that the listener groups differ in their use of acoustic information to identify voicing in stop consonants.
Xie, Xin; Fowler, Carol A.
This study examined the intelligibility of native and Mandarin-accented English speech for native English and native Mandarin listeners. In the latter group, it also examined the role of the language environment and English proficiency. Three groups of listeners were tested: native English listeners (NE), Mandarin-speaking Chinese listeners in the US (M-US) and Mandarin listeners in Beijing, China (M-BJ). As a group, M-US and M-BJ listeners were matched on English proficiency and age of acquisition. A nonword transcription task was used. Identification accuracy for word-final stops in the nonwords established two independent interlanguage intelligibility effects. An interlanguage speech intelligibility benefit for listeners (ISIB-L) was manifest by both groups of Mandarin listeners outperforming native English listeners in identification of Mandarin-accented speech. In the benefit for talkers (ISIB-T), only M-BJ listeners were more accurate identifying Mandarin-accented speech than native English speech. Thus, both Mandarin groups demonstrated an ISIB-L while only the M-BJ group overall demonstrated an ISIB-T. The English proficiency of listeners was found to modulate the magnitude of the ISIB-T in both groups. Regression analyses also suggested that the listener groups differ in their use of acoustic information to identify voicing in stop consonants. PMID:24293741
Metz, Dale Evan; And Others
A preliminary scheme for estimating the speech intelligibility of hearing-impaired speakers from acoustic parameters, using a computerized artificial neural network to process mathematically the acoustic input variables, is outlined. Tests with 60 hearing-impaired speakers found the scheme to be highly accurate in identifying speakers separated by…
Joubert, Karin; Bornman, Juan; Alant, Erna
Amyotrophic lateral sclerosis (ALS), a rapidly progressive neuromuscular disease, has a devastating impact not only on individuals diagnosed with ALS but also their spouses. Speech intelligibility, often compromised as a result of dysarthria, affects the couple's ability to maintain effective, intimate communication. The purpose of this…
Lee, Youngmee; Sung, Jee Eu; Sim, Hyunsub
The purpose of this study was to investigate the effects of listeners' working memory (WM), types of noise and signal to noise ratios (SNRs) on speech intelligibility in dysarthria. Speech intelligibility was judged by using a word transcription task. A three-way mixed design (2 × 3 × 2) was used with the WM group (high/low group) as a between-subject factor and the types of noise (multi-talker babble/environmental noise) and SNRs (0, +10 and +20 dB) as within-subject factors. The dependent measure was the percentage of correctly transcribed words. The results revealed that the high WM group performed significantly better than the low WM group and listeners performed significantly better at higher levels of SNRs on the speech intelligibility test. The findings of this study suggested that listeners' cognitive abilities and SNRs should be considered as important factors when evaluating speech intelligibility in dysarthria.
Watson, Peter J.; Schlauch, Robert S.
Purpose: To examine the effect of fundamental frequency (F0) on the intelligibility of speech with flattened F0 contours in noise. Method: Participants listened to sentences produced by 2 female talkers in white noise. The listening conditions included the unmodified original sentences and sentences with resynthesized F0 that reflected the average…
Stelzle, F; Knipfer, C; Schuster, M; Bocklet, T; Nöth, E; Adler, W; Schempf, L; Vieler, P; Riemann, M; Neukam, F W; Nkenke, E
Oral squamous cell carcinoma (OSCC) and its treatment impair speech intelligibility by alteration of the vocal tract. The aim of this study was to identify the factors of oral cancer treatment that influence speech intelligibility by means of an automatic, standardized speech-recognition system. The study group comprised 71 patients (mean age 59.89, range 35-82 years) with OSCC ranging from stage T1 to T4 (TNM staging). Tumours were located on the tongue (n=23), lower alveolar crest (n=27), and floor of the mouth (n=21). Reconstruction was conducted through local tissue plasty or microvascular transplants. Adjuvant radiotherapy was performed in 49 patients. Speech intelligibility was evaluated before, and at 3, 6, and 12 months after tumour resection, and compared to that of a healthy control group (n=40). Postoperatively, significant influences on speech intelligibility were tumour localization (P=0.010) and resection volume (P=0.019). Additionally, adjuvant radiotherapy (P=0.049) influenced intelligibility at 3 months after surgery. At 6 months after surgery, influences were resection volume (P=0.028) and adjuvant radiotherapy (P=0.034). The influence of tumour localization (P=0.001) and adjuvant radiotherapy (P=0.022) persisted after 12 months. Tumour localization, resection volume, and radiotherapy are crucial factors for speech intelligibility. Radiotherapy significantly impaired word recognition rate (WR) values with a progression of the impairment for up to 12 months after surgery.
Arıöz, Umut; Günel, Banu
High frequency hearing loss is a growing problem for both children and adults. To overcome this impairment, different frequency lowering methods (FLMs) were tried from 1930s, however no satisfaction was provided up to now. In this study, for getting higher speech intelligibility, eight combinations of FLMs which were designed originally were tried with simulated sounds onto normal hearing subjects. These improvements were calculated by the difference with standard hearing aid method, amplification. High frequency hearing loss was simulated with the combined suprathreshold effects. An offline study was carried out for each subject for determining the significant methods used in modified rhyme test (MRT) (Subjective measure for intelligibility). Significant methods were determined according to their speech intelligibility index (SII) (Objective measure for intelligibility). All different cases were tried under four noisy environments and a noise free environment. Twelve hearing impaired subjects were simulated by hearing loss simulation (HLS). MRT was developed for Turkish language as a first time. As the results of improvements, total 71 cases were statistically significant for twelve subjects. Eighty-three percent success of FLMs was achieved against amplification for being an alternative method of amplification in noisy environments. For four subjects, all significant methods gave higher improvements than amplification. As conclusion, specific method recommendations for different noisy environments were done for each subject for getting more speech intelligibility.
Adank, Patti; Rueschemeyer, Shirley-Ann; Bekkering, Harold
Recent theories on how listeners maintain perceptual invariance despite variation in the speech signal allocate a prominent role to imitation mechanisms. Notably, these simulation accounts propose that motor mechanisms support perception of ambiguous or noisy signals. Indeed, imitation of ambiguous signals, e.g., accented speech, has been found to aid effective speech comprehension. Here, we explored the possibility that imitation in speech benefits perception by increasing activation in speech perception and production areas. Participants rated the intelligibility of sentences spoken in an unfamiliar accent of Dutch in a functional Magnetic Resonance Imaging experiment. Next, participants in one group repeated the sentences in their own accent, while a second group vocally imitated the accent. Finally, both groups rated the intelligibility of accented sentences in a post-test. The neuroimaging results showed an interaction between type of training and pre- and post-test sessions in left Inferior Frontal Gyrus, Supplementary Motor Area, and left Superior Temporal Sulcus. Although alternative explanations such as task engagement and fatigue need to be considered as well, the results suggest that imitation may aid effective speech comprehension by supporting sensorimotor integration. PMID:24109447
Trimmer, Christopher G; Cuddy, Lola L
Is music training associated with greater sensitivity to emotional prosody in speech? University undergraduates (n = 100) were asked to identify the emotion conveyed in both semantically neutral utterances and melodic analogues that preserved the fundamental frequency contour and intensity pattern of the utterances. Utterances were expressed in four basic emotional tones (anger, fear, joy, sadness) and in a neutral condition. Participants also completed an extended questionnaire about music education and activities, and a battery of tests to assess emotional intelligence, musical perception and memory, and fluid intelligence. Emotional intelligence, not music training or music perception abilities, successfully predicted identification of intended emotion in speech and melodic analogues. The ability to recognize cues of emotion accurately and efficiently across domains may reflect the operation of a cross-modal processor that does not rely on gains of perceptual sensitivity such as those related to music training.
Cox, R M; McDaniel, D M
The Speech Intelligibility Rating (SIR) Test has been developed for use in clinical comparisons of hearing aid conditions. After listening to a short passage of connected speech, subjects generate a rating proportional to its intelligibility using an equal-appearing interval scale from 0 to 10. Before test passages are presented, the signal-to-babble ratio (SBR) is adjusted to a level that elicits intelligibility ratings of 7-8 for a "setup" passage. Then, with SBR held constant, three or more test passages are rated and the results averaged for each aided condition. This paper describes the generation of recorded test materials and their investigation using normally hearing listeners. Based on these data, a critical difference of about 2 scale intervals is recommended. A future paper will deal with results for hearing-impaired subjects.
Lawrence, Halcyon M.
There continues to be significant growth in the development and use of speech--mediated devices and technology products; however, there is no evidence that non-native English speech is used in these devices, despite the fact that English is now spoken by more non-native speakers than native speakers, worldwide. This relative absence of nonnative…
Reinten, Jikke; van Hout, Nicole; Hak, Constant; Kort, Helianthe
Adapting the built environment to the needs of nursing- or care-home residents has become common practice. Even though hearing loss due to ageing is a normal occurring biological process, little research has been performed on the effects of room acoustic parameters on the speech intelligibility for older adults. This article presents the results of room acoustic measurements in common rooms for older adults and the effect on speech intelligibility. Perceived speech intelligibility amongst the users of the rooms was also investigated. The results have led to ongoing research at Utrecht University of Applied Sciences and Eindhoven University of Technology, aimed at the development of acoustical guidelines for elderly care facilities.
Bradlow, Ann R.; Torretta, Gina M.; Pisoni, David B.
This study used a multi-talker database containing intelligibility scores for 2000 sentences (20 talkers, 100 sentences), to identify talker-related correlates of speech intelligibility. We first investigated “global” talker characteristics (e.g., gender, F0 and speaking rate). Findings showed female talkers to be more intelligible as a group than male talkers. Additionally, we found a tendency for F0 range to correlate positively with higher speech intelligibility scores. However, F0 mean and speaking rate did not correlate with intelligibility. We then examined several fine-grained acoustic-phonetic talker-characteristics as correlates of overall intelligibility. We found that talkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces. In investigating two cases of consistent listener errors (segment deletion and syllable affiliation), we found that these perceptual errors could be traced directly to detailed timing characteristics in the speech signal. Results suggest that a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker. Knowledge about these factors may be valuable for improving speech synthesis and recognition strategies, and for special populations (e.g., the hearing-impaired and second-language learners) who are particularly sensitive to intelligibility differences among talkers. PMID:21461127
Baumgärtel, Regina M; Hu, Hongmei; Krawczyk-Becker, Martin; Marquardt, Daniel; Herzke, Tobias; Coleman, Graham; Adiloğlu, Kamil; Bomke, Katrin; Plotz, Karsten; Gerkmann, Timo; Doclo, Simon; Kollmeier, Birger; Hohmann, Volker; Dietz, Mathias
Several binaural audio signal enhancement algorithms were evaluated with respect to their potential to improve speech intelligibility in noise for users of bilateral cochlear implants (CIs). 50% speech reception thresholds (SRT50) were assessed using an adaptive procedure in three distinct, realistic noise scenarios. All scenarios were highly nonstationary, complex, and included a significant amount of reverberation. Other aspects, such as the perfectly frontal target position, were idealized laboratory settings, allowing the algorithms to perform better than in corresponding real-world conditions. Eight bilaterally implanted CI users, wearing devices from three manufacturers, participated in the study. In all noise conditions, a substantial improvement in SRT50 compared to the unprocessed signal was observed for most of the algorithms tested, with the largest improvements generally provided by binaural minimum variance distortionless response (MVDR) beamforming algorithms. The largest overall improvement in speech intelligibility was achieved by an adaptive binaural MVDR in a spatially separated, single competing talker noise scenario. A no-pre-processing condition and adaptive differential microphones without a binaural link served as the two baseline conditions. SRT50 improvements provided by the binaural MVDR beamformers surpassed the performance of the adaptive differential microphones in most cases. Speech intelligibility improvements predicted by instrumental measures were shown to account for some but not all aspects of the perceptually obtained SRT50 improvements measured in bilaterally implanted CI users.
Utianski, Rene L; Caviness, John N; Liss, Julie M
High-density electroencephalography was used to evaluate cortical activity during speech comprehension via a sentence verification task. Twenty-four participants assigned true or false to sentences produced with 3 noise-vocoded channel levels (1--unintelligible, 6--decipherable, 16--intelligible), during simultaneous EEG recording. Participant data were sorted into higher- (HP) and lower-performing (LP) groups. The identification of a late-event related potential for LP listeners in the intelligible condition and in all listeners when challenged with a 6-Ch signal supports the notion that this induced potential may be related to either processing degraded speech, or degraded processing of intelligible speech. Different cortical locations are identified as neural generators responsible for this activity; HP listeners are engaging motor aspects of their language system, utilizing an acoustic-phonetic based strategy to help resolve the sentence, while LP listeners do not. This study presents evidence for neurophysiological indices associated with more or less successful speech comprehension performance across listening conditions.
Mechergui, Nader; Djaziri-Larbi, Sonia; Jaïdane, Mériem
A method to measure the speech intelligibility in public address systems for normal hearing and hearing impaired persons is presented. The proposed metric is an extension of the speech based Speech Transmission Index to account for accurate perceptual masking and variable hearing ability: The sound excitation pattern generated at the ear is accurately computed using an auditory filter model, and its shapes depend on frequency, sound level, and hearing impairment. This extension yields a better prediction of the intensity of auditory masking which is used to rectify the modulation transfer function and thus to objectively assess the speech intelligibility experienced by hearing impaired as well as by normal hearing persons in public spaces. The proposed metric was developed within the framework of the European Active and Assisted Living research program, and was labeled "SB-STI for All." Extensive subjective in-Lab and in vivo tests have been conducted and the proposed metric proved to have a good correlation with subjective intelligibility scores.
Ferguson, Sarah Hargus
It is well known that, for listeners with normal hearing, speech produced by non-native speakers of the listener's first language is less intelligible than speech produced by native speakers. Intelligibility is well correlated with listener's ratings of talker comprehensibility and accentedness, which have been shown to be related to several talker factors, including age of second language acquisition and level of similarity between the talker's native and second language phoneme inventories. Relatively few studies have focused on factors extrinsic to the talker. The current project explored the effects of listener and environmental factors on the intelligibility of foreign-accented speech. Specifically, monosyllabic English words previously recorded from two talkers, one a native speaker of American English and the other a native speaker of Spanish, were presented to three groups of listeners (young listeners with normal hearing, elderly listeners with normal hearing, and elderly listeners with hearing impairment; n=20 each) in three different listening conditions (undistorted words in quiet, undistorted words in 12-talker babble, and filtered words in quiet). Data analysis will focus on interactions between talker accent, listener age, listener hearing status, and listening condition. [Project supported by American Speech-Language-Hearing Association AARC Award.
Wójcicki, Kamil K.; Loizou, Philipos C.
Background noise reduces the depth of the low-frequency envelope modulations known to be important for speech intelligibility. The relative strength of the target and masker envelope modulations can be quantified using a modulation signal-to-noise ratio, (S/N)mod, measure. Such a measure can be used in noise-suppression algorithms to extract target-relevant modulations from the corrupted (target + masker) envelopes for potential improvement in speech intelligibility. In the present study, envelopes are decomposed in the modulation spectral domain into a number of channels spanning the range of 0–30 Hz. Target-dominant modulations are identified and retained in each channel based on the (S/N)mod selection criterion, while modulations which potentially interfere with perception of the target (i.e., those dominated by the masker) are discarded. The impact of modulation-selective processing on the speech-reception threshold for sentences in noise is assessed with normal-hearing listeners. Results indicate that the intelligibility of noise-masked speech can be improved by as much as 13 dB when preserving target-dominant modulations, present up to a modulation frequency of 18 Hz, while discarding masker-dominant modulations from the mixture envelopes. PMID:22501068
Hazrati, Oldooz; Loizou, Philipos C.
Objective The purpose of this study is to assess the individual effect of reverberation and noise, as well as their combined effect, on speech intelligibility by cochlear implant (CI) users. Design Sentence stimuli corrupted by reverberation, noise, and reverberation + noise are presented to 11 CI listeners for word identification. They are tested in two reverberation conditions (T60 = 0.6 s, 0.8 s), two noise conditions (SNR = 5 dB, 10 dB), and four reverberation + noise conditions. Study sample Eleven CI users participated. Results Results indicated that reverberation degrades speech intelligibility to a greater extent than additive noise (speech-shaped noise), at least for the SNR levels tested. The combined effects were greater than those introduced by either reverberation or noise alone. Conclusions The effect of reverberation on speech intelligibility by CI users was found to be larger than that by noise. The results from the present study highlight the importance of testing CI users in reverberant conditions, since testing in noise-alone conditions might underestimate the difficulties they experience in their daily lives where reverberation and noise often coexist. PMID:22356300
Vocoder simulation has been long applied as an effective tool to assess factors influencing the intelligibility of cochlear implants listeners. Considering that the temporal envelope information contained in contiguous bands of vocoded speech is correlated and redundant, this study examined the hypothesis that the intelligibility measure evaluating the distortions from a small number of selected envelope cues is sufficient to well predict the intelligibility scores. The speech intelligibility data from 80 conditions was collected from vocoder simulation experiments involving 22 normal-hearing listeners. The relative importance of temporal envelope information in cochlear-implant vocoded speech was modeled by correlating its speech-transmission indices (STIs) with the intelligibility scores. The relative importance pattern was subsequently utilized to determine a binary weight vector for STIs of all envelopes to compute the index predicting the speech intelligibility. A high correlation (r=0.95) was obtained when selecting a small number (e.g., 4 out of 20) of temporal envelope cues from disjoint bands to predict the intelligibility of cochlear-implant vocoded speech.
Hossain, Mohammad E.; Jassim, Wissam A.; Zilany, Muhammad S. A.
Sensorineural hearing loss occurs due to damage to the inner and outer hair cells of the peripheral auditory system. Hearing loss can cause decreases in audibility, dynamic range, frequency and temporal resolution of the auditory system, and all of these effects are known to affect speech intelligibility. In this study, a new reference-free speech intelligibility metric is proposed using 2-D neurograms constructed from the output of a computational model of the auditory periphery. The responses of the auditory-nerve fibers with a wide range of characteristic frequencies were simulated to construct neurograms. The features of the neurograms were extracted using third-order statistics referred to as bispectrum. The phase coupling of neurogram bispectrum provides a unique insight for the presence (or deficit) of supra-threshold nonlinearities beyond audibility for listeners with normal hearing (or hearing loss). The speech intelligibility scores predicted by the proposed method were compared to the behavioral scores for listeners with normal hearing and hearing loss both in quiet and under noisy background conditions. The results were also compared to the performance of some existing methods. The predicted results showed a good fit with a small error suggesting that the subjective scores can be estimated reliably using the proposed neural-response-based metric. The proposed metric also had a wide dynamic range, and the predicted scores were well-separated as a function of hearing loss. The proposed metric successfully captures the effects of hearing loss and supra-threshold nonlinearities on speech intelligibility. This metric could be applied to evaluate the performance of various speech-processing algorithms designed for hearing aids and cochlear implants. PMID:26967160
Colburn, H. Steven
Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. PMID:27698261
Mi, Jing; Colburn, H Steven
Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model.
Pollard, Kimberly A; Tran, Phuong K; Letowski, Tomasz
Bone conduction (BC) communication systems provide benefits over air conduction systems but are not in widespread use, partly due to problems with speech intelligibility. Contributing factors like device location and background noise have been explored, but little attention has been paid to the role of individual user differences. Because BC signals travel through an individual's skull and facial tissues, demographic factors such as user age, sex, race, or regional origin may influence sound transmission. Vocal traits such as pitch, spectral tilt, jitter, and shimmer may also play a role. Along with microphone placement and background noise, these factors can affect BC speech intelligibility. Eight diverse talkers were recorded with bone microphones on two different skull locations and in different background noise conditions. Twenty-four diverse listeners listened to these samples over BC and completed Modified Rhyme Tests for speech intelligibility. Forehead bone recordings were more intelligible than condyle recordings. In condyle recordings, female talkers, talkers with high fundamental frequency, and talkers in background noise were understood better, as were communications between talkers and listeners of the same regional origin. Listeners' individual traits had no significant effects. Thoughtful application of this knowledge can help improve BC communication for diverse users.
Yunusova, Yana; Weismer, Gary; Westbury, John; Rusche, Nicole
Understanding factors underlying intelligibility deficits in dysarthria is important for clinical and theoretical reasons. Correlation/regression analyses between intelligibility measures and various speech production measures (e.g., acoustic or phonetic) are often reported in the literature. However, the analyses rarely control for the effect of a third variable (severity of speech disorder, in this case) likely to be correlated with the primary correlated variables. The current report controls for this effect by using a within-speaker analysis approach. Factors that were hypothesized to underlie the intelligibility variations in multiple breath groups within a connected discourse included structural elements (e.g., number of total words) as well as acoustic measures (e.g., F2 variation). Results showed that speech intelligibility in dysarthric speakers with two forms of neurological disease (Parkinson and ALS) does, in fact, vary across breath groups extracted from a connected discourse, and that these variations are related in some cases to a per breath estimate of F2 variation. [Work supported by NIDCD Award No. R01 DC03723.
Summers, Van; Cord, Mary T
These experiments examined how high presentation levels influence speech recognition for high- and low-frequency stimuli in noise. Normally hearing (NH) and hearing-impaired (HI) listeners were tested. In Experiment 1, high- and low-frequency bandwidths yielding 70%-correct word recognition in quiet were determined at levels associated with broadband speech at 75 dB SPL. In Experiment 2, broadband and band-limited sentences (based on passbands measured in Experiment 1) were presented at this level in speech-shaped noise filtered to the same frequency bandwidths as targets. Noise levels were adjusted to produce approximately 30%-correct word recognition. Frequency bandwidths and signal-to-noise ratios supporting criterion performance in Experiment 2 were tested at 75, 87.5, and 100 dB SPL in Experiment 3. Performance tended to decrease as levels increased. For NH listeners, this "rollover" effect was greater for high-frequency and broadband materials than for low-frequency stimuli. For HI listeners, the 75- to 87.5-dB increase improved signal audibility for high-frequency stimuli and rollover was not observed. However, the 87.5- to 100-dB increase produced qualitatively similar results for both groups: scores decreased most for high-frequency stimuli and least for low-frequency materials. Predictions of speech intelligibility by quantitative methods such as the Speech Intelligibility Index may be improved if rollover effects are modeled as frequency dependent.
Ma, Jianfen; Hu, Yi; Loizou, Philipos C
The articulation index (AI), speech-transmission index (STI), and coherence-based intelligibility metrics have been evaluated primarily in steady-state noisy conditions and have not been tested extensively in fluctuating noise conditions. The aim of the present work is to evaluate the performance of new speech-based STI measures, modified coherence-based measures, and AI-based measures operating on short-term (30 ms) intervals in realistic noisy conditions. Much emphasis is placed on the design of new band-importance weighting functions which can be used in situations wherein speech is corrupted by fluctuating maskers. The proposed measures were evaluated with intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech (consonants and sentences) corrupted by four different maskers (car, babble, train, and street interferences). Of all the measures considered, the modified coherence-based measures and speech-based STI measures incorporating signal-specific band-importance functions yielded the highest correlations (r=0.89-0.94). The modified coherence measure, in particular, that only included vowel/consonant transitions and weak consonant information yielded the highest correlation (r=0.94) with sentence recognition scores. The results from this study clearly suggest that the traditional AI and STI indices could benefit from the use of the proposed signal- and segment-dependent band-importance functions.
Ma, Lin; Zhang, Mancai
Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation. PMID:28194222
Fang, Chunying; Li, Haifeng; Ma, Lin; Zhang, Mancai
Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.
Rong, Panying; Loucks, Torrey; Kim, Heejin; Hasegawa-Johnson, Mark
A multimodal approach combining acoustics, intelligibility ratings, articulography and surface electromyography was used to examine the characteristics of dysarthria due to cerebral palsy (CP). CV syllables were studied by obtaining the slope of F2 transition during the diphthong, tongue-jaw kinematics during the release of the onset consonant, and the related submental muscle activities and relating these measures to speech intelligibility. The results show that larger reductions of F2 slope are correlated with lower intelligibility in CP-related dysarthria. Among the three speakers with CP, the speaker with the lowest F2 slope and intelligibility showed smallest tongue release movement and largest jaw opening movement. The other two speakers with CP were comparable in the amplitude and velocity of tongue movements, but one speaker had abnormally prolonged jaw movement. The tongue-jaw coordination pattern found in the speakers with CP could be either compensatory or subject to an incompletely developed oromotor control system.
Cervera, Teresa; González-Alvarez, Julio
This article describes the development of a test for measuring the intelligibility of speech in noise for the Spanish language, similar to the test developed by Kalikow, Stevens, and Elliot (Journal of the Acoustical Society of America, 5, 1337-1360, 1977) for the English language. The test consists of six forms, each comprising 25 high-predictability (HP) sentences and 25 low-predictability (LP) sentences. The sentences were used in a perceptual task to assess their intelligibility in babble noise across three different signal-to-noise ratio (SNR) conditions in a sample of 474 normal-hearing listeners. The results showed that the listeners obtained higher scores of intelligibility for HP sentences than for LP sentences, and the scores were lower for the higher SNRs, as was expected. The final six forms were equivalent in intelligibility and phonetic content.
Mencke, E O; Ochsner, G J; Testut, E W
Two listener groups, one experienced and the other inexperienced in listening to deaf speakers, were asked to recognize speech sounds in word contexts presented in two modes: auditory only and auditory-visual. In contrast to previous studies, the experienced and inexperienced listener groups performed similarly. The one exception occurred for speech sounds in the final position presented auditory visually where the experienced listeners' performance surpassed that of inexperienced listeners. For both groups, performance in correct phoneme identification was better in the auditory-visual mode of presentation. Furthermore, and interaction between the position-in-word and the mode of stimulus presentation was present in both groups. In the auditory-only task, listeners correctly identified more target phonemes when they were initial rather than final position in a word. With supplemental visual information, listener performance increased our phonemes in both positions within a word, although the increase was greater for final position phonemes.
Holdgraf, Christopher R.; de Heer, Wendy; Pasley, Brian; Rieger, Jochem; Crone, Nathan; Lin, Jack J.; Knight, Robert T.; Theunissen, Frédéric E.
Experience shapes our perception of the world on a moment-to-moment basis. This robust perceptual effect of experience parallels a change in the neural representation of stimulus features, though the nature of this representation and its plasticity are not well-understood. Spectrotemporal receptive field (STRF) mapping describes the neural response to acoustic features, and has been used to study contextual effects on auditory receptive fields in animal models. We performed a STRF plasticity analysis on electrophysiological data from recordings obtained directly from the human auditory cortex. Here, we report rapid, automatic plasticity of the spectrotemporal response of recorded neural ensembles, driven by previous experience with acoustic and linguistic information, and with a neurophysiological effect in the sub-second range. This plasticity reflects increased sensitivity to spectrotemporal features, enhancing the extraction of more speech-like features from a degraded stimulus and providing the physiological basis for the observed ‘perceptual enhancement' in understanding speech. PMID:27996965
Pinkoski-Ball, Carrie L.; Reichle, Joe; Munson, Benjamin
Purpose: This investigation examined the effect of repeated exposure to novel and repeated spoken words in typical environments on the intelligibility of 2 synthesized voices and human recorded speech in preschools. Method: Eighteen preschoolers listened to and repeated single words presented in human-recorded speech, DECtalk Paul, and AT&T Voice…
Barnes, Elizabeth; Roberts, Joanne; Long, Steven H.; Martin, Gary E.; Berni, Mary C.; Mandulak, Kerry C.; Sideris, John
Purpose: To compare the phonological accuracy and speech intelligibility of boys with fragile X syndrome with autism spectrum disorder (FXS-ASD), fragile X syndrome only (FXS-O), Down syndrome (DS), and typically developing (TD) boys. Method: Participants were 32 boys with FXS-O (3-14 years), 31 with FXS-ASD (5-15 years), 34 with DS (4-16 years),…
Chen, Jing; Baer, Thomas; Moore, Brian C J
Most information in speech is carried in spectral changes over time, rather than in static spectral shape per se. A form of signal processing aimed at enhancing spectral changes over time was developed and evaluated using hearing-impaired listeners. The signal processing was based on the overlap-add method, and the degree and type of enhancement could be manipulated via four parameters. Two experiments were conducted to assess speech intelligibility and clarity preferences. Three sets of parameter values (one corresponding to a control condition), two types of masker (steady speech-spectrum noise and two-talker speech) and two signal-to-masker ratios (SMRs) were used for each masker type. Generally, the effects of the processing were small, although intelligibility was improved by about 8 percentage points relative to the control condition for one set of parameter values using the steady noise masker at -6 dB SMR. The processed signals were not preferred over those for the control condition, except for the steady noise masker at -6 dB SMR. Further work is needed to determine whether tailoring the processing to the characteristics of the individual hearing-impaired listener is beneficial.
The MTF-STI method which is a physical method for measuring the quality of speech-transmission in a tunnel was investigated and it appears that the STI, which can be deduced from the MTF, correlates highly with the sound articulation score. The character of the information loss represented by the MTF, and the calculating system of the MTF are considered. In this system the effect of the reverberation on the MTF is calculated from the impulse response in a tunnel, and the effect of the noise separate from the effect of the reverberation is considered. The MTF is converted to the STI (Speech Transmission Index), which corresponds directly to the speech intelligibility. Essentially the STI represents an extension of the Articulation Index (AI) concept, therefore we determine the values of the parameters used in the STI calculation from the parameters of the AI for Japanese. Resulting STI correlates highly with the -log(1-s), where s is a sound articulation score. The data suggest that the STI may serve as a convenient predictor of speech intelligibility in a tunnel.
Åström, Mattias; Tripoliti, Elina; Hariz, Marwan I.; Zrinzo, Ludvic U.; Martinez-Torres, Irene; Limousin, Patricia; Wårdell, Karin
Background/Aims Deep brain stimulation (DBS) is widely used to treat motor symptoms in patients with advanced Parkinson's disease. The aim of this study was to investigate the anatomical aspects of the electric field in relation to effects on speech and movement during DBS in the subthalamic nucleus. Methods Patient-specific finite element models of DBS were developed for simulation of the electric field in 10 patients. In each patient, speech intelligibility and movement were assessed during 2 electrical settings, i.e. 4 V (high) and 2 V (low). The electric field was simulated for each electrical setting. Results Movement was improved in all patients for both high and low electrical settings. In general, high-amplitude stimulation was more consistent in improving the motor scores than low-amplitude stimulation. In 6 cases, speech intelligibility was impaired during high-amplitude electrical settings. Stimulation of part of the fasciculus cerebellothalamicus from electrodes positioned medial and/or posterior to the center of the subthalamic nucleus was recognized as a possible cause of the stimulation-induced dysarthria. Conclusion Special attention to stimulation-induced speech impairments should be taken in cases when active electrodes are positioned medial and/or posterior to the center of the subthalamic nucleus. PMID:20460952
Hwang, Chung-Feng; Ko, Hui-Chen; Tsou, Yung-Ting; Chan, Kai-Chieh; Fang, Hsuan-Yeh; Wu, Che-Ming
Objectives. We evaluated the causes, hearing, and speech performance before and after cochlear implant reimplantation in Mandarin-speaking users. Methods. In total, 589 patients who underwent cochlear implantation in our medical center between 1999 and 2014 were reviewed retrospectively. Data related to demographics, etiologies, implant-related information, complications, and hearing and speech performance were collected. Results. In total, 22 (3.74%) cases were found to have major complications. Infection (n = 12) and hard failure of the device (n = 8) were the most common major complications. Among them, 13 were reimplanted in our hospital. The mean scores of the Categorical Auditory Performance (CAP) and the Speech Intelligibility Rating (SIR) obtained before and after reimplantation were 5.5 versus 5.8 and 3.7 versus 4.3, respectively. The SIR score after reimplantation was significantly better than preoperation. Conclusions. Cochlear implantation is a safe procedure with low rates of postsurgical revisions and device failures. The Mandarin-speaking patients in this study who received reimplantation had restored auditory performance and speech intelligibility after surgery. Device soft failure was rare in our series, calling attention to Mandarin-speaking CI users requiring revision of their implants due to undesirable symptoms or decreasing performance of uncertain cause. PMID:27413753
Liu, Chang; Jin, Su-Hyun
This study examined intelligibility of twelve American English vowels produced by English, Chinese, and Korean native speakers in quiet and speech-shaped noise in which vowels were presented at six sensation levels from 0 dB to 10 dB. The slopes of vowel intelligibility functions and the processing time for listeners to identify vowels were…
Rhebergen, Koenraad S.; Versfeld, Niek J.
The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role. .
Rhebergen, Koenraad S; Versfeld, Niek J
The SII model in its present form (ANSI S3.5-1997, American National Standards Institute, New York) can accurately describe intelligibility for speech in stationary noise but fails to do so for nonstationary noise maskers. Here, an extension to the SII model is proposed with the aim to predict the speech intelligibility in both stationary and fluctuating noise. The basic principle of the present approach is that both speech and noise signal are partitioned into small time frames. Within each time frame the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. Using speech reception threshold (SRT) data from the literature, the extension to the present SII model can give a good account for SRTs in stationary noise, fluctuating speech noise, interrupted noise, and multiple-talker noise. The predictions for sinusoidally intensity modulated (SIM) noise and real speech or speech-like maskers are better than with the original SII model, but are still not accurate. For the latter type of maskers, informational masking may play a role.
Bishop, J; Bahr, R H; Gelfer, M P
It is common knowledge among field personnel that poor speech intelligibility can occur when chemical-biological warfare (CBW) masks are worn: indeed, many users resort to hand signals for person-to-person communicative purposes. This study was conducted in an effort to generate basic information about the problem; its focus was on the assessment of, and comparisons among, the communicative efficiency of seven different CBW units. Near-field word intelligibility was assessed by use of rhyming minimal contrast tests; user and acoustic restrictions were studied by means of diadochokinetic tests and system frequency response. The near-field word intelligibility of six American-designed masks varied somewhat, but overall it was reasonably good; however, a Russian unit did not perform well. Second, three of the U.S. masks were found to produce less physiological restraint than the others, and the Soviet mask produced the greatest physiological restraint. Finally, a few of the CBW masks also exhibited very low levels of acoustic distortion. Accordingly, it was concluded that two of the several configurations studied exhibited superior features. Other factors being equal, they can be recommended for field use and as a basis for the development of future generations of CBW masks. However, it also should be noted that although these devices provided reasonably good speech intelligibility when the listener was close to the talker, they do not appear to do so even at minimal distances.
Smiljanić, Rajka; Bradlow, Ann R.
This study investigated how native language background interacts with speaking style adaptations in determining levels of speech intelligibility. The aim was to explore whether native and high proficiency non-native listeners benefit similarly from native and non-native clear speech adjustments. The sentence-in-noise perception results revealed that fluent non-native listeners gained a large clear speech benefit from native clear speech modifications. Furthermore, proficient non-native talkers in this study implemented conversational-to-clear speaking style modifications in their second language (L2) that resulted in significant intelligibility gain for both native and non-native listeners. The results of the accentedness ratings obtained for native and non-native conversational and clear speech sentences showed that while intelligibility was improved, the presence of foreign accent remained constant in both speaking styles. This suggests that objective intelligibility and subjective accentedness are two independent dimensions of non-native speech. Overall, these results provide strong evidence that greater experience in L2 processing leads to improved intelligibility in both production and perception domains. These results also demonstrated that speaking style adaptations along with less signal distortion can contribute significantly towards successful native and non-native interactions. PMID:22225056
Rhebergen, Koenraad S.; Versfeld, Niek J.
The speech intelligibility index (SII) is frequently used to predict the speech intelligibility for speech in a given interfering noise. However, the SII model only has been validated for speech in stationary noise. Since the SII departs from speech and noise spectra, it does not take into account any fluctuations in the masking noise. Hence, the model will yield similar SII values, regardless of the degree of fluctuation. In contrast, from the literature it is clear that normal-hearing listeners can benefit from the fluctuations in the noise. The present paper describes an SII-based approach to model speech reception thresholds (SRTs) for speech in both stationary and fluctuating noise. The basic principle of this approach is that both speech and noise signals are partitioned into small time frames. Within each time frame, the conventional SII is determined, yielding the speech information available to the listener at that time frame. Next, the SII values of these time frames are averaged, resulting in the SII for that particular condition. With the aid of SRT data from the literature, it will be shown that this approach can give a good account for most existing data.
Hu, Yi; Loizou, Philipos C
Attempts to develop noise-suppression algorithms that can significantly improve speech intelligibility in noise by cochlear implant (CI) users have met with limited success. This is partly because algorithms were sought that would work equally well in all listening situations. Accomplishing this has been quite challenging given the variability in the temporal/spectral characteristics of real-world maskers. A different approach is taken in the present study focused on the development of environment-specific noise suppression algorithms. The proposed algorithm selects a subset of the envelope amplitudes for stimulation based on the signal-to-noise ratio (SNR) of each channel. Binary classifiers, trained using data collected from a particular noisy environment, are first used to classify the mixture envelopes of each channel as either target-dominated (SNR>or=0 dB) or masker-dominated (SNR<0 dB). Only target-dominated channels are subsequently selected for stimulation. Results with CI listeners indicated substantial improvements (by nearly 44 percentage points at 5 dB SNR) in intelligibility with the proposed algorithm when tested with sentences embedded in three real-world maskers. The present study demonstrated that the environment-specific approach to noise reduction has the potential to restore speech intelligibility in noise to a level near to that attained in quiet.
Yousefian Jazi, Nima
Spatial filtering and directional discrimination has been shown to be an effective pre-processing approach for noise reduction in microphone array systems. In dual-microphone hearing aids, fixed and adaptive beamforming techniques are the most common solutions for enhancing the desired speech and rejecting unwanted signals captured by the microphones. In fact, beamformers are widely utilized in systems where spatial properties of target source (usually in front of the listener) is assumed to be known. In this dissertation, some dual-microphone coherence-based speech enhancement techniques applicable to hearing aids are proposed. All proposed algorithms operate in the frequency domain and (like traditional beamforming techniques) are purely based on the spatial properties of the desired speech source and does not require any knowledge of noise statistics for calculating the noise reduction filter. This benefit gives our algorithms the ability to address adverse noise conditions, such as situations where interfering talker(s) speaks simultaneously with the target speaker. In such cases, the (adaptive) beamformers lose their effectiveness in suppressing interference, since the noise channel (reference) cannot be built and updated accordingly. This difference is the main advantage of the proposed techniques in the dissertation over traditional adaptive beamformers. Furthermore, since the suggested algorithms are independent of noise estimation, they offer significant improvement in scenarios that the power level of interfering sources are much more than that of target speech. The dissertation also shows the premise behind the proposed algorithms can be extended and employed to binaural hearing aids. The main purpose of the investigated techniques is to enhance the intelligibility level of speech, measured through subjective listening tests with normal hearing and cochlear implant listeners. However, the improvement in quality of the output speech achieved by the
Houtgast, T.; Steeneken, H. J. M.
The physical measure Rapid Speech Transmission Index (RASTI) was developed to assess speech intelligibility in auditoria. In order to evaluate this method, a set of 14 auditorium conditions (plus 2 replicas) with various degrees of reverberation and/or interfering noise were subjected to: (1) RASTI measurements; (2) articulation tests performed by laboratories in 11 different countries; and (3) additional quality rating experiment by 4 of these laboratories. The various listening experiments show substantial differences in the ranking of the 14 conditions. For instance, it appears that the absence of a carrier phrase in some of the articulation tests has great influence on the relative importance of reverberation as compared to noise interference. When considering only the tests which use an appropriate carrier phrase (7 countries), it is found that the RASTI values are in good agreement with the mean results of these articulation tests.
two or more) speakers. These readings were recorded on audiotape and then digitized at 10 kHz, 16 bits/sample. The a.. input test data was generated...signal. The result that the Sl output is intelligible is expected because exciting the desired speech envelope with only random noise is known to...1015, 1982. "Methods for the Calculation of the Articulation Index", .S3.5, 1969. C.I. Berlin & M.R. McNeil, " Dichotic Listening", in oIssues in1= j
Chen, Fei; Loizou, Philipos C
Recent evidence suggests that spectral change, as measured by cochlea-scaled entropy (CSE), predicts speech intelligibility better than the information carried by vowels or consonants in sentences. Motivated by this finding, the present study investigates whether intelligibility indices implemented to include segments marked with significant spectral change better predict speech intelligibility in noise than measures that include all phonetic segments paying no attention to vowels/consonants or spectral change. The prediction of two intelligibility measures [normalized covariance measure (NCM), coherence-based speech intelligibility index (CSII)] is investigated using three sentence-segmentation methods: relative root-mean-square (RMS) levels, CSE, and traditional phonetic segmentation of obstruents and sonorants. While the CSE method makes no distinction between spectral changes occurring within vowels/consonants, the RMS-level segmentation method places more emphasis on the vowel-consonant boundaries wherein the spectral change is often most prominent, and perhaps most robust, in the presence of noise. Higher correlation with intelligibility scores was obtained when including sentence segments containing a large number of consonant-vowel boundaries than when including segments with highest entropy or segments based on obstruent/sonorant classification. These data suggest that in the context of intelligibility measures the type of spectral change captured by the measure is important.
Chen, Fei; Loizou, Philipos C.
Recent evidence suggests that spectral change, as measured by cochlea-scaled entropy (CSE), predicts speech intelligibility better than the information carried by vowels or consonants in sentences. Motivated by this finding, the present study investigates whether intelligibility indices implemented to include segments marked with significant spectral change better predict speech intelligibility in noise than measures that include all phonetic segments paying no attention to vowels/consonants or spectral change. The prediction of two intelligibility measures [normalized covariance measure (NCM), coherence-based speech intelligibility index (CSII)] is investigated using three sentence-segmentation methods: relative root-mean-square (RMS) levels, CSE, and traditional phonetic segmentation of obstruents and sonorants. While the CSE method makes no distinction between spectral changes occurring within vowels/consonants, the RMS-level segmentation method places more emphasis on the vowel-consonant boundaries wherein the spectral change is often most prominent, and perhaps most robust, in the presence of noise. Higher correlation with intelligibility scores was obtained when including sentence segments containing a large number of consonant-vowel boundaries than when including segments with highest entropy or segments based on obstruent/sonorant classification. These data suggest that in the context of intelligibility measures the type of spectral change captured by the measure is important. PMID:22559382
Chan, Jeffrey W.; Simpson, Carol A.
Active Noise Reduction (ANR) is a new technology which can reduce the level of aircraft cockpit noise that reaches the pilot's ear while simultaneously improving the signal to noise ratio for voice communications and other information bearing sound signals in the cockpit. A miniature, ear-cup mounted ANR system was tested to determine whether speech intelligibility is better for helicopter pilots using ANR compared to a control condition of ANR turned off. Two signal to noise ratios (S/N), representative of actual cockpit conditions, were used for the ratio of the speech to cockpit noise sound pressure levels. Speech intelligibility was significantly better with ANR compared to no ANR for both S/N conditions. Variability of speech intelligibility among pilots was also significantly less with ANR. When the stock helmet was used with ANR turned off, the average PB Word speech intelligibility score was below the Normally Acceptable level. In comparison, it was above that level with ANR on in both S/N levels.
Beutelmann, Rainer; Brand, Thomas
Binaural speech intelligibility of individual listeners under realistic conditions was predicted using a model consisting of a gammatone filter bank, an independent equalization-cancellation (EC) process in each frequency band, a gammatone resynthesis, and the speech intelligibility index (SII). Hearing loss was simulated by adding uncorrelated masking noises (according to the pure-tone audiogram) to the ear channels. Speech intelligibility measurements were carried out with 8 normal-hearing and 15 hearing-impaired listeners, collecting speech reception threshold (SRT) data for three different room acoustic conditions (anechoic, office room, cafeteria hall) and eight directions of a single noise source (speech in front). Artificial EC processing errors derived from binaural masking level difference data using pure tones were incorporated into the model. Except for an adjustment of the SII-to-intelligibility mapping function, no model parameter was fitted to the SRT data of this study. The overall correlation coefficient between predicted and observed SRTs was 0.95. The dependence of the SRT of an individual listener on the noise direction and on room acoustics was predicted with a median correlation coefficient of 0.91. The effect of individual hearing impairment was predicted with a median correlation coefficient of 0.95. However, for mild hearing losses the release from masking was overestimated.
Brons, Inge; Houben, Rolph; Dreschler, Wouter A
This study evaluates the perceptual effects of single-microphone noise reduction in hearing aids. Twenty subjects with moderate sensorineural hearing loss listened to speech in babble noise processed via noise reduction from three different linearly fitted hearing aids. Subjects performed (a) speech-intelligibility tests, (b) listening-effort ratings, and (c) paired-comparison ratings on noise annoyance, speech naturalness, and overall preference. The perceptual effects of noise reduction differ between hearing aids. The results agree well with those of normal-hearing listeners in a previous study. None of the noise-reduction algorithms improved speech intelligibility, but all reduced the annoyance of noise. The noise reduction that scored best with respect to noise annoyance and preference had the worst intelligibility scores. The trade-off between intelligibility and listening comfort shows that preference measurements might be useful in addition to intelligibility measurements in the selection of noise reduction. Additionally, this trade-off should be taken into consideration to create realistic expectations in hearing-aid users.
Dillier, Norbert; Lai, Wai Kong
The Nucleus(®) 5 System Sound Processor (CP810, Cochlear™, Macquarie University, NSW, Australia) contains two omnidirectional microphones. They can be configured as a fixed directional microphone combination (called Zoom) or as an adaptive beamformer (called Beam), which adjusts the directivity continuously to maximally reduce the interfering noise. Initial evaluation studies with the CP810 had compared performance and usability of the new processor in comparison with the Freedom™ Sound Processor (Cochlear™) for speech in quiet and noise for a subset of the processing options. This study compares the two processing options suggested to be used in noisy environments, Zoom and Beam, for various sound field conditions using a standardized speech in noise matrix test (Oldenburg sentences test). Nine German-speaking subjects who previously had been using the Freedom speech processor and subsequently were upgraded to the CP810 device participated in this series of additional evaluation tests. The speech reception threshold (SRT for 50% speech intelligibility in noise) was determined using sentences presented via loudspeaker at 65 dB SPL in front of the listener and noise presented either via the same loudspeaker (S0N0) or at 90 degrees at either the ear with the sound processor (S0NCI+) or the opposite unaided ear (S0NCI-). The fourth noise condition consisted of three uncorrelated noise sources placed at 90, 180 and 270 degrees. The noise level was adjusted through an adaptive procedure to yield a signal to noise ratio where 50% of the words in the sentences were correctly understood. In spatially separated speech and noise conditions both Zoom and Beam could improve the SRT significantly. For single noise sources, either ipsilateral or contralateral to the cochlear implant sound processor, average improvements with Beam of 12.9 and 7.9 dB in SRT were found. The average SRT of -8 dB for Beam in the diffuse noise condition (uncorrelated noise from both sides and
Zekveld, Adriana A; Rudner, Mary; Johnsrude, Ingrid S; Heslenfeld, Dirk J; Rönnberg, Jerker
Text cues facilitate the perception of spoken sentences to which they are semantically related (Zekveld, Rudner, et al., 2011). In this study, semantically related and unrelated cues preceding sentences evoked more activation in middle temporal gyrus (MTG) and inferior frontal gyrus (IFG) than nonword cues, regardless of acoustic quality (speech in noise or speech in quiet). Larger verbal working memory (WM) capacity (reading span) was associated with greater intelligibility benefit obtained from related cues, with less speech-related activation in the left superior temporal gyrus and left anterior IFG, and with more activation in right medial frontal cortex for related versus unrelated cues. Better ability to comprehend masked text was associated with greater ability to disregard unrelated cues, and with more activation in left angular gyrus (AG). We conclude that individual differences in cognitive abilities are related to activation in a speech-sensitive network including left MTG, IFG and AG during cued speech perception.
Shafiro, Valeriy; Sheft, Stanley; Risley, Robert
Perception of interrupted speech and the influence of speech materials and memory load were investigated using one or two concurrent square-wave gating functions. Sentences (Experiment 1) and random one-, three-, and five-word sequences (Experiment 2) were interrupted using either a primary gating rate alone (0.5−24 Hz) or a combined primary and faster secondary rate. The secondary rate interrupted only speech left intact after primary gating, reducing the original speech to 25%. In both experiments, intelligibility increased with primary rate, but varied with memory load and speech material (highest for sentences, lowest for five-word sequences). With dual-rate gating of sentences, intelligibility with fast secondary rates was superior to that with single rates and a 25% duty cycle, approaching that of single rates with a 50% duty cycle for some low and high rates. For dual-rate gating of words, the positive effect of fast secondary gating was smaller than for sentences, and the advantage of sentences over word-sequences was not obtained in many dual-rate conditions. These findings suggest that integration of interrupted speech fragments after gating depends on the duration of the gated speech interval and that sufficiently robust acoustic-phonetic word cues are needed to access higher-level contextual sentence information. PMID:21973362
Nogueira, Waldo; Rode, Thilo; Büchner, Andreas
Spectral smearing causes, at least partially, that cochlear implant (CI) users require a higher signal-to-noise ratio to obtain the same speech intelligibility as normal hearing listeners. A spectral contrast enhancement (SCE) algorithm has been designed and evaluated as an additional feature for a standard CI strategy. The algorithm keeps the most prominent peaks within a speech signal constant while attenuating valleys in the spectrum. The goal is to partly compensate for the spectral smearing produced by the limited number of stimulation electrodes and the overlap of electrical fields produced in CIs. Twelve CI users were tested for their speech reception threshold (SRT) using the standard CI coding strategy with and without SCE. No significant differences in SRT were observed between conditions. However, an analysis of the electrical stimulation patterns shows a reduction in stimulation current when using SCE. In a second evaluation, 12 CI users were tested in a similar configuration of the SCE strategy with the stimulation being balanced between the SCE and the non-SCE variants such that the loudness perception delivered by the strategies was the same. Results show a significant improvement in SRT of 0.57 dB (p < 0.0005) for the SCE algorithm.
Hakonen, Maria; May, Patrick J C; Alho, Jussi; Alku, Paavo; Jokinen, Emma; Jääskeläinen, Iiro P; Tiitinen, Hannu
Recent studies have shown that acoustically distorted sentences can be perceived as either unintelligible or intelligible depending on whether one has previously been exposed to the undistorted, intelligible versions of the sentences. This allows studying processes specifically related to speech intelligibility since any change between the responses to the distorted stimuli before and after the presentation of their undistorted counterparts cannot be attributed to acoustic variability but, rather, to the successful mapping of sensory information onto memory representations. To estimate how the complexity of the message is reflected in speech comprehension, we applied this rapid change in perception to behavioral and magnetoencephalography (MEG) experiments using vowels, words and sentences. In the experiments, stimuli were initially presented to the subject in a distorted form, after which undistorted versions of the stimuli were presented. Finally, the original distorted stimuli were presented once more. The resulting increase in intelligibility observed for the second presentation of the distorted stimuli depended on the complexity of the stimulus: vowels remained unintelligible (behaviorally measured intelligibility 27%) whereas the intelligibility of the words increased from 19% to 45% and that of the sentences from 31% to 65%. This increase in the intelligibility of the degraded stimuli was reflected as an enhancement of activity in the auditory cortex and surrounding areas at early latencies of 130-160ms. In the same regions, increasing stimulus complexity attenuated mean currents at latencies of 130-160ms whereas at latencies of 200-270ms the mean currents increased. These modulations in cortical activity may reflect feedback from top-down mechanisms enhancing the extraction of information from speech. The behavioral results suggest that memory-driven expectancies can have a significant effect on speech comprehension, especially in acoustically adverse
Tjaden, Kris; Kain, Alexander; Lam, Jennifer
Purpose: A speech analysis-resynthesis paradigm was used to investigate segmental and suprasegmental acoustic variables explaining intelligibility variation for 2 speakers with Parkinson's disease (PD). Method: Sentences were read in conversational and clear styles. Acoustic characteristics from clear sentences were extracted and applied to…
Reinhart, Paul N.; Souza, Pamela E.
Purpose: The purpose of this study was to examine the effects of varying wide dynamic range compression (WDRC) release time on intelligibility and clarity of reverberant speech. The study also considered the role of individual working memory. Method: Thirty older listeners with mild to moderately-severe sloping sensorineural hearing loss…
Van Esch, T E M; Dreschler, W A
The aim of the present study was to determine the relations between the intelligibility of speech in noise and measures of auditory resolution, loudness recruitment, and cognitive function. The analyses were based on data published earlier as part of the presentation of the Auditory Profile, a test battery implemented in four languages. Tests of the intelligibility of speech, resolution, loudness recruitment, and lexical decision making were measured using headphones in five centers: in Germany, the Netherlands, Sweden, and the United Kingdom. Correlations and stepwise linear regression models were calculated. In sum, 72 hearing-impaired listeners aged 22 to 91 years with a broad range of hearing losses were included in the study. Several significant correlations were found with the intelligibility of speech in noise. Stepwise linear regression analyses showed that pure-tone average, age, spectral and temporal resolution, and loudness recruitment were significant predictors of the intelligibility of speech in fluctuating noise. Complex interrelationships between auditory factors and the intelligibility of speech in noise were revealed using the Auditory Profile data set in four languages. After taking into account the effects of pure-tone average and age, spectral and temporal resolution and loudness recruitment had an added value in the prediction of variation among listeners with respect to the intelligibility of speech in noise. The results of the lexical decision making test were not related to the intelligibility of speech in noise, in the population studied.
Lopez-Poveda, Enrique A; Eustaquio-Martín, Almudena; Stohl, Joshua S; Wolford, Robert D; Schatzer, Reinhold; Gorospe, José M; Ruiz, Santiago Santa Cruz; Benito, Fernando; Wilson, Blake S
We have recently proposed a binaural cochlear implant (CI) sound processing strategy inspired by the contralateral medial olivocochlear reflex (the MOC strategy) and shown that it improves intelligibility in steady-state noise (Lopez-Poveda et al., 2016, Ear Hear 37:e138-e148). The aim here was to evaluate possible speech-reception benefits of the MOC strategy for speech maskers, a more natural type of interferer. Speech reception thresholds (SRTs) were measured in six bilateral and two single-sided deaf CI users with the MOC strategy and with a standard (STD) strategy. SRTs were measured in unilateral and bilateral listening conditions, and for target and masker stimuli located at azimuthal angles of (0°, 0°), (-15°, +15°), and (-90°, +90°). Mean SRTs were 2-5 dB better with the MOC than with the STD strategy for spatially separated target and masker sources. For bilateral CI users, the MOC strategy (1) facilitated the intelligibility of speech in competition with spatially separated speech maskers in both unilateral and bilateral listening conditions; and (2) led to an overall improvement in spatial release from masking in the two listening conditions. Insofar as speech is a more natural type of interferer than steady-state noise, the present results suggest that the MOC strategy holds potential for promising outcomes for CI users.
Khing, Phyu P.; Swanson, Brett A.; Ambikairajah, Eliathamby
Nucleus cochlear implant systems incorporate a fast-acting front-end automatic gain control (AGC), sometimes called a compression limiter. The objective of the present study was to determine the effect of replacing the front-end compression limiter with a newly proposed envelope profile limiter. A secondary objective was to investigate the effect of AGC speed on cochlear implant speech intelligibility. The envelope profile limiter was located after the filter bank and reduced the gain when the largest of the filter bank envelopes exceeded the compression threshold. The compression threshold was set equal to the saturation level of the loudness growth function (i.e. the envelope level that mapped to the maximum comfortable current level), ensuring that no envelope clipping occurred. To preserve the spectral profile, the same gain was applied to all channels. Experiment 1 compared sentence recognition with the front-end limiter and with the envelope profile limiter, each with two release times (75 and 625 ms). Six implant recipients were tested in quiet and in four-talker babble noise, at a high presentation level of 89 dB SPL. Overall, release time had a larger effect than the AGC type. With both AGC types, speech intelligibility was lower for the 75 ms release time than for the 625 ms release time. With the shorter release time, the envelope profile limiter provided higher group mean scores than the front-end limiter in quiet, but there was no significant difference in noise. Experiment 2 measured sentence recognition in noise as a function of presentation level, from 55 to 89 dB SPL. The envelope profile limiter with 625 ms release time yielded better scores than the front-end limiter with 75 ms release time. A take-home study showed no clear pattern of preferences. It is concluded that the envelope profile limiter is a feasible alternative to a front-end compression limiter. PMID:24312408
Liu, Huei-Mei; Tsao, Feng-Ming; Kuhl, Patricia K.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /eye/, /aye/, and /you/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners. .
Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354
Jenstad, Lorienne M; Souza, Pamela E
Compression hearing aids have the inherent, and often adjustable, feature of release time from compression. Research to date does not provide a consensus on how to choose or set release time. The current study had 2 purposes: (a) a comprehensive evaluation of the acoustic effects of release time for a single-channel compression system in quiet and (b) an evaluation of the relation between the acoustic changes and speech recognition. The release times under study were 12, 100, and 800 ms. All of the stimuli were VC syllables from the Nonsense Syllable Task spoken by a female talker. The stimuli were processed through a hearing aid simulator at 3 input levels. Two acoustic measures were made on individual syllables: the envelope-difference index and CV ratio. These measurements allowed for quantification of the short-term amplitude characteristics of the speech signal and the changes to these amplitude characteristics caused by compression. The acoustic analyses revealed statistically significant effects among the 3 release times. The size of the effect was dependent on characteristics of the phoneme. Twelve listeners with moderate sensorineural hearing loss were tested for their speech recognition for the same stimuli. Although release time for this single-channel, 3:1 compression ratio system did not directly predict overall intelligibility for these nonsense syllables in quiet, the acoustic measurements reflecting the changes due to release time were significant predictors of phoneme recognition. Increased temporal-envelope distortion was predictive of reduced recognition for some individual phonemes, which is consistent with previous research on the importance of relative amplitude as a cue to syllable recognition for some phonemes.
Kim, Gibak; Loizou, Philipos C
Most noise-reduction algorithms used in hearing aids apply a gain to the noisy envelopes to reduce noise interference. The present study assesses the impact of two types of speech distortion introduced by noise-suppressive gain functions: amplification distortion occurring when the amplitude of the target signal is over-estimated, and attenuation distortion occurring when the target amplitude is under-estimated. Sentences corrupted by steady noise and competing talker were processed through a noise-reduction algorithm and synthesized to contain either amplification distortion, attenuation distortion or both. The attenuation distortion was found to have a minimal effect on speech intelligibility. In fact, substantial improvements (>80 percentage points) in intelligibility, relative to noise-corrupted speech, were obtained when the processed sentences contained only attenuation distortion. When the amplification distortion was limited to be smaller than 6 dB, performance was nearly unaffected in the steady-noise conditions, but was severely degraded in the competing-talker conditions. Overall, the present data suggest that one reason that existing algorithms do not improve speech intelligibility is because they allow amplification distortions in excess of 6 dB. These distortions are shown in this study to be always associated with masker-dominated envelopes and should thus be eliminated.
The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.
The physical characteristics of speech, the methods of speech masking measurement, and the effects of noise on speech communication are investigated. Topics include the speech signal and intelligibility, the effects of noise on intelligibility, the articulation index, and various devices for evaluating speech systems.
Zamaninezhad, Ladan; Hohmann, Volker; Büchner, Andreas; Schädler, Marc René; Jürgens, Tim
This study introduces a speech intelligibility model for cochlear implant users with ipsilateral preserved acoustic hearing that aims at simulating the observed speech-in-noise intelligibility benefit when receiving simultaneous electric and acoustic stimulation (EA-benefit). The model simulates the auditory nerve spiking in response to electric and/or acoustic stimulation. The temporally and spatially integrated spiking patterns were used as the final internal representation of noisy speech. Speech reception thresholds (SRTs) in stationary noise were predicted for a sentence test using an automatic speech recognition framework. The model was employed to systematically investigate the effect of three physiologically relevant model factors on simulated SRTs: (1) the spatial spread of the electric field which co-varies with the number of electrically stimulated auditory nerves, (2) the "internal" noise simulating the deprivation of auditory system, and (3) the upper bound frequency limit of acoustic hearing. The model results show that the simulated SRTs increase monotonically with increasing spatial spread for fixed internal noise, and also increase with increasing the internal noise strength for a fixed spatial spread. The predicted EA-benefit does not follow such a systematic trend and depends on the specific combination of the model parameters. Beyond 300 Hz, the upper bound limit for preserved acoustic hearing is less influential on speech intelligibility of EA-listeners in stationary noise. The proposed model-predicted EA-benefits are within the range of EA-benefits shown by 18 out of 21 actual cochlear implant listeners with preserved acoustic hearing.
MacPherson, Alexandra; Akeroyd, Michael A
Although many studies have looked at the effects of different listening conditions on the intelligibility of speech, their analyses have often concentrated on changes to a single value on the psychometric function, namely, the threshold. Far less commonly has the slope of the psychometric function, that is, the rate at which intelligibility changes with level, been considered. The slope of the function is crucial because it is the slope, rather than the threshold, that determines the improvement in intelligibility caused by any given improvement in signal-to-noise ratio by, for instance, a hearing aid. The aim of the current study was to systematically survey and reanalyze the psychometric function data available in the literature in an attempt to quantify the range of slope changes across studies and to identify listening conditions that affect the slope of the psychometric function. The data for 885 individual psychometric functions, taken from 139 different studies, were fitted with a common logistic equation from which the slope was calculated. Large variations in slope across studies were found, with slope values ranging from as shallow as 1% per dB to as steep as 44% per dB (median = 6.6% per dB), suggesting that the perceptual benefit offered by an improvement in signal-to-noise ratio depends greatly on listening environment. The type and number of maskers used were found to be major factors on the value of the slope of the psychometric function while other minor effects of target predictability, target corpus, and target/masker similarity were also found.
Akeroyd, Michael A.
Although many studies have looked at the effects of different listening conditions on the intelligibility of speech, their analyses have often concentrated on changes to a single value on the psychometric function, namely, the threshold. Far less commonly has the slope of the psychometric function, that is, the rate at which intelligibility changes with level, been considered. The slope of the function is crucial because it is the slope, rather than the threshold, that determines the improvement in intelligibility caused by any given improvement in signal-to-noise ratio by, for instance, a hearing aid. The aim of the current study was to systematically survey and reanalyze the psychometric function data available in the literature in an attempt to quantify the range of slope changes across studies and to identify listening conditions that affect the slope of the psychometric function. The data for 885 individual psychometric functions, taken from 139 different studies, were fitted with a common logistic equation from which the slope was calculated. Large variations in slope across studies were found, with slope values ranging from as shallow as 1% per dB to as steep as 44% per dB (median = 6.6% per dB), suggesting that the perceptual benefit offered by an improvement in signal-to-noise ratio depends greatly on listening environment. The type and number of maskers used were found to be major factors on the value of the slope of the psychometric function while other minor effects of target predictability, target corpus, and target/masker similarity were also found. PMID:24906905
Adi-Bensaid, Limor; Michael, Rinat; Most, Tova; Gali-Cinamon, Rachel
This study examined the parental and spousal self-efficacy (SE) of adults who are deaf and who are hard of hearing (d/hh) in relation to their speech intelligibility. Forty individuals with hearing loss completed self-report measures: Spousal SE in a relationship with a spouse who was hearing/deaf, parental SE to a child who was hearing/deaf, and…
Khaiyat, Sami Abdulrahman
Three different methods are used to assess speech intelligibility in spaces with and without concave sound reflecting surfaces: calculated articulation index (AI), measured rapid speech transmission index (RASTI), and modified rhyme tests (MRT) with occupants. Factors such as the room size, size of curvature, the speaker's position, and the background noise level are considered in the two on -site testing methods. The MRT results show unexpectedly significant deviation from results obtained through the other methods such that they are de-emphasized in all discussions. Results from rooms without curvatures show no significant differences between the AI and RASTI values; whereas, these differences are significant when rooms with curvatures are considered. A modification factor to be subtracted from calculated AI values to account for erosional effects of the curved surfaces is developed according to further analysis of the differences between the AI and RASTI values. The magnitude of the modification factors depends on all the above factors as well as the location of the listeners within the room. There are no clear indications of any dead spots, however, the sound foci from both the 2ft. and 8ft. curvatures have caused certain group locations to have smaller modification factors than that of all other locations. The magnitude of the developed modification factors ranges between 0.01, for the 16ft. curvature in the small rooms, to 0.17, for the 8ft. curvature in the large room with NC-45 and the speaker's position is on the center. This range is of almost the same magnitude as that of the erosional corrections to calculated AI due to elevated reverberation time. This range is also of almost same magnitude as that of improvement in calculated AI due to presence of visual cues.
Palmiero, Andrew J; Symons, Daniel; Morgan, Judge W; Shaffer, Ronald E
Speech Intelligibility (SI) is the perceived quality of sound transmission. In healthcare settings, the ability to communicate clearly with coworkers, patients, etc., is crucial to quality patient care and safety. The objectives of this study were to: (1) assess the suitability of the Speech Transmission Index (STI) methods for testing reusable and disposable facial and respiratory personal protective equipment (protective facemasks [PF], N95 filtering facepiece respirators [N95 FFR], and elastomeric half-mask air-purifying respirators [EAPR]) commonly worn by healthcare workers; (2) quantify STI levels of these devices; and (3) contribute to the scientific body of knowledge in the area of SI. SI was assessed using the STI under two experimental conditions: (1) a modified version of the National Fire Protection Association 1981 Supplementary Voice Communications System Performance Test at a Signal to Noise Ratio (SNR) of -15 (66 dBA) and (2) STI measurements utilizing a range of modified pink noise levels (52.5 dBA (-2 SNR) - 72.5 dBA (+7 SNR)) in 5.0 dBA increments. The PF models (Kimberly Clark 49214 and 3 M 1818) had the least effect on SI interference, typically deviating from the STI baseline (no-mask condition) by 3% and 4% STI, respectively. The N95FFR (3 M 1870, 3 M 1860) had more effect on SI interference, typically differing from baseline by 13% and 17%, respectively, for models tested. The EAPR models (Scott Xcel and North 5500) had the most significant impact on SI, differing from baseline by 42% for models tested. This data offers insight into the performance of these apparatus with respect to STI and may serve as a reference point for future respirator design considerations, standards development, testing and certification activities.
Palmiero, Andrew J.; Symons, Daniel; Morgan, Judge W.; Shaffer, Ronald E.
Speech Intelligibility (SI) is the perceived quality of sound transmission. In healthcare settings, the ability to communicate clearly with coworkers, patients, etc. is crucial to quality patient care and safety. The objectives of this study were to 1) assess the suitability of the Speech Transmission Index (STI) methods for testing reusable and disposable facial and respiratory personal protective equipment (protective facemasks [PF], N95 filtering facepiece respirators [N95 FFR], and elastomeric half-mask air-purifying respirators [EAPR]) commonly worn by healthcare workers, 2) quantify STI levels of these devices, and 3) contribute to the scientific body of knowledge in the area of SI. SI was assessed using the STI under two experimental conditions: 1) a modified version of the National Fire Protection Association 1981 Supplementary Voice Communications System Performance Test at a Signal to Noise Ratio (SNR) of −15 (66 dBA) and 2) STI measurements utilizing a range of modified pink noise levels (52.5 dBA (−2 SNR) − 72.5 dBA (+7 SNR)) in 5.0 dBA increments. The PF models (Kimberly Clark 49214 and 3M 1818) had the least effect on SI interference, typically deviating from the STI baseline (no-mask condition) by 3% and 4% STI, respectively. The N95FFR (3M 1870, 3M 1860) had more effect on SI interference, typically differing from baseline by 13% and 17%, respectively for models tested. The EAPR models (Scott Xcel and North 5500) had the most significant impact on SI, differing from baseline by 42% for models tested. This data offers insight into the performance of these apparatus with respect to STI and may serve as a reference point for future respirator design considerations, standards development, testing and certification activities. PMID:27362358
An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040
Vahanesa, Chetan; Reddy, Chandan K A; Panahi, Issa M S
Functional Magnetic Resonance Imaging (fMRI) is used in many diagnostic procedures for neurological related disorders. Strong broadband acoustic noise generated during fMRI scan interferes with the speech communication between the physician and the patient. In this paper, we propose a single microphone Speech Enhancement (SE) technique which is based on the supervised machine learning technique and a statistical model based SE technique. The proposed algorithm is robust and computationally efficient and has capability to run in real-time. Objective and Subjective evaluations show that the proposed SE method outperforms the existing state-of-the-art algorithms in terms of quality and intelligibility of the recovered speech at low Signal to Noise Ratios (SNRs).
Samango-Sprouse, Carole; Lawson, Patrick; Sprouse, Courtney; Stapleton, Emily; Sadeghin, Teresa; Gropman, Andrea
Kleefstra syndrome (KS) is a rare neurogenetic disorder most commonly caused by deletion in the 9q34.3 chromosomal region and is associated with intellectual disabilities, severe speech delay, and motor planning deficits. To our knowledge, this is the first patient (PQ, a 6-year-old female) with a 9q34.3 deletion who has near normal intelligence, and developmental dyspraxia with childhood apraxia of speech (CAS). At 6, the Wechsler Preschool and Primary Intelligence testing (WPPSI-III) revealed a Verbal IQ of 81 and Performance IQ of 79. The Beery Buktenica Test of Visual Motor Integration, 5th Edition (VMI) indicated severe visual motor deficits: VMI = 51; Visual Perception = 48; Motor Coordination < 45. On the Receptive One Word Picture Vocabulary Test-R (ROWPVT-R), she had standard scores of 96 and 99 in contrast to an Expressive One Word Picture Vocabulary-R (EOWPVT-R) standard scores of 73 and 82, revealing a discrepancy in vocabulary domains on both evaluations. Preschool Language Scale-4 (PLS-4) on PQ's first evaluation reveals a significant difference between auditory comprehension and expressive communication with standard scores of 78 and 57, respectively, further supporting the presence of CAS. This patient's near normal intelligence expands the phenotypic profile as well as the prognosis associated with KS. The identification of CAS in this patient provides a novel explanation for the previously reported speech delay and expressive language disorder. Further research is warranted on the impact of CAS on intelligence and behavioral outcome in KS. Therapeutic and prognostic implications are discussed.
Rashid, Marya Sheikh; Leensen, Monique C.J.; Dreschler, Wouter A.
Objective: The objective was to describe the speech intelligibility in noise test results among Dutch teenagers and young adults aged 12–24 years, using a national online speech reception threshold (SRT) test, the Earcheck. A secondary objective was to assess the effect of age and gender on speech intelligibility in noise. Design: Cross-sectional SRT data were collected over a 5-year period (2010–2014), from participants of Earcheck. Regression analyses were performed, with SRT as the dependent variable, and age and gender as explaining variables. To cross-validate the model, data from 12- to 24-year olds from the same test distributed by a hearing aid dispenser (Hoorscan) were used. Results: In total, 96,803 valid test results were analyzed. The mean SRT score was −18.3 dB signal-to-noise ratio (SNR) (standard deviation (SD) = 3.7). Twenty-five percent of the scores was rated as insufficient or poor. SRT performance significantly improved with increasing age for teenagers aged 12–18 years by 0.49 dB SNR per age-year. A smaller age-effect (0.09 dB SNR per age-year) was found for young adults aged 19–24 years. Small differences between male and female users were found. Conclusion: Earcheck generated large quantities of national SRT data. The data implied that a substantial number of users of Earcheck may have some difficulty in understanding speech in noise. Furthermore, the results of this study showed an effect of gender and age on SRT performance, suggesting an ongoing maturation of speech-in-noise performance into late adolescence. This suggests the use of age-dependent reference values, but for this purpose, more research is required. PMID:27991462
Purpose: Temporal interruption limits the perception of speech to isolated temporal glimpses. An analysis was conducted to determine the acoustic parameter that best predicts speech recognition from temporal fragments that preserve different types of speech information--namely, consonants and vowels. Method: Young listeners with normal hearing…
Robles-Bykbaev, Vladimir; López-Nores, Martín; Pazos-Arias, José; Quisi-Peralta, Diego; García-Duque, Jorge
The language and communication constitute the development mainstays of several intellectual and cognitive skills in humans. However, there are millions of people around the world who suffer from several disabilities and disorders related with language and communication, while most of the countries present a lack of corresponding services related with health care and rehabilitation. On these grounds, we are working to develop an ecosystem of intelligent ICT tools to support speech and language pathologists, doctors, students, patients and their relatives. This ecosystem has several layers and components, integrating Electronic Health Records management, standardized vocabularies, a knowledge database, an ontology of concepts from the speech-language domain, and an expert system. We discuss the advantages of such an approach through experiments carried out in several institutions assisting children with a wide spectrum of disabilities.
Völker, Christoph; Warzybok, Anna; Ernst, Stephan M A
A comprehensive evaluation of eight signal pre-processing strategies, including directional microphones, coherence filters, single-channel noise reduction, binaural beamformers, and their combinations, was undertaken with normal-hearing (NH) and hearing-impaired (HI) listeners. Speech reception thresholds (SRTs) were measured in three noise scenarios (multitalker babble, cafeteria noise, and single competing talker). Predictions of three common instrumental measures were compared with the general perceptual benefit caused by the algorithms. The individual SRTs measured without pre-processing and individual benefits were objectively estimated using the binaural speech intelligibility model. Ten listeners with NH and 12 HI listeners participated. The participants varied in age and pure-tone threshold levels. Although HI listeners required a better signal-to-noise ratio to obtain 50% intelligibility than listeners with NH, no differences in SRT benefit from the different algorithms were found between the two groups. With the exception of single-channel noise reduction, all algorithms showed an improvement in SRT of between 2.1 dB (in cafeteria noise) and 4.8 dB (in single competing talker condition). Model predictions with binaural speech intelligibility model explained 83% of the measured variance of the individual SRTs in the no pre-processing condition. Regarding the benefit from the algorithms, the instrumental measures were not able to predict the perceptual data in all tested noise conditions. The comparable benefit observed for both groups suggests a possible application of noise reduction schemes for listeners with different hearing status. Although the model can predict the individual SRTs without pre-processing, further development is necessary to predict the benefits obtained from the algorithms at an individual level.
Bernstein, Joshua G W; Grant, Ken W
Speech intelligibility for audio-alone and audiovisual (AV) sentences was estimated as a function of signal-to-noise ratio (SNR) for a female target talker presented in a stationary noise, an interfering male talker, or a speech-modulated noise background, for eight hearing-impaired (HI) and five normal-hearing (NH) listeners. At the 50% keywords-correct performance level, HI listeners showed 7-12 dB less fluctuating-masker benefit (FMB) than NH listeners, consistent with previous results. Both groups showed significantly more FMB under AV than audio-alone conditions. When compared at the same stationary-noise SNR, FMB differences between listener groups and modalities were substantially smaller, suggesting that most of the FMB differences at the 50% performance level may reflect a SNR dependence of the FMB. Still, 1-5 dB of the FMB difference between listener groups remained, indicating a possible role for reduced audibility, limited spectral or temporal resolution, or an inability to use auditory source-segregation cues, in directly limiting the ability to listen in the dips of a fluctuating masker. A modified version of the extended speech-intelligibility index that predicts a larger FMB at less favorable SNRs accounted for most of the FMB differences between listener groups and modalities. Overall, these data suggest that HI listeners retain more of an ability to listen in the dips of a fluctuating masker than previously thought. Instead, the fluctuating-masker difficulties exhibited by HI listeners may derive from the reduced FMB associated with the more favorable SNRs they require to identify a reasonable proportion of the target speech.
this study: the analog formant frequency synthesis technique. A second definition of "synthetic" speech is related to basic data sampling theory...Analog formant frequency synthesis is a typical synthetic speech methodology, used here as an illustration of the technique. The waveform encoding and...reconstruction technique (discussed above) is similar to a "photograph" of speech. Analog formant frequency synthesis is more like an artist’s
Mehraei, Golbarg; Gallun, Frederick J.; Leek, Marjorie R.; Bernstein, Joshua G. W.
Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4–32 Hz), spectral ripple density [0.5–4 cycles/octave (c/o)] and carrier center frequency (500–4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4–12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements. PMID:24993215
Mehraei, Golbarg; Gallun, Frederick J; Leek, Marjorie R; Bernstein, Joshua G W
Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4-32 Hz), spectral ripple density [0.5-4 cycles/octave (c/o)] and carrier center frequency (500-4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4-12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements.
Stone, Michael A.; Moore, Brian C. J.
Using a ``noise-vocoder'' cochlear implant simulator [Shannon et al., Science 270, 303-304 (1995)], the effect of the speed of dynamic range compression on speech intelligibility was assessed, using normal-hearing subjects. The target speech had a level 5 dB above that of the competing speech. Initially, baseline performance was measured with no compression active, using between 4 and 16 processing channels. Then, performance was measured using a fast-acting compressor and a slow-acting compressor, each operating prior to the vocoder simulation. The fast system produced significant gain variation over syllabic timescales. The slow system produced significant gain variation only over the timescale of sentences. With no compression active, about six channels were necessary to achieve 50% correct identification of words in sentences. Sixteen channels produced near-maximum performance. Slow-acting compression produced no significant degradation relative to the baseline. However, fast-acting compression consistently reduced performance relative to that for the baseline, over a wide range of performance levels. It is suggested that fast-acting compression degrades performance for two reasons: (1) because it introduces correlated fluctuations in amplitude in different frequency bands, which tends to produce perceptual fusion of the target and background sounds and (2) because it reduces amplitude modulation depth and intensity contrasts.
Roman, Nicoleta; Woodruff, John
Ideal binary masking is a signal processing technique that separates a desired signal from a mixture by retaining only the time-frequency units where the signal-to-noise ratio (SNR) exceeds a predetermined threshold. In reverberant conditions there are multiple possible definitions of the ideal binary mask in that one may choose to treat the target early reflections as either desired signal or noise. The ideal binary mask may therefore be parameterized by the reflection boundary, a predetermined division point between early and late reflections. Another important parameter is the local SNR threshold used in labeling the time-frequency units as either target or background. Two experiments were designed to assess the impact of these two parameters on speech intelligibility with ideal binary masking for normal-hearing listeners in reverberant conditions. Experiment 1 shows that in order to achieve intelligibility improvements only the early reflections should be preserved by the binary mask. Moreover, it shows that the effective SNR should be accounted for when deciding the local threshold optimal range. Experiment 2 shows that with long reverberation times, intelligibility improvements are only obtained when the reflection boundary is 100 ms or less. Also, the experiment suggests that binary masking can be used for dereverberation.
Stiles, Derek J.; Bentler, Ruth A.; McGregor, Karla K.
Purpose: To determine whether a clinically obtainable measure of audibility, the aided Speech Intelligibility Index (SII; American National Standards Institute, 2007), is more sensitive than the pure-tone average (PTA) at predicting the lexical abilities of children who wear hearing aids (CHA). Method: School-age CHA and age-matched children with…
Törnqvist, Anna Lena; Schalén, Lucyna; Rehncrona, Stig
We evaluated the effects of different electrical parameter settings on the intelligibility of speech in patients with Parkinson's disease (PD) bilaterally treated with deep brain stimulation (DBS) in the subthalamic nucleus (STN). Ten patients treated with DBS for 15 +/- 5 months (mean, SD) with significant (P < 0.01) symptom reduction (Unified Parkinson's Disease Rating Scale III) were included. In the medication off condition, video laryngostroboscopy was performed and then, in random order, 11 DBS parameter settings were tested. Amplitude was increased and decreased by 25%, frequency was varied in the range 70 to 185 pps, and each of the contacts was tested separately as a cathode. The patients read a standard running text and five nonsense sentences per setting. A listener panel transcribed the nonsense sentences as perceived and valued the quality of speech on a visual analogue scale. With the patients' normally used settings, there was no significant (P = 0.058) group difference between DBS OFF and ON, but in four patients the intelligibility deteriorated with DBS ON. The higher frequencies or increased amplitude caused significant (P < 0.02) impairments of intelligibility, whereas changing the polarity between the separate contacts did not. The settings of amplitude and frequency have a major influence on the intelligibility of speech, emphasizing the importance of meticulous parameter adjustments when programming DBS to minimize side effects related to speech.
Kloiber, Diana True; Ertmer, David J.
Purpose: Assessments of the intelligibility of speech produced by children who are deaf or hard of hearing (D/HH) provide unique insights into functional speaking ability, readiness for mainstream classroom placements, and intervention effectiveness. The development of sentence lists for a wide age range of children and the advent of handheld…
This study examined the sense of coherence and loneliness of 19 children aged 12-14 years with severe to profound hearing loss. These feelings and their interrelations with speech intelligibility (SI) were examined in 2 settings: in special classes within regular schools (group inclusion) or individuals integrated into regular classes (individual…
Johannesen, Peter T.; Pérez-González, Patricia; Kalluri, Sridhar; Blanco, José L.
The aim of this study was to assess the relative importance of cochlear mechanical dysfunction, temporal processing deficits, and age on the ability of hearing-impaired listeners to understand speech in noisy backgrounds. Sixty-eight listeners took part in the study. They were provided with linear, frequency-specific amplification to compensate for their audiometric losses, and intelligibility was assessed for speech-shaped noise (SSN) and a time-reversed two-talker masker (R2TM). Behavioral estimates of cochlear gain loss and residual compression were available from a previous study and were used as indicators of cochlear mechanical dysfunction. Temporal processing abilities were assessed using frequency modulation detection thresholds. Age, audiometric thresholds, and the difference between audiometric threshold and cochlear gain loss were also included in the analyses. Stepwise multiple linear regression models were used to assess the relative importance of the various factors for intelligibility. Results showed that (a) cochlear gain loss was unrelated to intelligibility, (b) residual cochlear compression was related to intelligibility in SSN but not in a R2TM, (c) temporal processing was strongly related to intelligibility in a R2TM and much less so in SSN, and (d) age per se impaired intelligibility. In summary, all factors affected intelligibility, but their relative importance varied across maskers. PMID:27604779
Van Lierde, K M; Browaeys, H; Corthals, P; Matthys, C; Mussche, P; Van Kerckhove, E; De Bruyn, H
The purpose of this case control study is to determine the impact of screw-retained fixed cross-arch prostheses, supported by four osseointegrated implants, on articulation and oromyofunctional behaviour. Objective (acoustic analysis) and subjective assessment techniques were used to determine the overall intelligibility, phonetic characteristics and oromyofunctional behaviour at an average period of 7.3 months after placement of the fixed implant prosthesis in 15 patients and 9 age-matched controls with intact dentition and without prosthetic appliances. Overall satisfaction with the prosthesis was 87%, but 53% of the subjects mentioned an impact on speech. 87% of the subjects presented with one or more distortions of the consonants. The most common distortions were distortions of the sound /s/ (sigmatismus simplex, 40% and sigmatismus stridens, 33%), simplex /z/ (27%), insufficient frication of /f/ (20%), /[symbol in text]/ (20%), addental production of /d/ (20%), /t/ (20%) or /s/ sound (13%) and devoiced /d/ (7%). In the control group, no articulation disorders were noted. Oromyofunctional behaviour in both groups was normal. To what extent motor-oriented speech therapy (with focus on tongue function) immediately after periodontal treatment (after wound healing) would decrease the persistent phonetic distortions is a subject for further research.
Kloiber, Diana True
Purpose Assessments of the intelligibility of speech produced by children who are deaf or hard of hearing (D/HH) provide unique insights into functional speaking ability, readiness for mainstream classroom placements, and intervention effectiveness. The development of sentence lists for a wide age range of children and the advent of handheld digital recording devices have overcome two barriers to routine use of this tool. Yet, difficulties in recruiting adequate numbers of adults to judge speech samples continue to make routine assessment impractical. In response to this barrier, it has been proposed that children who are 9 years or older might be adequate substitutes for adult listener-judges (Ertmer, 2011). Method To examine this possibility, 22 children from the 3rd, 4th, and 5th grades identified words from speech samples previously judged by adults. Results Children in the 3rd and 4th grades identified fewer words than adults, whereas scores for 5th graders were not significantly different from those of the adults. All grade levels showed increasing scores across low, mid, and high levels of intelligibility. Conclusions Children who are functioning at a 5th grade level or higher can act as listener-judges in speech intelligibility assessments. Suggestions for implementing assessments and scoring child-listeners' written responses are discussed. PMID:25381439
Weismer, Gary; Laures, Jacqueline S.
This study applied four direct magnitude estimation (DME) standards to the evaluation of speech from four individuals with dysarthria and three neurologically normal speakers. It found a fixed set of sentence-level utterances was scaled differently depending on the specific standard used. Results are discussed in terms of possible standardization…
Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P; Gao, Jia-Hong
How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca's and Wernicke's areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension.
Gabrielsson, Alf; And Others
Twelve hearing-impaired and eight normal-hearing adults listened to speech and music programs that were reproduced using five different frequency responses (one flat, the others combinations of reduced lower frequencies and/or increased higher frequencies). Most preferred was a flat response at lower frequencies and a 6dB/octave increase…
means was estimated at 9% (table 3.2). The estimated sample size for a =.0 and $=.I is thus six assuming a student t-distribution for the data ( Ostle ...Status Report on Speech Research, 38, 169-190, New Haven, CT: Haskins Laboratories. ’ Ostle , B. (1963). "Statistics in Research," Iowa State Univ
Rong, Panying; Loucks, Torrey; Kim, Heejin; Hasegawa-Johnson, Mark
A multimodal approach combining acoustics, intelligibility ratings, articulography and surface electromyography was used to examine the characteristics of dysarthria due to cerebral palsy (CP). CV syllables were studied by obtaining the slope of F2 transition during the diphthong, tongue-jaw kinematics during the release of the onset consonant,…
An Ambient Intelligence Environment is meant to sense and respond to the presence of people, using its embedded technology. In order to effectively sense the activities and intentions of its inhabitants, such an environment needs to utilize information captured from multiple sensors and modalities. By doing so, the interaction becomes more natural…
Coleman, R. F.; Hollien, H.
Underwater intelligibility of three standard word lists is evaluated in two experiments. Results indicate that words which are equated for difficulty in normal conditions are likewise equated under water. Phoneme distortion was examined in a multiple choice test which showed fricatives and place of production to be most affected under water. (SC)
Van Lierde, K M; Mortier, G; Huysman, E; Vermeersch, H
The purpose of the present case study was to determine the long-term impact of partial glossectomy (using the keyhole technique) on overall speech intelligibility and articulation in a Dutch-speaking child with Beckwith-Wiedemann syndrome (BWS). Furthermore the present study is meant as a contribution to the further delineation of the phonation, resonance, articulation and language characteristics and oral behaviour in a child with BWS. Detailed information on the speech and language characteristics of children with BWS may lead to better guidance of pediatric management programs. The child's speech was assessed 9 years after partial glossectomy with regard to ENT characteristics, overall intelligibility (perceptual consensus evaluation), articulation (phonetic and phonological errors), voice (videostroboscopy, vocal quality), resonance (perceptual, nasometric assessment), language (expressive and receptive) and oral behaviour. A class III malocclusion, an anterior open bite, diastema, overangulation of lower incisors and an enlarged but normal symmetric shaped tongue were present. The overall speech intelligibility improved from severely impaired (presurgical) to slightly impaired (5 months post-glossectomy) to normal (9 years postoperative). Comparative phonetic inventory showed a remarkable improvement of articulation. Nine years post-glossectomy three types of distortions seemed to predominate: a rhotacism and sigmatism and the substitution of the alveolar /z/. Oral behaviour, vocal characteristics and resonance were normal, but problems with expressive syntactic abilities were present. The long-term impact of partial glossectomy, using the keyhole technique (preserving the vascularity and the nervous input of the remaining intrinsic tongue muscles), on speech intelligibility, articulation, and oral behaviour in this Dutch-speaking child with congenital macroglossia can be regarded as successful. It is not clear how these expressive syntactical problems
Kressner, Abigail A; Westermann, Adam; Buchholz, Jörg M; Rozell, Christopher J
It has been shown that intelligibility can be improved for cochlear implant (CI) recipients with the ideal binary mask (IBM). In realistic scenarios where prior information is unavailable, however, the IBM must be estimated, and these estimations will inevitably contain errors. Although the effects of both unstructured and structured binary mask errors have been investigated with normal-hearing (NH) listeners, they have not been investigated with CI recipients. This study assesses these effects with CI recipients using masks that have been generated systematically with a statistical model. The results demonstrate that clustering of mask errors substantially decreases the tolerance of errors, that incorrectly removing target-dominated regions can be as detrimental to intelligibility as incorrectly adding interferer-dominated regions, and that the individual tolerances of the different types of errors can change when both are present. These trends follow those of NH listeners. However, analysis with a mixed effects model suggests that CI recipients tend to be less tolerant than NH listeners to mask errors in most conditions, at least with respect to the testing methods in each of the studies. This study clearly demonstrates that structure influences the tolerance of errors and therefore should be considered when analyzing binary-masking algorithms.
Yang, L; Shield, B M
Long enclosures are spaces with nondiffuse sound fields, for which the classical theory of acoustics is not appropriate. Thus, the modeling of the sound field in a long enclosure is very different from the prediction of the behavior of sound in a diffuse space. Ray-tracing computer models have been developed for the prediction of the sound field in long enclosures, with particular reference to spaces such as underground stations which are generally long spaces of rectangular or curved cross section. This paper describes the development of a model for use in underground stations of rectangular cross section. The model predicts the sound-pressure level, early decay time, clarity index, and definition at receiver points along the enclosure. The model also calculates the value of the speech transmission index at individual points. Measurements of all parameters have been made in a station of rectangular cross section, and compared with the predicted values. The predictions of all parameters show good agreement with measurements at all frequencies, particularly in the far field of the sound source, and the trends in the behavior of the parameters along the enclosure have been correctly predicted.
Saweikis, Meghan; Surprenant, Aimée M.; Davies, Patricia; Gallant, Don
While young and old subjects with comparable audiograms tend to perform comparably on speech recognition tasks in quiet environments, the older subjects have more difficulty than the younger subjects with recognition tasks in degraded listening conditions. This suggests that factors other than an absolute threshold may account for some of the difficulty older listeners have on recognition tasks in noisy environments. Many metrics, including the Speech Intelligibility Index (SII), used to measure speech intelligibility, only consider an absolute threshold when accounting for age related hearing loss. Therefore these metrics tend to overestimate the performance for elderly listeners in noisy environments [Tobias et al., J. Acoust. Soc. Am. 83, 859-895 (1988)]. The present studies examine the predictive capabilities of the SII in an environment with automobile noise present. This is of interest because people's evaluation of the automobile interior sound is closely linked to their ability to carry on conversations with their fellow passengers. The four studies examine whether, for subjects with age related hearing loss, the accuracy of the SII can be improved by incorporating factors other than an absolute threshold into the model. [Work supported by Ford Motor Company.
Chen, Fei; Loizou, Philipos C
The normalized covariance measure (NCM) has been shown previously to predict reliably the intelligibility of noise-suppressed speech containing non-linear distortions. This study analyzes a simplified NCM measure that requires only a small number of bands (not necessarily contiguous) and uses simple binary (1 or 0) weighting functions. The rationale behind the use of a small number of bands is to account for the fact that the spectral information contained in contiguous or nearby bands is correlated and redundant. The modified NCM measure was evaluated with speech intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech corrupted by four different types of maskers (car, babble, train, and street interferences). High correlation (r = 0.8) was obtained with the modified NCM measure even when only one band was used. Further analysis revealed a masker-specific pattern of correlations when only one band was used, and bands with low correlation signified the corresponding envelopes that have been severely distorted by the noise-suppression algorithm and/or the masker. Correlation improved to r = 0.84 when only two disjoint bands (centered at 325 and 1874 Hz) were used. Even further improvements in correlation (r = 0.85) were obtained when three or four lower-frequency (<700 Hz) bands were selected.
Su, Qiaotong; Galvin, John J; Zhang, Guoping; Li, Yongxin; Fu, Qian-Jie
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users.
Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin
Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Sternberg, Robert J
Intelligence is the ability to learn from experience and to adapt to, shape, and select environments. Intelligence as measured by (raw scores on) conventional standardized tests varies across the lifespan, and also across generations. Intelligence can be understood in part in terms of the biology of the brain-especially with regard to the functioning in the prefrontal cortex-and also correlates with brain size, at least within humans. Studies of the effects of genes and environment suggest that the heritability coefficient (ratio of genetic to phenotypic variation) is between .4 and .8, although heritability varies as a function of socioeconomic status and other factors. Racial differences in measured intelligence have been observed, but race is a socially constructed rather than biological variable, so such differences are difficult to interpret.
Sternberg, Robert J.
Intelligence is the ability to learn from experience and to adapt to, shape, and select environments. Intelligence as measured by (raw scores on) conventional standardized tests varies across the lifespan, and also across generations. Intelligence can be understood in part in terms of the biology of the brain—especially with regard to the functioning in the prefrontal cortex—and also correlates with brain size, at least within humans. Studies of the effects of genes and environment suggest that the heritability coefficient (ratio of genetic to phenotypic variation) is between .4 and .8, although heritability varies as a function of socioeconomic status and other factors. Racial differences in measured intelligence have been observed, but race is a socially constructed rather than biological variable, so such differences are difficult to interpret. PMID:22577301
The Diagnostic Rhyme Test (DRT) is widely used to evaluate digital voice systems. Would-be users often have no reference frame of interpreting DRT scores in terms of performance measures that they can understand, e.g., how many operational words are correctly understood. This research was aimed at providing a better understanding of the effects of very poor quality speech on human communication performance. It is especially important to determine how successful communications are likely to be when the speech quality is severely degraded. This report compares the recognition of ICAO spelling alphabet words (ALFA, BRAVO, CHARLIE, etc) with DRT scores for the same conditions. Confusions among the spelling alphabet words are also given. Two types of speech degradation were selected for investigation: narrowband digital speech (the DoD standard linear predictive coding algorithm operating at 2400 bits/s) with varying bit-error rates and analog jamming. The report will be in two parts. Part 1 covers the narrowband digital speech research, and Part 2 will cover the analog speech research.
Knight, Sarah; Heinrich, Antje
Inhibition—the ability to suppress goal-irrelevant information—is thought to be an important cognitive skill in many situations, including speech-in-noise (SiN) perception. One way to measure inhibition is by means of Stroop tasks, in which one stimulus dimension must be named while a second, more prepotent dimension is ignored. The to-be-ignored dimension may be relevant or irrelevant to the target dimension, and the inhibition measure—Stroop interference (SI)—is calculated as the reaction time difference between the relevant and irrelevant conditions. Both SiN perception and inhibition are suggested to worsen with age, yet attempts to connect age-related declines in these two abilities have produced mixed results. We suggest that the inconsistencies between studies may be due to methodological issues surrounding the use of Stroop tasks. First, the relationship between SI and SiN perception may differ depending on the modality of the Stroop task; second, the traditional SI measure may not account for generalized slowing or sensory declines, and thus may not provide a pure interference measure. We investigated both claims in a group of 50 older adults, who performed two Stroop tasks (visual and auditory) and two SiN perception tasks. For each Stroop task, we calculated interference scores using both the traditional difference measure and methods designed to address its various problems, and compared the ability of these different scoring methods to predict SiN performance, alone and in combination with hearing sensitivity. Results from the two Stroop tasks were uncorrelated and had different relationships to SiN perception. Changing the scoring method altered the nature of the predictive relationship between Stroop scores and SiN perception, which was additionally influenced by hearing sensitivity. These findings raise questions about the extent to which different Stroop tasks and/or scoring methods measure the same aspect of cognition. They also highlight
Koning, Raphael; Madhu, Nilesh; Wouters, Jan
Hearing impaired listeners using cochlear implants (CIs) suffer from a decrease in speech intelligibility (SI) in adverse listening conditions. Time-frequency masks are often applied to perform noise suppression in an attempt to increase SI. Two important masks are the so-called ideal binary mask (IBM) with its binary weights and the ideal Wiener filter (IWF) with its continuous weights. It is unclear which of the masks has the highest potential for SI and speech quality enhancement in CI users. In this study, both approaches for SI and quality enhancement were compared. The investigations were conducted in normal-hearing (NH) subjects listening to noise vocoder CI simulations and in CI users. The potential for SI improvement was assessed in a sentence recognition task with ideal mask estimates in multitalker babble and with an interfering talker. The robustness of the approaches was evaluated with simulated estimation errors. CI users assessed the speech quality in a preference rating. The IWF outperformed the IBM in NH listeners. In contrast, no significant difference was obtained in CI users. Estimation errors degraded SI in CI users for both approaches. In terms of quality, the IWF outperformed, slightly, the IBM processed signals. The outcomes of this study suggest that the mask pattern is not that crucial for CIs. Results of speech enhancement algorithms obtained with NH subjects listening to vocoded or normally processed stimuli do not translate to CI users. This outcome means that the effect of new strategies has to be quantified with the user group considered.
ARTIFICIAL INTELLIGENCE , GAME THEORY, DECISION MAKING, BIONICS, AUTOMATA, SPEECH RECOGNITION, GEOMETRIC FORMS, LEARNING MACHINES, MATHEMATICAL MODELS, PATTERN RECOGNITION, SERVOMECHANISMS, SIMULATION, BIBLIOGRAPHIES.
Tjaden, Kris; Sussman, Joan E.; Wilding, Gregory E.
Purpose: The perceptual consequences of rate reduction, increased vocal intensity, and clear speech were studied in speakers with multiple sclerosis (MS), Parkinson's disease (PD), and healthy controls. Method: Seventy-eight speakers read sentences in habitual, clear, loud, and slow conditions. Sentences were equated for peak amplitude and…
Ravishankar, C., Hughes Network Systems, Germantown, MD
coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.
Lousada, M.; Jesus, Luis M. T.; Hall, A.; Joffe, V.
Background: The effectiveness of two treatment approaches (phonological therapy and articulation therapy) for treatment of 14 children, aged 4;0-6;7 years, with phonologically based speech-sound disorder (SSD) has been previously analysed with severity outcome measures (percentage of consonants correct score, percentage occurrence of phonological…
Vouloumanos, Athena; Gelfand, Hanna M.
The ability to decode atypical and degraded speech signals as intelligible is a hallmark of speech perception. Human adults can perceive sounds as speech even when they are generated by a variety of nonhuman sources including computers and parrots. We examined how infants perceive the speech-like vocalizations of a parrot. Further, we examined how…
Cornelis, Bram; Moonen, Marc; Wouters, Jan
This paper evaluates noise reduction techniques in bilateral and binaural hearing aids. Adaptive implementations (on a real-time test platform) of the bilateral and binaural speech distortion weighted multichannel Wiener filter (SDW-MWF) and a competing bilateral fixed beamformer are evaluated. As the SDW-MWF relies on a voice activity detector (VAD), a realistic binaural VAD is also included. The test subjects (both normal hearing subjects and hearing aid users) are tested by an adaptive speech reception threshold (SRT) test in different spatial scenarios, including a realistic cafeteria scenario with nonstationary noise. The main conclusions are: (a) The binaural SDW-MWF can further improve the SRT (up to 2 dB) over the improvements achieved by bilateral algorithms, although a significant difference is only achievable if the binaural SDW-MWF uses a perfect VAD. However, in the cafeteria scenario only the binaural SDW-MWF achieves a significant SRT improvement (2.6 dB with perfect VAD, 2.2 dB with real VAD), for the group of hearing aid users. (b) There is no significant degradation when using a real VAD at the input signal-to-noise ratio (SNR) levels where the hearing aid users reach their SRT. (c) The bilateral SDW-MWF achieves no SRT improvements compared to the bilateral fixed beamformer.
The three general approaches for measuring speech intelligibility in young children are open-set word identification, closed-set word identification, and rating scales. Positive and negative aspects of the various procedures for measuring speech intelligibility are discussed. (Author/DB)
Yu Rao; Yiya Hao; Panahi, Issa M S; Kehtarnavaz, Nasser
In this paper, the development of a speech processing pipeline on smartphones for hearing aid devices (HADs) is presented. This pipeline is used for noise suppression and speech enhancement (SE) to improve speech quality and intelligibility. The proposed method is implemented to run in real-time on Android smartphones. The results of the testing conducted indicate that the proposed method suppresses the noise and improves the perceptual quality of speech in terms of three objective measures of perceptual evaluation of speech quality (PESQ), noise attenuation level (NAL), and the coherent speech intelligibility index (CSD).
Maier, Joost X.; Di Luca, Massimiliano; Noppeney, Uta
Combining information from the visual and auditory senses can greatly enhance intelligibility of natural speech. Integration of audiovisual speech signals is robust even when temporal offsets are present between the component signals. In the present study, we characterized the temporal integration window for speech and nonspeech stimuli with…
Lam, Jennifer; Tjaden, Kris; Wilding, Greg
Purpose: This study investigated how different instructions for eliciting clear speech affected selected acoustic measures of speech. Method: Twelve speakers were audio-recorded reading 18 different sentences from the Assessment of Intelligibility of Dysarthric Speech (Yorkston & Beukelman, 1984). Sentences were produced in habitual, clear,…
Zeng, Fan-Gang; Liu, Sheng
Purpose: Speech perception in participants with auditory neuropathy (AN) was systematically studied to answer the following 2 questions: Does noise present a particular problem for people with AN: Can clear speech and cochlear implants alleviate this problem? Method: The researchers evaluated the advantage in intelligibility of clear speech over…
Nijland, Lian; Terband, Hayo; Maassen, Ben
Purpose: Childhood apraxia of speech (CAS) is diagnosed on the basis of specific speech characteristics, in the absence of problems in hearing, intelligence, and language comprehension. This does not preclude the possibility that children with this speech disorder might demonstrate additional problems. Method: Cognitive functions were investigated…
Smith, Elizabeth G.; Bennetto, Loisa
Background: During speech perception, the ability to integrate auditory and visual information causes speech to sound louder and be more intelligible, and leads to quicker processing. This integration is important in early language development, and also continues to affect speech comprehension throughout the lifespan. Previous research shows that…
Bracken, Bruce A.; McCallum, R. Steve
This kit presents all components of the Universal Nonverbal Intelligence Test (UNIT), a newly developed instrument designed to measure the general intelligence and cognitive abilities of children and adolescents (ages 5 through 17) who may be disadvantaged by traditional verbal and language-loaded measures such as children with speech, language,…
Li, F. F.; Cox, T. J.
Speech transmission index (STI) is an important objective parameter concerning speech intelligibility for sound transmission channels. It is normally measured with specific test signals to ensure high accuracy and good repeatability. Measurement with running speech was previously proposed, but accuracy is compromised and hence applications limited. A new approach that uses artificial neural networks to accurately extract the STI from received running speech is developed in this paper. Neural networks are trained on a large set of transmitted speech examples with prior knowledge of the transmission channels' STIs. The networks perform complicated nonlinear function mappings and spectral feature memorization to enable accurate objective parameter extraction from transmitted speech. Validations via simulations demonstrate the feasibility of this new method on a one-net-one-speech extract basis. In this case, accuracy is comparable with normal measurement methods. This provides an alternative to standard measurement techniques, and it is intended that the neural network method can facilitate occupied room acoustic measurements.
Calandruccio, Lauren; Van Engen, Kristin; Dhar, Sumitrajit; Bradlow, Ann R.
Purpose: It is established that speaking clearly is an effective means of enhancing intelligibility. Because any signal-processing scheme modeled after known acoustic-phonetic features of clear speech will likely affect both target and competing speech, it is important to understand how speech recognition is affected when a competing speech signal…
Elberling, C., Keidser, G. and Poulsen, T. (1990). Prediction of intelligibility of non- linearly processed speech. Acta Otolaryngol Suppl., 469, 190-5... Acustica 46, 59-72.  Steeneken, H.J.M. (1992) On measuring and predicting speech intelligibility. Soesterberg: TNO Institute for Perception
Reetzke, Rachel; Lam, Boji Pak-Wing; Xie, Zilong; Sheng, Li; Chandrasekaran, Bharath
Recognizing speech in adverse listening conditions is a significant cognitive, perceptual, and linguistic challenge, especially for children. Prior studies have yielded mixed results on the impact of bilingualism on speech perception in noise. Methodological variations across studies make it difficult to converge on a conclusion regarding the effect of bilingualism on speech-in-noise performance. Moreover, there is a dearth of speech-in-noise evidence for bilingual children who learn two languages simultaneously. The aim of the present study was to examine the extent to which various adverse listening conditions modulate differences in speech-in-noise performance between monolingual and simultaneous bilingual children. To that end, sentence recognition was assessed in twenty-four school-aged children (12 monolinguals; 12 simultaneous bilinguals, age of English acquisition ≤ 3 yrs.). We implemented a comprehensive speech-in-noise battery to examine recognition of English sentences across different modalities (audio-only, audiovisual), masker types (steady-state pink noise, two-talker babble), and a range of signal-to-noise ratios (SNRs; 0 to -16 dB). Results revealed no difference in performance between monolingual and simultaneous bilingual children across each combination of modality, masker, and SNR. Our findings suggest that when English age of acquisition and socioeconomic status is similar between groups, monolingual and bilingual children exhibit comparable speech-in-noise performance across a range of conditions analogous to everyday listening environments. PMID:27936212
Selouani, Sid-Ahmed; Sidi Yakoub, Mohammed; O'Shaughnessy, Douglas
Assistive speech-enabled systems are proposed to help both French and English speaking persons with various speech disorders. The proposed assistive systems use automatic speech recognition (ASR) and speech synthesis in order to enhance the quality of communication. These systems aim at improving the intelligibility of pathologic speech making it as natural as possible and close to the original voice of the speaker. The resynthesized utterances use new basic units, a new concatenating algorithm and a grafting technique to correct the poorly pronounced phonemes. The ASR responses are uttered by the new speech synthesis system in order to convey an intelligible message to listeners. Experiments involving four American speakers with severe dysarthria and two Acadian French speakers with sound substitution disorders (SSDs) are carried out to demonstrate the efficiency of the proposed methods. An improvement of the Perceptual Evaluation of the Speech Quality (PESQ) value of 5% and more than 20% is achieved by the speech synthesis systems that deal with SSD and dysarthria, respectively.
Khwaileh, Fadwa A.; Flipsen, Peter, Jr.
This study examined the intelligibility of speech produced by 17 children (aged 4-11 years) with cochlear implants. Stimulus items included sentences from the Beginners' Intelligibility Test (BIT) and words from the Children Speech Intelligibility Measure (CSIM). Naive listeners responded by writing sentences heard or with two types of responses…
For some 30 years, intelligibility has been recognized as an appropriate goal for pronunciation instruction, yet remarkably little is known about the factors that make a language learner's speech intelligible. Studies have traced correlations between features of nonnative speech and native speakers' intelligibility judgements. They have tended to…
encoder and identifies individual words. This use of neural networks offers two advantages over conventional algorithmic detectors: the detection...environment. Keywords: Artificial intelligence; Neural networks : Back propagation; Speech recognition.
This book is an introduction to the field of artificial intelligence. The volume sets Al in a broad context of historical attitudes, imaginative insights, and ideas about intelligence in general. The author offers a wide-ranging survey of Al concerns, including cognition, knowledge engineering, problem inference, speech understanding, and perception. He also discusses expert systems, LISP, smart robots, and other Al products, and provides a listing of all major Al systems.
Klopfenstein, Marie I.
Despite the importance of speech naturalness to treatment outcomes, little research has been done on what constitutes speech naturalness and how to best maximize naturalness in relationship to other treatment goals like intelligibility. In addition, previous literature alludes to the relationship between prosodic aspects of speech and speech…
Durand, V. Mark; Crimmins, Daniel B.
Analysis of the psychotic speech of a nine-year-old autistic boy suggested that psychotic speech (intelligible but out of context phrases) was maintained through escape from task demands and that teaching an appropriate equivalent phrase ("Help me") reduced the frequency of psychotic speech. (Author/DB)
Nagle, Kathy F.; Eadie, Tanya L.; Wright, Derek R.; Sumida, Yumi A.
Purpose: To determine (a) the effect of fundamental frequency (f0) on speech intelligibility, acceptability, and perceived gender in electrolaryngeal (EL) speakers, and (b) the effect of known gender on speech acceptability in EL speakers. Method: A 2-part study was conducted. In Part 1, 34 healthy adults provided speech recordings using…
Bradlow, Ann R.
When a talker believes that the listener is likely to have speech perception difficulties due to a hearing loss, background noise, or a different native language, she or he will typically adopt a clear speaking style. Previous research has established that, with a simple set of instructions to the talker, ``clear speech'' can be produced by most talkers under laboratory recording conditions. Furthermore, there is reliable evidence that adult listeners with either impaired or normal hearing typically find clear speech more intelligible than conversational speech. Since clear speech production involves listener-oriented articulatory adjustments, a careful examination of the acoustic-phonetic and perceptual consequences of the conversational-to-clear speech transformation can serve as an effective window into talker- and listener-related forces in speech communication. Furthermore, clear speech research has considerable potential for the development of speech enhancement techniques. After reviewing previous and current work on the acoustic properties of clear versus conversational speech, this talk will present recent data from a cross-linguistic study of vowel production in clear speech and a cross-population study of clear speech perception. Findings from these studies contribute to an evolving view of clear speech production and perception as reflecting both universal, auditory and language-specific, phonological contrast enhancement features.
... and the respiratory system . The ability to understand language and produce speech is coordinated by the brain. So a person with brain damage from an accident, stroke, or birth defect may have speech and language problems. Some people with speech problems, particularly articulation ...
A speech modulated white noise device is reported that gives the rhythmic characteristics of a speech signal for intelligible reception by deaf persons. The signal is composed of random amplitudes and frequencies as modulated by the speech envelope characteristics of rhythm and stress. Time intensity parameters of speech are conveyed through the vibro-tactile sensation stimuli.
City Univ. of New York, Flushing. Queens Coll. Dept. of Communication Arts and Sciences.
Seven papers report on speech language pathology and audiology studies performed by graduate students. The first paper reports on intelligibility of two popular synthetic speech systems used in communication aids for the speech impaired, the Votrax Personal Speech System and the Echo II synthesizer. The second paper reports facilitation of tense…
Pennington, Lindsay; Miller, Nick; Robson, Sheila; Steen, Nick
Aim: To investigate whether speech therapy using a speech systems approach to controlling breath support, phonation, and speech rate can increase the speech intelligibility of children with dysarthria and cerebral palsy (CP). Method: Sixteen children with dysarthria and CP participated in a modified time series design. Group characteristics were…
Nasir, Sazzad M; Ostry, David J
Speech production, like other sensorimotor behaviors, relies on multiple sensory inputs--audition, proprioceptive inputs from muscle spindles and cutaneous inputs from mechanoreceptors in the skin and soft tissues of the vocal tract. However, the capacity for intelligible speech by deaf speakers suggests that somatosensory input alone may contribute to speech motor control and perhaps even to speech learning. We assessed speech motor learning in cochlear implant recipients who were tested with their implants turned off. A robotic device was used to alter somatosensory feedback by displacing the jaw during speech. We found that implant subjects progressively adapted to the mechanical perturbation with training. Moreover, the corrections that we observed were for movement deviations that were exceedingly small, on the order of millimeters, indicating that speakers have precise somatosensory expectations. Speech motor learning is substantially dependent on somatosensory input.
Ma, Yong Ma; Caswell, Daryl J.; Dai, Liming; Goodchild, Jim T.
Speech privacy is the opposite concept of speech intelligibility and can be assessed by the predictors of speech intelligibility. Based on the existing standards and the research to date, most objective assessments for speech privacy and speech intelligibility, such as articulation index (AI) or speech intelligibility index (SII), speech transmission index (STI), and sound early-to-late ratio (C50), are evaluated by the subjective measurements. However, these subject measurements are based on the studies of English or the other Western languages. The language impact to speech privacy has been overseen. It is therefore necessary to study the impact of different languages and accents in multiculturalism environments to speech privacy. In this study, subjective measurements were conducted in closed office environments by using English and a tonal language, Mandarin. Detailed investigations on the language impact to speech privacy were carried out with the two languages. The results of this study reveal the significant evaluation variations in speech privacy when different languages are used. The subjective measurement results obtained in this study were also compared with the objective measurement employing articulation indices.
Klein, Harriet B.; Liu-Shea, May
Purpose: This study was designed to identify and describe between-word simplification patterns in the continuous speech of children with speech sound disorders. It was hypothesized that word combinations would reveal phonological changes that were unobserved with single words, possibly accounting for discrepancies between the intelligibility of…
Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.
In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency and voice emerge more saliently in conversation than in repetition, reading or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have…
Bell, W L; Horner, J; Logue, P; Radtke, R A
There are no documented cases of seizures causing reiterative neologistic speech automatisms. We report an 18-year-old right-handed woman with stereotypic ictal speech automatisms characterized by phonemic jargon and reiterative neologisms. Video-EEG during the reiterative neologisms demonstrated rhythmic delta activity, which was most prominent in the left posterior temporal region. At surgery, there was an arteriovenous malformation impinging on the left supramarginal gyrus and the posterior portion of the superior temporal gyrus. Though intelligible speech automatisms can result from seizure foci in either hemisphere, neologistic speech automatisms may implicate a focus in the language-dominant hemisphere.
Yunusova, Yana; Weismer, Gary; Kent, Ray D.; Rusche, Nicole M.
Purpose: This study was designed to determine whether within-speaker fluctuations in speech intelligibility occurred among speakers with dysarthria who produced a reading passage, and, if they did, whether selected linguistic and acoustic variables predicted the variations in speech intelligibility. Method: Participants with dysarthria included a…
Tjaden, Kris; Wilding, Greg
Intelligibility tests for dysarthria typically provide an estimate of overall severity for speech materials elicited through imitation or read from a printed script. The extent to which these types of tasks and procedures reflect intelligibility for extemporaneous speech is not well understood. The purpose of this study was to compare…
Schroeder, M R; Strube, H W
Flat-spectrum stimuli, consisting of many equal-amplitude harmonics, produce timbre sensations that can depend strongly on the phase angles of the individual harmonics. For fundamental frequencies in the human pitch range, many realizable timbres have vowel-like perceptual qualities. This observation suggests the possibility of constructing intelligible voiced speech signals that have flat-amplitude spectra. This paper describes a successful experiment of creating several different diphthongs by judicious choice of the phase angles of a flat-spectrum waveform. A possible explanation of the observed vowel timbres lies in the dependence of the short-time amplitude spectra on phase changes.
Designed to assist deaf and hearing impaired-persons in achieving better speech, Resnick Worldwide Inc.'s device provides a visual means of cuing the deaf as a speech-improvement measure. This is done by electronically processing the subjects' sounds and comparing them with optimum values which are displayed for comparison.
Brooks, William D.
Presented in this book is a view of speech communication which enables an individual to become fully aware of his or her role as both initiator and recipient of messages. Communication is treated broadly with emphasis on the understanding and skills relating to various types of speech communication across the broad spectrum of human communication.…
Podgor, Ellen S.
The concept of symbolic speech emanates from the 1967 case of United States v. O'Brien. These discussions of flag desecration, grooming and dress codes, nude entertainment, buttons and badges, and musical expression show that the courts place symbolic speech in different strata from verbal communication. (LBH)
Mochida, Takemi; Kimura, Toshitaka; Hiroya, Sadao; Kitagawa, Norimichi; Gomi, Hiroaki; Kondo, Tadahisa
Speech perception is thought to be linked to speech motor production. This linkage is considered to mediate multimodal aspects of speech perception, such as audio-visual and audio-tactile integration. However, direct coupling between articulatory movement and auditory perception has been little studied. The present study reveals a clear dissociation between the effects of a listener’s own speech action and the effects of viewing another’s speech movements on the perception of auditory phonemes. We assessed the intelligibility of the syllables [pa], [ta], and [ka] when listeners silently and simultaneously articulated syllables that were congruent/incongruent with the syllables they heard. The intelligibility was compared with a condition where the listeners simultaneously watched another’s mouth producing congruent/incongruent syllables, but did not articulate. The intelligibility of [ta] and [ka] were degraded by articulating [ka] and [ta] respectively, which are associated with the same primary articulator (tongue) as the heard syllables. But they were not affected by articulating [pa], which is associated with a different primary articulator (lips) from the heard syllables. In contrast, the intelligibility of [ta] and [ka] was degraded by watching the production of [pa]. These results indicate that the articulatory-induced distortion of speech perception occurs in an articulator-specific manner while visually induced distortion does not. The articulator-specific nature of the auditory-motor interaction in speech perception suggests that speech motor processing directly contributes to our ability to hear speech. PMID:23844227
Wang, Emily; Verhagen, Leo; de Vries, Meinou H.
Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].
Ferguson, Sarah Hargus; Kewley-Port, Diane
Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…
Drager, Kathryn D. R.; Clark-Serpentine, Elizabeth A.; Johnson, Kate E.; Roeser, Jennifer L.
Purpose: The present study investigated the intelligibility of digitized and synthesized speech output in background noise for children 3-5 years old. The purpose of the study was to determine whether there was a difference in the intelligibility (ability to repeat) of 3 types of speech output (digitized, DECTalk synthesized, and MacinTalk…
Goldsworthy, Ray L.; Greenberg, Julie E.
The Speech Transmission Index (STI) is a physical metric that is well correlated with the intelligibility of speech degraded by additive noise and reverberation. The traditional STI uses modulated noise as a probe signal and is valid for assessing degradations that result from linear operations on the speech signal. Researchers have attempted to extend the STI to predict the intelligibility of nonlinearly processed speech by proposing variations that use speech as a probe signal. This work considers four previously proposed speech-based STI methods and four novel methods, studied under conditions of additive noise, reverberation, and two nonlinear operations (envelope thresholding and spectral subtraction). Analyzing intermediate metrics in the STI calculation reveals why some methods fail for nonlinear operations. Results indicate that none of the previously proposed methods is adequate for all of the conditions considered, while four proposed methods produce qualitatively reasonable results and warrant further study. The discussion considers the relevance of this work to predicting the intelligibility of cochlear-implant processed speech. .
... thinking, but it becomes disorganized as they're speaking. So, someone who clutters may speak in bursts ... refuse to wait patiently for them to finish speaking. If you have a speech problem, it's fine ...
Montazeri, Vahid; Khoubrouy, Soudeh A; Panahi, Issa M S
Several studies on hearing impaired people who use hearing aid reveal that speech enhancement algorithms implemented in hearing aids improve listening comfort. However, these algorithms do not improve speech intelligibility too much and in many cases they decrease the speech intelligibility, both in hearing-impaired and in normally hearing people. In fact, current approaches for development of the speech enhancement algorithms (e.g. minimum mean square error (MMSE)) are not optimal for intelligibility improvement. Some recent studies investigated the effect of different distortions on the enhanced speech and realized that by controlling the amplification distortion, the intelligibility improves dramatically. In this paper, we examined, subjectively and objectively, the effects of amplification distortion on the speech enhanced by two algorithms in three background noises at different SNR levels.
Smith, Shelley D.; Pennington, Bruce F.; Boada, Richard; Shriberg, Lawrence D.
Background: Speech sound disorder (SSD) is a common childhood disorder characterized by developmentally inappropriate errors in speech production that greatly reduce intelligibility. SSD has been found to be associated with later reading disability (RD), and there is also evidence for both a cognitive and etiological overlap between the two…
Atagi, Eriko; Bent, Tessa
Through experience with speech variability, listeners build categories of indexical speech characteristics including categories for talker, gender, and dialect. The auditory free classification task—a task in which listeners freely group talkers based on audio samples—has been a useful tool for examining listeners’ representations of some of these characteristics including regional dialects and different languages. The free classification task was employed in the current study to examine the perceptual representation of nonnative speech. The category structure and salient perceptual dimensions of nonnative speech were investigated from two perspectives: general similarity and perceived native language background. Talker intelligibility and whether native talkers were included were manipulated to test stimulus set effects. Results showed that degree of accent was a highly salient feature of nonnative speech for classification based on general similarity and on perceived native language background. This salience, however, was attenuated when listeners were listening to highly intelligible stimuli and attending to the talkers’ native language backgrounds. These results suggest that the context in which nonnative speech stimuli are presented—such as the listeners’ attention to the talkers’ native language and the variability of stimulus intelligibility—can influence listeners’ perceptual organization of nonnative speech. PMID:24363470
Nickerson, Raymond S.; And Others
The factor of timing on intelligible speech among normally-hearing persons was studied to determine some ways in which it differs from the speech of the deaf. Additional data was gathered on the temporal aspects of the speech of deaf and hearing children and hearing adults. The data corroborated earlier studies indicating that (1) deaf speakers…
Shriberg, Lawrence D.; Paul, Rhea; Black, Lois M.; van Santen, Jan P.
In a sample of 46 children aged 4-7 years with Autism Spectrum Disorder (ASD) and intelligible speech, there was no statistical support for the hypothesis of concomitant Childhood Apraxia of Speech (CAS). Perceptual and acoustic measures of participants' speech, prosody, and voice were compared with data from 40 typically-developing children, 13…
Bilodeau-Mercure, Mylène; Lortie, Catherine L; Sato, Marc; Guitton, Matthieu J; Tremblay, Pascale
Speech perception difficulties are common among elderlies; yet the underlying neural mechanisms are still poorly understood. New empirical evidence suggesting that brain senescence may be an important contributor to these difficulties has challenged the traditional view that peripheral hearing loss was the main factor in the etiology of these difficulties. Here, we investigated the relationship between structural and functional brain senescence and speech perception skills in aging. Following audiometric evaluations, participants underwent MRI while performing a speech perception task at different intelligibility levels. As expected, with age speech perception declined, even after controlling for hearing sensitivity using an audiological measure (pure tone averages), and a bioacoustical measure (DPOAEs recordings). Our results reveal that the core speech network, centered on the supratemporal cortex and ventral motor areas bilaterally, decreased in spatial extent in older adults. Importantly, our results also show that speech skills in aging are affected by changes in cortical thickness and in brain functioning. Age-independent intelligibility effects were found in several motor and premotor areas, including the left ventral premotor cortex and the right supplementary motor area (SMA). Age-dependent intelligibility effects were also found, mainly in sensorimotor cortical areas, and in the left dorsal anterior insula. In this region, changes in BOLD signal modulated the relationship between age and speech perception skills suggesting a role for this region in maintaining speech perception in older ages. These results provide important new insights into the neurobiology of speech perception in aging.
Suter, Alice H.
To assess the effects of noise on speech communication it is necessary to examine certain characteristics of the speech signal. Speech level can be measured by a variety of methods, none of which has yet been standardized, and it should be kept in mind that vocal effort increases with background noise level and with different types of activity. Noise and filtering commonly degrade the speech signal, especially as it is transmitted through communications systems. Intelligibility is also adversely affected by distance, reverberation, and monaural listening. Communication systems currently in use may cause strain and delays on the part of the listener, but there are many possibilities for improvement. Individuals who need to communicate in noise may be subject to voice disorders. Shouted speech becomes progressively less intelligible at high voice levels, but improvements can be realized when talkers use clear speech. Tolerable listening levels are lower for negative than for positive S/Ns, and comfortable listening levels should be at a S/N of at least 5 dB, and preferably above 10 dB. Popular methods to predict speech intelligibility in noise include the Articulation Index, Speech Interference Level, Speech Transmission Index, and the sound level meter's A-weighting network. This report describes these methods, discussing certain advantages and disadvantages of each, and shows their interrelations.
Villarreal, James A.; Wang, Lui
Vital to the success of an expert system is an interface to the user which performs intelligently. A generic intelligent interface is being developed for expert systems. This intelligent interface was developed around the in-house developed Expert System for the Flight Analysis System (ESFAS). The Flight Analysis System (FAS) is comprised of 84 configuration controlled FORTRAN subroutines that are used in the preflight analysis of the space shuttle. In order to use FAS proficiently, a person must be knowledgeable in the areas of flight mechanics, the procedures involved in deploying a certain payload, and an overall understanding of the FAS. ESFAS, still in its developmental stage, is taking into account much of this knowledge. The generic intelligent interface involves the integration of a speech recognizer and synthesizer, a preparser, and a natural language parser to ESFAS. The speech recognizer being used is capable of recognizing 1000 words of connected speech. The natural language parser is a commercial software package which uses caseframe instantiation in processing the streams of words from the speech recognizer or the keyboard. The systems configuration is described along with capabilities and drawbacks.
Herff, Christian; Schultz, Tanja
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system. PMID:27729844
Herff, Christian; Schultz, Tanja
Speech interfaces have become widely accepted and are nowadays integrated in various real-life applications and devices. They have become a part of our daily life. However, speech interfaces presume the ability to produce intelligible speech, which might be impossible due to either loud environments, bothering bystanders or incapabilities to produce speech (i.e., patients suffering from locked-in syndrome). For these reasons it would be highly desirable to not speak but to simply envision oneself to say words or sentences. Interfaces based on imagined speech would enable fast and natural communication without the need for audible speech and would give a voice to otherwise mute people. This focused review analyzes the potential of different brain imaging techniques to recognize speech from neural signals by applying Automatic Speech Recognition technology. We argue that modalities based on metabolic processes, such as functional Near Infrared Spectroscopy and functional Magnetic Resonance Imaging, are less suited for Automatic Speech Recognition from neural signals due to low temporal resolution but are very useful for the investigation of the underlying neural mechanisms involved in speech processes. In contrast, electrophysiologic activity is fast enough to capture speech processes and is therefor better suited for ASR. Our experimental results indicate the potential of these signals for speech recognition from neural data with a focus on invasively measured brain activity (electrocorticography). As a first example of Automatic Speech Recognition techniques used from neural signals, we discuss the Brain-to-text system.
Tedford, Thomas L., Ed.
This book is a collection of essays on free speech issues and attitudes, compiled by the Commission on Freedom of Speech of the Speech Communication Association. Four articles focus on freedom of speech in classroom situations as follows: a philosophic view of teaching free speech, effects of a course on free speech on student attitudes,…
The book covers the principles of AI, the main areas of application, as well as considering some of the social implications. The applications chapters have a common format structured as follows: definition of the topic; approach with conventional computing techniques; why 'intelligence' would provide a better approach; and how AI techniques would be used and the limitations. The contents discussed are: Principles of artificial intelligence; AI programming environments; LISP, list processing and pattern-making; AI programming with POP-11; Computer processing of natural language; Speech synthesis and recognition; Computer vision; Artificial intelligence and robotics; The anatomy of expert systems - Forsyth; Machine learning; Memory models of man and machine; Artificial intelligence and cognitive psychology; Breaking out of the chinese room; Social implications of artificial intelligence; and Index.
Guntupalli, Vijaya K; Kalinowski, Joseph; Saltuklaroglu, Tim; Nanjundeswaran, Chayadevie
The recovery of 'gestural' speech information via the engagement of mirror neurons has been suggested to be the key agent in stuttering inhibition during the presentation of exogenous second speech signals. Based on this hypothesis, we expect the amount of stuttering inhibition to depend on the ease of recovery of exogenous speech gestures. To examine this possibility, linguistically non-congruent second speech signals were temporally compressed and expanded in two experiments. In Experiment 1, 12 participants who stutter read passages aloud at normal and fast speech rates while listening to second speech signals that were 0, 40, 80% compressed, and 40 and 80% expanded. Except for the 80% compressed speech signal, all other stimuli induced significant stuttering inhibition relative to the control condition. The 80% compressed speech signal was the first exogenously presented speech signal that failed to significantly reduce stuttering frequency by 60--70% that has been the case in our research over the years. It was hypothesized that at a compression ratio of 80%, exogenous speech signals generated too many gestures per unit time to allow for adequate gestural recovery via mirror neurons. However, considering that 80% compressed signal was also highly unintelligible, a second experiment was conducted to further examine whether the effects of temporal compression on stuttering inhibition are mediated by speech intelligibility. In Experiment 2, 10 participants who stutter read passages at a normal rate while listening to linguistically non-congruent second speech signals that were compressed by 0, 20, 40, 60, and 80%. Results revealed that 0 and 20% compressed speech signals induced approximately 52% stuttering inhibition. In contrast, compression ratios of 40% and beyond induced only 27% stuttering inhibition although 40 and 60% compressed signals were perceptually intelligible. Our findings suggest that recovery of gestural information is affected by temporal
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Lokerson, D. C. (Inventor)
A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.
The premise of this study is that current models of speech perception, which are driven by acoustic features alone, are incomplete, and that the role of decoding time during memory access must be incorporated to account for the patterns of observed recognition phenomena. It is postulated that decoding time is governed by a cascade of neuronal oscillators, which guide template-matching operations at a hierarchy of temporal scales. Cascaded cortical oscillations in the theta, beta, and gamma frequency bands are argued to be crucial for speech intelligibility. Intelligibility is high so long as these oscillations remain phase locked to the auditory input rhythm. A model (Tempo) is presented which is capable of emulating recent psychophysical data on the intelligibility of speech sentences as a function of “packaging” rate (Ghitza and Greenberg, 2009). The data show that intelligibility of speech that is time-compressed by a factor of 3 (i.e., a high syllabic rate) is poor (above 50% word error rate), but is substantially restored when the information stream is re-packaged by the insertion of silent gaps in between successive compressed-signal intervals – a counterintuitive finding, difficult to explain using classical models of speech perception, but emerging naturally from the Tempo architecture. PMID:21743809
Yi, Astrid; Wong, Willy; Eizenman, Moshe
Purpose: In this study, the authors sought to quantify the relationships between speech intelligibility (perception) and gaze patterns under different auditory-visual conditions. Method: Eleven subjects listened to low-context sentences spoken by a single talker while viewing the face of one or more talkers on a computer display. Subjects either…
Espy-Wilson, Carol Y.; Chari, Venkatesh R.; MacAuslan, Joel M.; Huang, Caroline B.; Walsh, Michael J.
A study tested the quality and intelligibility, as judged by several listeners, of four users' electrolaryngeal speech, with and without filtering to compensate for perceptually objectionable acoustic characteristics. Results indicated that an adaptive filtering technique produced a noticeable improvement in the quality of the Transcutaneous…
Investigations in recent years have indicated that only about 20% of the speech output of the deaf is understood by the "person on the street." This lack of intelligibility has been associated with some frequently occurring segmental and suprasegmental errors. Journal Availability: Elsevier North Holland, Inc., 52 Vanderbilt Avenue, New York, NY…
Witbrock, Michael J.; Hauptmann, Alexander G.
Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…
Sørensen, Sarah Mejer; Bjørn, Signe Frahm; Jochumsen, Kirsten Marie; Jensen, Pernille Tine; Thranov, Ingrid Regitze; Hare-Bruun, Helle; Seibæk, Lene; Høgdall, Claus
Aim of database The Danish Gynecological Cancer Database (DGCD) is a nationwide clinical cancer database and its aim is to monitor the treatment quality of Danish gynecological cancer patients, and to generate data for scientific purposes. DGCD also records detailed data on the diagnostic measures for gynecological cancer. Study population DGCD was initiated January 1, 2005, and includes all patients treated at Danish hospitals for cancer of the ovaries, peritoneum, fallopian tubes, cervix, vulva, vagina, and uterus, including rare histological types. Main variables DGCD data are organized within separate data forms as follows: clinical data, surgery, pathology, pre- and postoperative care, complications, follow-up visits, and final quality check. DGCD is linked with additional data from the Danish “Pathology Registry”, the “National Patient Registry”, and the “Cause of Death Registry” using the unique Danish personal identification number (CPR number). Descriptive data Data from DGCD and registers are available online in the Statistical Analysis Software portal. The DGCD forms cover almost all possible clinical variables used to describe gynecological cancer courses. The only limitation is the registration of oncological treatment data, which is incomplete for a large number of patients. Conclusion The very complete collection of available data from more registries form one of the unique strengths of DGCD compared to many other clinical databases, and provides unique possibilities for validation and completeness of data. The success of the DGCD is illustrated through annual reports, high coverage, and several peer-reviewed DGCD-based publications. PMID:27822089
Yuan, Meng; Sun, Yang; Feng, Haihong; Lee, Tan
This paper discusses a single-channel speech enhancement method for cochlear implant listeners. It is assumed that the Fourier Transform coefficients of speech and background noise have different statistical distributions. A statistical-model-based method is adopted to update the signal-to-noise ratio and estimate the background noise so that the musical noise and speech distortion induced by traditional spectral subtraction method can be effectively reduced. This enhancement method was evaluated on seven postlingually deaf Chinese cochlear implant listeners in comparison with other two speech enhancement methods. Test materials were Mandarin sentences corrupted by three different types of background noise. Experimental results showed that the proposed speech enhancement method could benefit the speech intelligibility of Chinese cochlear implant listeners. The results suggest that different noise types may affect the performance of different speech enhancement algorithms.
Kayasith, Prakasith; Theeramunkong, Thanaruk
It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Hölmich, Lisbet Rosenkrantz; Klausen, Siri; Spaun, Eva; Schmidt, Grethe; Gad, Dorte; Svane, Inge Marie; Schmidt, Henrik; Lorentzen, Henrik Frank; Ibfelt, Else Helene
Aim of database The aim of the database is to monitor and improve the treatment and survival of melanoma patients. Study population All Danish patients with cutaneous melanoma and in situ melanomas must be registered in the Danish Melanoma Database (DMD). In 2014, 2,525 patients with invasive melanoma and 780 with in situ tumors were registered. The coverage is currently 93% compared with the Danish Pathology Register. Main variables The main variables include demographic, clinical, and pathological characteristics, including Breslow’s tumor thickness, ± ulceration, mitoses, and tumor–node–metastasis stage. Information about the date of diagnosis, treatment, type of surgery, including safety margins, results of lymphoscintigraphy in patients for whom this was indicated (tumors > T1a), results of sentinel node biopsy, pathological evaluation hereof, and follow-up information, including recurrence, nature, and treatment hereof is registered. In case of death, the cause and date are included. Currently, all data are entered manually; however, data catchment from the existing registries is planned to be included shortly. Descriptive data The DMD is an old research database, but new as a clinical quality register. The coverage is high, and the performance in the five Danish regions is quite similar due to strong adherence to guidelines provided by the Danish Melanoma Group. The list of monitored indicators is constantly expanding, and annual quality reports are issued. Several important scientific studies are based on DMD data. Conclusion DMD holds unique detailed information about tumor characteristics, the surgical treatment, and follow-up of Danish melanoma patients. Registration and monitoring is currently expanding to encompass even more clinical parameters to benefit both patient treatment and research. PMID:27822097
This document contains the six of the seven keynote speeches from an international conference on vocational education and training (VET) for lifelong learning in the information era. "IVETA (International Vocational Education and Training Association) 2000 Conference 6-9 August 2000" (K.Y. Yeung) discusses the objectives and activities…
Paulson, L. M.; MacArthur, C. J.; Beaulieu, K. B.; Brockman, J. H.; Milczuk, H. A.
Introduction. Controversy exists over whether tonsillectomy will affect speech in patients with known velopharyngeal insufficiency (VPI), particularly in those with cleft palate. Methods. All patients seen at the OHSU Doernbecher Children's Hospital VPI clinic between 1997 and 2010 with VPI who underwent tonsillectomy were reviewed. Speech parameters were assessed before and after tonsillectomy. Wilcoxon rank-sum testing was used to evaluate for significance. Results. A total of 46 patients with VPI underwent tonsillectomy during this period. Twenty-three had pre- and postoperative speech evaluation sufficient for analysis. The majority (87%) had a history of cleft palate. Indications for tonsillectomy included obstructive sleep apnea in 11 (48%) and staged tonsillectomy prior to pharyngoplasty in 10 (43%). There was no significant difference between pre- and postoperative speech intelligibility or velopharyngeal competency in this population. Conclusion. In this study, tonsillectomy in patients with VPI did not significantly alter speech intelligibility or velopharyngeal competence. PMID:22164175
Peelle, Jonathan E; Sommers, Mitchell S
During face-to-face conversational speech listeners must efficiently process a rapid and complex stream of multisensory information. Visual speech can serve as a critical complement to auditory information because it provides cues to both the timing of the incoming acoustic signal (the amplitude envelope, influencing attention and perceptual sensitivity) and its content (place and manner of articulation, constraining lexical selection). Here we review behavioral and neurophysiological evidence regarding listeners' use of visual speech information. Multisensory integration of audiovisual speech cues improves recognition accuracy, particularly for speech in noise. Even when speech is intelligible based solely on auditory information, adding visual information may reduce the cognitive demands placed on listeners through increasing the precision of prediction. Electrophysiological studies demonstrate that oscillatory cortical entrainment to speech in auditory cortex is enhanced when visual speech is present, increasing sensitivity to important acoustic cues. Neuroimaging studies also suggest increased activity in auditory cortex when congruent visual information is available, but additionally emphasize the involvement of heteromodal regions of posterior superior temporal sulcus as playing a role in integrative processing. We interpret these findings in a framework of temporally-focused lexical competition in which visual speech information affects auditory processing to increase sensitivity to acoustic information through an early integration mechanism, and a late integration stage that incorporates specific information about a speaker's articulators to constrain the number of possible candidates in a spoken utterance. Ultimately it is words compatible with both auditory and visual information that most strongly determine successful speech perception during everyday listening. Thus, audiovisual speech perception is accomplished through multiple stages of integration
King, Simon; Frankel, Joe; Livescu, Karen; McDermott, Erik; Richmond, Korin; Wester, Mirjam
Although much is known about how speech is produced, and research into speech production has resulted in measured articulatory data, feature systems of different kinds, and numerous models, speech production knowledge is almost totally ignored in current mainstream approaches to automatic speech recognition. Representations of speech production allow simple explanations for many phenomena observed in speech which cannot be easily analyzed from either acoustic signal or phonetic transcription alone. In this article, a survey of a growing body of work in which such representations are used to improve automatic speech recognition is provided.
Hustad, Katherine C.; Schueler, Brynn; Schultz, Laurel; DuHadway, Caitlin
Purpose: The authors examined speech intelligibility in typically developing (TD) children and 3 groups of children with cerebral palsy (CP) who were classified into speech/language profile groups following Hustad, Gorton, and Lee (2010). Questions addressed differences in transcription intelligibility scores among groups, the effects of utterance…
Whitehead, John S.
The paper is a supplement to an earlier paper in the same series which reviews Danish higher education until 1977. Expansion in higher education in the last 20 years, approaching the scale of mass higher education, culminated in a crisis in 1977. At that time, a trend toward self-government and participatory governing boards was seen as the end of…
Johnsen, Søren Paaske; Ingeman, Annette; Hundborg, Heidi Holmager; Schaarup, Susanne Zielke; Gyllenborg, Jesper
Aim of database The aim of the Danish Stroke Registry is to monitor and improve the quality of care among all patients with acute stroke and transient ischemic attack (TIA) treated at Danish hospitals. Study population All patients with acute stroke (from 2003) or TIA (from 2013) treated at Danish hospitals. Reporting is mandatory by law for all hospital departments treating these patients. The registry included >130,000 events by the end of 2014, including 10,822 strokes and 4,227 TIAs registered in 2014. Main variables The registry holds prospectively collected data on key processes of care, mainly covering the early phase after stroke, including data on time of delivery of the processes and the eligibility of the individual patients for each process. The data are used for assessing 18 process indicators reflecting recommendations in the national clinical guidelines for patients with acute stroke and TIA. Patient outcomes are currently monitored using 30-day mortality, unplanned readmission, and for patients receiving revascularization therapy, also functional level at 3 months poststroke. Descriptive data Sociodemographic, clinical, and lifestyle factors with potential prognostic impact are registered. Conclusion The Danish Stroke Registry is a well-established clinical registry which plays a key role for monitoring and improving stroke and TIA care in Denmark. In addition, the registry is increasingly used for research. PMID:27843349
Academic Press, 1973. Kimura, D. The neural basis of language qua gesture. In H. Whitaker & H. A. Whitaker (Eds.), Studies in neurolinguistics (Vol. 3...Lubker, J., & Gay, T. Formant frequencies of some fixed- mandible vowels and a model of speech motor programming . Journal of Phonetics, 1979, 7, 147-162...A. Interarticulator programming in stop production. To appear in Journal of Phonetics, in press. Ldfqvist, A., & Yoshioka, H. Laryngeal activity in
Liu, Fang; Jiang, Cunmei; Wang, Bei; Xu, Yi; Patel, Aniruddh D
This study investigated the underlying link between speech and music by examining whether and to what extent congenital amusia, a musical disorder characterized by degraded pitch processing, would impact spoken sentence comprehension for speakers of Mandarin, a tone language. Sixteen Mandarin-speaking amusics and 16 matched controls were tested on the intelligibility of news-like Mandarin sentences with natural and flat fundamental frequency (F0) contours (created via speech resynthesis) under four signal-to-noise (SNR) conditions (no noise, +5, 0, and -5dB SNR). While speech intelligibility in quiet and extremely noisy conditions (SNR=-5dB) was not significantly compromised by flattened F0, both amusic and control groups achieved better performance with natural-F0 sentences than flat-F0 sentences under moderately noisy conditions (SNR=+5 and 0dB). Relative to normal listeners, amusics demonstrated reduced speech intelligibility in both quiet and noise, regardless of whether the F0 contours of the sentences were natural or flattened. This deficit in speech intelligibility was not associated with impaired pitch perception in amusia. These findings provide evidence for impaired speech comprehension in congenital amusia, suggesting that the deficit of amusics extends beyond pitch processing and includes segmental processing.
Brandewie, Eugene; Zahorik, Pavel
Speech intelligibility has been shown to improve with prior exposure to a reverberant room environment [Brandewie and Zahorik (2010). J. Acoust. Soc. Am. 128, 291-299] with a spatially separated noise masker. Here, this speech enhancement effect was examined in multiple room environments using carrier phrases of varying lengths in order to control the amount of exposure. Speech intelligibility enhancement of between 5% and 18% was observed with as little as 850 ms of exposure, although the effect's time course varied considerably with reverberation and signal-to-noise ratio. In agreement with previous work, greater speech enhancement was found for reverberant environments compared to anechoic space.
each category containing 16 word pairs that differ only in the initial consonant. The six consonant categories are voicing, nasality, sustention ...voiced) are paired with their bilabial stop counterparts. meat (nasal) vs. beat (voiced, bilabial stop) Sustention (Sust) No movement compared...and sustention categories. The current results clearly demonstrate that while the throat microphone enhances the signal-to-noise ratio, the
Gregg, Jean Westerman; Scherer, Ronald C
Vowel intelligibility during singing is an important aspect of communication during performance. The intelligibility of isolated vowels sung by Western classically trained singers has been found to be relatively low, in fact, decreasing as pitch rises, and it is lower for women than for men. The lack of contextual cues significantly deteriorates vowel intelligibility. It was postulated in this study that the reduced intelligibility of isolated sung vowels may be partly from the vowels used by the singers in their daily vocalises. More specifically, if classically trained singers sang only a few American English vowels during their vocalises, their intelligibility for American English vowels would be less than for those classically trained singers who usually vocalize on most American English vowels. In this study, there were 21 subjects (15 women, 6 men), all Western classically trained performers as well as teachers of classical singing. They sang 11 words containing 11 different American English vowels, singing on two pitches a musical fifth apart. Subjects were divided into two groups, those who normally vocalize on 4, 5, or 6 vowels, and those who sing all 11 vowels during their daily vocalises. The sung words were cropped to isolate the vowels, and listening tapes were created. Two listening groups, four singing teachers and five speech-language pathologists, were asked to identify the vowels intended by the singers. Results suggest that singing fewer vowels during daily vocalises does not decrease intelligibility compared with singing the 11 American English vowels. Also, in general, vowel intelligibility was lower with the higher pitch, and vowels sung by the women were less intelligible than those sung by the men. Identification accuracy was about the same for the singing teacher listeners and the speech-language pathologist listeners except for the lower pitch, where the singing teachers were more accurate.
Mekonnen, Abebayehu Messele
This article presents a case study of speech production in a 14-year-old Amharic-speaking boy. The boy had developed secondary macroglossia, related to a disturbance of growth hormones, following a history of normal speech development. Perceptual analysis combined with acoustic analysis and static palatography is used to investigate the specific articulatory compensations arising from the macroglossia. The subset of sounds chosen for study were the denti-alveolar and alveolar plosives, fricatives, ejectives, nasal, lateral and trill produced in single words, as well as in short phrases. The phonetic analysis revealed both spatial and temporal atypicalities in the realisations of the sounds in question. Speaking rate was slow relative to his peer's speech and attempts to increase speech rate resulted in dysfluent speech. Given the phonological system of Amharic, however, the atypical segmental realisations, while reducing both the intelligibility and acceptability of the participant's speech production, did not result in loss of phonological contrasts.
Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi
Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.
Steeneken, H. J. M.; Houtgast, T.
The use of an objective method of measuring speech intelligibility in auditoria is illustrated. The applications involve the mapping of iso-intelligibility contours for tracing areas with poor intelligibility, and also for assessing the gain of a public address system. The method, based on the modulation transfer function (MTF), presents valuable diagnostic information about the effect of reverberation, noise, echoes and of public address systems on intelligibility. The measuring time is about 3 minutes for the MTFs of the octave bands 500 Hz and 2000 Hz.
Gover, Bradford N.; Bradley, John S.
Objective measures were investigated as predictors of the speech security of closed offices and rooms. A new signal-to-noise type measure is shown to be a superior indicator for security than existing measures such as the Articulation Index, the Speech Intelligibility Index, the ratio of the loudness of speech to that of noise, and the A-weighted level difference of speech and noise. This new measure is a weighted sum of clipped one-third-octave-band signal-to-noise ratios; various weightings and clipping levels are explored. Listening tests had 19 subjects rate the audibility and intelligibility of 500 English sentences, filtered to simulate transmission through various wall constructions, and presented along with background noise. The results of the tests indicate that the new measure is highly correlated with sentence intelligibility scores and also with three security thresholds: the threshold of intelligibility (below which speech is unintelligible), the threshold of cadence (below which the cadence of speech is inaudible), and the threshold of audibility (below which speech is inaudible). The ratio of the loudness of speech to that of noise, and simple A-weighted level differences are both shown to be well correlated with these latter two thresholds (cadence and audibility), but not well correlated with intelligibility. .
Peelle, Jonathan E.; Davis, Matthew H.
A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251
Following the Japanese announcement that they intend to devise, make, and market, in the 1990s, computers incorporating a level of intelligence, a vast amount of energy and expense has been diverted at the field of Artificial Intelligence. Workers for the past 25 years in this discipline have tried to reproduce human behavior on computers and this book presents their achievements and the problems. Subjects include: computer vision, speech processing, robotics, natural language processing expert systems and machine learning. The book also attempts to show the general principles behind the various applications and finally attempts to show their implications for other human endeavors such as philosophy, psychology, and the development of modern society.
... Speech-to-Speech (STS) is one form of Telecommunications Relay Service (TRS). TRS is a service that ... to STS, go to www. fcc. gov/ guides/ telecommunications- relay- service- trs. Filing a Complaint If you ...
Phonology is traditionally seen as the discipline that concerns itself with the building blocks of linguistic messages. It is the study of the structure of sound inventories of languages and of the participation of sounds in rules or processes. Phonetics, in contrast, concerns speech sounds as produced and perceived. Two extreme positions on the relationship between phonological messages and phonetic realizations are represented in the literature. One holds that the primary home for linguistic symbols, including phonological ones, is the human mind, itself housed in the human brain. The second holds that their primary home is the human vocal tract.
Drullman, Rob; Bronkhorst, Adelbert W.
Sentence intelligibility for interfering speech was investigated as a function of level difference, pitch difference, and presence of tactile support. A previous study by the present authors [J. Acoust. Soc. Am. 111, 2432-2433 (2002)] had shown a small benefit of tactile support in the speech-reception threshold measured against a background of one to eight competing talkers. The present experiment focused on the effects of informational and energetic masking for one competing talker. Competing speech was obtained by manipulating the speech of the male target talker (different sentences). The PSOLA technique was used to increase the average pitch of competing speech by 2, 4, 8, or 12 semitones. Level differences between target and competing speech ranged from -16 to +4 dB. Tactile support (B&K 4810 shaker) was given to the index finger by presenting the temporal envelope of the low-pass-filtered speech (0-200 Hz). Sentences were presented diotically and the percentage of correctly perceived words was measured. Results show a significant overall increase in intelligibility score from 71% to 77% due to tactile support. Performance improves monotonically with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences.
Liégeois, Frédérique; Morgan, Angela T; Stewart, Lorna H; Helen Cross, J; Vogel, Adam P; Vargha-Khadem, Faraneh
Hemispherectomy (disconnection or removal of an entire cerebral hemisphere) is a rare surgical procedure used for the relief of drug-resistant epilepsy in children. After hemispherectomy, contralateral hemiplegia persists whereas gross expressive and receptive language functions can be remarkably spared. Motor speech deficits have rarely been examined systematically, thus limiting the accuracy of postoperative prognosis. We describe the speech profiles of hemispherectomized participants characterizing their intelligibility, articulation, phonological speech errors, dysarthric features, and execution and sequencing of orofacial speech and non-speech movements. Thirteen participants who had undergone hemispherectomy (six left, seven right; nine with congenital, four with acquired hemiplegia; operated between four months and 13 years) were investigated. Results showed that all participants were intelligible but showed a mild dysarthric profile characterized by neuromuscular asymmetry and reduced quality and coordination of movements, features that are characteristic of adult-onset unilateral upper motor neuron dysarthria, flaccid-ataxic variant. In addition, one left and four right hemispherectomy cases presented with impaired production of speech and non-speech sequences. No participant showed evidence of verbal or oral dyspraxia. It is concluded that mild dysarthria is persistent after left or right hemispherectomy, irrespective of age at onset of hemiplegia. These results indicate incomplete functional re-organization for the control of fine speech motor movements throughout childhood, and provide no evidence of hemispheric differences.
Danish auroral science history begins with the early auroral observations made by the Danish astronomer Tycho Brahe during the years from 1582 to 1601 preceding the Maunder minimum in solar activity. Included are also the brilliant observations made by another astronomer, Ole Rømer, from Copenhagen in 1707, as well as the early auroral observations made from Greenland by missionaries during the 18th and 19th centuries. The relations between auroras and geomagnetic variations were analysed by H. C. Ørsted, who also played a vital role in the development of Danish meteorology that came to include comprehensive auroral observations from Denmark, Iceland and Greenland as well as auroral and geomagnetic research. The very important auroral investigations made by Sophus Tromholt are outlined. His analysis from 1880 of auroral observations from Greenland prepared for the significant contributions from the Danish Meteorological Institute, DMI, (founded in 1872) to the first International Polar Year 1882/83, where an expedition headed by Adam Paulsen was sent to Greenland to conduct auroral and geomagnetic observations. Paulsen's analyses of the collected data gave many important results but also raised many new questions that gave rise to auroral expeditions to Iceland in 1899 to 1900 and to Finland in 1900 to 1901. Among the results from these expeditions were 26 unique paintings of the auroras made by the artist painter, Harald Moltke. The expedition to Finland was headed by Dan la Cour, who later as director of the DMI came to be in charge of the comprehensive international geomagnetic and auroral observations made during the Second International Polar Year in 1932/33. Finally, the article describes the important investigations made by Knud Lassen during, among others, the International Geophysical Year 1957/58 and during the International Quiet Sun Year (IQSY) in 1964/65. With his leadership the auroral and geomagnetic research at DMI reached a high international
Waltz, David L.
Describes kinds of results achieved by computer programs in artificial intelligence. Topics discussed include heuristic searches, artificial intelligence/psychology, planning program, backward chaining, learning (focusing on Winograd's blocks to explore learning strategies), concept learning, constraint propagation, language understanding…
Purpose: This study was designed to assess potential contributors to listener variability in judgments of intelligibility. Method: A total of 228 unfamiliar everyday listeners judged speech samples from 3 individuals with dysarthria. Samples were the single-word phonetic contrast test, the Sentence Intelligibility Test, an unpredictable sentence…
Experiments were conducted to investigate the effect of noise and reverberation on speech intelligibility and to find a simple method for evaluating the quality of speech transmission. Articulation scores and subjectively judged intelligibility of sentences were obtained in a laboratory situation together with the physical measure of the cross-correlation between the source and the listening point. The effect of broadband steady noise was examined by a monosyllable articulation test and subjective judgment of speech interference. A three-syllable articulation test was used to detect the effect of reverberation. These subjective indices showed correspondence with the source to listening point cross-correlation. The measurement of cross-correlation appeared to be effective as an objective method for estimating speech interference.
Forty years ago Lisker and Abramson published their landmark paper on VOT; the speech-research world has never been the same. The concept of VOT as a measure relevant to phonology, speech physiology, and speech perception made it a prime choice for scientists who saw an opportunity to exploit the techniques and analytic frameworks of ``speech science'' in the study of speech disorders. Modifications of VOT in speech disorders have been used to draw specific inferences concerning phonological representations, glottal-supraglottal timing, and speech intelligibility. This presentation will provide a review of work on VOT in speech disorders, including (among others) stuttering, hearing impairment, and neurogenic disorders. An attempt will be made to collect published data in summary graphic form, and to discuss their implications. Emphasis will be placed on how VOT has been used to inform theories of disordered speech production. I will close with some personal comments about the influence (unbeknowest to them) these two outstanding scientists had on me in the 1970s, when under the spell of their work I first became aware that the world of speech research did not start and end with moving parts.
Forty years ago Lisker and Abramson published their landmark paper on VOT; the speech-research world has never been the same. The concept of VOT as a measure relevant to phonology, speech physiology, and speech perception made it a prime choice for scientists who saw an opportunity to exploit the techniques and analytic frameworks of ``speech science'' in the study of speech disorders. Modifications of VOT in speech disorders have been used to draw specific inferences concerning phonological representations, glottal-supraglottal timing, and speech intelligibility. This presentation will provide a review of work on VOT in speech disorders, including (among others) stuttering, hearing impairment, and neurogenic disorders. An attempt will be made to collect published data in summary graphic form, and to discuss their implications. Emphasis will be placed on how VOT has been used to inform theories of disordered speech production. I will close with some personal comments about the influence (unbeknowest to them) these two outstanding scientists had on me in the 1970s, when under the spell of their work I first became aware that the world of speech research did not start and end with moving parts.
Durand, V M; Crimmins, D B
The psychotic speech of autistic and other developmentally disabled children can be defined as words or phrases that are intelligible, but appear out of context. In the present investigation we conducted an analysis of the psychotic speech of a 9-year-old autistic boy. Three experiments were constructed to determine the functional significance of this child's psychotic speech and a method of intervention. The first study involved an analysis of the role of adult attention and task demands in the maintenance of psychotic speech. When task demands were increased, the frequency of psychotic speech increased. Varying adult attention had no effect on psychotic speech. We then performed a second analysis in which the consequence for psychotic speech was a 10-second time-out. Psychotic speech increased, suggesting that it may have been maintained through escape from task demands. Finally, the third experiment involved teaching an appropriate escape response ("Help me"). Psychotic speech was greatly reduced by this intervention. Thus, teaching an appropriate equivalent phrase proved to be a viable alternative to interventions using aversive consequences. The present study represents the first observation that psychotic speech may serve to remove children from unpleasant situations and also introduces a nonaversive intervention for this behavior.
Van Lancker Sidtis, Diana; Cameron, Krista; Sidtis, John J.
In motor speech disorders, dysarthric features impacting intelligibility, articulation, fluency, and voice emerge more saliently in conversation than in repetition, reading, or singing. A role of the basal ganglia in these task discrepancies has been identified. Further, more recent studies of naturalistic speech in basal ganglia dysfunction have revealed that formulaic language is more impaired than novel language. This descriptive study extends these observations to a case of severely dysfluent dysarthria due to a parkinsonian syndrome. Dysfluencies were quantified and compared for conversation, two forms of repetition, reading, recited speech, and singing. Other measures examined phonetic inventories, word forms, and formulaic language. Phonetic, syllabic, and lexical dysfluencies were more abundant in conversation than in other task conditions. Formulaic expressions in conversation were reduced compared to normal speakers. A proposed explanation supports the notion that the basal ganglia contribute to formulation of internal models for execution of speech. PMID:22774929
Danish as a second language textbooks published over the last 15 years have presented the Danish cultural identity as a homogenous and purely national phenomenon. Research into teaching theory, on the other hand, has been more broad-minded, and is based on interactivity. The aim of this paper is to explain this divergence. (Contains 2 notes.)
Information Technology Quarterly, 1985
This issue of "Information Technology Quarterly" is devoted to the theme of "Artificial Intelligence." It contains two major articles: (1) Artificial Intelligence and Law" (D. Peter O'Neill and George D. Wood); (2) "Artificial Intelligence: A Long and Winding Road" (John J. Simon, Jr.). In addition, it contains two sidebars: (1) "Calculating and…
Purpose: Seeks to explore the notion of organisational intelligence as a simple extension of the notion of the idea of collective intelligence. Design/methodology/approach: Discusses organisational intelligence using previous research, which includes the Purpose, Properties and Practice model of Dealtry, and the Viable Systems model. Findings: The…
Thornburg, David D.
Overview of the artificial intelligence (AI) field provides a definition; discusses past research and areas of future research; describes the design, functions, and capabilities of expert systems and the "Turing Test" for machine intelligence; and lists additional sources for information on artificial intelligence. Languages of AI are…
Bergeron, Pierrette; Hiller, Christine A.
Reviews the evolution of competitive intelligence since 1994, including terminology and definitions and analytical techniques. Addresses the issue of ethics; explores how information technology supports the competitive intelligence process; and discusses education and training opportunities for competitive intelligence, including core competencies…
Vorstman, Jacob AS; Kon, Moshe; Mink van der Molen, Aebele B
Background Speech problems are a common clinical feature of the 22q11.2 deletion syndrome. The objectives of this study were to inventory the speech history and current self-reported speech rating of adolescents and young adults, and examine the possible variables influencing the current speech ratings, including cleft palate, surgery, speech and language therapy, intelligence quotient, and age at assessment. Methods In this cross-sectional cohort study, 50 adolescents and young adults with the 22q11.2 deletion syndrome (ages, 12-26 years, 67% female) filled out questionnaires. A neuropsychologist administered an age-appropriate intelligence quotient test. The demographics, histories, and intelligence of patients with normal speech (speech rating=1) were compared to those of patients with different speech (speech rating>1). Results Of the 50 patients, a minority (26%) had a cleft palate, nearly half (46%) underwent a pharyngoplasty, and all (100%) had speech and language therapy. Poorer speech ratings were correlated with more years of speech and language therapy (Spearman's correlation= 0.418, P=0.004; 95% confidence interval, 0.145-0.632). Only 34% had normal speech ratings. The groups with normal and different speech were not significantly different with respect to the demographic variables; a history of cleft palate, surgery, or speech and language therapy; and the intelligence quotient. Conclusions All adolescents and young adults with the 22q11.2 deletion syndrome had undergone speech and language therapy, and nearly half of them underwent pharyngoplasty. Only 34% attained normal speech ratings. Those with poorer speech ratings had speech and language therapy for more years. PMID:25276637
Background Treacher Collins syndrome (TCS, OMIM 154500) is a rare congenital disorder of craniofacial development. Characteristic hypoplastic malformations of the ears, zygomatic arch, mandible and pharynx have been described in detail. However, reports on the impact of these malformations on speech are few. Exploring speech features and investigating if speech function is related to phenotypic severity are essential for optimizing follow-up and treatment. Methods Articulation, nasal resonance, voice and intelligibility were examined in 19 individuals (5–74 years, median 34 years) divided into three groups comprising children 5–10 years (n = 4), adolescents 11–18 years (n = 4) and adults 29 years and older (n = 11). A speech composite score (0–6) was calculated to reflect the variability of speech deviations. TCS severity scores of phenotypic expression and total scores of Nordic Orofacial Test-Screening (NOT-S) measuring orofacial dysfunction were used in analyses of correlation with speech characteristics (speech composite scores). Results Children and adolescents presented with significantly higher speech composite scores (median 4, range 1–6) than adults (median 1, range 0–5). Nearly all children and adolescents (6/8) displayed speech deviations of articulation, nasal resonance and voice, while only three adults were identified with multiple speech aberrations. The variability of speech dysfunction in TCS was exhibited by individual combinations of speech deviations in 13/19 participants. The speech composite scores correlated with TCS severity scores and NOT-S total scores. Speech composite scores higher than 4 were associated with cleft palate. The percent of intelligible words in connected speech was significantly lower in children and adolescents (median 77%, range 31–99) than in adults (98%, range 93–100). Intelligibility of speech among the children was markedly inconsistent and clearly affecting the understandability
Budiharto, Widodo; Santoso Gunawan, Alexander Agung
Nowadays, there are many developments in building intelligent humanoid robot, mainly in order to handle voice and image. In this research, we propose blind speech separation system using FastICA for audio filtering and separation that can be used in education or entertainment. Our main problem is to separate the multi speech sources and also to filter irrelevant noises. After speech separation step, the results will be integrated with our previous speech and face recognition system which is based on Bioloid GP robot and Raspberry Pi 2 as controller. The experimental results show the accuracy of our blind speech separation system is about 88% in command and query recognition cases.
A common opinion is that progress in speech synthesis should be easier to discern than in other areas of speech communication: you just have to listen to the speech! Unfortunately, things are more complicated. It can be said, however, that early speech synthesis efforts were primarily concerned with providing intelligible speech, while, more recently, ``naturalness'' has been the focus. The field had its ``electronic'' roots in Homer Dudley's 1939 ``Voder,'' and it advanced in the 1950s and 1960s through progress in a number of labs including JSRU in England, Haskins Labs in the U.S., and Fant's Lab in Sweden. In the 1970s and 1980s significant progress came from efforts at Bell Labs (under Jim Flanagan's leadership) and at MIT (where Dennis Klatt created one of the first commercially viable systems). Finally, over the past 15 years, the methods of unit-selection synthesis were devised, primarily at ATR in Japan, and were advanced by work at AT&T Labs, Univ. of Edinburgh, and ATR. Today, TTS systems are able to ``convince some of the listeners some of the time'' that synthetic speech is as natural as live recordings. Ongoing efforts aim at replacing ``some'' with ``most'' for a wide range of real-world applications.
Begault, D. R.; Erbe, T.; Wenzel, E. M. (Principal Investigator)
A spatial auditory display for multiple speech communications was developed at NASA/Ames Research Center. Input is spatialized by the use of simplified head-related transfer functions, adapted for FIR filtering on Motorola 56001 digital signal processors. Hardware and firmware design implementations are overviewed for the initial prototype developed for NASA-Kennedy Space Center. An adaptive staircase method was used to determine intelligibility levels of four-letter call signs used by launch personnel at NASA against diotic speech babble. Spatial positions at 30 degrees azimuth increments were evaluated. The results from eight subjects showed a maximum intelligibility improvement of about 6-7 dB when the signal was spatialized to 60 or 90 degrees azimuth positions.
... COMMISSION 47 CFR Part 64 Speech-to-Speech and Internet Protocol (IP) Speech-to-Speech Telecommunications Relay Services; Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities AGENCY: Federal Communications Commission. ACTION: Proposed rule....
Pope, Diana S; Miller-Klein, Erik T
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s' standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered.
Pope, Diana S.; Miller-Klein, Erik T.
Hospitals have complex soundscapes that create challenges to patient care. Extraneous noise and high reverberation rates impair speech intelligibility, which leads to raised voices. In an unintended spiral, the increasing noise may result in diminished speech privacy, as people speak loudly to be heard over the din. The products available to improve hospital soundscapes include construction materials that absorb sound (acoustic ceiling tiles, carpet, wall insulation) and reduce reverberation rates. Enhanced privacy curtains are now available and offer potential for a relatively simple way to improve speech privacy and speech intelligibility by absorbing sound at the hospital patient's bedside. Acoustic assessments were performed over 2 days on two nursing units with a similar design in the same hospital. One unit was built with the 1970s’ standard hospital construction and the other was newly refurbished (2013) with sound-absorbing features. In addition, we determined the effect of an enhanced privacy curtain versus standard privacy curtains using acoustic measures of speech privacy and speech intelligibility indexes. Privacy curtains provided auditory protection for the patients. In general, that protection was increased by the use of enhanced privacy curtains. On an average, the enhanced curtain improved sound absorption from 20% to 30%; however, there was considerable variability, depending on the configuration of the rooms tested. Enhanced privacy curtains provide measureable improvement to the acoustics of patient rooms but cannot overcome larger acoustic design issues. To shorten reverberation time, additional absorption, and compact and more fragmented nursing unit floor plate shapes should be considered. PMID:26780959
Shyammohan, A; Sreenivasulu, D
Rehabilitation of speech is tantamount to closure of defect in cases with velopharyngeal insufficiency. Often the importance of speech therapy is sidelined during the fabrication of obturators. Usually the speech part is taken up only at a later stage and is relegated entirely to a speech therapist without the active involvement of the prosthodontist. The article suggests a protocol for speech therapy in such cases to be done in unison with a prosthodontist.
Bunton, Kate; Leddy, Mark; Miller, Jon
The purpose of the study was to document speech intelligibility deficits for a group of five adult males with Down syndrome, and use listener based error profiles to identify phonetic dimensions underlying reduced intelligibility. Phonetic error profiles were constructed for each speaker using the Kent, Weismer, Kent, and Rosenbek (1989) word intelligibility test. The test was designed to allow for identification of reasons for the intelligibility deficit, quantitative analyses at varied levels, and sensitivity to potential speech deficits across populations. Listener generated profiles were calculated based on a multiple-choice task and a transcription task. The most disrupted phonetic features, across listening task, involved simplification of clusters in both the word initial and word final position, and contrasts involving tongue-posture, control, and timing (e.g., high-low vowel, front-back vowel, and place of articulation for stops and fricatives). Differences between speakers in the ranking of these phonetic features was found, however, the mean error proportion for the six most severely affected features correlated highly with the overall intelligibility score (0.88 based on multiple-choice task, .94 for the transcription task). The phonetic feature analyses are an index that may help clarify the suspected motor speech basis for the speech intelligibility deficits seen in adults with Down syndrome and may lead to improved speech management in these individuals. PMID:17692179
Whiteside, Sandra P; Dyson, Lucy; Cowell, Patricia E; Varley, Rosemary A
Acquired apraxia of speech (AOS) is a motor speech disorder that affects the implementation of articulatory gestures and the fluency and intelligibility of speech. Oral apraxia (OA) is an impairment of nonspeech volitional movement. Although many speakers with AOS also display difficulties with volitional nonspeech oral movements, the relationship between the 2 conditions is unclear. This study explored the relationship between speech and volitional nonspeech oral movement impairment in a sample of 50 participants with AOS. We examined levels of association and dissociation between speech and OA using a battery of nonspeech oromotor, speech, and auditory/aphasia tasks. There was evidence of a moderate positive association between the 2 impairments across participants. However, individual profiles revealed patterns of dissociation between the 2 in a few cases, with evidence of double dissociation of speech and oral apraxic impairment. We discuss the implications of these relationships for models of oral motor and speech control.
van Wijngaarden, Sander J.; Houtgast, Tammo
The Speech Transmission Index (STI) is routinely applied for predicting the intelligibility of messages (sentences) in noise and reverberation. Despite clear evidence that the STI is capable of doing so accurately, recent results indicate that the STI sometimes underestimates the effect of reverberation on sentence intelligibility. To investigate the influence of talker and speaking style, the Speech Reception Threshold in noise and reverberation was measured for three talkers, differing in clarity of articulation and speaking style. For very clear speech, the standard STI yields accurate results. For more conversational speech by an untrained talker, the effect of reverberation is underestimated. Measurements of the envelope spectrum reveal that conversational speech has relatively stronger contributions by higher (> 12.5 Hz) modulation frequencies. By modifying the STI calculation procedure to include modulations in the range 12.5-31.5 Hz, better results are obtained for conversational speech. A speaking-style-dependent choice for the STI modulation frequency range is proposed.
Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias
The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
Lear, Emmaline L.
This study explores the effectiveness of guided reflective journals to improve intelligibility in a Japanese higher educational context. Based on qualitative and quantitative methods, the paper evaluates changes in speech over the duration of one semester. In particular, this study focuses on changes in prosodic features such as stress, intonation…
Waisbren, Susan E.; And Others
Intelligence and speech-language development of eight children (3.6 to 11.6 years old) with classic galactosemia were assessed by standardized tests. Each of the children had delays of early speech difficulties, and all but one had language disorders in at least one area. Available from: Journal of Pediatrics, C.V. Mosby Co., 11830 Westline…
Schwartz, Jean-Luc; Berthommier, Frederic; Savariaux, Christophe
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances "sensitivity" to acoustic information,…
Dubois, Cyril; Otzenberger, Helene; Gounot, Daniel; Sock, Rudolph; Metz-Lutz, Marie-Noelle
In a noisy environment, visual perception of articulatory movements improves natural speech intelligibility. Parallel to phonemic processing based on auditory signal, visemic processing constitutes a counterpart based on "visemes", the distinctive visual units of speech. Aiming at investigating the neural substrates of visemic processing in a…
Cleland, Joanne; Wood, Sara; Hardcastle, William; Wishart, Jennifer; Timmins, Claire
Background: Children and young people with Down's syndrome present with deficits in expressive speech and language, accompanied by strengths in vocabulary comprehension compared with non-verbal mental age. Intelligibility is particularly low, but whether speech is delayed or disordered is a controversial topic. Most studies suggest a delay, but no…
Fercho, Kelene; Baugh, Lee A.; Hanson, Elizabeth K.
Purpose: The purpose of this article was to examine the neural mechanisms associated with increases in speech intelligibility brought about through alphabet supplementation. Method: Neurotypical participants listened to dysarthric speech while watching an accompanying video of a hand pointing to the 1st letter spoken of each word on an alphabet…
Goberman, A.M.; Elmer, L.W.
A number of studies have been devoted to the examination of clear versus conversational speech in non-impaired speakers. The purpose of these previous studies has been primarily to help increase speech intelligibility for the benefit of hearing-impaired listeners. The goal of the present study was to examine differences between conversational and…
Farrell, Anna; Theodoros, Deborah; Ward, Elizabeth; Hall, Bruce; Silburn, Peter
The present study examined the effects of neurosurgical management of Parkinson's disease (PD), including the procedures of pallidotomy, thalamotomy, and deep-brain stimulation (DBS) on perceptual speech characteristics, speech intelligibility, and oromotor function in a group of 22 participants with PD. The surgical participant group was compared…
Wang, Yuxuan; Narayanan, Arun; Wang, DeLiang
Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation. PMID:25599083
Bocquelet, Florent; Hueber, Thomas; Girin, Laurent; Savariaux, Christophe; Yvert, Blaise
Restoring natural speech in paralyzed and aphasic people could be achieved using a Brain-Computer Interface (BCI) controlling a speech synthesizer in real-time. To reach this goal, a prerequisite is to develop a speech synthesizer producing intelligible speech in real-time with a reasonable number of control parameters. We present here an articulatory-based speech synthesizer that can be controlled in real-time for future BCI applications. This synthesizer converts movements of the main speech articulators (tongue, jaw, velum, and lips) into intelligible speech. The articulatory-to-acoustic mapping is performed using a deep neural network (DNN) trained on electromagnetic articulography (EMA) data recorded on a reference speaker synchronously with the produced speech signal. This DNN is then used in both offline and online modes to map the position of sensors glued on different speech articulators into acoustic parameters that are further converted into an audio signal using a vocoder. In offline mode, highly intelligible speech could be obtained as assessed by perceptual evaluation performed by 12 listeners. Then, to anticipate future BCI applications, we further assessed the real-time control of the synthesizer by both the reference speaker and new speakers, in a closed-loop paradigm using EMA data recorded in real time. A short calibration period was used to compensate for differences in sensor positions and articulatory differences between new speakers and the reference speaker. We found that real-time synthesis of vowels and consonants was possible with good intelligibility. In conclusion, these results open to future speech BCI applications using such articulatory-based speech synthesizer. PMID:27880768
Cuenca, M H; Barrio, M M
Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.
Kabra, Shikha; Agarwal, Ritika
The goal of Text-to-Speech (TTS) synthesis in a particular language is to convert arbitrary input text to intelligible and natural sounding speech. However, for a particular language like Hindi, which is a highly confusing language (due to very close spellings), it is not an easy task to identify errors/mistakes in input text and an incorrect text degrade the quality of output speech hence this paper is a contribution to the development of high quality speech synthesis with the involvement of Spellchecker which generates spell suggestions for misspelled words automatically. Involvement of spellchecker would increase the efficiency of speech synthesis by providing spell suggestions for incorrect input text. Furthermore, we have provided the comparative study for evaluating the resultant effect on to phonetic text by adding spellchecker on to input text.
Caballero Morales, Santiago Omar; Cox, Stephen J.
Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
particular. O’Malley and Caisse (1987) point out the original MRT was never intended to be a measure of human speakers’ ability to produce intelligible...O’Malley and Caisse , 1987). Also, no study has systematically investigated the possible interaction of voice type and speech rate on intelligibility. This...the university community were provided monetary compensation for their par- ticipation. Average age was 19.9 years with a range from 18 to 27
Intelligent behavior is a complex adaptive phenomenon that has evolved to enable organisms to deal with variable environmental circumstances. Maximizing fitness requires skill in foraging for necessary resources (food) in competitive circumstances and is probably the activity in which intelligent behavior is most easily seen. Biologists suggest that intelligence encompasses the characteristics of detailed sensory perception, information processing, learning, memory, choice, optimisation of resource sequestration with minimal outlay, self-recognition, and foresight by predictive modeling. All these properties are concerned with a capacity for problem solving in recurrent and novel situations. Here I review the evidence that individual plant species exhibit all of these intelligent behavioral capabilities but do so through phenotypic plasticity, not movement. Furthermore it is in the competitive foraging for resources that most of these intelligent attributes have been detected. Plants should therefore be regarded as prototypical intelligent organisms, a concept that has considerable consequences for investigations of whole plant communication, computation and signal transduction.
Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David
How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10–40 Hz modulation frequency) and syllable-sized (2–10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations—corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively—were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility—a view compatible with recent insights from neuroscience implicating multi-timescale auditory
PATTERN RECOGNITION, * ARTIFICIAL INTELLIGENCE , *TEXTBOOKS, COMPUTER PROGRAMMING, MATHEMATICAL LOGIC, ROBOTS, PROBLEM SOLVING, STATISTICAL ANALYSIS, GAME THEORY, NATURAL LANGUAGE, SELF ORGANIZING SYSTEMS.
Özcan, Cengiz; Juel, Knud; Flensted Lassen, Jens; von Kappelgaard, Lene Mia; Mortensen, Poul Erik; Gislason, Gunnar
Aim The Danish Heart Registry (DHR) seeks to monitor nationwide activity and quality of invasive diagnostic and treatment strategies in patients with ischemic heart disease as well as valvular heart disease and to provide data for research. Study population All adult (≥15 years) patients undergoing coronary angiography (CAG), percutaneous coronary intervention (PCI), coronary artery bypass grafting, and heart valve surgery performed across all Danish hospitals were included. Main variables The DHR contains a subset of the data stored in the Eastern and Western Denmark Heart Registries (EDHR and WDHR). For each type of procedure, up to 70 variables are registered in the DHR. Since 2010, the data quality protocol encompasses fulfillment of web-based validation rules of daily-submitted records and yearly approval of the data by the EDHR and WDHR. Descriptive data The data collection on procedure has been complete for PCI and surgery since 2000, and for CAG as of 2006. From 2000 to 2014, the number of CAG, PCI, and surgical procedures changed by 231%, 193%, and 99%, respectively. Until the end of 2014, a total of 357,476 CAG, 131,309 PCI, and 60,831 surgical procedures had been performed, corresponding to 249,445, 100,609, and 55,539 first-time patients, respectively. The DHR generally has a high level of completeness (1–missing) of each procedure (>90%) when compared to the National Patient Registry. Variables important for assessing the quality of care have a high level of completeness for surgery since 2000, and for CAG and PCI since 2010. Conclusion The DHR contains valuable data on cardiac invasive procedures, which makes it an important national monitoring and quality system and at the same time serves as a platform for research projects in the cardiovascular field. PMID:27822091
Jørgensen, Peter Holmberg; Lausten, Gunnar Schwarz; Pedersen, Alma B
Aim The aim of the database is to gather information about sarcomas treated in Denmark in order to continuously monitor and improve the quality of sarcoma treatment in a local, a national, and an international perspective. Study population Patients in Denmark diagnosed with a sarcoma, both skeletal and ekstraskeletal, are to be registered since 2009. Main variables The database contains information about appearance of symptoms; date of receiving referral to a sarcoma center; date of first visit; whether surgery has been performed elsewhere before referral, diagnosis, and treatment; tumor characteristics such as location, size, malignancy grade, and growth pattern; details on treatment (kind of surgery, amount of radiation therapy, type and duration of chemotherapy); complications of treatment; local recurrence and metastases; and comorbidity. In addition, several quality indicators are registered in order to measure the quality of care provided by the hospitals and make comparisons between hospitals and with international standards. Descriptive data Demographic patient-specific data such as age, sex, region of living, comorbidity, World Health Organization’s International Classification of Diseases – tenth edition codes and TNM Classification of Malignant Tumours, and date of death (after yearly coupling to the Danish Civil Registration System). Data quality and completeness are currently secured. Conclusion The Danish Sarcoma Database is population based and includes sarcomas occurring in Denmark since 2009. It is a valuable tool for monitoring sarcoma incidence and quality of treatment and its improvement, postoperative complications, and recurrence within 5 years follow-up. The database is also a valuable research tool to study the impact of technical and medical interventions on prognosis of sarcoma patients. PMID:27822116
To achieve robustness and efficiency for voice communication in noise, the noise suppression and bandwidth compression processes are combined to form a joint process using input from an array of microphones. An adaptive beamforming technique with a set of robust linear constraints and a single quadratic inequality constraint is used to preserve desired signal and to cancel directional plus ambient noise in a small room environment. This robustly constrained array processor is found to be effective in limiting signal cancelation over a wide range of input SNRs (-10 dB to +10 dB). The resulting intelligibility gains (8-10 dB) provide significant improvement to subsequent CELP coding. In addition, the desired speech activity is detected by estimating Target-to-Jammer Ratios (TJR) using subband correlations between different microphone inputs or using signals within the Generalized Sidelobe Canceler directly. These two novel techniques of speech activity detection for coding are studied thoroughly in this dissertation. Each is subsequently incorporated with the adaptive array and a 4.8 kbps CELP coder to form a Variable Bit Kate (VBR) coder with noise canceling and Spatial Voice Activity Detection (SVAD) capabilities. This joint noise suppression and bandwidth compression system demonstrates large improvements in desired speech quality after coding, accurate desired speech activity detection in various types of interference, and a reduction in the information bits required to code the speech.
... is…Robbie, Pearl, and Mario. Back to top Definition There are many kinds of speech and language ... education available to school-aged children with disabilities. Definition of “Speech or Language Impairment” under IDEA The ...
Haderlein, Tino; Döllinger, Michael; Matoušek, Václav; Nöth, Elmar
Automatic voice assessment is often performed using sustained vowels. In contrast, speech analysis of read-out texts can be applied to voice and speech assessment. Automatic speech recognition and prosodic analysis were used to find regression formulae between automatic and perceptual assessment of four voice and four speech criteria. The regression was trained with 21 men and 62 women (average age 49.2 years) and tested with another set of 24 men and 49 women (48.3 years), all suffering from chronic hoarseness. They read the text 'Der Nordwind und die Sonne' ('The North Wind and the Sun'). Five voice and speech therapists evaluated the data on 5-point Likert scales. Ten prosodic and recognition accuracy measures (features) were identified which describe all the examined criteria. Inter-rater correlation within the expert group was between r = 0.63 for the criterion 'match of breath and sense units' and r = 0.87 for the overall voice quality. Human-machine correlation was between r = 0.40 for the match of breath and sense units and r = 0.82 for intelligibility. The perceptual ratings of different criteria were highly correlated with each other. Likewise, the feature sets modeling the criteria were very similar. The automatic method is suitable for assessing chronic hoarseness in general and for subgroups of functional and organic dysphonia. In its current version, it is almost as reliable as a randomly picked rater from a group of voice and speech therapists.
Krause, Jean C.; Braida, Louis D.
Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.
Obleser, Jonas; Kotz, Sonja A
In speech comprehension, the processing of auditory information and linguistic context are mutually dependent. This functional magnetic resonance imaging study examines how semantic expectancy ("cloze probability") in variably intelligible sentences ("noise vocoding") modulates the brain bases of comprehension. First, intelligibility-modulated activation along the superior temporal sulci (STS) was extended anteriorly and posteriorly in low-cloze sentences (e.g., "she weighs the flour") but restricted to a mid-superior temporal gyrus/STS area in more predictable high-cloze sentences (e.g., "she sifts the flour"). Second, the degree of left inferior frontal gyrus (IFG) (Brodmann's area 44) involvement in processing low-cloze constructions was proportional to increasing intelligibility. Left inferior parietal cortex (IPC; angular gyrus) activation accompanied successful speech comprehension that derived either from increased signal quality or from semantic facilitation. The results show that successful decoding of speech in auditory cortex areas regulates language-specific computation (left IFG and IPC). In return, semantic expectancy can constrain these speech-decoding processes, with fewer neural resources being allocated to highly predictable sentences. These findings offer an important contribution toward the understanding of the functional neuroanatomy in speech comprehension.
Sohn, Junil; Kim, Dongwook; Ku, Yunseo; Lee, Kyungwon; Lee, Junghak
In this study, we proposed new self assessment of hearing loss in mobile phones and realized a function of compensation for hearing impaired person. The results of experiments on mobile phone showed that the proposed hearing test is sufficient to check hearing loss and the compensation based on the result of the proposed hearing test can improve speech intelligibility of hearing impaired persons.
Peters, B. F.; And Others
Describes the development, implementation, and evaluation of a voice interface for the British Library Blaise Online Information Retrieval System. Results of the evaluation show that the use of currently available speech recognition and synthesis hardware, along with intelligent software, can provide an interface well suited to the needs of online…
Howard Gardner's theory of Multiple Intelligences has had a huge influence on school education. But its credentials lack justification, as the first section of this paper shows via a detailed philosophical analysis of how the intelligences are identified. If we want to make sense of the theory, we need to turn from a philosophical to a historical…
The perceptual boundaries between speech sounds are malleable and can shift after repeated exposure to contextual information. This shift is known as recalibration. To date, the known inducers of recalibration are lexical (including phonotactic) information, lip-read information and reading. The experiments reported here are a proof-of-effect demonstration that speech imagery can also induce recalibration.
Phifer, Gregg, Ed.
The 17 articles in this collection deal with theoretical and practical freedom of speech issues. The topics include: freedom of speech in Marquette Park, Illinois; Nazis in Skokie, Illinois; freedom of expression in the Confederate States of America; Robert M. LaFollette's arguments for free speech and the rights of Congress; the United States…
Berliss-Vincent, Jane; Whitford, Gigi
This article presents both the factors involved in successful speech input use and the potential barriers that may suggest that other access technologies could be more appropriate for a given individual. Speech input options that are available are reviewed and strategies for optimizing use of speech recognition technology are discussed. (Contains…
Tedford, Thomas L., Ed.
This book is a collection of syllabi, attitude surveys, and essays relating to free-speech issues, compiled by the Committee on Freedom of Seech of the Speech Communication Association. The collection begins with a rationale for the inclusion of a course on free speech in the college curriculum. Three syllabi with bibliographies present guides for…
Conrad, B; Schönle, P
This investigation deals with the temporal aspects of air volume changes during speech. Speech respiration differs fundamentally from resting respiration. In resting respiration the duration and velocity of inspiration (air flow or lung volume change) are in a range similar to that of expiration. In speech respiration the duration of inspiration decreases and its velocity increases; conversely, the duration of expiration increases and the volume of air flow decreases dramatically. The following questions arise: are these two respiration types different entities, or do they represent the end points of a continuum from resting to speech respiration? How does articulation without the generation of speech sound affect breathing? Does (verbalized?) thinking without articulation or speech modify the breathing pattern? The main test battery included four tasks (spontaneous speech, reading, serial speech, arithmetic) performed under three conditions (speaking aloud, articulating subvocally, quiet performance by tryping to exclusively 'think' the tasks). Respiratory movements were measured with a chest pneumograph and evaluated in comparison with a phonogram and the identified spoken text. For quiet performance the resulting respiratory time ratio (relation of duration of inspiration versus expiration) showed a gradual shift in the direction of speech respiration--the least for reading, the most for arithmetic. This change was even more apparent for the subvocal tasks. It is concluded that (a) there is a gradual automatic change from resting to speech respiration and (b) the degree of internal verbalization (activation of motor speech areas) defines the degree of activation of the speech respiratory pattern.
Konst, Emmy M.; Weersink-Braks, Hanny; Rietveld, Toni; Peters, Herman
The influence of presurgical infant orthopedic treatment (PIO) on speech intelligibility was evaluated with 10 toddlers who used PIO during the first year of life and 10 who did not. Treated children were rated as exhibiting greater intelligibility, however, transcription data indicated there were not group differences in actual intelligibility.…
Oberteuffer, J A
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines.
Oberteuffer, J A
Speech interface technology, which includes automatic speech recognition, synthetic speech, and natural language processing, is beginning to have a significant impact on business and personal computer use. Today, powerful and inexpensive microprocessors and improved algorithms are driving commercial applications in computer command, consumer, data entry, speech-to-text, telephone, and voice verification. Robust speaker-independent recognition systems for command and navigation in personal computers are now available; telephone-based transaction and database inquiry systems using both speech synthesis and recognition are coming into use. Large-vocabulary speech interface systems for document creation and read-aloud proofing are expanding beyond niche markets. Today's applications represent a small preview of a rich future for speech interface technology that will eventually replace keyboards with microphones and loud-speakers to give easy accessibility to increasingly intelligent machines. PMID:7479717
Schwartz, Jean-Luc; Berthommier, Frédéric; Savariaux, Christophe
Lip reading is the ability to partially understand speech by looking at the speaker's lips. It improves the intelligibility of speech in noise when audio-visual perception is compared with audio-only perception. A recent set of experiments showed that seeing the speaker's lips also enhances sensitivity to acoustic information, decreasing the auditory detection threshold of speech embedded in noise [J. Acoust. Soc. Am. 109 (2001) 2272; J. Acoust. Soc. Am. 108 (2000) 1197]. However, detection is different from comprehension, and it remains to be seen whether improved sensitivity also results in an intelligibility gain in audio-visual speech perception. In this work, we use an original paradigm to show that seeing the speaker's lips enables the listener to hear better and hence to understand better. The audio-visual stimuli used here could not be differentiated by lip reading per se since they contained exactly the same lip gesture matched with different compatible speech sounds. Nevertheless, the noise-masked stimuli were more intelligible in the audio-visual condition than in the audio-only condition due to the contribution of visual information to the extraction of acoustic cues. Replacing the lip gesture by a non-speech visual input with exactly the same time course, providing the same temporal cues for extraction, removed the intelligibility benefit. This early contribution to audio-visual speech identification is discussed in relationships with recent neurophysiological data on audio-visual perception.
Oommen, Elizabeth R; McCarthy, John W
In childhood apraxia of speech (CAS), children exhibit varying levels of speech intelligibility depending on the nature of errors in articulation and prosody. Augmentative and alternative communication (AAC) strategies are beneficial, and commonly adopted with children with CAS. This study focused on the decision-making process and strategies adopted by speech-language pathologists (SLPs) when simultaneously implementing interventions that focused on natural speech and AAC. Eight SLPs, with significant clinical experience in CAS and AAC interventions, participated in an online focus group. Thematic analysis revealed eight themes: key decision-making factors; treatment history and rationale; benefits; challenges; therapy strategies and activities; collaboration with team members; recommendations; and other comments. Results are discussed along with clinical implications and directions for future research.
Bishop, Christopher W.; Miller, Lee M.
In noisy environments, listeners tend to hear a speaker’s voice yet struggle to understand what is said. The most effective way to improve intelligibility in such conditions is to watch the speaker’s mouth movements. Here we identify the neural networks that distinguish understanding from merely hearing speech, and determine how the brain applies visual information to improve intelligibility. Using functional magnetic resonance imaging, we show that understanding speech-in-noise is supported by a network of brain areas including the left superior parietal lobule, the motor/premotor cortex, and the left anterior superior temporal sulcus (STS), a likely apex of the acoustic processing hierarchy. Multisensory integration likely improves comprehension through improved communication between the left temporal–occipital boundary, the left medial-temporal lobe, and the left STS. This demonstrates how the brain uses information from multiple modalities to improve speech comprehension in naturalistic, acoustically adverse conditions. PMID:18823249
Hilkhuysen, Gaston; Gaubitch, Nikolay; Brookes, Mike; Huckvale, Mark
Using the data presented in the accompanying paper [Hilkhuysen et al., J. Acoust. Soc. Am. 131, 531-539 (2012)], the ability of six metrics to predict intelligibility of speech in noise before and after noise suppression was studied. The metrics considered were the Speech Intelligibility Index (SII), the fractional Articulation Index (fAI), the coherence intelligibility index based on the mid-levels in speech (CSIImid), an extension of the Normalized Coherence Metric (NCM+), a part of the speech-based envelope power model (pre-sEPSM), and the Short Term Objective Intelligibility measure (STOI). Three of the measures, SII, CSIImid, and NCM+, overpredicted intelligibility after noise reduction, whereas fAI underpredicted these intelligibilities. The pre-sEPSM metric worked well for speech in babble but failed with car noise. STOI gave the best predictions, but overall the size of intelligibility prediction errors were greater than the change in intelligibility caused by noise suppression. Suggestions for improvements of the metrics are discussed.
Nuttall, Helen E; Kennedy-Higgins, Daniel; Devlin, Joseph T; Adank, Patti
Excitability of articulatory motor cortex is facilitated when listening to speech in challenging conditions. Beyond this, however, we have little knowledge of what listener-specific and speech-specific factors engage articulatory facilitation during speech perception. For example, it is unknown whether speech motor activity is independent or dependent on the form of distortion in the speech signal. It is also unknown if speech motor facilitation is moderated by hearing ability. We investigated these questions in two experiments. We applied transcranial magnetic stimulation (TMS) to the lip area of primary motor cortex (M1) in young, normally hearing participants to test if lip M1 is sensitive to the quality (Experiment 1) or quantity (Experiment 2) of distortion in the speech signal, and if lip M1 facilitation relates to the hearing ability of the listener. Experiment 1 found that lip motor evoked potentials (MEPs) were larger during perception of motor-distorted speech that had been produced using a tongue depressor, and during perception of speech presented in background noise, relative to natural speech in quiet. Experiment 2 did not find evidence of motor system facilitation when speech was presented in noise at signal-to-noise ratios where speech intelligibility was at 50% or 75%, which were significantly less severe noise levels than used in Experiment 1. However, there was a significant interaction between noise condition and hearing ability, which indicated that when speech stimuli were correctly classified at 50%, speech motor facilitation was observed in individuals with better hearing, whereas individuals with relatively worse but still normal hearing showed more activation during perception of clear speech. These findings indicate that the motor system may be sensitive to the quantity, but not quality, of degradation in the speech signal. Data support the notion that motor cortex complements auditory cortex during speech perception, and point to a role
This was an instrumentation grant to purchase equipment of support of research in neural networks, information science , artificial intelligence, and applied mathematics. Computer lab equipment, motor control and robotics lab equipment, speech analysis equipment and computational vision equipment were purchased.
Ho, Cheng-Yu; Li, Pei-Chun; Chiang, Yuan-Chuan; Young, Shuenn-Tsong; Chu, Woei-Chyn
Binaural hearing involves using information relating to the differences between the signals that arrive at the two ears, and it can make it easier to detect and recognize signals in a noisy environment. This phenomenon of binaural hearing is quantified in laboratory studies as the binaural masking-level difference (BMLD). Mandarin is one of the most commonly used languages, but there are no publication values of BMLD or BILD based on Mandarin tones. Therefore, this study investigated the BMLD and BILD of Mandarin tones. The BMLDs of Mandarin tone detection were measured based on the detection threshold differences for the four tones of the voiced vowels /i/ (i.e., /i1/, /i2/, /i3/, and /i4/) and /u/ (i.e., /u1/, /u2/, /u3/, and /u4/) in the presence of speech-spectrum noise when presented interaurally in phase (S0N0) and interaurally in antiphase (SπN0). The BILDs of Mandarin tone recognition in speech-spectrum noise were determined as the differences in the target-to-masker ratio (TMR) required for 50% correct tone recognitions between the S0N0 and SπN0 conditions. The detection thresholds for the four tones of /i/ and /u/ differed significantly (p<0.001) between the S0N0 and SπN0 conditions. The average detection thresholds of Mandarin tones were all lower in the SπN0 condition than in the S0N0 condition, and the BMLDs ranged from 7.3 to 11.5 dB. The TMR for 50% correct Mandarin tone recognitions differed significantly (p<0.001) between the S0N0 and SπN0 conditions, at -13.4 and -18.0 dB, respectively, with a mean BILD of 4.6 dB. The study showed that the thresholds of Mandarin tone detection and recognition in the presence of speech-spectrum noise are improved when phase inversion is applied to the target speech. The average BILDs of Mandarin tones are smaller than the average BMLDs of Mandarin tones.
Peled, Yotam; Rafaely, Boaz
Reverberation and noise have a significant effect on the intelligibility of speech in rooms. The detection of clear speech in highly reverberant and noisy enclosures is an extremely difficult task. Recently, spherical microphone arrays have been studied for processing of sound fields in three-dimensions, with applications ranging from acoustic analysis to speech enhancement. This paper presents the derivation of a model that facilitates the prediction of spherical array configurations that guarantee an acceptable level of speech intelligibility in reverberant and noisy environments. A spherical microphone array is employed to generate a spatial filter that maximizes speech intelligibility according to an objective measure that combines the effects of both reverberation and noise. The spherical array beamformer is designed to enhance the speech signal while minimizing noise power and maintaining robustness over a wide frequency range. The paper includes simulation and experimental studies with a comparison to speech transmission index based analysis to provide initial validation of the model. Examples are presented in which the minimum number of microphones in a spherical array can be determined from environment conditions such as reverberation time, noise level, and distance of the array to the speech source.
Saravanan, Gomathi; Ranganathan, Venkatesan; Gandhi, Anitha; Jaya, V
Aim: The tongue plays a major role in articulation. Speech outcome depends on the site of lesion, extent of resection, and flexibility of the remaining structures. The aim of this study is to evaluate the speech outcome measures such as sounds that are misarticulated and speech intelligibility and its connection to tumor site before and after surgery. Methodology: Totally, 24 (12 pre- and 12 post-operative patients) patients who had buccal and tongue cancer underwent speech intelligibility rating and articulation screening. Result: The results show that the speech outcome is worse in postoperative patients when compared to preoperative patients. The articulation errors produced by tongue cancer patients were more than the errors produced in buccal cancer patients. The type of reconstruction also affects the speech outcome. Conclusion: The perceptual analysis of oral cancer patients showed specific articulation issues and reduced intelligibility of speech in regards to site of lesion and type of reconstruction surgery. To reduce the speech errors, effective rehabilitation is recommended. A comprehensive speech evaluation and analysis of error patterns would help us in planning the rehabilitative measures of speech which is the most important factor in re-establishing interpersonal communication and well-being of the individual. PMID:27803574
Bitner, Rachel M.; Begault, Durand R.
Humans may be exposed to whole-body vibration in environments where clear speech communications are crucial, particularly during the launch phases of space flight and in high-performance aircraft. Prior research has shown that high levels of vibration cause a decrease in speech intelligibility. However, the effects of whole-body vibration upon speech are not well understood, and no attempt has been made to restore speech distorted by whole-body vibration. In this paper, a model for speech under whole-body vibration is proposed and a method to remove its effect is described. The method described reduces the perceptual effects of vibration, yields higher ASR accuracy scores, and may significantly improve intelligibility. Possible applications include incorporation within communication systems to improve radio-communication systems in environments such a spaceflight, aviation, or off-road vehicle operations.
Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi
In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection. PMID:26729126
Chen, Fuming; Li, Sheng; Li, Chuantao; Liu, Miao; Li, Zhao; Xue, Huijun; Jing, Xijing; Wang, Jianqi
In order to improve the speech acquisition ability of a non-contact method, a 94 GHz millimeter wave (MMW) radar sensor was employed to detect speech signals. This novel non-contact speech acquisition method was shown to have high directional sensitivity, and to be immune to strong acoustical disturbance. However, MMW radar speech is often degraded by combined sources of noise, which mainly include harmonic, electrical circuit and channel noise. In this paper, an algorithm combining empirical mode decomposition (EMD) and mutual information entropy (MIE) was proposed for enhancing the perceptibility and intelligibility of radar speech. Firstly, the radar speech signal was adaptively decomposed into oscillatory components called intrinsic mode functions (IMFs) by EMD. Secondly, MIE was used to determine the number of reconstructive components, and then an adaptive threshold was employed to remove the noise from the radar speech. The experimental results show that human speech can be effectively acquired by a 94 GHz MMW radar sensor when the detection distance is 20 m. Moreover, the noise of the radar speech is greatly suppressed and the speech sounds become more pleasant to human listeners after being enhanced by the proposed algorithm, suggesting that this novel speech acquisition and enhancement method will provide a promising alternative for various applications associated with speech detection.
Cooke, Martin; Lu, Youyi
Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Dagenais, Paul A; Stallworth, Jamequa A
The purpose of this study was to determine the influence of dialect upon the perception of dysarthric speech. Speakers and listeners were self-identifying as either Caucasian American or African American. Three speakers were Caucasian American, three were African American. Four speakers had experienced a CVA and were dysarthric. Listeners were age matched and were equally divided for gender. Readers recorded 14 word sentences from the Assessment of Intelligibility of Dysarthric Speech. Listeners provided ratings of intelligibility, comprehensibility, and acceptability. Own-race biases were found for all measures; however, significant findings were found for intelligibility and comprehensibility in that the Caucasian Americans provided significantly higher scores for Caucasian American speakers. Clinical implications are discussed.
Hansen, Ulla Darling; Gradel, Kim Oren; Larsen, Michael Due
The Danish Urogynaecological Database is established in order to ensure high quality of treatment for patients undergoing urogynecological surgery. The database contains details of all women in Denmark undergoing incontinence surgery or pelvic organ prolapse surgery amounting to ~5,200 procedures per year. The variables are collected along the course of treatment of the patient from the referral to a postoperative control. Main variables are prior obstetrical and gynecological history, symptoms, symptom-related quality of life, objective urogynecological findings, type of operation, complications if relevant, implants used if relevant, 3–6-month postoperative recording of symptoms, if any. A set of clinical quality indicators is being maintained by the steering committee for the database and is published in an annual report which also contains extensive descriptive statistics. The database has a completeness of over 90% of all urogynecological surgeries performed in Denmark. Some of the main variables have been validated using medical records as gold standard. The positive predictive value was above 90%. The data are used as a quality monitoring tool by the hospitals and in a number of scientific studies of specific urogynecological topics, broader epidemiological topics, and the use of patient reported outcome measures. PMID:27826217
Adaki, Raghavendra; Shigli, Kamal; Hormuzdi, Dinshaw M.; Gali, Sivaranjani
Treating diverse maxillofacial patients poses a challenge to the maxillofacial prosthodontist. Rehabilitation of hemimandibulectomy patients must aim at restoring mastication and other functions such as intelligible speech, swallowing, and esthetics. Prosthetic methods such as palatal ramp and mandibular guiding flange reposition the deviated mandible. Such prosthesis can also be used to restore speech in case of patients with debilitating speech following surgical resection. This clinical report gives detail of a hemimandibulectomy patient provided with an interim removable dental speech prosthesis with composite resin flange for mandibular guidance therapy. PMID:27041917
Kent, Ray D.; Vorperian, Houri K.
Purpose This review summarizes research on disorders of speech production in Down Syndrome (DS) for the purposes of informing clinical services and guiding future research. Method Review of the literature was based on searches using Medline, Google Scholar, Psychinfo, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency and intelligibility. Conclusions The following conclusions pertain to four major areas of review: (a) Voice. Although a number of studies have been reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. (b) Speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. (c) Fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10 to 45%, compared to about 1% in the general population. Research also points to significant disturbances in prosody. (d) Intelligibility. Studies consistently show marked limitations in this area but it is only recently that research goes beyond simple rating scales. PMID:23275397
Hu, Yi; Loizou, Philipos C
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.
Gevarter, W. B.
An overview of artificial intelligence (AI), its core ingredients, and its applications is presented. The knowledge representation, logic, problem solving approaches, languages, and computers pertaining to AI are examined, and the state of the art in AI is reviewed. The use of AI in expert systems, computer vision, natural language processing, speech recognition and understanding, speech synthesis, problem solving, and planning is examined. Basic AI topics, including automation, search-oriented problem solving, knowledge representation, and computational logic, are discussed.
de Sopena, Luis
Speech recognition is one of five main areas in the field of speech processing. Difficulties in speech recognition include variability in sound within and across speakers, in channel, in background noise, and of speech production. Speech recognition can be used in a variety of situations: to perform query operations and phone call transfers; for…
Yoshinaga-Itano, Christine; Sedey, Allison
A study investigated the relationship between speech production and several demographic and developmental factors in 147 children (ages 14-60 months) with hearing impairments. Significant predictors of speech intelligibility and phonetic inventory included the child's age, expressive language ability, degree of hearing loss, mode of communication,…
Crowe, Kathryn; McLeod, Sharynne
The purpose of this study was to systematically review the factors affecting the language, speech intelligibility, speech production, and lexical tone development of children with hearing loss who use spoken languages other than English. Relevant studies of children with hearing loss published between 2000 and 2011 were reviewed with reference to…
Department of Industrial and Systems Engineering, North Carolina Agricultural & Technical State University 1601 E. Market Street, Greensboro, North...Department of Management, North Carolina Agricultural & Technical State University 1601 E. Market Street, Greensboro, North Carolina 27411, USA; e...to-noise ratio. 1. Introduction Speech intelligibility (SI) is defined as the percent- age of speech units (i.e., phonemes, syllables, words, phrases
George, Erwin L. J.; Goverts, S. Theo; Festen, Joost M.; Houtgast, Tammo
Purpose: The Speech Transmission Index (STI; Houtgast, Steeneken, & Plomp, 1980; Steeneken & Houtgast, 1980) is commonly used to quantify the adverse effects of reverberation and stationary noise on speech intelligibility for normal-hearing listeners. Duquesnoy and Plomp (1980) showed that the STI can be applied for presbycusic listeners, relating…
Anand, Supraja; Stepp, Cara E.
Purpose: Given the potential significance of speech naturalness to functional and social rehabilitation outcomes, the objective of this study was to examine the effect of listener perceptions of monopitch on speech naturalness and intelligibility in individuals with Parkinson's disease (PD). Method: Two short utterances were extracted from…
Groenvold, Mogens; Adsersen, Mathilde; Hansen, Maiken Bang
Aims The aim of the Danish Palliative Care Database (DPD) is to monitor, evaluate, and improve the clinical quality of specialized palliative care (SPC) (ie, the activity of hospital-based palliative care teams/departments and hospices) in Denmark. Study population The study population is all patients in Denmark referred to and/or in contact with SPC after January 1, 2010. Main variables The main variables in DPD are data about referral for patients admitted and not admitted to SPC, type of the first SPC contact, clinical and sociodemographic factors, multidisciplinary conference, and the patient-reported European Organisation for Research and Treatment of Cancer Quality of Life Questionaire-Core-15-Palliative Care questionnaire, assessing health-related quality of life. The data support the estimation of currently five quality of care indicators, ie, the proportions of 1) referred and eligible patients who were actually admitted to SPC, 2) patients who waited <10 days before admission to SPC, 3) patients who died from cancer and who obtained contact with SPC, 4) patients who were screened with European Organisation for Research and Treatment of Cancer Quality of Life Questionaire-Core-15-Palliative Care at admission to SPC, and 5) patients who were discussed at a multidisciplinary conference. Descriptive data In 2014, all 43 SPC units in Denmark reported their data to DPD, and all 9,434 cancer patients (100%) referred to SPC were registered in DPD. In total, 41,104 unique cancer patients were registered in DPD during the 5 years 2010–2014. Of those registered, 96% had cancer. Conclusion DPD is a national clinical quality database for SPC having clinically relevant variables and high data and patient completeness. PMID:27822111
Drullman, Rob; Bronkhorst, Adelbert W.
Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking. .
Gover, Bradford N.; Bradley, John S.
The new method of estimating the speech security of meeting rooms first predicts transmitted speech levels at spot positions 0.25 m from the outside boundaries of a meeting room. The degree of speech security at each spot measurement position is related to this transmitted speech level and to the ambient noise level at the same location. This information can be used to calculate new signal/noise ratio measures that indicate the degree of speech security [J. Acoust. Soc. Am. 116(6), 3480-3490 (2004)]. The values of these indices will indicate whether speech will be audible or intelligible, and to what degree. This paper reports validation studies of the prediction of transmitted speech levels at points 0.25 m outside the meeting room, and the effects of varied room absorption on these predicted levels. It also reports on subjective evaluations of the transmitted speech sounds, in terms of the audibility and intelligibility of the transmitted speech, to validate expected judgments in realistic acoustical conditions.
Begault, Durand R.
A spatial auditory display was used to convolve speech stimuli, consisting of 130 different call signs used in the communications protocol of NASA's John F. Kennedy Space Center, to different virtual auditory positions. An adaptive staircase method was used to determine intelligibility levels of the signal against diotic speech babble, with spatial positions at 30 deg azimuth increments. Non-individualized, minimum-phase approximations of head-related transfer functions were used. The results showed a maximal intelligibility improvement of about 6 dB when the signal was spatialized to 60 deg or 90 deg azimuth positions.
Begault, Durand R.
A spatial auditory display was used to convolve speech stimuli, consisting of 130 different call signs used in the communications protocol of NASA's John F. Kennedy Space Center, to different virtual auditory positions. An adaptive staircase method was used to determine intelligibility levels of the signal against diotic speech babble, with spatial positions at 30 deg azimuth increments. Non-individualized, minimum-phase approximations of head-related transfer functions were used. The results showed a maximal intelligibility improvement of about 6 dB when the signal was spatialized to 60 deg or 90 deg azimuth positions.
Thompson, Elaine C; Woodruff Carr, Kali; White-Schwoch, Travis; Otto-Meyer, Sebastian; Kraus, Nina
From bustling classrooms to unruly lunchrooms, school settings are noisy. To learn effectively in the unwelcome company of numerous distractions, children must clearly perceive speech in noise. In older children and adults, speech-in-noise perception is supported by sensory and cognitive processes, but the correlates underlying this critical listening skill in young children (3-5 year olds) remain undetermined. Employing a longitudinal design (two evaluations separated by ∼12 months), we followed a cohort of 59 preschoolers, ages 3.0-4.9, assessing word-in-noise perception, cognitive abilities (intelligence, short-term memory, attention), and neural responses to speech. Results reveal changes in word-in-noise perception parallel changes in processing of the fundamental frequency (F0), an acoustic cue known for playing a role central to speaker identification and auditory scene analysis. Four unique developmental trajectories (speech-in-noise perception groups) confirm this relationship, in that improvements and declines in word-in-noise perception couple with enhancements and diminishments of F0 encoding, respectively. Improvements in word-in-noise perception also pair with gains in attention. Word-in-noise perception does not relate to strength of neural harmonic representation or short-term memory. These findings reinforce previously-reported roles of F0 and attention in hearing speech in noise in older children and adults, and extend this relationship to preschool children.
Remez, Robert E; Thomas, Emily F
Classic research on the perception of speech sought to identify minimal acoustic correlates of each consonant and vowel. In explaining perception, this view designated momentary components of an acoustic spectrum as cues to the recognition of elementary phonemes. This conceptualization of speech perception is untenable given the findings of phonetic sensitivity to modulation independent of the acoustic and auditory form of the carrier. The empirical key is provided by studies of the perceptual organization of speech, a low-level integrative function that finds and follows the sensory effects of speech amid concurrent events. These projects have shown that the perceptual organization of speech is keyed to modulation; fast; unlearned; nonsymbolic; indifferent to short-term auditory properties; and organization requires attention. The ineluctably multisensory nature of speech perception also imposes conditions that distinguish language among cognitive systems. WIREs Cogn Sci 2013, 4:213–223. doi: 10.1002/wcs.1213 PMID:23926454
Barbeau, Elise B; Soulières, Isabelle; Dawson, Michelle; Zeffiro, Thomas A; Mottron, Laurent
Across the autism spectrum, level of intelligence is highly dependent on the psychometric instrument used for assessment, and there are conflicting views concerning which measures best estimate autistic cognitive abilities. Inspection time is a processing speed measure associated with general intelligence in typical individuals. We therefore investigated autism spectrum performance on inspection time in relation to two different general intelligence tests. Autism spectrum individuals were divided into autistic and Asperger subgroups according to speech development history. Compared to a typical control group, mean inspection time for the autistic subgroup but not the Asperger subgroup was significantly shorter (by 31%). However, the shorter mean autistic inspection time was evident only when groups were matched on Wechsler IQ and disappeared when they were matched using Raven's Progressive Matrices. When autism spectrum abilities are compared to typical abilities, results may be influenced by speech development history as well as by the instrument used for intelligence matching.
Weil, Shawn A.
Non-normative speech (i.e., synthetic speech, pathological speech, foreign accented speech) is more difficult to process for native listeners than is normative speech. Does perceptual dissimilarity affect only intelligibility, or are there other costs to processing? The current series of experiments investigates both the intelligibility and time course of foreign accented speech (FAS) perception. Native English listeners heard single English words spoken by both native English speakers and non-native speakers (Mandarin or Russian). Words were chosen based on the similarity between the phonetic inventories of the respective languages. Three experimental designs were used: a cross-modal matching task, a word repetition (shadowing) task, and two subjective ratings tasks which measured impressions of accentedness and effortfulness. The results replicate previous investigations that have found that FAS significantly lowers word intelligibility. Furthermore, in FAS as well as perceptual effort, in the word repetition task, correct responses are slower to accented words than to nonaccented words. An analysis indicates that both intelligibility and reaction time are, in part, functions of the similarity between the talker's utterance and the listener's representation of the word.
Sandor, Aniko; Moses, Haifa
Speech alarms have been used extensively in aviation and included in International Building Codes (IBC) and National Fire Protection Association's (NFPA) Life Safety Code. However, they have not been implemented on space vehicles. Previous studies conducted at NASA JSC showed that speech alarms lead to faster identification and higher accuracy. This research evaluated updated speech and tone alerts in a laboratory environment and in the Human Exploration Research Analog (HERA) in a realistic setup.
Wang, Syu-Siang; Chern, Alan; Tsao, Yu; Hung, Jeih-weih; Lu, Xugang; Lai, Ying-Hui; Su, Borching
For most of the state-of-the-art speech enhancement techniques, a spectrogram is usually preferred than the respective time-domain raw data since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, the short-time Fourier transform (STFT) that creates the spectrogram in general distorts the original signal and thereby limits the capability of the associated speech enhancement techniques. In this study, we propose a novel speech enhancement method that adopts the algorithms of discrete wavelet packet transform (DWPT) and nonnegative matrix factorization (NMF) in order to conquer the aforementioned limitation. In brief, the DWPT is first applied to split a time-domain speech signal into a series of subband signals without introducing any distortion. Then we exploit NMF to highlight the speech component for each subband. Finally, the enhanced subband signals are joined together via the inverse DWPT to reconstruct a noise-reduced signal in time domain. We evaluate the proposed DWPT-NMF based speech enhancement method on the MHINT task. Experimental results show that this new method behaves very well in prompting speech quality and intelligibility and it outperforms the convnenitional STFT-NMF based method.
Van Engen, Kristin J; Chandrasekaran, Bharath; Smiljanic, Rajka
Extensive research shows that inter-talker variability (i.e., changing the talker) affects recognition memory for speech signals. However, relatively little is known about the consequences of intra-talker variability (i.e. changes in speaking style within a talker) on the encoding of speech signals in memory. It is well established that speakers can modulate the characteristics of their own speech and produce a listener-oriented, intelligibility-enhancing speaking style in response to communication demands (e.g., when speaking to listeners with hearing impairment or non-native speakers of the language). Here we conducted two experiments to examine the role of speaking style variation in spoken language processing. First, we examined the extent to which clear speech provided benefits in challenging listening environments (i.e. speech-in-noise). Second, we compared recognition memory for sentences produced in conversational and clear speaking styles. In both experiments, semantically normal and anomalous sentences were included to investigate the role of higher-level linguistic information in the processing of speaking style variability. The results show that acoustic-phonetic modifications implemented in listener-oriented speech lead to improved speech recognition in challenging listening conditions and, crucially, to a substantial enhancement in recognition memory for sentences.
Tufts, Jennifer B.; Frank, Tom
People working in noisy environments often complain of difficulty communicating when they wear hearing protection. It was hypothesized that part of the workers' communication difficulties stem from changes in speech production that occur when hearing protectors are worn. To address this possibility, overall and one-third-octave-band SPL measurements were obtained for 16 men and 16 women as they produced connected speech while wearing foam, flange, or no earplugs (open ears) in quiet and in pink noise at 60, 70, 80, 90, and 100 dB SPL. The attenuation and the occlusion effect produced by the earplugs were measured. The Speech Intelligibility Index (SII) was also calculated for each condition. The talkers produced lower overall speech levels, speech-to-noise ratios, and SII values, and less high-frequency speech energy, when they wore earplugs compared with the open-ear condition. Small differences in the speech measures between the talkers wearing foam and flange earplugs were observed. Overall, the results of the study indicate that talkers wearing earplugs (and consequently their listeners) are at a disadvantage when communicating in noise.
Wavelet denoising is commonly used for speech enhancement because of the simplicity of its implementation. However, the conventional methods generate the presence of musical residual noise while thresholding the background noise. The unvoiced components of speech are often eliminated from this method. In this paper, a novel algorithm of wavelet coefficient threshold (WCT) based on time-frequency adaptation is proposed. In addition, an unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. The wavelet coefficient threshold (WCT) of each subband is first temporally adjusted according to the value of a posterior signal-to-noise ratio (SNR). To prevent the degradation of unvoiced sounds during noise, the algorithm utilizes a simple speech/noise detector (SND) and further divides speech signal into unvoiced and voiced sounds. Then, we apply appropriate wavelet thresholding according to voiced/unvoiced (V/U) decision. Based on the masking properties of human auditory system, a perceptual gain factor is adopted into wavelet thresholding for suppressing musical residual noise. Simulation results show that the proposed method is capable of reducing noise with little speech degradation and the overall performance is superior to several competitive methods.
Sell, D; Mildinhall, S; Albery, L; Wills, A K; Sandy, J R; Ness, A R
Structured Abstract Objectives To describe the perceptual speech outcomes from the Cleft Care UK (CCUK) study and compare them to the 1998 Clinical Standards Advisory Group (CSAG) audit. Setting and sample population A cross-sectional study of 248 children born with complete unilateral cleft lip and palate, between 1 April 2005 and 31 March 2007 who underwent speech assessment. Materials and methods Centre-based specialist speech and language therapists (SLT) took speech audio–video recordings according to nationally agreed guidelines. Two independent listeners undertook the perceptual analysis using the CAPS-A Audit tool. Intra- and inter-rater reliability were tested. Results For each speech parameter of intelligibility/distinctiveness, hypernasality, palatal/palatalization, backed to velar/uvular, glottal, weak and nasalized consonants, and nasal realizations, there was strong evidence that speech outcomes were better in the CCUK children compared to CSAG children. The parameters which did not show improvement were nasal emission, nasal turbulence, hyponasality and lateral/lateralization. Conclusion These results suggest that centralization of cleft care into high volume centres has resulted in improvements in UK speech outcomes in five-year-olds with unilateral cleft lip and palate. This may be associated with the development of a specialized workforce. Nevertheless, there still remains a group of children with significant difficulties at school entry. PMID:26567854
Megnin-Viggars, Odette; Goswami, Usha
Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and temporal modulations in the 2-7 Hz range are of particular importance. Dyslexic individuals have specific problems in perceiving speech envelope cues. In the current study, we used an audiovisual noise-vocoded speech task to investigate the contribution of low-frequency visual information to intelligibility of 4-channel and 16-channel noise vocoded speech in participants with and without dyslexia. For the 4-channel speech, noise vocoding preserves amplitude information that is entirely congruent with dynamic visual information. All participants were significantly more accurate with 4-channel speech when visual information was present, even when this information was purely spatio-temporal (pixelated stimuli changing in luminance). Possible underlying mechanisms are discussed.
insight into a low resource languages ,” Transactions on Machine Learning and Artificial Intelligence, 2(4), Aug. 2014, 115-126.  Q. Zheng, G. Liu...by a person is a rich and valuable piece of information for several applications such as health monitoring, second language learning or language ...ROBUST SPEECH PROCESSING & RECOGNITION: SPEAKER ID, LANGUAGE ID, SPEECH RECOGNITION/KEYWORD SPOTTING, DIARIZATION/CO- CHANNEL/ENVIRONMENTAL
Objective : To investigate the phonetic and phonological parameters of speech production associated with cleft palate in single words and in sentence repetition in order to explore the impact of connected speech processes, prosody, and word juncture on word production across contexts. Participants : Two boys (aged 9 years 5 months and 11 years 0 months) with persisting speech impairments related to a history of unilateral cleft lip and palate formed the main focus of the study; three typical adult male speakers provided control data. Method : Audio, video, and electropalatographic recordings were made of the participants producing single words and repeating two sets of sentences. The data were transcribed and the electropalatographic recordings were analyzed to explore lingual-palatal contact patterns across the different speech conditions. Acoustic analysis was used to further inform the perceptual analysis and to make specific durational measurements. Results : The two boys' speech production differed across the speech conditions. Both boys showed typical and atypical phonetic features in their connected speech production. One boy, although often unintelligible, resembled the adult speakers more closely prosodically and in his specific connected speech behaviors at word boundaries. The second boy produced developmentally atypical phonetic adjustments at word boundaries that appeared to promote intelligibility at the expense of naturalness. Conclusion : For older children with persisting speech impairments, it is particularly important to examine specific features of connected speech production, including word juncture and prosody. Sentence repetition data provide useful information to this end, but further investigations encompassing detailed perceptual and instrumental analysis of real conversational data are warranted.
Bistafa, Sylvio R.
Eight auditoriums with a proscenium type of stage, which usually operate as dramatic theaters in the city of Sao Paulo, were acoustically surveyed in terms of their adequacy to unassisted speech. Reverberation times, early decay times, and speech levels were measured in different positions, together with objective measures of speech intelligibility. The measurements revealed reverberation time values rather uniform throughout the rooms, whereas significant variations were found in the values of the other acoustical measures with position. The early decay time was found to be better correlated with the objective measures of speech intelligibility than the reverberation time. The results from the objective measurements of speech intelligibility revealed that the speech transmission index STI, and its simplified version RaSTI, are strongly correlated with the early-to-late sound ratio C50 (1 kHz). However, it was found that the criterion value of acceptability of the latter is more easily met than the former. The results from these measurements enable to understand how the characteristics of the architectural design determine the acoustical quality for speech. Measurements of ST1-Gade were made as an attempt to validate it as an objective measure of ''support'' for the actor. The preliminary diagnosing results with ray tracing simulations will also be presented.
Tran, Phuong K; Letowski, Tomasz R; McBride, Maranda E
Speech signals can be converted into electrical audio signals using either conventional air conduction (AC) microphone or a contact bone conduction (BC) microphone. The goal of this study was to investigate the effects of the location of a BC microphone on the intensity and frequency spectrum of the recorded speech. Twelve locations, 11 on the talker's head and 1 on the collar bone, were investigated. The speech sounds were three vowels (/u/, /a/, /i/) and two consonants (/m/, /∫/). The sounds were produced by 12 talkers. Each sound was recorded simultaneously with two BC microphones and an AC microphone. Analyzed spectral data showed that the BC recordings made at the forehead of the talker were the most similar to the AC recordings, whereas the collar bone recordings were most different. Comparison of the spectral data with speech intelligibility data collected in another study revealed a strong negative relationship between BC speech intelligibility and the degree of deviation of the BC speech spectrum from the AC spectrum. In addition, the head locations that resulted in the highest speech intelligibility were associated with the lowest output signals among all tested locations. Implications of these findings for BC communication are discussed.
Hornsby, Benjamin W. Y.; Ricketts, Todd A.
The speech understanding of persons with ``flat'' hearing loss (HI) was compared to a normal-hearing (NH) control group to examine how hearing loss affects the contribution of speech information in various frequency regions. Speech understanding in noise was assessed at multiple low- and high-pass filter cutoff frequencies. Noise levels were chosen to ensure that the noise, rather than quiet thresholds, determined audibility. The performance of HI subjects was compared to a NH group listening at the same signal-to-noise ratio and a comparable presentation level. Although absolute speech scores for the HI group were reduced, performance improvements as the speech and noise bandwidth increased were comparable between groups. These data suggest that the presence of hearing loss results in a uniform, rather than frequency-specific, deficit in the contribution of speech information. Measures of auditory thresholds in noise and speech intelligibility index (SII) calculations were also performed. These data suggest that differences in performance between the HI and NH groups are due primarily to audibility differences between groups. Measures of auditory thresholds in noise showed the ``effective masking spectrum'' of the noise was greater for the HI than the NH subjects.
Zoefel, Benedikt; VanRullen, Rufin
Phase entrainment of neural oscillations, the brain's adjustment to rhythmic stimulation, is a central component in recent theories of speech comprehension: the alignment between brain oscillations and speech sound improves speech intelligibility. However, phase entrainment to everyday speech sound could also be explained by oscillations passively following the low-level periodicities (e.g., in sound amplitude and spectral content) of auditory stimulation-and not by an adjustment to the speech rhythm per se. Recently, using novel speech/noise mixture stimuli, we have shown that behavioral performance can entrain to speech sound even when high-level features (including phonetic information) are not accompanied by fluctuations in sound amplitude and spectral content. In the present study, we report that neural phase entrainment might underlie our behavioral findings. We observed phase-locking between electroencephalogram (EEG) and speech sound in response not only to original (unprocessed) speech but also to our constructed "high-level" speech/noise mixture stimuli. Phase entrainment to original speech and speech/noise sound did not differ in the degree of entrainment, but rather in the actual phase difference between EEG signal and sound. Phase entrainment was not abolished when speech/noise stimuli were presented in reverse (which disrupts semantic processing), indicating that acoustic (rather than linguistic) high-level features play a major role in the observed neural entrainment. Our results provide further evidence for phase entrainment as a potential mechanism underlying speech processing and segmentation, and for the involvement of high-level processes in the adjustment to the rhythm of speech.
Park, Hyojin; Ince, Robin A A; Schyns, Philippe G; Thut, Gregor; Gross, Joachim
Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility . We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception.
Ferguson, Melanie A.; Hall, Rebecca L.; Riley, Alison; Moore, David R.
Purpose: Parental reports of communication, listening, and behavior in children receiving a clinical diagnosis of specific language impairment (SLI) or auditory processing disorder (APD) were compared with direct tests of intelligence, memory, language, phonology, literacy, and speech intelligibility. The primary aim was to identify whether there…
In this paper, I shall discuss Danish perspectives on nature, showing the interdependence of conceptions of "nature" and "nationhood" in the formations of a particular cultural community. Nature, thus construed, is never innocent of culture and cannot therefore simply be "restored" to some pristine, pre-lapsarian…
... Speech, Language and Swallowing / Disorders and Diseases Speech Sound Disorders: Articulation and Phonological Processes What are speech ... individuals with speech sound disorders ? What are speech sound disorders? Most children make some mistakes as they ...
There is a lack of agreement on the features used to differentiate Childhood Apraxia of Speech (CAS) from Phonological Disorders (PD). One criterion which has gained consensus is lexical inconsistency of speech (ASHA, 2007); however, no accepted measure of this feature has been defined. Although lexical assessment provides information about consistency of an item across repeated trials, it may not capture the magnitude of inconsistency within an item. In contrast, segmental analysis provides more extensive information about consistency of phoneme usage across multiple contexts and word-positions. The current research compared segmental and lexical inconsistency metrics in preschool-aged children with PD, CAS, and typical development (TD) to determine how inconsistency varies with age in typical and disordered speakers, and whether CAS and PD were differentiated equally well by both assessment levels. Whereas lexical and segmental analyses may be influenced by listener characteristics or speaker intelligibility, the acoustic signal is less vulnerable to these factors. In addition, the acoustic signal may reveal information which is not evident in the perceptual signal. A second focus of the current research was motivated by Blumstein et al.'s (1980) classic study on voice onset time (VOT) in adults with acquired apraxia of speech (AOS) which demonstrated a motor impairment underlying AOS. In the current study, VOT analyses were conducted to determine the relationship between age and group with the voicing distribution for bilabial and alveolar plosives. Findings revealed that 3-year-olds evidenced significantly higher inconsistency than 5-year-olds; segmental inconsistency approached 0% in 5-year-olds with TD, whereas it persisted in children with PD and CAS suggesting that for child in this age-range, inconsistency is a feature of speech disorder rather than typical development (Holm et al., 2007). Likewise, whereas segmental and lexical inconsistency were
Drawing on the fields of computer science, electrical engineering, linguistics, mathematics, philosophy, psychology, and physiology, this one-volume encyclopedia brings together the core of knowledge on artificial intelligence. It provides an overview of how to program computers to emulate human behavior, offering a wide range of techniques for speech and visual generation, problem-solving and more. Over 250 entries are organized alphabetically, cross-referenced and indexed.
Under a Small Business Innovation Research contract from Marshall Space Flight Center, Ultrafast, Inc. developed the world's first, high-temperature resistant, "intelligent" fastener. NASA needed a critical-fastening appraisal and validation of spacecraft segments that are coupled together in space. The intelligent-bolt technology deletes the self-defeating procedure of having to untighten the fastener, and thus upset the joint, during inspection and maintenance. The Ultrafast solution yielded an innovation that is likely to revolutionize manufacturing assembly, particularly the automobile industry. Other areas of application range from aircraft, computers and fork-lifts to offshore platforms, buildings, and bridges.
Indian orators have been saying good-bye for more than three hundred years. John Eliot's "Dying Speeches of Several Indians" (1685), as David Murray notes, inaugurates a long textual history in which "Indians... are most useful dying," or, as in a number of speeches, bidding the world farewell as they embrace an undesired but…
Shearer, William M.
Written for students in the fields of speech correction and audiology, the text deals with the following: structures involved in respiration; the skeleton and the processes of inhalation and exhalation; phonation and pitch, the larynx, and esophageal speech; muscles involved in articulation; muscles involved in resonance; and the anatomy of the…
Authoritarian teaching practices in ballet inhibit the use of private speech. This paper highlights the critical importance of private speech in the cognitive development of young ballet students, within what is largely a non-verbal art form. It draws upon research by Russian psychologist Lev Vygotsky and contemporary socioculturalists, to…
Hyman, Allen, Ed.; Johnson, M. Bruce, Ed.
The articles collected in this book originated at a conference at which legal and economic scholars discussed the issue of First Amendment protection for commercial speech. The first article, in arguing for freedom for commercial speech, finds inconsistent and untenable the arguments of those who advocate freedom from regulation for political…
Powell, Thomas W.
This article describes a procedure to aid in the clinical appraisal of child speech. The approach, based on the work by Dinnsen, Chin, Elbert, and Powell (1990; Some constraints on functionally disordered phonologies: Phonetic inventories and phonotactics. "Journal of Speech and Hearing Research", 33, 28-37), uses a railway idiom to track gains in…
Kane, Peter E., Ed.
The nine articles in this collection deal with theoretical and practical freedom of speech issues. Topics discussed include the following: (1) freedom of expression in Thailand and India; (2) metaphors and analogues in several landmark free speech cases; (3) Supreme Court Justice William O. Douglas's views of the First Amendment; (4) the San…
Barbour, Alton, Ed.
This issue of the "Free Speech Yearbook" contains the following: "Between Rhetoric and Disloyalty: Free Speech Standards for the Sunshire Soldier" by Richard A. Parker; "William A. Rehnquist: Ideologist on the Bench" by Peter E. Kane; "The First Amendment's Weakest Link: Government Regulation of Controversial…
Braunwald, Susan R.
A range of language use model is proposed as an alternative conceptual framework to a stage model of egocentric speech. The range of language use model is proposed to clarify the meaning of the term egocentric speech, to examine the validity of stage assumptions, and to explain the existence of contextual variation in the form of children's…
Phifer, Gregg, Ed.
The articles collected in this annual address several aspects of First Amendment Law. The following titles are included: "Freedom of Speech As an Academic Discipline" (Franklyn S. Haiman), "Free Speech and Foreign-Policy Decision Making" (Douglas N. Freeman), "The Supreme Court and the First Amendment: 1975-1976"…
studies of the dynamics of speech production through cineradiographic techniques and through acoustic analysis of formant motions in vowels in various...particular, the activity of the vocal cords and the dynamics of tongue motion. Research on speech perception has included experiments on vowel
Lee, HweeLing; Noppeney, Uta
Face-to-face communication challenges the human brain to integrate information from auditory and visual senses with linguistic representations. Yet the role of bottom-up physical (spectrotemporal structure) input and top-down linguistic constraints in shaping the neural mechanisms specialized for integrating audiovisual speech signals are currently unknown. Participants were presented with speech and sinewave speech analogs in visual, auditory, and audiovisual modalities. Before the fMRI study, they were trained to perceive physically identical sinewave speech analogs as speech (SWS-S) or nonspeech (SWS-N). Comparing audiovisual integration (interactions) of speech, SWS-S, and SWS-N revealed a posterior-anterior processing gradient within the left superior temporal sulcus/gyrus (STS/STG): Bilateral posterior STS/STG integrated audiovisual inputs regardless of spectrotemporal structure or speech percept; in left mid-STS, the integration profile was primarily determined by the spectrotemporal structure of the signals; more anterior STS regions discarded spectrotemporal structure and integrated audiovisual signals constrained by stimulus intelligibility and the availability of linguistic representations. In addition to this "ventral" processing stream, a "dorsal" circuitry encompassing posterior STS/STG and left inferior frontal gyrus differentially integrated audiovisual speech and SWS signals. Indeed, dynamic causal modeling and Bayesian model comparison provided strong evidence for a parallel processing structure encompassing a ventral and a dorsal stream with speech intelligibility training enhancing the connectivity between posterior and anterior STS/STG. In conclusion, audiovisual speech comprehension emerges in an interactive process with the integration of auditory and visual signals being progressively constrained by stimulus intelligibility along the STS and spectrotemporal structure in a dorsal fronto-temporal circuitry.
Great strides have been made in the development of automatic speech recognition (ASR) technology over the past thirty years. Most of this effort has been centered around the extension and improvement of Hidden Markov Model (HMM) approaches to ASR. Current commercially-available and industry systems based on HMMs can perform well for certain situational tasks that restrict variability such as phone dialing or limited voice commands. However, the holy grail of ASR systems is performance comparable to humans-in other words, the ability to automatically transcribe unrestricted conversational speech spoken by an infinite number of speakers under varying acoustic environments. This goal is far from being reached. Key to the success of ASR is effective modeling of variability in the speech signal. This tutorial will review the basics of ASR and the various ways in which our current knowledge of speech production, speech perception and prosody can be exploited to improve robustness at every level of the system.
identification task, in which an intelligence analyst identifies triples of "who," "where" and "when" for each event described in transcribed broadcast...news. Both of these resemble typical activities of intelligence analysts in OSINT processing and production applications. We assessed two task...a specific named-entity identification task in broadcast news speech, in which an intelligence analyst identifies triples of "who," "where" and
Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Ihlefeld, Antje; Litovsky, Ruth Y
A cochlear implant (CI) presents band-pass-filtered acoustic envelope information by modulating current pulse train levels. Similarly, a vocoder presents envelope information by modulating an acoustic carrier. By studying how normal hearing (NH) listeners are able to understand degraded speech signals with a vocoder, the parameters that best simulate electric hearing and factors that might contribute to the NH-CI performance difference may be better understood. A vocoder with harmonic complex carriers (fundamental frequency, f0 = 100 Hz) was used to study the effect of carrier phase dispersion on speech envelopes and intelligibility. The starting phases of the harmonic components were randomly dispersed to varying degrees prior to carrier filtering and modulation. NH listeners were tested on recognition of a closed set of vocoded words in background noise. Two sets of synthesis filters simulated different amounts of current spread in CIs. Results showed that the speech vocoded with carriers whose starting phases were maximally dispersed was the most intelligible. Superior speech understanding may have been a result of the flattening of the dispersed-phase carrier's intrinsic temporal envelopes produced by the large number of interacting components in the high-frequency channels. Cross-correlogram analyses of auditory nerve model simulations confirmed that randomly dispersing the carrier's component starting phases resulted in better neural envelope representation. However, neural metrics extracted from these analyses were not found to accurately predict speech recognition scores for all vocoded speech conditions. It is possible that central speech understanding mechanisms are insensitive to the envelope-fine structure dichotomy exploited by vocoders.
Lipavská, Helena; Žárský, Viktor
The concept of plant intelligence, as proposed by Anthony Trewavas, has raised considerable discussion. However, plant intelligence remains loosely defined; often it is either perceived as practically synonymous to Darwinian fitness, or reduced to a mere decorative metaphor. A more strict view can be taken, emphasizing necessary prerequisites such as memory and learning, which requires clarifying the definition of memory itself. To qualify as memories, traces of past events have to be not only stored, but also actively accessed. We propose a criterion for eliminating false candidates of possible plant intelligence phenomena in this stricter sense: an “intelligent” behavior must involve a component that can be approximated by a plausible algorithmic model involving recourse to stored information about past states of the individual or its environment. Re-evaluation of previously presented examples of plant intelligence shows that only some of them pass our test. “You were hurt?” Kumiko said, looking at the scar. Sally looked down. “Yeah.” “Why didn't you have it removed?” “Sometimes it's good to remember.” “Being hurt?” “Being stupid.”—(W. Gibson: Mona Lisa Overdrive) PMID:19816094
Details the characteristics of Howard Gardner's seven multiple intelligences (MI): linguistic, logical-mathematical, bodily-kinesthetic, spatial, musical, interpersonal, and intrapersonal. Discusses the implications of MI for instruction. Explores how students can study using their preferred learning style - visual, auditory, and physical study…
To make an academic study of matters inherently secret and potentially explosive seems a tall task. But a growing number of scholars are drawn to understanding spycraft. The interdisciplinary field of intelligence studies is mushrooming, as scholars trained in history, international studies, and political science examine such subjects as the…
Research has shown that differences among ordinary people in intelligence and personality depend equally on individual genetic variability and on differences in the environments that siblings experience within the same family, not differences in the neighborhood, school, and community environments. As of yet, there are no adequate theories to…