frequency speech rate: Topics by Science.gov

Sample records for frequency speech rate

The effect of speech rate on stuttering frequency, phonated intervals, speech effort, and speech naturalness during chorus reading.

PubMed

Davidow, Jason H; Ingham, Roger J

2013-01-01

This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also examined. An added purpose was to determine if chorus reading could be further refined so as to provide a perceptual guide for gauging the level of physical effort exerted during speech production. A repeated-measures design was used to compare data obtained during control reading conditions and during several chorus reading conditions produced at different speech rates. Participants included 8 persons who stutter (PWS) between the ages of 16 and 32 years. There were significant reductions in the frequency of short PIs from the habitual reading condition during slower chorus conditions, no change when speech rates were matched between habitual reading and chorus conditions, and an increase in the frequency of short PIs during chorus reading produced at a faster rate than the habitual condition. Speech rate did not have an effect on stuttering frequency during chorus reading. In general, speech effort ratings improved and naturalness ratings worsened as speech rate decreased. These results provide evidence that (a) a reduction in the frequency of short PIs is not necessary for fluency improvement during chorus reading, and (b) speech rate may be altered to provide PWS with a more appropriate reference for how physically effortful normally fluent speech production should be. Future investigations should examine the necessity of changes in the activation of neural regions during chorus reading, the possibility of defining individualized units on a 9-point effort scale, and if there are upper and lower speech rate boundaries for receiving ratings of "highly natural sounding" speech during chorus reading. The reader will be able to: (1) describe the effect of changes in speech rate on the frequency of short phonated intervals during chorus reading, (2) describe changes to speaker-judged speech effort as speech rate changes during chorus reading, (3) and describe the effect of changes in speech rate on listener-judged naturalness ratings during chorus reading. Copyright © 2012 Elsevier Inc. All rights reserved.
The Effect of Speech Rate on Stuttering Frequency, Phonated Intervals, Speech Effort, and Speech Naturalness during Chorus Reading

ERIC Educational Resources Information Center

Davidow, Jason H.; Ingham, Roger J.

2013-01-01

Purpose: This study examined the effect of speech rate on phonated intervals (PIs), in order to test whether a reduction in the frequency of short PIs is an important part of the fluency-inducing mechanism of chorus reading. The influence of speech rate on stuttering frequency, speaker-judged speech effort, and listener-judged naturalness was also…
Stuttering Frequency, Speech Rate, Speech Naturalness, and Speech Effort During the Production of Voluntary Stuttering.

PubMed

Davidow, Jason H; Grossman, Heather L; Edge, Robin L

2018-05-01

Voluntary stuttering techniques involve persons who stutter purposefully interjecting disfluencies into their speech. Little research has been conducted on the impact of these techniques on the speech pattern of persons who stutter. The present study examined whether changes in the frequency of voluntary stuttering accompanied changes in stuttering frequency, articulation rate, speech naturalness, and speech effort. In total, 12 persons who stutter aged 16-34 years participated. Participants read four 300-syllable passages during a control condition, and three voluntary stuttering conditions that involved attempting to produce purposeful, tension-free repetitions of initial sounds or syllables of a word for two or more repetitions (i.e., bouncing). The three voluntary stuttering conditions included bouncing on 5%, 10%, and 15% of syllables read. Friedman tests and follow-up Wilcoxon signed ranks tests were conducted for the statistical analyses. Stuttering frequency, articulation rate, and speech naturalness were significantly different between the voluntary stuttering conditions. Speech effort did not differ between the voluntary stuttering conditions. Stuttering frequency was significantly lower during the three voluntary stuttering conditions compared to the control condition, and speech effort was significantly lower during two of the three voluntary stuttering conditions compared to the control condition. Due to changes in articulation rate across the voluntary stuttering conditions, it is difficult to conclude, as has been suggested previously, that voluntary stuttering is the reason for stuttering reductions found when using voluntary stuttering techniques. Additionally, future investigations should examine different types of voluntary stuttering over an extended period of time to determine their impact on stuttering frequency, speech rate, speech naturalness, and speech effort.
Formant-frequency variation and its effects on across-formant grouping in speech perception.

PubMed

Roberts, Brian; Summers, Robert J; Bailey, Peter J

2013-01-01

How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.
[Effects of fundamental frequency and speech rate on impression formation].

PubMed

Uchida, Teruhisa; Nakaune, Naoko

2004-12-01

This study investigated the systematic relationship between nonverbal features of speech and personality trait ratings of the speaker. In Study 1, fundamental frequency (F0) in original speech was converted into five levels from 64% to 156.25%. Then 132 undergraduates rated each of the converted speeches in terms of personality traits. In Study 2 134 undergraduates similarly rated the speech stimuli, which had five speech rate levels as well as two F0 levels. Results showed that listener ratings along Big Five dimensions were mostly independent. Each dimension had a slightly different change profile over the five levels of F0 and speech rate. A quadratic regression equation provided a good approximation for each rating as a function of F0 or speech rate. The quadratic regression equations put together would provide us with a rough estimate of personality trait impression as a function of prosodic features. The functional relationship among F0, speech rate, and trait ratings was shown as a curved surface in the three-dimensional space.
a Comparative Analysis of Fluent and Cerebral Palsied Speech.

NASA Astrophysics Data System (ADS)

van Doorn, Janis Lee

Several features of the acoustic waveforms of fluent and cerebral palsied speech were compared, using six fluent and seven cerebral palsied subjects, with a major emphasis being placed on an investigation of the trajectories of the first three formants (vocal tract resonances). To provide an overall picture which included other acoustic features, fundamental frequency, intensity, speech timing (speech rate and syllable duration), and prevocalization (vocalization prior to initial stop consonants found in cerebral palsied speech) were also investigated. Measurements were made using repetitions of a test sentence which was chosen because it required large excursions of the speech articulators (lips, tongue and jaw), so that differences in the formant trajectories for the fluent and cerebral palsied speakers would be emphasized. The acoustic features were all extracted from the digitized speech waveform (10 kHz sampling rate): the fundamental frequency contours were derived manually, the intensity contours were measured using the signal covariance, speech rate and syllable durations were measured manually, as were the prevocalization durations, while the formant trajectories were derived from short time spectra which were calculated for each 10 ms of speech using linear prediction analysis. Differences which were found in the acoustic features can be summarized as follows. For cerebral palsied speakers, the fundamental frequency contours generally showed inappropriate exaggerated fluctuations, as did some of the intensity contours; the mean fundamental frequencies were either higher or the same as for the fluent subjects; speech rates were reduced, and syllable durations were longer; prevocalization was consistently present at the beginning of the test sentence; formant trajectories were found to have overall reduced frequency ranges, and to contain anomalous transitional features, but it is noteworthy that for any one cerebral palsied subject, the inappropriate trajectory pattern was generally reproducible. The anomalous transitional features took the form of (a) inappropriate transition patterns, (b) reduced frequency excursions, (c) increased transition durations, and (d) decreased maximum rates of frequency change.
Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.

PubMed

Summers, Robert J; Bailey, Peter J; Roberts, Brian

2012-04-01

How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1 + F2 + F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3C; F2 + F3), where F2C + F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0 = constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Perceptual weighting of the envelope and fine structure across frequency bands for sentence intelligibility: Effect of interruption at the syllabic-rate and periodic-rate of speech

PubMed Central

Fogerty, Daniel

2011-01-01

Listeners often only have fragments of speech available to understand the intended message due to competing background noise. In order to maximize successful speech recognition, listeners must allocate their perceptual resources to the most informative acoustic properties. The speech signal contains temporally-varying acoustics in the envelope and fine structure that are present across the frequency spectrum. Understanding how listeners perceptually weigh these acoustic properties in different frequency regions during interrupted speech is essential for the design of assistive listening devices. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for interrupted sentence materials. Perceptual weights were obtained during interruption at the syllabic rate (i.e., 4 Hz) and the periodic rate (i.e., 128 Hz) of speech. Potential interruption interactions with fundamental frequency information were investigated by shifting the natural pitch contour higher relative to the interruption rate. The availability of each acoustic property was varied independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated similar relative weights across the interruption conditions, with emphasis on the envelope in high-frequencies. PMID:21786914
Measuring the rate of change of voice fundamental frequency in fluent speech during mental depression.

PubMed

Nilsonne, A; Sundberg, J; Ternström, S; Askenfelt, A

1988-02-01

A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.
Nonlinear frequency compression: effects on sound quality ratings of speech and music.

PubMed

Parsa, Vijay; Scollie, Susan; Glista, Danielle; Seelisch, Andreas

2013-03-01

Frequency lowering technologies offer an alternative amplification solution for severe to profound high frequency hearing losses. While frequency lowering technologies may improve audibility of high frequency sounds, the very nature of this processing can affect the perceived sound quality. This article reports the results from two studies that investigated the impact of a nonlinear frequency compression (NFC) algorithm on perceived sound quality. In the first study, the cutoff frequency and compression ratio parameters of the NFC algorithm were varied, and their effect on the speech quality was measured subjectively with 12 normal hearing adults, 12 normal hearing children, 13 hearing impaired adults, and 9 hearing impaired children. In the second study, 12 normal hearing and 8 hearing impaired adult listeners rated the quality of speech in quiet, speech in noise, and music after processing with a different set of NFC parameters. Results showed that the cutoff frequency parameter had more impact on sound quality ratings than the compression ratio, and that the hearing impaired adults were more tolerant to increased frequency compression than normal hearing adults. No statistically significant differences were found in the sound quality ratings of speech-in-noise and music stimuli processed through various NFC settings by hearing impaired listeners. These findings suggest that there may be an acceptable range of NFC settings for hearing impaired individuals where sound quality is not adversely affected. These results may assist an Audiologist in clinical NFC hearing aid fittings for achieving a balance between high frequency audibility and sound quality.
Amplitude modulation detection with concurrent frequency modulation.

PubMed

Nagaraj, Naveen K

2016-09-01

Human speech consists of concomitant temporal modulations in amplitude and frequency that are crucial for speech perception. In this study, amplitude modulation (AM) detection thresholds were measured for 550 and 5000 Hz carriers with and without concurrent frequency modulation (FM), at AM rates crucial for speech perception. Results indicate that adding 40 Hz FM interferes with AM detection, more so for 5000 Hz carrier and for frequency deviations exceeding the critical bandwidth of the carrier frequency. These findings suggest that future cochlear implant processors, encoding speech fine-structures may consider limiting the FM to narrow bandwidth and to low frequencies.
Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality.

PubMed

Kates, James M; Arehart, Kathryn H

2015-10-01

This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships.
Comparing the information conveyed by envelope modulation for speech intelligibility, speech quality, and music quality

PubMed Central

Kates, James M.; Arehart, Kathryn H.

2015-01-01

This paper uses mutual information to quantify the relationship between envelope modulation fidelity and perceptual responses. Data from several previous experiments that measured speech intelligibility, speech quality, and music quality are evaluated for normal-hearing and hearing-impaired listeners. A model of the auditory periphery is used to generate envelope signals, and envelope modulation fidelity is calculated using the normalized cross-covariance of the degraded signal envelope with that of a reference signal. Two procedures are used to describe the envelope modulation: (1) modulation within each auditory frequency band and (2) spectro-temporal processing that analyzes the modulation of spectral ripple components fit to successive short-time spectra. The results indicate that low modulation rates provide the highest information for intelligibility, while high modulation rates provide the highest information for speech and music quality. The low-to-mid auditory frequencies are most important for intelligibility, while mid frequencies are most important for speech quality and high frequencies are most important for music quality. Differences between the spectral ripple components used for the spectro-temporal analysis were not significant in five of the six experimental conditions evaluated. The results indicate that different modulation-rate and auditory-frequency weights may be appropriate for indices designed to predict different types of perceptual relationships. PMID:26520329
The effect of intensive speech rate and intonation therapy on intelligibility in Parkinson's disease.

PubMed

Martens, Heidi; Van Nuffelen, Gwen; Dekens, Tomas; Hernández-Díaz Huici, Maria; Kairuz Hernández-Díaz, Hector Arturo; De Letter, Miet; De Bodt, Marc

2015-01-01

Most studies on treatment of prosody in individuals with dysarthria due to Parkinson's disease are based on intensive treatment of loudness. The present study investigates the effect of intensive treatment of speech rate and intonation on the intelligibility of individuals with dysarthria due to Parkinson's disease. A one group pretest-posttest design was used to compare intelligibility, speech rate, and intonation before and after treatment. Participants included eleven Dutch-speaking individuals with predominantly moderate dysarthria due to Parkinson's disease, who received five one-hour treatment sessions per week during three weeks. Treatment focused on lowering speech rate and magnifying the phrase final intonation contrast between statements and questions. Intelligibility was perceptually assessed using a standardized sentence intelligibility test. Speech rate was automatically assessed during the sentence intelligibility test as well as during a passage reading task and a storytelling task. Intonation was perceptually assessed using a sentence reading task and a sentence repetition task, and also acoustically analyzed in terms of maximum fundamental frequency. After treatment, there was a significant improvement of sentence intelligibility (effect size .83), a significant increase of pause frequency during the passage reading task, a significant improvement of correct listener identification of statements and questions, and a significant increase of the maximum fundamental frequency in the final syllable of questions during both intonation tasks. The findings suggest that participants were more intelligible and more able to manipulate pause frequency and statement-question intonation after treatment. However, the relationship between the change in intelligibility on the one hand and the changes in speech rate and intonation on the other hand is not yet fully understood. Results are nuanced in the light of the operated research design. The reader will be able to: (1) describe the effect of intensive speech rate and intonation treatment on intelligibility of speakers with dysarthria due to PD, (2) describe the effect of intensive speech rate treatment on rate manipulation by speakers with dysarthria due to PD, and (3) describe the effect of intensive intonation treatment on manipulation of the phrase final intonation contrast between statements and questions by speakers with dysarthria due to PD. Copyright © 2015 Elsevier Inc. All rights reserved.
Voice stress analysis

NASA Technical Reports Server (NTRS)

Brenner, Malcolm; Shipp, Thomas

1988-01-01

In a study of the validity of eight candidate voice measures (fundamental frequency, amplitude, speech rate, frequency jitter, amplitude shimmer, Psychological Stress Evaluator scores, energy distribution, and the derived measure of the above measures) for determining psychological stress, 17 males age 21 to 35 were subjected to a tracking task on a microcomputer CRT while parameters of vocal production as well as heart rate were measured. Findings confirm those of earlier studies that increases in fundamental frequency, amplitude, and speech rate are found in speakers involved in extreme levels of stress. In addition, it was found that the same changes appear to occur in a regular fashion within a more subtle level of stress that may be characteristic, for example, of routine flying situations. None of the individual speech measures performed as robustly as did heart rate.
Binaural sluggishness in the perception of tone sequences and speech in noise.

PubMed

Culling, J F; Colburn, H S

2000-01-01

The binaural system is well-known for its sluggish response to changes in the interaural parameters to which it is sensitive. Theories of binaural unmasking have suggested that detection of signals in noise is mediated by detection of differences in interaural correlation. If these theories are correct, improvements in the intelligibility of speech in favorable binaural conditions is most likely mediated by spectro-temporal variations in interaural correlation of the stimulus which mirror the spectro-temporal amplitude modulations of the speech. However, binaural sluggishness should limit the temporal resolution of the representation of speech recovered by this means. The present study tested this prediction in two ways. First, listeners' masked discrimination thresholds for ascending vs descending pure-tone arpeggios were measured as a function of rate of frequency change in the NoSo and NoSpi binaural configurations. Three-tone arpeggios were presented repeatedly and continuously for 1.6 s, masked by a 1.6-s burst of noise. In a two-interval task, listeners determined the interval in which the arpeggios were ascending. The results showed a binaural advantage of 12-14 dB for NoSpi at 3.3 arpeggios per s (arp/s), which reduced to 3-5 dB at 10.4 arp/s. This outcome confirmed that the discrimination of spectro-temporal patterns in noise is susceptible to the effects of binaural sluggishness. Second, listeners' masked speech-reception thresholds were measured in speech-shaped noise using speech which was 1, 1.5, and 2 times the original articulation rate. The articulation rate was increased using a phase-vocoder technique which increased all the modulation frequencies in the speech without altering its pitch. Speech-reception thresholds were, on average, 5.2 dB lower for the NoSpi than for the NoSo configuration, at the original articulation rate. This binaural masking release was reduced to 2.8 dB when the articulation rate was doubled, but the most notable effect was a 6-8 dB increase in thresholds with articulation rate for both configurations. These results suggest that higher modulation frequencies in masked signals cannot be temporally resolved by the binaural system, but that the useful modulation frequencies in speech are sufficiently low (<5 Hz) that they are invulnerable to the effects of binaural sluggishness, even at elevated articulation rates.
Acoustic properties of naturally produced clear speech at normal speaking rates

NASA Astrophysics Data System (ADS)

Krause, Jean C.; Braida, Louis D.

2004-01-01

Sentences spoken ``clearly'' are significantly more intelligible than those spoken ``conversationally'' for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.
Revisiting speech rate and utterance length manipulations in stuttering speakers.

PubMed

Blomgren, Michael; Goberman, Alexander M

2008-01-01

The goal of this study was to evaluate stuttering frequency across a multidimensional (2x2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22 non-stuttering speakers). Participants were audio and video recorded while producing a spontaneous speech task and four different experimental speaking tasks. The four experimental speaking tasks involved reading a list of 45 words and a list 45 phrases two times each. One reading of each list involved speaking at a steady habitual rate (habitual rate tasks) and another reading involved producing each list at a variable speaking rate (variable rate tasks). For the variable rate tasks, participants were directed to produce words or phrases at randomly ordered slow, habitual, and fast rates. The stuttering speakers exhibited significantly more stuttering on the variable rate tasks than on the habitual rate tasks. In addition, the stuttering speakers exhibited significantly more stuttering on the first word of the phrase length tasks compared to the single word tasks. Overall, the results indicated that varying levels of both utterance length and temporal complexity function to modulate stuttering frequency in adult stuttering speakers. Discussion focuses on issues of speech performance according to stuttering severity and possible clinical implications. The reader will learn about and be able to: (1) describe the mediating effects of length of utterance and speech rate on the frequency of stuttering in stuttering speakers; (2) understand the rationale behind multidimensional skill performance matrices; and (3) describe possible applications of motor skill performance matrices to stuttering therapy.
Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures.

PubMed

Askenfelt, A G; Hammarberg, B

1986-03-01

The performance of seven acoustic measures of cycle-to-cycle variations (perturbations) in the speech waveform was compared. All measures were calculated automatically and applied on running speech. Three of the measures refer to the frequency of occurrence and severity of waveform perturbations in special selected parts of the speech, identified by means of the rate of change in the fundamental frequency. Three other measures refer to statistical properties of the distribution of the relative frequency differences between adjacent pitch periods. One perturbation measure refers to the percentage of consecutive pitch period differences with alternating signs. The acoustic measures were tested on tape recorded speech samples from 41 voice patients, before and after successful therapy. Scattergrams of acoustic waveform perturbation data versus an average of perceived deviant voice qualities, as rated by voice clinicians, are presented. The perturbation measures were compared with regard to the acoustic-perceptual correlation and their ability to discriminate between normal and pathological voice status. The standard deviation of the distribution of the relative frequency differences was suggested as the most useful acoustic measure of waveform perturbations for clinical applications.
Perceptual consequences of changes in vocoded speech parameters in various reverberation conditions.

PubMed

Drgas, Szymon; Blaszak, Magdalena A

2009-08-01

To study the perceptual consequences of changes in parameters of vocoded speech in various reverberation conditions. The 3 controlled variables were number of vocoder bands, instantaneous frequency change rate, and reverberation conditions. The effects were quantified in terms of (a) nonsense words' recognition scores for young normal-hearing listeners, (b) ease of listening based on the time of response (response delay), and (c) the subjective measure of difficulty (10-degree scale). It has been shown that the fine structure of a signal is a relevant cue in speech perception in reverberation conditions. The results obtained for different number of bands, frequency-modulation cutoff frequencies, and reverberation conditions have shown that all these parameters are important for speech perception in reverberation. Only slow variations in the instantaneous frequency (<50 Hz) seem to play a critical role in speech intelligibility in anechoic conditions. In reverberant enclosures, however, fast fluctuations of instantaneous frequency are also significant.

Can a model of overlapping gestures account for scanning speech patterns?

PubMed

Tjaden, K

1999-06-01

A simple acoustic model of overlapping, sliding gestures was used to evaluate whether coproduction was reduced for neurologic speakers with scanning speech patterns. F2 onset frequency was used as an acoustic measure of coproduction or gesture overlap. The effects of speaking rate (habitual versus fast) and utterance position (initial versus medial) on F2 frequency, and presumably gesture overlap, were examined. Regression analyses also were used to evaluate the extent to which across-repetition temporal variability in F2 trajectories could be explained as variation in coproduction for consonants and vowels. The lower F2 onset frequencies for disordered speakers suggested that gesture overlap was reduced for neurologic individuals with scanning speech. Speaking rate change did not influence F2 onset frequencies, and presumably gesture overlap, for healthy or disordered speakers. F2 onset frequency differences for utterance-initial and -medial repetitions were interpreted to suggest reduced coproduction for the utterance-initial position. The utterance-position effects on F2 onset frequency, however, likely were complicated by position-related differences in articulatory scaling. The results of the regression analysis indicated that gesture sliding accounts, in part, for temporal variability in F2 trajectories. Taken together, the results of this study provide support for the idea that speech production theory for healthy talkers helps to account for disordered speech production.
Speech analyzer

NASA Technical Reports Server (NTRS)

Lokerson, D. C. (Inventor)

1977-01-01

A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.
Revisiting Speech Rate and Utterance Length Manipulations in Stuttering Speakers

ERIC Educational Resources Information Center

Blomgren, Michael; Goberman, Alexander M.

2008-01-01

The goal of this study was to evaluate stuttering frequency across a multidimensional (2 x 2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22…
Evaluating signal-to-noise ratios, loudness, and related measures as indicators of airborne sound insulation.

PubMed

Park, H K; Bradley, J S

2009-09-01

Subjective ratings of the audibility, annoyance, and loudness of music and speech sounds transmitted through 20 different simulated walls were used to identify better single number ratings of airborne sound insulation. The first part of this research considered standard measures such as the sound transmission class the weighted sound reduction index (R(w)) and variations of these measures [H. K. Park and J. S. Bradley, J. Acoust. Soc. Am. 126, 208-219 (2009)]. This paper considers a number of other measures including signal-to-noise ratios related to the intelligibility of speech and measures related to the loudness of sounds. An exploration of the importance of the included frequencies showed that the optimum ranges of included frequencies were different for speech and music sounds. Measures related to speech intelligibility were useful indicators of responses to speech sounds but were not as successful for music sounds. A-weighted level differences, signal-to-noise ratios and an A-weighted sound transmission loss measure were good predictors of responses when the included frequencies were optimized for each type of sound. The addition of new spectrum adaptation terms to R(w) values were found to be the most practical approach for achieving more accurate predictions of subjective ratings of transmitted speech and music sounds.
Cortical activity during cued picture naming predicts individual differences in stuttering frequency

PubMed Central

Mock, Jeffrey R.; Foundas, Anne L.; Golob, Edward J.

2016-01-01

Objective Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. Methods This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture’s name and a control condition that minimized speech preparation. Results Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. Conclusions The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). Significance The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. PMID:27472545
Cortical activity during cued picture naming predicts individual differences in stuttering frequency.

PubMed

Mock, Jeffrey R; Foundas, Anne L; Golob, Edward J

2016-09-01

Developmental stuttering is characterized by fluent speech punctuated by stuttering events, the frequency of which varies among individuals and contexts. Most stuttering events occur at the beginning of an utterance, suggesting neural dynamics associated with stuttering may be evident during speech preparation. This study used EEG to measure cortical activity during speech preparation in men who stutter, and compared the EEG measures to individual differences in stuttering rate as well as to a fluent control group. Each trial contained a cue followed by an acoustic probe at one of two onset times (early or late), and then a picture. There were two conditions: a speech condition where cues induced speech preparation of the picture's name and a control condition that minimized speech preparation. Across conditions stuttering frequency correlated to cue-related EEG beta power and auditory ERP slow waves from early onset acoustic probes. The findings reveal two new cortical markers of stuttering frequency that were present in both conditions, manifest at different times, are elicited by different stimuli (visual cue, auditory probe), and have different EEG responses (beta power, ERP slow wave). The cue-target paradigm evoked brain responses that correlated to pre-experimental stuttering rate. Copyright © 2016 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Hemispheric asymmetry of auditory steady-state responses to monaural and diotic stimulation.

PubMed

Poelmans, Hanne; Luts, Heleen; Vandermosten, Maaike; Ghesquière, Pol; Wouters, Jan

2012-12-01

Amplitude modulations in the speech envelope are crucial elements for speech perception. These modulations comprise the processing rate at which syllabic (~3-7 Hz), and phonemic transitions occur in speech. Theories about speech perception hypothesize that each hemisphere in the auditory cortex is specialized in analyzing modulations at different timescales, and that phonemic-rate modulations of the speech envelope lateralize to the left hemisphere, whereas right lateralization occurs for slow, syllabic-rate modulations. In the present study, neural processing of phonemic- and syllabic-rate modulations was investigated with auditory steady-state responses (ASSRs). ASSRs to speech-weighted noise stimuli, amplitude modulated at 4, 20, and 80 Hz, were recorded in 30 normal-hearing adults. The 80 Hz ASSR is primarily generated by the brainstem, whereas 20 and 4 Hz ASSRs are mainly cortically evoked and relate to speech perception. Stimuli were presented diotically (same signal to both ears) and monaurally (one signal to the left or right ear). For 80 Hz, diotic ASSRs were larger than monaural responses. This binaural advantage decreased with decreasing modulation frequency. For 20 Hz, diotic ASSRs were equal to monaural responses, while for 4 Hz, diotic responses were smaller than monaural responses. Comparison of left and right ear stimulation demonstrated that, with decreasing modulation rate, a gradual change from ipsilateral to right lateralization occurred. Together, these results (1) suggest that ASSR enhancement to binaural stimulation decreases in the ascending auditory system and (2) indicate that right lateralization is more prominent for low-frequency ASSRs. These findings may have important consequences for electrode placement in clinical settings, as well as for the understanding of low-frequency ASSR generation.
Systematic studies of modified vocalization: effects of speech rate and instatement style during metronome stimulation.

PubMed

Davidow, Jason H; Bothe, Anne K; Richardson, Jessica D; Andreatta, Richard D

2010-12-01

This study introduces a series of systematic investigations intended to clarify the parameters of the fluency-inducing conditions (FICs) in stuttering. Participants included 11 adults, aged 20-63 years, with typical speech-production skills. A repeated measures design was used to examine the relationships between several speech production variables (vowel duration, voice onset time, fundamental frequency, intraoral pressure, pressure rise time, transglottal airflow, and phonated intervals) and speech rate and instatement style during metronome-entrained rhythmic speech. Measures of duration (vowel duration, voice onset time, and pressure rise time) differed across different metronome conditions. When speech rates were matched between the control condition and metronome condition, voice onset time was the only variable that changed. Results confirm that speech rate and instatement style can influence speech production variables during the production of fluency-inducing conditions. Future studies of normally fluent speech and of stuttered speech must control both features and should further explore the importance of voice onset time, which may be influenced by rate during metronome stimulation in a way that the other variables are not.
Spectrotemporal modulation sensitivity for hearing-impaired listeners: dependence on carrier center frequency and the relationship to speech intelligibility.

PubMed

Mehraei, Golbarg; Gallun, Frederick J; Leek, Marjorie R; Bernstein, Joshua G W

2014-07-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4-32 Hz), spectral ripple density [0.5-4 cycles/octave (c/o)] and carrier center frequency (500-4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4-12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements.
Speech effort measurement and stuttering: investigating the chorus reading effect.

PubMed

Ingham, Roger J; Warner, Allison; Byrd, Anne; Cotton, John

2006-06-01

The purpose of this study was to investigate chorus reading's (CR's) effect on speech effort during oral reading by adult stuttering speakers and control participants. The effect of a speech effort measurement highlighting strategy was also investigated. Twelve persistent stuttering (PS) adults and 12 normally fluent control participants completed 1-min base rate readings (BR-nonchorus) and CRs within a BR/CR/BR/CR/BR experimental design. Participants self-rated speech effort using a 9-point scale after each reading trial. Stuttering frequency, speech rate, and speech naturalness measures were also obtained. Instructions highlighting speech effort ratings during BR and CR phases were introduced after the first CR. CR improved speech effort ratings for the PS group, but the control group showed a reverse trend. Both groups' effort ratings were not significantly different during CR phases but were significantly poorer than the control group's effort ratings during BR phases. The highlighting strategy did not significantly change effort ratings. The findings show that CR will produce not only stutter-free and natural sounding speech but also reliable reductions in speech effort. However, these reductions do not reach effort levels equivalent to those achieved by normally fluent speakers, thereby conditioning its use as a gold standard of achievable normal fluency by PS speakers.
Neural Oscillations Carry Speech Rhythm through to Comprehension

PubMed Central

Peelle, Jonathan E.; Davis, Matthew H.

2012-01-01

A key feature of speech is the quasi-regular rhythmic information contained in its slow amplitude modulations. In this article we review the information conveyed by speech rhythm, and the role of ongoing brain oscillations in listeners’ processing of this content. Our starting point is the fact that speech is inherently temporal, and that rhythmic information conveyed by the amplitude envelope contains important markers for place and manner of articulation, segmental information, and speech rate. Behavioral studies demonstrate that amplitude envelope information is relied upon by listeners and plays a key role in speech intelligibility. Extending behavioral findings, data from neuroimaging – particularly electroencephalography (EEG) and magnetoencephalography (MEG) – point to phase locking by ongoing cortical oscillations to low-frequency information (~4–8 Hz) in the speech envelope. This phase modulation effectively encodes a prediction of when important events (such as stressed syllables) are likely to occur, and acts to increase sensitivity to these relevant acoustic cues. We suggest a framework through which such neural entrainment to speech rhythm can explain effects of speech rate on word and segment perception (i.e., that the perception of phonemes and words in connected speech is influenced by preceding speech rate). Neuroanatomically, acoustic amplitude modulations are processed largely bilaterally in auditory cortex, with intelligible speech resulting in differential recruitment of left-hemisphere regions. Notable among these is lateral anterior temporal cortex, which we propose functions in a domain-general fashion to support ongoing memory and integration of meaningful input. Together, the reviewed evidence suggests that low-frequency oscillations in the acoustic speech signal form the foundation of a rhythmic hierarchy supporting spoken language, mirrored by phase-locked oscillations in the human brain. PMID:22973251
Immediate Effect of Alcohol on Voice Tremor Parameters and Speech Motor Control

ERIC Educational Resources Information Center

Krishnan, Gayathri; Ghosh, Vipin

2017-01-01

The complex neuro-muscular interplay of speech subsystems is susceptible to alcohol intoxication. Published reports have studied language formulation and fundamental frequency measures pre- and post-intoxication. This study aimed at tapping the speech motor control measure using rate, consistency, and accuracy measures of diadochokinesis and…
Frequency modulation detection in cochlear implant subjects

NASA Astrophysics Data System (ADS)

Chen, Hongbin; Zeng, Fan-Gang

2004-10-01

Frequency modulation (FM) detection was investigated in acoustic and electric hearing to characterize cochlear-implant subjects' ability to detect dynamic frequency changes and to assess the relative contributions of temporal and spectral cues to frequency processing. Difference limens were measured for frequency upward sweeps, downward sweeps, and sinusoidal FM as a function of standard frequency and modulation rate. In electric hearing, factors including electrode position and stimulation level were also studied. Electric hearing data showed that the difference limen increased monotonically as a function of standard frequency regardless of the modulation type, the modulation rate, the electrode position, and the stimulation level. In contrast, acoustic hearing data showed that the difference limen was nearly a constant as a function of standard frequency. This difference was interpreted to mean that temporal cues are used only at low standard frequencies and at low modulation rates. At higher standard frequencies and modulation rates, the reliance on the place cue is increased, accounting for the better performance in acoustic hearing than for electric hearing with single-electrode stimulation. The present data suggest a speech processing strategy that encodes slow frequency changes using lower stimulation rates than those typically employed by contemporary cochlear-implant speech processors. .
Spectrotemporal modulation sensitivity for hearing-impaired listeners: Dependence on carrier center frequency and the relationship to speech intelligibility

PubMed Central

Mehraei, Golbarg; Gallun, Frederick J.; Leek, Marjorie R.; Bernstein, Joshua G. W.

2014-01-01

Poor speech understanding in noise by hearing-impaired (HI) listeners is only partly explained by elevated audiometric thresholds. Suprathreshold-processing impairments such as reduced temporal or spectral resolution or temporal fine-structure (TFS) processing ability might also contribute. Although speech contains dynamic combinations of temporal and spectral modulation and TFS content, these capabilities are often treated separately. Modulation-depth detection thresholds for spectrotemporal modulation (STM) applied to octave-band noise were measured for normal-hearing and HI listeners as a function of temporal modulation rate (4–32 Hz), spectral ripple density [0.5–4 cycles/octave (c/o)] and carrier center frequency (500–4000 Hz). STM sensitivity was worse than normal for HI listeners only for a low-frequency carrier (1000 Hz) at low temporal modulation rates (4–12 Hz) and a spectral ripple density of 2 c/o, and for a high-frequency carrier (4000 Hz) at a high spectral ripple density (4 c/o). STM sensitivity for the 4-Hz, 4-c/o condition for a 4000-Hz carrier and for the 4-Hz, 2-c/o condition for a 1000-Hz carrier were correlated with speech-recognition performance in noise after partialling out the audiogram-based speech-intelligibility index. Poor speech-reception and STM-detection performance for HI listeners may be related to a combination of reduced frequency selectivity and a TFS-processing deficit limiting the ability to track spectral-peak movements. PMID:24993215
Quality ratings of frequency-compressed speech by participants with extensive high-frequency dead regions in the cochlea

PubMed Central

Salorio-Corbetto, Marina; Baer, Thomas; Moore, Brian C. J.

2017-01-01

Abstract Objective: The objective was to assess the degradation of speech sound quality produced by frequency compression for listeners with extensive high-frequency dead regions (DRs). Design: Quality ratings were obtained using values of the starting frequency (Sf) of the frequency compression both below and above the estimated edge frequency, fe, of each DR. Thus, the value of Sf often fell below the lowest value currently used in clinical practice. Several compression ratios were used for each value of Sf. Stimuli were sentences processed via a prototype hearing aid based on Phonak Exélia Art P. Study sample: Five participants (eight ears) with extensive high-frequency DRs were tested. Results: Reductions of sound-quality produced by frequency compression were small to moderate. Ratings decreased significantly with decreasing Sf and increasing CR. The mean ratings were lowest for the lowest Sf and highest CR. Ratings varied across participants, with one participant rating frequency compression lower than no frequency compression even when Sf was above fe. Conclusions: Frequency compression degraded sound quality somewhat for this small group of participants with extensive high-frequency DRs. The degradation was greater for lower values of Sf relative to fe, and for greater values of CR. Results varied across participants. PMID:27724057
Fluency variation in adolescents.

PubMed

Furquim de Andrade, Claudia Regina; de Oliveira Martins, Vanessa

2007-10-01

The Speech Fluency Profile of fluent adolescent speakers of Brazilian Portuguese, were examined with respect to gender and neurolinguistic variations. Speech samples of 130 male and female adolescents, aged between 12;0 and 17;11 years were gathered. They were analysed according to type of speech disruption; speech rate; and frequency of speech disruptions. Statistical analysis did not find significant differences between genders for the variables studied. However, regarding the phases of adolescence (early: 12;0-14;11 years; late: 15;0-17;11 years), statistical differences were observed for all of the variables. As for neurolinguistic maturation, a decrease in the number of speech disruptions and an increase in speech rate occurred during the final phase of adolescence, indicating that the maturation of the motor and linguistic processes exerted an influence over the fluency profile of speech.
Acoustic changes in the speech of children with cerebral palsy following an intensive program of dysarthria therapy.

PubMed

Pennington, Lindsay; Lombardo, Eftychia; Steen, Nick; Miller, Nick

2018-01-01

The speech intelligibility of children with dysarthria and cerebral palsy has been observed to increase following therapy focusing on respiration and phonation. To determine if speech intelligibility change following intervention is associated with change in acoustic measures of voice. We recorded 16 young people with cerebral palsy and dysarthria (nine girls; mean age 14 years, SD = 2; nine spastic type, two dyskinetic, four mixed; one Worster-Drought) producing speech in two conditions (single words, connected speech) twice before and twice after therapy focusing on respiration, phonation and rate. In both single-word and connected speech we measured vocal intensity (root mean square-RMS), period-to-period variability (Shimmer APQ, Jitter RAP and PPQ) and harmonics-to-noise ratio (HNR). In connected speech we also measured mean fundamental frequency, utterance duration in seconds and speech and articulation rate (syllables/s with and without pauses respectively). All acoustic measures were made using Praat. Intelligibility was calculated in previous research. In single words statistically significant but very small reductions were observed in period-to-period variability following therapy: Shimmer APQ -0.15 (95% CI = -0.21 to -0.09); Jitter RAP -0.08 (95% CI = -0.14 to -0.01); Jitter PPQ -0.08 (95% CI = -0.15 to -0.01). No changes in period-to-period perturbation across phrases in connected speech were detected. However, changes in connected speech were observed in phrase length, rate and intensity. Following therapy, mean utterance duration increased by 1.11 s (95% CI = 0.37-1.86) when measured with pauses and by 1.13 s (95% CI = 0.40-1.85) when measured without pauses. Articulation rate increased by 0.07 syllables/s (95% CI = 0.02-0.13); speech rate increased by 0.06 syllables/s (95% CI = < 0.01-0.12); and intensity increased by 0.03 Pascals (95% CI = 0.02-0.04). There was a gradual reduction in mean fundamental frequency across all time points (-11.85 Hz, 95% CI = -19.84 to -3.86). Only increases in the intensity of single words (0.37 Pascals, 95% CI = 0.10-0.65) and reductions in fundamental frequency (-0.11 Hz, 95% CI = -0.21 to -0.02) in connected speech were associated with gains in intelligibility. Mean reductions in impairment in vocal function following therapy observed were small and most are unlikely to be clinically significant. Changes in vocal control did not explain improved intelligibility. © 2017 Royal College of Speech and Language Therapists.
Immediate effects of AAF devices on the characteristics of stuttering: a clinical analysis.

PubMed

Unger, Julia P; Glück, Christian W; Cholewa, Jürgen

2012-06-01

The present study investigated the immediate effects of altered auditory feedback (AAF) and one Inactive Condition (AAF parameters set to 0) on clinical attributes of stuttering during scripted and spontaneous speech. Two commercially available, portable AAF devices were used to create the combined delayed auditory feedback (DAF) and frequency altered feedback (FAF) effects. Thirty adults, who stutter, aged 18-68 years (M=36.5; SD=15.2), participated in this investigation. Each subject produced four sets of 5-min of oral reading, three sets of 5-min monologs as well as 10-min dialogs. These speech samples were analyzed to detect changes in descriptive features of stuttering (frequency, duration, speech/articulatory rate, core behaviors) across the various speech samples and within two SSI-4 (Riley, 2009) based severity ratings. A statistically significant difference was found in the frequency of stuttered syllables (%SS) during both Active Device conditions (p=.000) for all speech samples. The most sizable reductions in %SS occurred within scripted speech. In the analysis of stuttering type, it was found that blocks were reduced significantly (Device A: p=.017; Device B: p=.049). To evaluate the impact on severe and mild stuttering, participants were grouped into two SSI-4 based categories; mild and moderate-severe. During the Inactive Condition those participants within the moderate-severe group (p=.024) showed a statistically significant reduction in overall disfluencies. This result indicates, that active AAF parameters alone may not be the sole cause of a fluency-enhancement when using a technical speech aid. The reader will learn and be able to describe: (1) currently available scientific evidence on the use of altered auditory feedback (AAF) during scripted and spontaneous speech, (2) which characteristics of stuttering are impacted by an AAF device (frequency, duration, core behaviors, speech & articulatory rate, stuttering severity), (3) the effects of an Inactive Condition on people who stutter (PWS) falling into two severity groups, and (4) how the examined participants perceived the use of AAF devices. Copyright © 2012 Elsevier Inc. All rights reserved.
Is There a Relationship Between Tic Frequency and Physiological Arousal? Examination in a Sample of Children with Co-Occurring Tic and Anxiety Disorders

PubMed Central

Conelea, Christine A.; Ramanujam, Krishnapriya; Walther, Michael R.; Freeman, Jennifer B.; Garcia, Abbe M.

2014-01-01

Stress is the contextual variable most commonly implicated in tic exacerbations. However, research examining associations between tics, stressors, and the biological stress response has yielded mixed results. This study examined whether tics occur at a greater frequency during discrete periods of heightened physiological arousal. Children with co-occurring tic and anxiety disorders (n = 8) completed two stress induction tasks (discussion of family conflict, public speech). Observational (tic frequencies) and physiological (heart rate) data were synchronized using The Observer XT, and tic frequencies were compared across periods of high and low heart rate. Tic frequencies across the entire experiment did not increase during periods of higher heart rate. During the speech task, tic frequencies were significantly lower during periods of higher heart rate. Results suggest that tic exacerbations may not be associated with heightened physiological arousal and highlight the need for further tic research using integrated measurement of behavioral and biological processes. PMID:24662238
Speech transformations based on a sinusoidal representation

NASA Astrophysics Data System (ADS)

Quatieri, T. E.; McAulay, R. J.

1986-05-01

A new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformation including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency and rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism that has been shown to produce synthetic speech that preserves the waveform shape and is essentially perceptually indistinguishable from the original. Although the analysis/synthesis system originally was designed for single-speaker signals, it is equally capable of recovering and modifying nonspeech signals such as music; multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.

Speech and pause characteristics associated with voluntary rate reduction in Parkinson's disease and Multiple Sclerosis.

PubMed

Tjaden, Kris; Wilding, Greg

2011-01-01

The primary purpose of this study was to investigate how speakers with Parkinson's disease (PD) and Multiple Sclerosis (MS) accomplish voluntary reductions in speech rate. A group of talkers with no history of neurological disease was included for comparison. This study was motivated by the idea that knowledge of how speakers with dysarthria voluntarily accomplish a reduced speech rate would contribute toward a descriptive model of speaking rate change in dysarthria. Such a model has the potential to assist in identifying rate control strategies to receive focus in clinical treatment programs and also would advance understanding of global speech timing in dysarthria. All speakers read a passage in Habitual and Slow conditions. Speech rate, articulation rate, pause duration, and pause frequency were measured. All speaker groups adjusted articulation time as well as pause time to reduce overall speech rate. Group differences in how voluntary rate reduction was accomplished were primarily one of quantity or degree. Overall, a slower-than-normal rate was associated with a reduced articulation rate, shorter speech runs that included fewer syllables, and longer more frequent pauses. Taken together, these results suggest that existing skills or strategies used by patients should be emphasized in dysarthria training programs focusing on rate reduction. Results further suggest that a model of voluntary speech rate reduction based on neurologically normal speech shows promise as being applicable for mild to moderate dysarthria. The reader will be able to: (1) describe the importance of studying voluntary adjustments in speech rate in dysarthria, (2) discuss how speakers with Parkinson's disease and Multiple Sclerosis adjust articulation time and pause time to slow speech rate. Copyright © 2011 Elsevier Inc. All rights reserved.
Effects of Speech Output on Maintenance of Requesting and Frequency of Vocalizations in Three Children with Developmental Disabilities.

PubMed

Sigafoos, Jeff; Didden, Robert; O'Reilly, Mark

2003-01-01

We evaluated the role of digitized speech output on the maintenance of requesting and frequency of vocalizations in three children with developmental disabilities. The children were taught to request access to preferred objects using an augmentative communication speech-generating device (SGD). Following acquisition, rates of requesting and vocalizations were compared across two conditions (speech output on versus speech output off) that were alternated on a session-by-session basis. There were no major or consistent differences across the two conditions for the three children, suggesting that access to preferred objects was the critical variable maintaining use of the SGDs. The results also suggest that feedback in the form of digitized speech from the SGD did not inhibit vocalizations. One child began to speak single words during the latter part of the study, suggesting that in some cases AAC intervention involving SGDs may facilitate speech.
Formant-Frequency Variation and Informational Masking of Speech by Extraneous Formants: Evidence Against Dynamic and Speech-Specific Acoustical Constraints

PubMed Central

2014-01-01

How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068
Clear Speech Variants: An Acoustic Study in Parkinson's Disease.

PubMed

Lam, Jennifer; Tjaden, Kris

2016-08-01

The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems.
Pilot Workload and Speech Analysis: A Preliminary Investigation

NASA Technical Reports Server (NTRS)

Bittner, Rachel M.; Begault, Durand R.; Christopher, Bonny R.

2013-01-01

Prior research has questioned the effectiveness of speech analysis to measure the stress, workload, truthfulness, or emotional state of a talker. The question remains regarding the utility of speech analysis for restricted vocabularies such as those used in aviation communications. A part-task experiment was conducted in which participants performed Air Traffic Control read-backs in different workload environments. Participant's subjective workload and the speech qualities of fundamental frequency (F0) and articulation rate were evaluated. A significant increase in subjective workload rating was found for high workload segments. F0 was found to be significantly higher during high workload while articulation rates were found to be significantly slower. No correlation was found to exist between subjective workload and F0 or articulation rate.
Studies in automatic speech recognition and its application in aerospace

NASA Astrophysics Data System (ADS)

Taylor, Michael Robinson

Human communication is characterized in terms of the spectral and temporal dimensions of speech waveforms. Electronic speech recognition strategies based on Dynamic Time Warping and Markov Model algorithms are described and typical digit recognition error rates are tabulated. The application of Direct Voice Input (DVI) as an interface between man and machine is explored within the context of civil and military aerospace programmes. Sources of physical and emotional stress affecting speech production within military high performance aircraft are identified. Experimental results are reported which quantify fundamental frequency and coarse temporal dimensions of male speech as a function of the vibration, linear acceleration and noise levels typical of aerospace environments; preliminary indications of acoustic phonetic variability reported by other researchers are summarized. Connected whole-word pattern recognition error rates are presented for digits spoken under controlled Gz sinusoidal whole-body vibration. Correlations are made between significant increases in recognition error rate and resonance of the abdomen-thorax and head subsystems of the body. The phenomenon of vibrato style speech produced under low frequency whole-body Gz vibration is also examined. Interactive DVI system architectures and avionic data bus integration concepts are outlined together with design procedures for the efficient development of pilot-vehicle command and control protocols.
Speech fluency profile in Williams-Beuren syndrome: a preliminary study.

PubMed

Rossi, Natalia Freitas; Souza, Deise Helena de; Moretti-Ferreira, Danilo; Giacheti, Célia Maria

2009-01-01

the speech fluency pattern attributed to individuals with Williams-Beuren syndrome (WBS) is supported by the effectiveness of the phonological loop. Some studies have reported the occurrence of speech disruptions caused by lexical and semantic deficits. However, the type and frequency of such speech disruptions has not been well elucidated. to determine the speech fluency profile of individuals with WBS and to compare the speech performance of these individuals to a control group matched by gender and mental age. Twelve subjects with Williams-Beuren syndrome, chronologically aged between 6.6 and 23.6 years and mental age ranging from 4.8 to 14.3 years, were evaluated. They were compared with another group consisting of 12 subjects with similar mental age and with no speech or learning difficulties. Speech fluency parameters were assessed according to the ABFW Language Test: type and frequency of speech disruptions and speech rate. The obtained results were compared between the groups. In comparison with individuals of similar mental age and typical speech and language development, the group with Williams-Beuren syndrome showed a greater percentage of speech discontinuity, and an increased frequency of common hesitations and word repetition. The speech fluency profile presented by individuals with WBS in this study suggests that the presence of disfluencies can be caused by deficits in the lexical, semantic, and syntactic processing of verbal information. The authors stress that further systematic investigations on the subject are warranted.
Comparisons of stuttering frequency during and after speech initiation in unaltered feedback, altered auditory feedback and choral speech conditions.

PubMed

Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew

2009-01-01

Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate speech before it can go into effect and, therefore, it might not be as helpful as true choral speech during speech initiation. To examine how AAF and choral speech differentially enhance fluency during speech initiation and in subsequent portions of utterances. Ten participants who stuttered read passages without altered feedback (NAF), under four AAF conditions and under a true choral speech condition. Each condition was blocked into ten 10 s trials separated by 5 s intervals so each trial required 'cold' speech initiation. In the first analysis, comparisons of stuttering frequencies were made across conditions. A second, finer grain analysis involved examining stuttering frequencies on the initial syllable, the subsequent four syllables produced and the five syllables produced immediately after the midpoint of each trial. On average, AAF reduced stuttering by approximately 68% relative to the NAF condition. Stuttering frequencies on the initial syllables were considerably higher than on the other syllables analysed (0.45 and 0.34 for NAF and AAF conditions, respectively). After the first syllable was produced, stuttering frequencies dropped precipitously and remained stable. However, this drop in stuttering frequency was significantly greater (approximately 84%) in the AAF conditions than in the NAF condition (approximately 66%) with frequencies on the last nine syllables analysed averaging 0.15 and 0.05 for NAF and AAF conditions, respectively. In the true choral speech condition, stuttering was virtually (approximately 98%) eliminated across all utterances and all syllable positions. Altered auditory feedback effectively inhibits stuttering immediately after speech has been initiated. However, unlike a true choral signal, which is exogenously initiated and offers the most complete fluency enhancement, AAF requires speech to be initiated by the user and 'fed back' before it can directly inhibit stuttering. It is suggested that AAF can be a viable clinical option for those who stutter and should often be used in combination with therapeutic techniques, particularly those that aid speech initiation. The substantially higher rate of stuttering occurring on initiation supports a hypothesis that overt stuttering events help 'release' and 'inhibit' central stuttering blocks. This perspective is examined in the context of internal models and mirror neurons.
Articulation rate and its relationship to disfluency type, duration, and temperament in preschool children who stutter.

PubMed

Tumanova, Victoria; Zebrowski, Patricia M; Throneburg, Rebecca N; Kulak Kayikci, Mavis E

2011-01-01

The purpose of this study was to examine the relationship between articulation rate, frequency and duration of disfluencies of different types, and temperament in preschool children who stutter (CWS). In spontaneous speech samples from 19 CWS (mean age=3:9; years:months), we measured articulation rate, the frequency and duration of (a) sound prolongations; (b) sound-syllable repetitions; (c) single syllable whole word repetitions; and (d) clusters. Temperament was assessed with the Children's Behavior Questionnaire (Rothbart et al., 2001). There was a significant negative correlation between articulation rate and average duration of sound prolongations (p<0.01), and between articulation rate and frequency of stuttering-like disfluencies (SLDs) (p<0.05). No other relationships proved statistically significant. Results do not support models of stuttering development that implicate particular characteristics of temperament as proximal contributors to stuttering; however, this is likely due to the fact that current methods, including the ones used in the present study, do not allow for the identification of a functional relationship between temperament and speech production. Findings do indicate that for some CWS, relatively longer sound prolongations co-occur with relatively slower speech rate, which suggests that sound prolongations, across a range of durations, may represent a distinct type of SLD, not just in their obvious perceptual characteristics, but in their potential influence on overall speech production at multiple levels. Readers will be able to describe the relationship between stuttering-like disfluencies, articulation rate and temperament in children who stutter, and discuss different measurements of articulation rate. Copyright © 2010 Elsevier Inc. All rights reserved.
A novel speech processing algorithm based on harmonicity cues in cochlear implant

NASA Astrophysics Data System (ADS)

Wang, Jian; Chen, Yousheng; Zhang, Zongping; Chen, Yan; Zhang, Weifeng

2017-08-01

This paper proposed a novel speech processing algorithm in cochlear implant, which used harmonicity cues to enhance tonal information in Mandarin Chinese speech recognition. The input speech was filtered by a 4-channel band-pass filter bank. The frequency ranges for the four bands were: 300-621, 621-1285, 1285-2657, and 2657-5499 Hz. In each pass band, temporal envelope and periodicity cues (TEPCs) below 400 Hz were extracted by full wave rectification and low-pass filtering. The TEPCs were modulated by a sinusoidal carrier, the frequency of which was fundamental frequency (F0) and its harmonics most close to the center frequency of each band. Signals from each band were combined together to obtain an output speech. Mandarin tone, word, and sentence recognition in quiet listening conditions were tested for the extensively used continuous interleaved sampling (CIS) strategy and the novel F0-harmonic algorithm. Results found that the F0-harmonic algorithm performed consistently better than CIS strategy in Mandarin tone, word, and sentence recognition. In addition, sentence recognition rate was higher than word recognition rate, as a result of contextual information in the sentence. Moreover, tone 3 and 4 performed better than tone 1 and tone 2, due to the easily identified features of the former. In conclusion, the F0-harmonic algorithm could enhance tonal information in cochlear implant speech processing due to the use of harmonicity cues, thereby improving Mandarin tone, word, and sentence recognition. Further study will focus on the test of the F0-harmonic algorithm in noisy listening conditions.
[Speech fluency developmental profile in Brazilian Portuguese speakers].

PubMed

Martins, Vanessa de Oliveira; Andrade, Claudia Regina Furquim de

2008-01-01

speech fluency varies from one individual to the next, fluent or stutterer, depending on several factors. Studies that investigate the influence of age on fluency patterns have been identified; however these differences were investigated in isolated age groups. Studies about life span fluency variations were not found. to verify the speech fluency developmental profile. speech samples of 594 fluent participants of both genders, with ages between 2:0 and 99:11 years, speakers of the Brazilian Portuguese language, were analyzed. Participants were grouped as follows: pre-scholars, scholars, early adolescence, late adolescence, adults and elderlies. Speech samples were analyzed according to the Speech Fluency Profile variables and were compared regarding: typology of speech disruptions (typical and less typical), speech rate (words and syllables per minute) and frequency of speech disruptions (percentage of speech discontinuity). although isolated variations were identified, overall there was no significant difference between the age groups for the speech disruption indexes (typical and less typical speech disruptions and percentage of speech discontinuity). Significant differences were observed between the groups when considering speech rate. the development of the neurolinguistic system for speech fluency, in terms of speech disruptions, seems to stabilize itself during the first years of life, presenting no alterations during the life span. Indexes of speech rate present variations in the age groups, indicating patterns of acquisition, development, stabilization and degeneration.
Improving Understanding of Emotional Speech Acoustic Content

NASA Astrophysics Data System (ADS)

Tinnemore, Anna

Children with cochlear implants show deficits in identifying emotional intent of utterances without facial or body language cues. A known limitation to cochlear implants is the inability to accurately portray the fundamental frequency contour of speech which carries the majority of information needed to identify emotional intent. Without reliable access to the fundamental frequency, other methods of identifying vocal emotion, if identifiable, could be used to guide therapies for training children with cochlear implants to better identify vocal emotion. The current study analyzed recordings of adults speaking neutral sentences with a set array of emotions in a child-directed and adult-directed manner. The goal was to identify acoustic cues that contribute to emotion identification that may be enhanced in child-directed speech, but are also present in adult-directed speech. Results of this study showed that there were significant differences in the variation of the fundamental frequency, the variation of intensity, and the rate of speech among emotions and between intended audiences.
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects.

PubMed

Skoog Waller, Sara; Eriksson, Mårten

2016-01-01

The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics ( f 0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f 0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f 0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20-25, 40-45, and 60-65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers' age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency ( f 0 ) and speech rate when attempting to sound younger and decreased f 0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f 0 , as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended.
Sinusoidal transform coding

NASA Technical Reports Server (NTRS)

Mcaulay, Robert J.; Quatieri, Thomas F.

1988-01-01

It has been shown that an analysis/synthesis system based on a sinusoidal representation of speech leads to synthetic speech that is essentially perceptually indistinguishable from the original. Strategies for coding the amplitudes, frequencies and phases of the sine waves have been developed that have led to a multirate coder operating at rates from 2400 to 9600 bps. The encoded speech is highly intelligible at all rates with a uniformly improving quality as the data rate is increased. A real-time fixed-point implementation has been developed using two ADSP2100 DSP chips. The methods used for coding and quantizing the sine-wave parameters for operation at the various frame rates are described.
Temporal processing of speech in a time-feature space

NASA Astrophysics Data System (ADS)

Avendano, Carlos

1997-09-01

The performance of speech communication systems often degrades under realistic environmental conditions. Adverse environmental factors include additive noise sources, room reverberation, and transmission channel distortions. This work studies the processing of speech in the temporal-feature or modulation spectrum domain, aiming for alleviation of the effects of such disturbances. Speech reflects the geometry of the vocal organs, and the linguistically dominant component is in the shape of the vocal tract. At any given point in time, the shape of the vocal tract is reflected in the short-time spectral envelope of the speech signal. The rate of change of the vocal tract shape appears to be important for the identification of linguistic components. This rate of change, or the rate of change of the short-time spectral envelope can be described by the modulation spectrum, i.e. the spectrum of the time trajectories described by the short-time spectral envelope. For a wide range of frequency bands, the modulation spectrum of speech exhibits a maximum at about 4 Hz, the average syllabic rate. Disturbances often have modulation frequency components outside the speech range, and could in principle be attenuated without significantly affecting the range with relevant linguistic information. Early efforts for exploiting the modulation spectrum domain (temporal processing), such as the dynamic cepstrum or the RASTA processing, used ad hoc designed processing and appear to be suboptimal. As a major contribution, in this dissertation we aim for a systematic data-driven design of temporal processing. First we analytically derive and discuss some properties and merits of temporal processing for speech signals. We attempt to formalize the concept and provide a theoretical background which has been lacking in the field. In the experimental part we apply temporal processing to a number of problems including adaptive noise reduction in cellular telephone environments, reduction of reverberation for speech enhancement, and improvements on automatic recognition of speech degraded by linear distortions and reverberation.
Effects of Age and Hearing Loss on the Relationship between Discrimination of Stochastic Frequency Modulation and Speech Perception

PubMed Central

Sheft, Stanley; Shafiro, Valeriy; Lorenzi, Christian; McMullen, Rachel; Farrell, Caitlin

2012-01-01

Objective The frequency modulation (FM) of speech can convey linguistic information and also enhance speech-stream coherence and segmentation. Using a clinically oriented approach, the purpose of the present study was to examine the effects of age and hearing loss on the ability to discriminate between stochastic patterns of low-rate FM and determine whether difficulties in speech perception experienced by older listeners relate to a deficit in this ability. Design Data were collected from 18 normal-hearing young adults, and 18 participants who were at least 60 years old, nine normal-hearing and nine with a mild-to-moderate sensorineural hearing loss. Using stochastic frequency modulators derived from 5-Hz lowpass noise applied to a 1-kHz carrier, discrimination thresholds were measured in terms of frequency excursion (ΔF) both in quiet and with a speech-babble masker present, stimulus duration, and signal-to-noise ratio (SNRFM) in the presence of a speech-babble masker. Speech perception ability was evaluated using Quick Speech-in-Noise (QuickSIN) sentences in four-talker babble. Results Results showed a significant effect of age, but not of hearing loss among the older listeners, for FM discrimination conditions with masking present (ΔF and SNRFM). The effect of age was not significant for the FM measures based on stimulus duration. ΔF and SNRFM were also the two conditions for which performance was significantly correlated with listener age when controlling for effect of hearing loss as measured by pure-tone average. With respect to speech-in-noise ability, results from the SNRFM condition were significantly correlated with QuickSIN performance. Conclusions Results indicate that aging is associated with reduced ability to discriminate moderate-duration patterns of low-rate stochastic FM. Furthermore, the relationship between QuickSIN performance and the SNRFM thresholds suggests that the difficulty experienced by older listeners with speech-in-noise processing may in part relate to diminished ability to process slower fine-structure modulation at low sensation levels. Results thus suggest that clinical consideration of stochastic FM discrimination measures may offer a fuller picture of auditory processing abilities. PMID:22790319
Speech Enhancement Using Gaussian Scale Mixture Models

PubMed Central

Hao, Jiucang; Lee, Te-Won; Sejnowski, Terrence J.

2011-01-01

This paper presents a novel probabilistic approach to speech enhancement. Instead of a deterministic logarithmic relationship, we assume a probabilistic relationship between the frequency coefficients and the log-spectra. The speech model in the log-spectral domain is a Gaussian mixture model (GMM). The frequency coefficients obey a zero-mean Gaussian whose covariance equals to the exponential of the log-spectra. This results in a Gaussian scale mixture model (GSMM) for the speech signal in the frequency domain, since the log-spectra can be regarded as scaling factors. The probabilistic relation between frequency coefficients and log-spectra allows these to be treated as two random variables, both to be estimated from the noisy signals. Expectation-maximization (EM) was used to train the GSMM and Bayesian inference was used to compute the posterior signal distribution. Because exact inference of this full probabilistic model is computationally intractable, we developed two approaches to enhance the efficiency: the Laplace method and a variational approximation. The proposed methods were applied to enhance speech corrupted by Gaussian noise and speech-shaped noise (SSN). For both approximations, signals reconstructed from the estimated frequency coefficients provided higher signal-to-noise ratio (SNR) and those reconstructed from the estimated log-spectra produced lower word recognition error rate because the log-spectra fit the inputs to the recognizer better. Our algorithms effectively reduced the SSN, which algorithms based on spectral analysis were not able to suppress. PMID:21359139
Acoustic analysis of speech variables during depression and after improvement.

PubMed

Nilsonne, A

1987-09-01

Speech recordings were made of 16 depressed patients during depression and after clinical improvement. The recordings were analyzed using a computer program which extracts acoustic parameters from the fundamental frequency contour of the voice. The percent pause time, the standard deviation of the voice fundamental frequency distribution, the standard deviation of the rate of change of the voice fundamental frequency and the average speed of voice change were found to correlate to the clinical state of the patient. The mean fundamental frequency, the total reading time and the average rate of change of the voice fundamental frequency did not differ between the depressed and the improved group. The acoustic measures were more strongly correlated to the clinical state of the patient as measured by global depression scores than to single depressive symptoms such as retardation or agitation.
The cerebral control of speech tempo: opposite relationship between speaking rate and BOLD signal changes at striatal and cerebellar structures.

PubMed

Riecker, Axel; Kassubek, Jan; Gröschel, Klaus; Grodd, Wolfgang; Ackermann, Hermann

2006-01-01

So far, only sparse data on the cerebral organization of speech motor control are available. In order to further delineate the neural basis of articulatory functions, fMRI measurements were performed during self-paced syllable repetitions at six different frequencies (2-6 Hz). Bilateral hemodynamic main effects, calculated across all syllable rates considered, emerged within sensorimotor cortex, putamen, thalamus and cerebellum. At the level of the caudatum and the anterior insula, activation was found restricted to the left side. The computation of rate-to-response functions of the BOLD signal revealed a negative linear relationship between syllable frequency and response magnitude within the striatum whereas cortical areas and cerebellar hemispheres exhibited an opposite activation pattern. Dysarthric patients with basal ganglia disorders show unimpaired or even accelerated speaking rate whereas, in contrast, cerebellar dysfunctions give rise to slowed speech tempo which does not fall below a rate of about 3 Hz. The observed rate-to-response profiles of the BOLD signal thus might help to elucidate the pathophysiological mechanisms of dysarthric deficits in central motor disorders.
Tuning of Human Modulation Filters Is Carrier-Frequency Dependent

PubMed Central

Simpson, Andrew J. R.; Reiss, Joshua D.; McAlpine, David

2013-01-01

Recent studies employing speech stimuli to investigate ‘cocktail-party’ listening have focused on entrainment of cortical activity to modulations at syllabic (5 Hz) and phonemic (20 Hz) rates. The data suggest that cortical modulation filters (CMFs) are dependent on the sound-frequency channel in which modulations are conveyed, potentially underpinning a strategy for separating speech from background noise. Here, we characterize modulation filters in human listeners using a novel behavioral method. Within an ‘inverted’ adaptive forced-choice increment detection task, listening level was varied whilst contrast was held constant for ramped increments with effective modulation rates between 0.5 and 33 Hz. Our data suggest that modulation filters are tonotopically organized (i.e., vary along the primary, frequency-organized, dimension). This suggests that the human auditory system is optimized to track rapid (phonemic) modulations at high sound-frequencies and slow (prosodic/syllabic) modulations at low frequencies. PMID:24009759

Gender Identification Using High-Frequency Speech Energy: Effects of Increasing the Low-Frequency Limit.

PubMed

Donai, Jeremy J; Halbritter, Rachel M

The purpose of this study was to investigate the ability of normal-hearing listeners to use high-frequency energy for gender identification from naturally produced speech signals. Two experiments were conducted using a repeated-measures design. Experiment 1 investigated the effects of increasing high-pass filter cutoff (i.e., increasing the low-frequency spectral limit) on gender identification from naturally produced vowel segments. Experiment 2 studied the effects of increasing high-pass filter cutoff on gender identification from naturally produced sentences. Confidence ratings for the gender identification task were also obtained for both experiments. Listeners in experiment 1 were capable of extracting talker gender information at levels significantly above chance from vowel segments high-pass filtered up to 8.5 kHz. Listeners in experiment 2 also performed above chance on the gender identification task from sentences high-pass filtered up to 12 kHz. Cumulatively, the results of both experiments provide evidence that normal-hearing listeners can utilize information from the very high-frequency region (above 4 to 5 kHz) of the speech signal for talker gender identification. These findings are at variance with current assumptions regarding the perceptual information regarding talker gender within this frequency region. The current results also corroborate and extend previous studies of the use of high-frequency speech energy for perceptual tasks. These findings have potential implications for the study of information contained within the high-frequency region of the speech spectrum and the role this region may play in navigating the auditory scene, particularly when the low-frequency portion of the spectrum is masked by environmental noise sources or for listeners with substantial hearing loss in the low-frequency region and better hearing sensitivity in the high-frequency region (i.e., reverse slope hearing loss).
Speech Enhancement, Gain, and Noise Spectrum Adaptation Using Approximate Bayesian Estimation

PubMed Central

Hao, Jiucang; Attias, Hagai; Nagarajan, Srikantan; Lee, Te-Won; Sejnowski, Terrence J.

2010-01-01

This paper presents a new approximate Bayesian estimator for enhancing a noisy speech signal. The speech model is assumed to be a Gaussian mixture model (GMM) in the log-spectral domain. This is in contrast to most current models in frequency domain. Exact signal estimation is a computationally intractable problem. We derive three approximations to enhance the efficiency of signal estimation. The Gaussian approximation transforms the log-spectral domain GMM into the frequency domain using minimal Kullback–Leiber (KL)-divergency criterion. The frequency domain Laplace method computes the maximum a posteriori (MAP) estimator for the spectral amplitude. Correspondingly, the log-spectral domain Laplace method computes the MAP estimator for the log-spectral amplitude. Further, the gain and noise spectrum adaptation are implemented using the expectation–maximization (EM) algorithm within the GMM under Gaussian approximation. The proposed algorithms are evaluated by applying them to enhance the speeches corrupted by the speech-shaped noise (SSN). The experimental results demonstrate that the proposed algorithms offer improved signal-to-noise ratio, lower word recognition error rate, and less spectral distortion. PMID:20428253
A speech processing study using an acoustic model of a multiple-channel cochlear implant

NASA Astrophysics Data System (ADS)

Xu, Ying

1998-10-01

A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.
Cognitive Load in Voice Therapy Carry-Over Exercises.

PubMed

Iwarsson, Jenny; Morris, David Jackson; Balling, Laura Winther

2017-01-01

The cognitive load generated by online speech production may vary with the nature of the speech task. This article examines 3 speech tasks used in voice therapy carry-over exercises, in which a patient is required to adopt and automatize new voice behaviors, ultimately in daily spontaneous communication. Twelve subjects produced speech in 3 conditions: rote speech (weekdays), sentences in a set form, and semispontaneous speech. Subjects simultaneously performed a secondary visual discrimination task for which response times were measured. On completion of each speech task, subjects rated their experience on a questionnaire. Response times from the secondary, visual task were found to be shortest for the rote speech, longer for the semispontaneous speech, and longest for the sentences within the set framework. Principal components derived from the subjective ratings were found to be linked to response times on the secondary visual task. Acoustic measures reflecting fundamental frequency distribution and vocal fold compression varied across the speech tasks. The results indicate that consideration should be given to the selection of speech tasks during the process leading to automation of revised speech behavior and that self-reports may be a reliable index of cognitive load.
Neural Entrainment to Rhythmically Presented Auditory, Visual, and Audio-Visual Speech in Children

PubMed Central

Power, Alan James; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2012-01-01

Auditory cortical oscillations have been proposed to play an important role in speech perception. It is suggested that the brain may take temporal “samples” of information from the speech stream at different rates, phase resetting ongoing oscillations so that they are aligned with similar frequency bands in the input (“phase locking”). Information from these frequency bands is then bound together for speech perception. To date, there are no explorations of neural phase locking and entrainment to speech input in children. However, it is clear from studies of language acquisition that infants use both visual speech information and auditory speech information in learning. In order to study neural entrainment to speech in typically developing children, we use a rhythmic entrainment paradigm (underlying 2 Hz or delta rate) based on repetition of the syllable “ba,” presented in either the auditory modality alone, the visual modality alone, or as auditory-visual speech (via a “talking head”). To ensure attention to the task, children aged 13 years were asked to press a button as fast as possible when the “ba” stimulus violated the rhythm for each stream type. Rhythmic violation depended on delaying the occurrence of a “ba” in the isochronous stream. Neural entrainment was demonstrated for all stream types, and individual differences in standardized measures of language processing were related to auditory entrainment at the theta rate. Further, there was significant modulation of the preferred phase of auditory entrainment in the theta band when visual speech cues were present, indicating cross-modal phase resetting. The rhythmic entrainment paradigm developed here offers a method for exploring individual differences in oscillatory phase locking during development. In particular, a method for assessing neural entrainment and cross-modal phase resetting would be useful for exploring developmental learning difficulties thought to involve temporal sampling, such as dyslexia. PMID:22833726
Vocal Age Disguise: The Role of Fundamental Frequency and Speech Rate and Its Perceived Effects

PubMed Central

Skoog Waller, Sara; Eriksson, Mårten

2016-01-01

The relationship between vocal characteristics and perceived age is of interest in various contexts, as is the possibility to affect age perception through vocal manipulation. A few examples of such situations are when age is staged by actors, when ear witnesses make age assessments based on vocal cues only or when offenders (e.g., online groomers) disguise their voice to appear younger or older. This paper investigates how speakers spontaneously manipulate two age related vocal characteristics (f0 and speech rate) in attempt to sound younger versus older than their true age, and if the manipulations correspond to actual age related changes in f0 and speech rate (Study 1). Further aims of the paper is to determine how successful vocal age disguise is by asking listeners to estimate the age of generated speech samples (Study 2) and to examine whether or not listeners use f0 and speech rate as cues to perceived age. In Study 1, participants from three age groups (20–25, 40–45, and 60–65 years) agreed to read a short text under three voice conditions. There were 12 speakers in each age group (six women and six men). They used their natural voice in one condition, attempted to sound 20 years younger in another and 20 years older in a third condition. In Study 2, 60 participants (listeners) listened to speech samples from the three voice conditions in Study 1 and estimated the speakers’ age. Each listener was exposed to all three voice conditions. The results from Study 1 indicated that the speakers increased fundamental frequency (f0) and speech rate when attempting to sound younger and decreased f0 and speech rate when attempting to sound older. Study 2 showed that the voice manipulations had an effect in the sought-after direction, although the achieved mean effect was only 3 years, which is far less than the intended effect of 20 years. Moreover, listeners used speech rate, but not f0, as a cue to speaker age. It was concluded that age disguise by voice can be achieved by naïve speakers even though the perceived effect was smaller than intended. PMID:27917144
Lexical frequency and acoustic reduction in spoken Dutch

NASA Astrophysics Data System (ADS)

Pluymaekers, Mark; Ernestus, Mirjam; Baayen, R. Harald

2005-10-01

This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.
The influence of speaking rate on nasality in the speech of hearing-impaired individuals.

PubMed

Dwyer, Claire H; Robb, Michael P; O'Beirne, Greg A; Gilbert, Harvey R

2009-10-01

The purpose of this study was to determine whether deliberate increases in speaking rate would serve to decrease the amount of nasality in the speech of severely hearing-impaired individuals. The participants were 11 severely to profoundly hearing-impaired students, ranging in age from 12 to 19 years (M = 16 years). Each participant provided a baseline speech sample (R1) followed by 3 training sessions during which participants were trained to increase their speaking rate. Following the training sessions, a second speech sample was obtained (R2). Acoustic and perceptual analyses of the speech samples obtained at R1 and R2 were undertaken. The acoustic analysis focused on changes in first (F(1)) and second (F(2)) formant frequency and formant bandwidths. The perceptual analysis involved listener ratings of the speech samples (at R1 and R2) for perceived nasality. Findings indicated a significant increase in speaking rate at R2. In addition, significantly narrower F(2) bandwidth and lower perceptual rating scores of nasality were obtained at R2 across all participants, suggesting a decrease in nasality as speaking rate increases. The nasality demonstrated by hearing-impaired individuals is amenable to change when speaking rate is increased. The influences of speaking rate changes on the perception and production of nasality in hearing-impaired individuals are discussed.
Articulation Rate and Its Relationship to Disfluency Type, Duration, and Temperament in Preschool Children Who Stutter

ERIC Educational Resources Information Center

Tumanova, Victoria; Zebrowski, Patricia M.; Throneburg, Rebecca N.; Kayikci, Mavis E. Kulak

2011-01-01

The purpose of this study was to examine the relationship between articulation rate, frequency and duration of disfluencies of different types, and temperament in preschool children who stutter (CWS). In spontaneous speech samples from 19 CWS (mean age = 3:9; years:months), we measured articulation rate, the frequency and duration of (a) sound…
Using on-line altered auditory feedback treating Parkinsonian speech

NASA Astrophysics Data System (ADS)

Wang, Emily; Verhagen, Leo; de Vries, Meinou H.

2005-09-01

Patients with advanced Parkinson's disease tend to have dysarthric speech that is hesitant, accelerated, and repetitive, and that is often resistant to behavior speech therapy. In this pilot study, the speech disturbances were treated using on-line altered feedbacks (AF) provided by SpeechEasy (SE), an in-the-ear device registered with the FDA for use in humans to treat chronic stuttering. Eight PD patients participated in the study. All had moderate to severe speech disturbances. In addition, two patients had moderate recurring stuttering at the onset of PD after long remission since adolescence, two had bilateral STN DBS, and two bilateral pallidal DBS. An effective combination of delayed auditory feedback and frequency-altered feedback was selected for each subject and provided via SE worn in one ear. All subjects produced speech samples (structured-monologue and reading) under three conditions: baseline, with SE without, and with feedbacks. The speech samples were randomly presented and rated for speech intelligibility goodness using UPDRS-III item 18 and the speaking rate. The results indicted that SpeechEasy is well tolerated and AF can improve speech intelligibility in spontaneous speech. Further investigational use of this device for treating speech disorders in PD is warranted [Work partially supported by Janus Dev. Group, Inc.].
Task Repetition and Second Language Speech Processing

ERIC Educational Resources Information Center

Lambert, Craig; Kormos, Judit; Minn, Danny

2017-01-01

This study examines the relationship between the repetition of oral monologue tasks and immediate gains in L2 fluency. It considers the effect of aural-oral task repetition on speech rate, frequency of clause-final and midclause filled pauses, and overt self-repairs across different task types and proficiency levels and relates these findings to…
Perceptual Consequences of Changes in Vocoded Speech Parameters in Various Reverberation Conditions

ERIC Educational Resources Information Center

Drgas, Szymon; Blaszak, Magdalena A.

2009-01-01

Purpose: To study the perceptual consequences of changes in parameters of vocoded speech in various reverberation conditions. Method: The 3 controlled variables were number of vocoder bands, instantaneous frequency change rate, and reverberation conditions. The effects were quantified in terms of (a) nonsense words' recognition scores for young…
Tuning time-frequency methods for the detection of metered HF speech

NASA Astrophysics Data System (ADS)

Nelson, Douglas J.; Smith, Lawrence H.

2002-12-01

Speech is metered if the stresses occur at a nearly regular rate. Metered speech is common in poetry, and it can occur naturally in speech, if the speaker is spelling a word or reciting words or numbers from a list. In radio communications, the CQ request, call sign and other codes are frequently metered. In tactical communications and air traffic control, location, heading and identification codes may be metered. Moreover metering may be expected to survive even in HF communications, which are corrupted by noise, interference and mistuning. For this environment, speech recognition and conventional machine-based methods are not effective. We describe Time-Frequency methods which have been adapted successfully to the problem of mitigation of HF signal conditions and detection of metered speech. These methods are based on modeled time and frequency correlation properties of nearly harmonic functions. We derive these properties and demonstrate a performance gain over conventional correlation and spectral methods. Finally, in addressing the problem of HF single sideband (SSB) communications, the problems of carrier mistuning, interfering signals, such as manual Morse, and fast automatic gain control (AGC) must be addressed. We demonstrate simple methods which may be used to blindly mitigate mistuning and narrowband interference, and effectively invert the fast automatic gain function.
Speech acoustic markers of early stage and prodromal Huntington's disease: a marker of disease onset?

PubMed

Vogel, Adam P; Shirbin, Christopher; Churchyard, Andrew J; Stout, Julie C

2012-12-01

Speech disturbances (e.g., altered prosody) have been described in symptomatic Huntington's Disease (HD) individuals, however, the extent to which speech changes in gene positive pre-manifest (PreHD) individuals is largely unknown. The speech of individuals carrying the mutant HTT gene is a behavioural/motor/cognitive marker demonstrating some potential as an objective indicator of early HD onset and disease progression. Speech samples were acquired from 30 individuals carrying the mutant HTT gene (13 PreHD, 17 early stage HD) and 15 matched controls. Participants read a passage, produced a monologue and said the days of the week. Data were analysed acoustically for measures of timing, frequency and intensity. There was a clear effect of group across most acoustic measures, so that speech performance differed in-line with disease progression. Comparisons across groups revealed significant differences between the control and the early stage HD group on measures of timing (e.g., speech rate). Participants carrying the mutant HTT gene presented with slower rates of speech, took longer to say words and produced greater silences between and within words compared to healthy controls. Importantly, speech rate showed a significant correlation to burden of disease scores. The speech of early stage HD differed significantly from controls. The speech of PreHD, although not reaching significance, tended to lie between the performance of controls and early stage HD. This suggests that changes in speech production appear to be developing prior to diagnosis. Copyright © 2012 Elsevier Ltd. All rights reserved.
[Study the impacts of diagnosis on occupational noise-induced deafness after bring into the different high frequency hearing threshold weighted value].

PubMed

Xue, L J; Yang, A C; Chen, H; Huang, W X; Guo, J J; Liang, X Y; Chen, Z Q; Zheng, Q L

2017-11-20

Objective: Study of the results and the degree on occupational noise-induced deafness in-to the different high frequency hearing threshold weighted value, in order to provide theoretical basis for the re-vision of diagnostic criteria on occupational noise-induced deafness. Methods: A retrospective study was con-ducted to investigate the cases on the diagnosis of occupational noise-induced deafness in Guangdong province hospital for occupational disease prevention and treatment from January 2016 to January 2017. Based on the re-sults of the 3 hearing test for each test interval greater than 3 days in the hospital, the best threshold of each frequency was obtained, and based on the diagnostic criteria of occupational noise deafness in 2007 edition, Chi square test, t test and variance analysis were used to measure SPSS21.0 data, their differences are tested among the means of speech frequency and the high frequency weighted value into different age group, noise ex-posure group, and diagnostic classification between different dimensions. Results: 1. There were totally 168 cases in accordance with the study plan, male 154 cases, female 14 cases, the average age was 41.18 ±6.07 years old. 2. The diagnosis rate was increased into the weighted value of different high frequency than the mean value of pure speech frequency, the weighted 4 kHz frequency increased by 13.69% (χ(2)=9.880, P =0.002) , 6 kHz increased by 15.47% (χ(2)=9.985, P =0.002) and 4 kHz+6 kHz increased by15.47% (χ(2)=9.985, P =0.002) , the difference was statistically significant. The diagnostic rate of different high threshold had no obvious differ-ence between the genders. 3. The age groups were divided into less than or equal to 40years old group (A group) and 40-50 years old group (group B) , there were higher the diagnostic rate between high frequency weighted 4 kHz (A group χ(2)=3.380, P =0.050; B group χ(2)=4.054, P =0.032) , weighted 6 kHz (A group χ(2)=6.362, P =0.012; B group χ(2)=4.054, P =0.032) , high frequency weighted 4 kHz+6 kHz (A group χ(2)=6.362, P =0.012; B group χ(2)=4.054, P =0.032) than those of speech frequency average value in the same group on oc-cupational noise-induced deafness diagnosis rate, the difference was statistically significant. There was no sig-nificant difference between age groups (χ(2)=2.265, P =0.944) . 4. The better ear's mean value of pure speech fre-quency and the weighted values into different high frequency of working years on each group were compared, working years more than 10 years group was significantly higher than that of average thresholds of each frequen-cy band in 3-5 group ( F =2.271, P =0.001) , 6-10 group ( F =1.563, P =0.046) , the difference was statistically significant. The different high frequency weighted values were higher than those of the mean value of pure speech frequency, and the high frequency weighted 4 kHz+6 kHz had the highest frequency difference, with an average increase of 2.83 dB. 5. The diagnostic rate into weighted different high frequency was higher in the mild, moderate and severe grades than in the pure speech frequency. In the comparison of diagnosis for mild occupational noise-induced deafness, in addition to the weighted 3 kHz high frequency (χ(2)=3.117, P =0.077) had no significant difference, the weighted 4 kHz (χ(2)=10.835, P =0.001) , 6 kHz (χ(2)=9.985, P =0.002) , 3 kHz+4 kHz (χ(2)=6.315, P =0.012) , 3 kHz+6 kHz (χ(2)=6.315, P =0.012) , 4 kHz+6 kHz (χ(2)=9.985, P =0.002) , 3 kHz+4 kHz+6 kHz (χ(2)=7.667, P =0.002) were significantly higher than the diagnosis rate of the mean value of pure speech frequency. There was no significant difference between the two groups in the moderate and se-vere grades ( P >0.05) . Conclusion: Bring into different high frequency hearing threshold weighted value in-creases the diagnostic rate of occupational noise-induced deafness, the weighted 4 kHz, 6 kHz and 4 kHz+ 6 kHz high frequency value affects the result greatly, and the weighted 4 kHz+6 kHz high frequency hearing threshold value is maximum the effect on occupational noise-induced deafness diagnosis.
Speech Characteristics and Intelligibility in Adults with Mild and Moderate Intellectual Disabilities

PubMed Central

Coppens-Hofman, Marjolein C.; Terband, Hayo; Snik, Ad F.M.; Maassen, Ben A.M.

2017-01-01

Purpose Adults with intellectual disabilities (ID) often show reduced speech intelligibility, which affects their social interaction skills. This study aims to establish the main predictors of this reduced intelligibility in order to ultimately optimise management. Method Spontaneous speech and picture naming tasks were recorded in 36 adults with mild or moderate ID. Twenty-five naïve listeners rated the intelligibility of the spontaneous speech samples. Performance on the picture-naming task was analysed by means of a phonological error analysis based on expert transcriptions. Results The transcription analyses showed that the phonemic and syllabic inventories of the speakers were complete. However, multiple errors at the phonemic and syllabic level were found. The frequencies of specific types of errors were related to intelligibility and quality ratings. Conclusions The development of the phonemic and syllabic repertoire appears to be completed in adults with mild-to-moderate ID. The charted speech difficulties can be interpreted to indicate speech motor control and planning difficulties. These findings may aid the development of diagnostic tests and speech therapies aimed at improving speech intelligibility in this specific group. PMID:28118637
Expressed parental concern regarding childhood stuttering and the Test of Childhood Stuttering.

PubMed

Tumanova, Victoria; Choi, Dahye; Conture, Edward G; Walden, Tedra A

The purpose of the present study was to determine whether the Test of Childhood Stuttering observational rating scales (TOCS; Gillam et al., 2009) (1) differed between parents who did versus did not express concern (independent from the TOCS) about their child's speech fluency; (2) correlated with children's frequency of stuttering measured during a child-examiner conversation; and (3) correlated with the length and complexity of children's utterances, as indexed by mean length of utterance (MLU). Participants were 183 young children ages 3:0-5:11. Ninety-one had parents who reported concern about their child's stuttering (65 boys, 26 girls) and 92 had parents who reported no such concern (50 boys, 42 girls). Participants' conversational speech during a child-examiner conversation was analyzed for (a) frequency of occurrence of stuttered and non-stuttered disfluencies, and (b) MLU. Besides expressing concern or lack thereof about their child's speech fluency, parents completed the TOCS observational rating scales documenting how often they observe different disfluency types in speech of their children, as well as disfluency-related consequences. There were three main findings. First, parents who expressed concern (independently from the TOCS) about their child's stuttering reported significantly higher scores on the TOCS Speech Fluency and Disfluency-Related Consequences rating scales. Second, children whose parents rated them higher on the TOCS Speech Fluency rating scale produced more stuttered disfluencies during a child-examiner conversation. Third, children with higher scores on the TOCS Disfluency-Related Consequences rating scale had shorter MLU during child-examiner conversation, across age and level of language ability. Findings support the use of the TOCS observational rating scales as one documentable, objective means to determine parental perception of and concern about their child's stuttering. Findings also support the notion that parents are reasonably accurate, if not reliable, judges of the quantity and quality (i.e., stuttered vs. non-stuttered) of their child's speech disfluencies. Lastly, findings that some children may decrease their verbal output in attempts to minimize instances of stuttering - as indexed by relatively low MLU and a high TOCS Disfluency-Related Consequences scores - provides strong support for sampling young children's speech and language across various situations to obtain the most representative index possible of the child's MLU and associated instances of stuttering. Copyright © 2018 Elsevier Inc. All rights reserved.
How do we use language? Shared patterns in the frequency of word use across 17 world languages

PubMed Central

Calude, Andreea S.; Pagel, Mark

2011-01-01

We present data from 17 languages on the frequency with which a common set of words is used in everyday language. The languages are drawn from six language families representing 65 per cent of the world's 7000 languages. Our data were collected from linguistic corpora that record frequencies of use for the 200 meanings in the widely used Swadesh fundamental vocabulary. Our interest is to assess evidence for shared patterns of language use around the world, and for the relationship of language use to rates of lexical replacement, defined as the replacement of a word by a new unrelated or non-cognate word. Frequencies of use for words in the Swadesh list range from just a few per million words of speech to 191 000 or more. The average inter-correlation among languages in the frequency of use across the 200 words is 0.73 (p < 0.0001). The first principal component of these data accounts for 70 per cent of the variance in frequency of use. Elsewhere, we have shown that frequently used words in the Indo-European languages tend to be more conserved, and that this relationship holds separately for different parts of speech. A regression model combining the principal factor loadings derived from the worldwide sample along with their part of speech predicts 46 per cent of the variance in the rates of lexical replacement in the Indo-European languages. This suggests that Indo-European lexical replacement rates might be broadly representative of worldwide rates of change. Evidence for this speculation comes from using the same factor loadings and part-of-speech categories to predict a word's position in a list of 110 words ranked from slowest to most rapidly evolving among 14 of the world's language families. This regression model accounts for 30 per cent of the variance. Our results point to a remarkable regularity in the way that human speakers use language, and hint that the words for a shared set of meanings have been slowly evolving and others more rapidly evolving throughout human history. PMID:21357232
Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model.

PubMed

Polur, Prasad D; Miller, Gerald E

2005-01-01

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients, requires a robust technique that can handle conditions of very high variability and limited training data. In this study, a hidden Markov model (HMM) was constructed and conditions investigated that would provide improved performance for a dysarthric speech (isolated word) recognition system intended to act as an assistive/control tool. In particular, we investigated the effect of high-frequency spectral components on the recognition rate of the system to determine if they contributed useful additional information to the system. A small-size vocabulary spoken by three cerebral palsy subjects was chosen. Mel-frequency cepstral coefficients extracted with the use of 15 ms frames served as training input to an ergodic HMM setup. Subsequent results demonstrated that no significant useful information was available to the system for enhancing its ability to discriminate dysarthric speech above 5.5 kHz in the current set of dysarthric data. The level of variability in input dysarthric speech patterns limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor-impaired individuals such as cerebral palsy subjects holds sufficient promise.
Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech.

PubMed

Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A

2013-02-01

As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.

Impaired extraction of speech rhythm from temporal modulation patterns in speech in developmental dyslexia

PubMed Central

Leong, Victoria; Goswami, Usha

2014-01-01

Dyslexia is associated with impaired neural representation of the sound structure of words (phonology). The “phonological deficit” in dyslexia may arise in part from impaired speech rhythm perception, thought to depend on neural oscillatory phase-locking to slow amplitude modulation (AM) patterns in the speech envelope. Speech contains AM patterns at multiple temporal rates, and these different AM rates are associated with phonological units of different grain sizes, e.g., related to stress, syllables or phonemes. Here, we assess the ability of adults with dyslexia to use speech AMs to identify rhythm patterns (RPs). We study 3 important temporal rates: “Stress” (~2 Hz), “Syllable” (~4 Hz) and “Sub-beat” (reduced syllables, ~14 Hz). 21 dyslexics and 21 controls listened to nursery rhyme sentences that had been tone-vocoded using either single AM rates from the speech envelope (Stress only, Syllable only, Sub-beat only) or pairs of AM rates (Stress + Syllable, Syllable + Sub-beat). They were asked to use the acoustic rhythm of the stimulus to identity the original nursery rhyme sentence. The data showed that dyslexics were significantly poorer at detecting rhythm compared to controls when they had to utilize multi-rate temporal information from pairs of AMs (Stress + Syllable or Syllable + Sub-beat). These data suggest that dyslexia is associated with a reduced ability to utilize AMs <20 Hz for rhythm recognition. This perceptual deficit in utilizing AM patterns in speech could be underpinned by less efficient neuronal phase alignment and cross-frequency neuronal oscillatory synchronization in dyslexia. Dyslexics' perceptual difficulties in capturing the full spectro-temporal complexity of speech over multiple timescales could contribute to the development of impaired phonological representations for words, the cognitive hallmark of dyslexia across languages. PMID:24605099
Lip Movement Exaggerations During Infant-Directed Speech

PubMed Central

Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana

2011-01-01

Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
Spectrotemporal Modulation Sensitivity as a Predictor of Speech Intelligibility for Hearing-Impaired Listeners

PubMed Central

Bernstein, Joshua G.W.; Mehraei, Golbarg; Shamma, Shihab; Gallun, Frederick J.; Theodoroff, Sarah M.; Leek, Marjorie R.

2014-01-01

Background A model that can accurately predict speech intelligibility for a given hearing-impaired (HI) listener would be an important tool for hearing-aid fitting or hearing-aid algorithm development. Existing speech-intelligibility models do not incorporate variability in suprathreshold deficits that are not well predicted by classical audiometric measures. One possible approach to the incorporation of such deficits is to base intelligibility predictions on sensitivity to simultaneously spectrally and temporally modulated signals. Purpose The likelihood of success of this approach was evaluated by comparing estimates of spectrotemporal modulation (STM) sensitivity to speech intelligibility and to psychoacoustic estimates of frequency selectivity and temporal fine-structure (TFS) sensitivity across a group of HI listeners. Research Design The minimum modulation depth required to detect STM applied to an 86 dB SPL four-octave noise carrier was measured for combinations of temporal modulation rate (4, 12, or 32 Hz) and spectral modulation density (0.5, 1, 2, or 4 cycles/octave). STM sensitivity estimates for individual HI listeners were compared to estimates of frequency selectivity (measured using the notched-noise method at 500, 1000measured using the notched-noise method at 500, 2000, and 4000 Hz), TFS processing ability (2 Hz frequency-modulation detection thresholds for 500, 10002 Hz frequency-modulation detection thresholds for 500, 2000, and 4000 Hz carriers) and sentence intelligibility in noise (at a 0 dB signal-to-noise ratio) that were measured for the same listeners in a separate study. Study Sample Eight normal-hearing (NH) listeners and 12 listeners with a diagnosis of bilateral sensorineural hearing loss participated. Data Collection and Analysis STM sensitivity was compared between NH and HI listener groups using a repeated-measures analysis of variance. A stepwise regression analysis compared STM sensitivity for individual HI listeners to audiometric thresholds, age, and measures of frequency selectivity and TFS processing ability. A second stepwise regression analysis compared speech intelligibility to STM sensitivity and the audiogram-based Speech Intelligibility Index. Results STM detection thresholds were elevated for the HI listeners, but only for low rates and high densities. STM sensitivity for individual HI listeners was well predicted by a combination of estimates of frequency selectivity at 4000 Hz and TFS sensitivity at 500 Hz but was unrelated to audiometric thresholds. STM sensitivity accounted for an additional 40% of the variance in speech intelligibility beyond the 40% accounted for by the audibility-based Speech Intelligibility Index. Conclusions Impaired STM sensitivity likely results from a combination of a reduced ability to resolve spectral peaks and a reduced ability to use TFS information to follow spectral-peak movements. Combining STM sensitivity estimates with audiometric threshold measures for individual HI listeners provided a more accurate prediction of speech intelligibility than audiometric measures alone. These results suggest a significant likelihood of success for an STM-based model of speech intelligibility for HI listeners. PMID:23636210
The influence of cochlear spectral processing on the timing and amplitude of the speech-evoked auditory brain stem response

PubMed Central

Nuttall, Helen E.; Moore, David R.; Barry, Johanna G.; Krumbholz, Katrin

2015-01-01

The speech-evoked auditory brain stem response (speech ABR) is widely considered to provide an index of the quality of neural temporal encoding in the central auditory pathway. The aim of the present study was to evaluate the extent to which the speech ABR is shaped by spectral processing in the cochlea. High-pass noise masking was used to record speech ABRs from delimited octave-wide frequency bands between 0.5 and 8 kHz in normal-hearing young adults. The latency of the frequency-delimited responses decreased from the lowest to the highest frequency band by up to 3.6 ms. The observed frequency-latency function was compatible with model predictions based on wave V of the click ABR. The frequency-delimited speech ABR amplitude was largest in the 2- to 4-kHz frequency band and decreased toward both higher and lower frequency bands despite the predominance of low-frequency energy in the speech stimulus. We argue that the frequency dependence of speech ABR latency and amplitude results from the decrease in cochlear filter width with decreasing frequency. The results suggest that the amplitude and latency of the speech ABR may reflect interindividual differences in cochlear, as well as central, processing. The high-pass noise-masking technique provides a useful tool for differentiating between peripheral and central effects on the speech ABR. It can be used for further elucidating the neural basis of the perceptual speech deficits that have been associated with individual differences in speech ABR characteristics. PMID:25787954
Encoding of frequency-modulation (FM) rates in human auditory cortex.

PubMed

Okamoto, Hidehiko; Kakigi, Ryusuke

2015-12-14

Frequency-modulated sounds play an important role in our daily social life. However, it currently remains unclear whether frequency modulation rates affect neural activity in the human auditory cortex. In the present study, using magnetoencephalography, we investigated the auditory evoked N1m and sustained field responses elicited by temporally repeated and superimposed frequency-modulated sweeps that were matched in the spectral domain, but differed in frequency modulation rates (1, 4, 16, and 64 octaves per sec). The results obtained demonstrated that the higher rate frequency-modulated sweeps elicited the smaller N1m and the larger sustained field responses. Frequency modulation rate had a significant impact on the human brain responses, thereby providing a key for disentangling a series of natural frequency-modulated sounds such as speech and music.
Predictions of Speech Chimaera Intelligibility Using Auditory Nerve Mean-Rate and Spike-Timing Neural Cues.

PubMed

Wirtzfeld, Michael R; Ibrahim, Rasha A; Bruce, Ian C

2017-10-01

Perceptual studies of speech intelligibility have shown that slow variations of acoustic envelope (ENV) in a small set of frequency bands provides adequate information for good perceptual performance in quiet, whereas acoustic temporal fine-structure (TFS) cues play a supporting role in background noise. However, the implications for neural coding are prone to misinterpretation because the mean-rate neural representation can contain recovered ENV cues from cochlear filtering of TFS. We investigated ENV recovery and spike-time TFS coding using objective measures of simulated mean-rate and spike-timing neural representations of chimaeric speech, in which either the ENV or the TFS is replaced by another signal. We (a) evaluated the levels of mean-rate and spike-timing neural information for two categories of chimaeric speech, one retaining ENV cues and the other TFS; (b) examined the level of recovered ENV from cochlear filtering of TFS speech; (c) examined and quantified the contribution to recovered ENV from spike-timing cues using a lateral inhibition network (LIN); and (d) constructed linear regression models with objective measures of mean-rate and spike-timing neural cues and subjective phoneme perception scores from normal-hearing listeners. The mean-rate neural cues from the original ENV and recovered ENV partially accounted for perceptual score variability, with additional variability explained by the recovered ENV from the LIN-processed TFS speech. The best model predictions of chimaeric speech intelligibility were found when both the mean-rate and spike-timing neural cues were included, providing further evidence that spike-time coding of TFS cues is important for intelligibility when the speech envelope is degraded.
Effects of degree and configuration of hearing loss on the contribution of high- and low-frequency speech information to bilateral speech understanding

PubMed Central

Hornsby, Benjamin W. Y.; Johnson, Earl E.; Picou, Erin

2011-01-01

Objectives The purpose of this study was to examine the effects of degree and configuration of hearing loss on the use of, and benefit from, information in amplified high- and low-frequency speech presented in background noise. Design Sixty-two adults with a wide range of high- and low-frequency sensorineural hearing loss (5–115+ dB HL) participated. To examine the contribution of speech information in different frequency regions, speech understanding in noise was assessed in multiple low- and high-pass filter conditions, as well as a band-pass (713–3534 Hz) and wideband (143–8976 Hz) condition. To increase audibility over a wide frequency range, speech and noise were amplified based on each individual’s hearing loss. A stepwise multiple linear regression approach was used to examine the contribution of several factors to 1) absolute performance in each filter condition and 2) the change in performance with the addition of amplified high- and low-frequency speech components. Results Results from the regression analysis showed that degree of hearing loss was the strongest predictor of absolute performance for low- and high-pass filtered speech materials. In addition, configuration of hearing loss affected both absolute performance for severely low-pass filtered speech and benefit from extending high-frequency (3534–8976 Hz) bandwidth. Specifically, individuals with steeply sloping high-frequency losses made better use of low-pass filtered speech information than individuals with similar low-frequency thresholds but less high-frequency loss. In contrast, given similar high-frequency thresholds, individuals with flat hearing losses received more benefit from extending high-frequency bandwidth than individuals with more sloping losses. Conclusions Consistent with previous work, benefit from speech information in a given frequency region generally decreases as degree of hearing loss in that frequency region increases. However, given a similar degree of loss, the configuration of hearing loss also affects the ability to use speech information in different frequency regions. Except for individuals with steeply sloping high-frequency losses, providing high-frequency amplification (3534–8976 Hz) had either a beneficial effect on, or did not significantly degrade, speech understanding. These findings highlight the importance of extended high-frequency amplification for listeners with a wide range of high-frequency hearing losses, when seeking to maximize intelligibility. PMID:21336138
Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition.

PubMed

Wang, Kun-Ching

2015-01-14

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech.
Speaker-independent factors affecting the perception of foreign accent in a second languagea)

PubMed Central

Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

2012-01-01

Previous research on foreign accent perception has largely focused on speaker-dependent factors such as age of learning and length of residence. Factors that are independent of a speaker’s language learning history have also been shown to affect perception of second language speech. The present study examined the effects of two such factors—listening context and lexical frequency—on the perception of foreign-accented speech. Listeners rated foreign accent in two listening contexts: auditory-only, where listeners only heard the target stimuli, and auditory+orthography, where listeners were presented with both an auditory signal and an orthographic display of the target word. Results revealed that higher frequency words were consistently rated as less accented than lower frequency words. The effect of the listening context emerged in two interactions: the auditory +orthography context reduced the effects of lexical frequency, but increased the perceived differences between native and non-native speakers. Acoustic measurements revealed some production differences for words of different levels of lexical frequency, though these differences could not account for all of the observed interactions from the perceptual experiment. These results suggest that factors independent of the speakers’ actual speech articulations can influence the perception of degree of foreign accent. PMID:17471745
Auditory Masking Effects on Speech Fluency in Apraxia of Speech and Aphasia: Comparison to Altered Auditory Feedback

PubMed Central

Haley, Katarina L.

2015-01-01

Purpose To study the effects of masked auditory feedback (MAF) on speech fluency in adults with aphasia and/or apraxia of speech (APH/AOS). We hypothesized that adults with AOS would increase speech fluency when speaking with noise. Altered auditory feedback (AAF; i.e., delayed/frequency-shifted feedback) was included as a control condition not expected to improve speech fluency. Method Ten participants with APH/AOS and 10 neurologically healthy (NH) participants were studied under both feedback conditions. To allow examination of individual responses, we used an ABACA design. Effects were examined on syllable rate, disfluency duration, and vocal intensity. Results Seven of 10 APH/AOS participants increased fluency with masking by increasing rate, decreasing disfluency duration, or both. In contrast, none of the NH participants increased speaking rate with MAF. In the AAF condition, only 1 APH/AOS participant increased fluency. Four APH/AOS participants and 8 NH participants slowed their rate with AAF. Conclusions Speaking with MAF appears to increase fluency in a subset of individuals with APH/AOS, indicating that overreliance on auditory feedback monitoring may contribute to their disorder presentation. The distinction between responders and nonresponders was not linked to AOS diagnosis, so additional work is needed to develop hypotheses for candidacy and underlying control mechanisms. PMID:26363508
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.

PubMed

Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca

2018-04-01

The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
Development of a good-quality speech coder for transmission over noisy channels at 2.4 kb/s

NASA Astrophysics Data System (ADS)

Viswanathan, V. R.; Berouti, M.; Higgins, A.; Russell, W.

1982-03-01

This report describes the development, study, and experimental results of a 2.4 kb/s speech coder called harmonic deviations (HDV) vocoder, which transmits good-quality speech over noisy channels with bit-error rates of up to 1%. The HDV coder is based on the linear predictive coding (LPC) vocoder, and it transmits additional information over and above the data transmitted by the LPC vocoder, in the form of deviations between the speech spectrum and the LPC all-pole model spectrum at a selected set of frequencies. At the receiver, the spectral deviations are used to generate the excitation signal for the all-pole synthesis filter. The report describes and compares several methods for extracting the spectral deviations from the speech signal and for encoding them. To limit the bit-rate of the HDV coder to 2.4 kb/s the report discusses several methods including orthogonal transformation and minimum-mean-square-error scalar quantization of log area ratios, two-stage vector-scalar quantization, and variable frame rate transmission. The report also presents the results of speech-quality optimization of the HDV coder at 2.4 kb/s.
Predicting fundamental frequency from mel-frequency cepstral coefficients to enable speech reconstruction.

PubMed

Shao, Xu; Milner, Ben

2005-08-01

This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.
Effect of Cigarette Smoking and Passive Smoking on Hearing Impairment: Data from a Population–Based Study

PubMed Central

Chang, Jiwon; Ryou, Namhyung; Jun, Hyung Jin; Hwang, Soon Young; Song, Jae-Jun; Chae, Sung Won

2016-01-01

Objectives In the present study, we aimed to determine the effect of both active and passive smoking on the prevalence of the hearing impairment and the hearing thresholds in different age groups through the analysis of data collected from the Korea National Health and Nutrition Examination Survey (KNHANES). Study Design Cross-sectional epidemiological study. Methods The KNHANES is an ongoing population study that started in 1998. We included a total of 12,935 participants aged ≥19 years in the KNHANES, from 2010 to 2012, in the present study. Pure-tone audiometric (PTA) testing was conducted and the frequencies tested were 0.5, 1, 2, 3, 4, and 6 kHz. Smoking status was categorized into three groups; current smoking group, passive smoking group and non-smoking group. Results In the current smoking group, the prevalence of speech-frequency bilateral hearing impairment was increased in ages of 40−69, and the rate of high frequency bilateral hearing impairment was elevated in ages of 30−79. When we investigated the impact of smoking on hearing thresholds, we found that the current smoking group had significantly increased hearing thresholds compared to the passive smoking group and non-smoking groups, across all ages in both speech-relevant and high frequencies. The passive smoking group did not have an elevated prevalence of either speech-frequency bilateral hearing impairment or high frequency bilateral hearing impairment, except in ages of 40s. However, the passive smoking group had higher hearing thresholds than the non-smoking group in the 30s and 40s age groups. Conclusion Current smoking was associated with hearing impairment in both speech-relevant frequency and high frequency across all ages. However, except in the ages of 40s, passive smoking was not related to hearing impairment in either speech-relevant or high frequencies. PMID:26756932
Age-related changes to spectral voice characteristics affect judgments of prosodic, segmental, and talker attributes for child and adult speech

PubMed Central

Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.

2013-01-01

Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
Examining explanations for fundamental frequency's contribution to speech intelligibility in noise

NASA Astrophysics Data System (ADS)

Schlauch, Robert S.; Miller, Sharon E.; Watson, Peter J.

2005-09-01

Laures and Weismer [JSLHR, 42, 1148 (1999)] reported that speech with natural variation in fundamental frequency (F0) is more intelligible in noise than speech with a flattened F0 contour. Cognitive-linguistic based explanations have been offered to account for this drop in intelligibility for the flattened condition, but a lower-level mechanism related to auditory streaming may be responsible. Numerous psychoacoustic studies have demonstrated that modulating a tone enables a listener to segregate it from background sounds. To test these rival hypotheses, speech recognition in noise was measured for sentences with six different F0 contours: unmodified, flattened at the mean, natural but exaggerated, reversed, and frequency modulated (rates of 2.5 and 5.0 Hz). The 180 stimulus sentences were produced by five talkers (30 sentences per condition). Speech recognition for fifteen listeners replicate earlier findings showing that flattening the F0 contour results in a roughly 10% reduction in recognition of key words compared with the natural condition. Although the exaggerated condition produced results comparable to those of the flattened condition, the other conditions with unnatural F0 contours all yielded significantly poorer performance than the flattened condition. These results support the cognitive, linguistic-based explanations for the reduction in performance.
Improving Speech Perception in Noise with Current Focusing in Cochlear Implant Users

PubMed Central

Srinivasan, Arthi G.; Padilla, Monica; Shannon, Robert V.; Landsberger, David M.

2013-01-01

Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. PMID:23467170
Children's Acoustic and Linguistic Adaptations to Peers With Hearing Impairment.

PubMed

Granlund, Sonia; Hazan, Valerie; Mahon, Merle

2018-05-17

This study aims to examine the clear speaking strategies used by older children when interacting with a peer with hearing loss, focusing on both acoustic and linguistic adaptations in speech. The Grid task, a problem-solving task developed to elicit spontaneous interactive speech, was used to obtain a range of global acoustic and linguistic measures. Eighteen 9- to 14-year-old children with normal hearing (NH) performed the task in pairs, once with a friend with NH and once with a friend with a hearing impairment (HI). In HI-directed speech, children increased their fundamental frequency range and midfrequency intensity, decreased the number of words per phrase, and expanded their vowel space area by increasing F1 and F2 range, relative to NH-directed speech. However, participants did not appear to make changes to their articulation rate, the lexical frequency of content words, or lexical diversity when talking to their friend with HI compared with their friend with NH. Older children show evidence of listener-oriented adaptations to their speech production; although their speech production systems are still developing, they are able to make speech adaptations to benefit the needs of a peer with HI, even without being given a specific instruction to do so. https://doi.org/10.23641/asha.6118817.
High-frequency energy in singing and speech

NASA Astrophysics Data System (ADS)

Monson, Brian Bruce

While human speech and the human voice generate acoustical energy up to (and beyond) 20 kHz, the energy above approximately 5 kHz has been largely neglected. Evidence is accruing that this high-frequency energy contains perceptual information relevant to speech and voice, including percepts of quality, localization, and intelligibility. The present research was an initial step in the long-range goal of characterizing high-frequency energy in singing voice and speech, with particular regard for its perceptual role and its potential for modification during voice and speech production. In this study, a database of high-fidelity recordings of talkers was created and used for a broad acoustical analysis and general characterization of high-frequency energy, as well as specific characterization of phoneme category, voice and speech intensity level, and mode of production (speech versus singing) by high-frequency energy content. Directionality of radiation of high-frequency energy from the mouth was also examined. The recordings were used for perceptual experiments wherein listeners were asked to discriminate between speech and voice samples that differed only in high-frequency energy content. Listeners were also subjected to gender discrimination tasks, mode-of-production discrimination tasks, and transcription tasks with samples of speech and singing that contained only high-frequency content. The combination of these experiments has revealed that (1) human listeners are able to detect very subtle level changes in high-frequency energy, and (2) human listeners are able to extract significant perceptual information from high-frequency energy.
Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition

PubMed Central

Wang, Kun-Ching

2015-01-01

The classification of emotional speech is mostly considered in speech-related research on human-computer interaction (HCI). In this paper, the purpose is to present a novel feature extraction based on multi-resolutions texture image information (MRTII). The MRTII feature set is derived from multi-resolution texture analysis for characterization and classification of different emotions in a speech signal. The motivation is that we have to consider emotions have different intensity values in different frequency bands. In terms of human visual perceptual, the texture property on multi-resolution of emotional speech spectrogram should be a good feature set for emotion classification in speech. Furthermore, the multi-resolution analysis on texture can give a clearer discrimination between each emotion than uniform-resolution analysis on texture. In order to provide high accuracy of emotional discrimination especially in real-life, an acoustic activity detection (AAD) algorithm must be applied into the MRTII-based feature extraction. Considering the presence of many blended emotions in real life, in this paper make use of two corpora of naturally-occurring dialogs recorded in real-life call centers. Compared with the traditional Mel-scale Frequency Cepstral Coefficients (MFCC) and the state-of-the-art features, the MRTII features also can improve the correct classification rates of proposed systems among different language databases. Experimental results show that the proposed MRTII-based feature information inspired by human visual perception of the spectrogram image can provide significant classification for real-life emotional recognition in speech. PMID:25594590

A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder

NASA Astrophysics Data System (ADS)

Wilson, J. B.; Mosko, J. D.

1985-12-01

The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.
An acoustical assessment of pitch-matching accuracy in relation to speech frequency, speech frequency range, age and gender in preschool children

NASA Astrophysics Data System (ADS)

Trollinger, Valerie L.

This study investigated the relationship between acoustical measurement of singing accuracy in relationship to speech fundamental frequency, speech fundamental frequency range, age and gender in preschool-aged children. Seventy subjects from Southeastern Pennsylvania; the San Francisco Bay Area, California; and Terre Haute, Indiana, participated in the study. Speech frequency was measured by having the subjects participate in spontaneous and guided speech activities with the researcher, with 18 diverse samples extracted from each subject's recording for acoustical analysis for fundamental frequency in Hz with the CSpeech computer program. The fundamental frequencies were averaged together to derive a mean speech frequency score for each subject. Speech range was calculated by subtracting the lowest fundamental frequency produced from the highest fundamental frequency produced, resulting in a speech range measured in increments of Hz. Singing accuracy was measured by having the subjects each echo-sing six randomized patterns using the pitches Middle C, D, E, F♯, G and A (440), using the solfege syllables of Do and Re, which were recorded by a 5-year-old female model. For each subject, 18 samples of singing were recorded. All samples were analyzed by the CSpeech for fundamental frequency. For each subject, deviation scores in Hz were derived by calculating the difference between what the model sang in Hz and what the subject sang in response in Hz. Individual scores for each child consisted of an overall mean total deviation frequency, mean frequency deviations for each pattern, and mean frequency deviation for each pitch. Pearson correlations, MANOVA and ANOVA analyses, Multiple Regressions and Discriminant Analysis revealed the following findings: (1) moderate but significant (p < .001) relationships emerged between mean speech frequency and the ability to sing the pitches E, F♯, G and A in the study; (2) mean speech frequency also emerged as the strongest predictor of subjects' ability to sing the notes E and F♯; (3) mean speech frequency correlated moderately and significantly (p < .001) with sharpness and flatness of singing response accuracy in Hz; (4) speech range was the strongest predictor of singing accuracy for the pitches G and A in the study (p < .001); (5) gender emerged as a significant, but not the strongest, predictor for ability to sing the pitches in the study above C and D; (6) gender did not correlate with mean speech frequency and speech range; (7) age in months emerged as a low but significant predictor of ability to sing the lower notes (C and D) in the study; (8) age correlated significantly but negatively low (r = -.23, p < .05, two-tailed) with mean speech frequency; and (9) age did not emerge as a significant predictor of overall singing accuracy. Ancillary findings indicated that there were significant differences in singing accuracy based on geographic location by gender, and that siblings and fraternal twins in the study generally performed similarly. In addition, reliability for using the CSpeech for acoustical analysis revealed test/retest correlations of .99, with one exception at .94. Based on these results, suggestions were made concerning future research concerned with studying the use of voice in speech and how it may affect singing development, overall use in singing, and pitch-matching accuracy.
Correlational Analysis of Speech Intelligibility Tests and Metrics for Speech Transmission

DTIC Science & Technology

2017-12-04

frequency scale (male voice; normal voice effort) ............................... 4 Fig. 2 Diagram of a speech communication system (Letowski...languages. Consonants contain mostly high frequency (above 1500 Hz) speech energy, but this energy is relatively small in comparison to that of the whole...voices (Letowski et al. 1993). Since the mid- frequency spectral region contains mostly vowel energy while consonants are high frequency sounds, an
Spatial Release From Masking in Simulated Cochlear Implant Users With and Without Access to Low-Frequency Acoustic Hearing

PubMed Central

Dietz, Mathias; Hohmann, Volker; Jürgens, Tim

2015-01-01

For normal-hearing listeners, speech intelligibility improves if speech and noise are spatially separated. While this spatial release from masking has already been quantified in normal-hearing listeners in many studies, it is less clear how spatial release from masking changes in cochlear implant listeners with and without access to low-frequency acoustic hearing. Spatial release from masking depends on differences in access to speech cues due to hearing status and hearing device. To investigate the influence of these factors on speech intelligibility, the present study measured speech reception thresholds in spatially separated speech and noise for 10 different listener types. A vocoder was used to simulate cochlear implant processing and low-frequency filtering was used to simulate residual low-frequency hearing. These forms of processing were combined to simulate cochlear implant listening, listening based on low-frequency residual hearing, and combinations thereof. Simulated cochlear implant users with additional low-frequency acoustic hearing showed better speech intelligibility in noise than simulated cochlear implant users without acoustic hearing and had access to more spatial speech cues (e.g., higher binaural squelch). Cochlear implant listener types showed higher spatial release from masking with bilateral access to low-frequency acoustic hearing than without. A binaural speech intelligibility model with normal binaural processing showed overall good agreement with measured speech reception thresholds, spatial release from masking, and spatial speech cues. This indicates that differences in speech cues available to listener types are sufficient to explain the changes of spatial release from masking across these simulated listener types. PMID:26721918
Quadcopter Control Using Speech Recognition

NASA Astrophysics Data System (ADS)

Malik, H.; Darma, S.; Soekirno, S.

2018-04-01

This research reported a comparison from a success rate of speech recognition systems that used two types of databases they were existing databases and new databases, that were implemented into quadcopter as motion control. Speech recognition system was using Mel frequency cepstral coefficient method (MFCC) as feature extraction that was trained using recursive neural network method (RNN). MFCC method was one of the feature extraction methods that most used for speech recognition. This method has a success rate of 80% - 95%. Existing database was used to measure the success rate of RNN method. The new database was created using Indonesian language and then the success rate was compared with results from an existing database. Sound input from the microphone was processed on a DSP module with MFCC method to get the characteristic values. Then, the characteristic values were trained using the RNN which result was a command. The command became a control input to the single board computer (SBC) which result was the movement of the quadcopter. On SBC, we used robot operating system (ROS) as the kernel (Operating System).
Perception of speech in reverberant conditions using AM-FM cochlear implant simulation.

PubMed

Drgas, Szymon; Blaszak, Magdalena A

2010-10-01

This study assessed the effects of speech misidentification and cognitive processing errors in normal-hearing adults listening to degraded auditory input signals simulating cochlear implants in reverberation conditions. Three variables were controlled: number of vocoder channels (six and twelve), instantaneous frequency change rate (none, 50, 400 Hz), and enclosures (different reverberation conditions). The analyses were made on the basis of: (a) nonsense word recognition scores for eight young normal-hearing listeners, (b) 'ease of listening' based on the time of response, and (c) the subjective measure of difficulty. The maximum score of speech intelligibility in cochlear implant simulation was 70% for non-reverberant conditions with a 12-channel vocoder and changes of instantaneous frequency limited to 400 Hz. In the presence of reflections, word misidentification was about 10-20 percentage points higher. There was little difference between the 50 and 400 Hz frequency modulation cut-off for the 12-channel vocoder; however, in the case of six channels this difference was more significant. The results of the experiment suggest that the information other than F0, that is carried by FM, can be sufficient to improve speech intelligibility in the real-world conditions.
Systematic Studies of Modified Vocalization: Speech Production Changes During a Variation of Metronomic Speech in Persons Who Do and Do Not Stutter

PubMed Central

Davidow, Jason H.; Bothe, Anne K.; Ye, Jun

2011-01-01

The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 s). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1 s of reading with 1 s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately 7 on a 1–9 scale (1 = highly natural; 9 = highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. Educational Objectives The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1 s of reading and 1 s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. PMID:21664528
High-frequency neural activity predicts word parsing in ambiguous speech streams.

PubMed

Kösem, Anne; Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

2016-12-01

During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. Copyright © 2016 the American Physiological Society.
High-frequency neural activity predicts word parsing in ambiguous speech streams

PubMed Central

Basirat, Anahita; Azizi, Leila; van Wassenhove, Virginie

2016-01-01

During speech listening, the brain parses a continuous acoustic stream of information into computational units (e.g., syllables or words) necessary for speech comprehension. Recent neuroscientific hypotheses have proposed that neural oscillations contribute to speech parsing, but whether they do so on the basis of acoustic cues (bottom-up acoustic parsing) or as a function of available linguistic representations (top-down linguistic parsing) is unknown. In this magnetoencephalography study, we contrasted acoustic and linguistic parsing using bistable speech sequences. While listening to the speech sequences, participants were asked to maintain one of the two possible speech percepts through volitional control. We predicted that the tracking of speech dynamics by neural oscillations would not only follow the acoustic properties but also shift in time according to the participant's conscious speech percept. Our results show that the latency of high-frequency activity (specifically, beta and gamma bands) varied as a function of the perceptual report. In contrast, the phase of low-frequency oscillations was not strongly affected by top-down control. Whereas changes in low-frequency neural oscillations were compatible with the encoding of prelexical segmentation cues, high-frequency activity specifically informed on an individual's conscious speech percept. PMID:27605528
Spectral and temporal changes to speech produced in the presence of energetic and informational maskers.

PubMed

Cooke, Martin; Lu, Youyi

2010-10-01

Talkers change the way they speak in noisy conditions. For energetic maskers, speech production changes are relatively well-understood, but less is known about how informational maskers such as competing speech affect speech production. The current study examines the effect of energetic and informational maskers on speech production by talkers speaking alone or in pairs. Talkers produced speech in quiet and in backgrounds of speech-shaped noise, speech-modulated noise, and competing speech. Relative to quiet, speech output level and fundamental frequency increased and spectral tilt flattened in proportion to the energetic masking capacity of the background. In response to modulated backgrounds, talkers were able to reduce substantially the degree of temporal overlap with the noise, with greater reduction for the competing speech background. Reduction in foreground-background overlap can be expected to lead to a release from both energetic and informational masking for listeners. Passive changes in speech rate, mean pause length or pause distribution cannot explain the overlap reduction, which appears instead to result from a purposeful process of listening while speaking. Talkers appear to monitor the background and exploit upcoming pauses, a strategy which is particularly effective for backgrounds containing intelligible speech.
Improving speech perception in noise with current focusing in cochlear implant users.

PubMed

Srinivasan, Arthi G; Padilla, Monica; Shannon, Robert V; Landsberger, David M

2013-05-01

Cochlear implant (CI) users typically have excellent speech recognition in quiet but struggle with understanding speech in noise. It is thought that broad current spread from stimulating electrodes causes adjacent electrodes to activate overlapping populations of neurons which results in interactions across adjacent channels. Current focusing has been studied as a way to reduce spread of excitation, and therefore, reduce channel interactions. In particular, partial tripolar stimulation has been shown to reduce spread of excitation relative to monopolar stimulation. However, the crucial question is whether this benefit translates to improvements in speech perception. In this study, we compared speech perception in noise with experimental monopolar and partial tripolar speech processing strategies. The two strategies were matched in terms of number of active electrodes, microphone, filterbanks, stimulation rate and loudness (although both strategies used a lower stimulation rate than typical clinical strategies). The results of this study showed a significant improvement in speech perception in noise with partial tripolar stimulation. All subjects benefited from the current focused speech processing strategy. There was a mean improvement in speech recognition threshold of 2.7 dB in a digits in noise task and a mean improvement of 3 dB in a sentences in noise task with partial tripolar stimulation relative to monopolar stimulation. Although the experimental monopolar strategy was worse than the clinical, presumably due to different microphones, frequency allocations and stimulation rates, the experimental partial-tripolar strategy, which had the same changes, showed no acute deficit relative to the clinical. Copyright © 2013 Elsevier B.V. All rights reserved.
[Simulation of speech perception with cochlear implants : Influence of frequency and level of fundamental frequency components with electronic acoustic stimulation].

PubMed

Rader, T; Fastl, H; Baumann, U

2017-03-01

After implantation of cochlear implants with hearing preservation for combined electronic acoustic stimulation (EAS), the residual acoustic hearing ability relays fundamental speech frequency information in the low frequency range. With the help of acoustic simulation of EAS hearing perception the impact of frequency and level fine structure of speech signals can be systematically examined. The aim of this study was to measure the speech reception threshold (SRT) under various noise conditions with acoustic EAS simulation by variation of the frequency and level information of the fundamental frequency f0 of speech. The study was carried out to determine to what extent the SRT is impaired by modification of the f0 fine structure. Using partial tone time pattern analysis an acoustic EAS simulation of the speech material from the Oldenburg sentence test (OLSA) was generated. In addition, determination of the f0 curve of the speech material was conducted. Subsequently, either the parameter frequency or level of f0 was fixed in order to remove one of the two fine contour information of the speech signal. The processed OLSA sentences were used to determine the SRT in background noise under various test conditions. The conditions "f0 fixed frequency" and "f0 fixed level" were tested under two different situations, under "amplitude modulated background noise" and "continuous background noise" conditions. A total of 24 subjects with normal hearing participated in the study. The SRT in background noise for the condition "f0 fixed frequency" was more favorable in continuous noise with 2.7 dB and in modulated noise with 0.8 dB compared to the condition "f0 fixed level" with 3.7 dB and 2.9 dB, respectively. In the simulation of speech perception with cochlear implants and acoustic components, the level information of the fundamental frequency had a stronger impact on speech intelligibility than the frequency information. The method of simulation of transmission of cochlear implants allows investigation of how various parameters influence speech intelligibility in subjects with normal hearing.
Rise time and formant transition duration in the discrimination of speech sounds: the Ba-Wa distinction in developmental dyslexia.

PubMed

Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Dénes

2011-01-01

Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami et al., 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes et al., 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.
A novel radar sensor for the non-contact detection of speech signals.

PubMed

Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi

2010-01-01

Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects.
A Novel Radar Sensor for the Non-Contact Detection of Speech Signals

PubMed Central

Jiao, Mingke; Lu, Guohua; Jing, Xijing; Li, Sheng; Li, Yanfeng; Wang, Jianqi

2010-01-01

Different speech detection sensors have been developed over the years but they are limited by the loss of high frequency speech energy, and have restricted non-contact detection due to the lack of penetrability. This paper proposes a novel millimeter microwave radar sensor to detect speech signals. The utilization of a high operating frequency and a superheterodyne receiver contributes to the high sensitivity of the radar sensor for small sound vibrations. In addition, the penetrability of microwaves allows the novel sensor to detect speech signals through nonmetal barriers. Results show that the novel sensor can detect high frequency speech energies and that the speech quality is comparable to traditional microphone speech. Moreover, the novel sensor can detect speech signals through a nonmetal material of a certain thickness between the sensor and the subject. Thus, the novel speech sensor expands traditional speech detection techniques and provides an exciting alternative for broader application prospects. PMID:22399895
Suprasegmental Characteristics of Spontaneous Speech Produced in Good and Challenging Communicative Conditions by Talkers Aged 9-14 Years.

PubMed

Hazan, Valerie; Tuomainen, Outi; Pettinato, Michèle

2016-12-01

This study investigated the acoustic characteristics of spontaneous speech by talkers aged 9-14 years and their ability to adapt these characteristics to maintain effective communication when intelligibility was artificially degraded for their interlocutor. Recordings were made for 96 children (50 female participants, 46 male participants) engaged in a problem-solving task with a same-sex friend; recordings for 20 adults were used as reference. The task was carried out in good listening conditions (normal transmission) and in degraded transmission conditions. Articulation rate, median fundamental frequency (f0), f0 range, and relative energy in the 1- to 3-kHz range were analyzed. With increasing age, children significantly reduced their median f0 and f0 range, became faster talkers, and reduced their mid-frequency energy in spontaneous speech. Children produced similar clear speech adaptations (in degraded transmission conditions) as adults, but only children aged 11-14 years increased their f0 range, an unhelpful strategy not transmitted via the vocoder. Changes made by children were consistent with a general increase in vocal effort. Further developments in speech production take place during later childhood. Children use clear speech strategies to benefit an interlocutor facing intelligibility problems but may not be able to attune these strategies to the same degree as adults.
Variability in Word Duration as a Function of Probability, Speech Style, and Prosody

PubMed Central

Baker, Rachel E.; Bradlow, Ann R.

2010-01-01

This article examines how probability (lexical frequency and previous mention), speech style, and prosody affect word duration, and how these factors interact. Participants read controlled materials in clear and plain speech styles. As expected, more probable words (higher frequencies and second mentions) were significantly shorter than less probable words, and words in plain speech were significantly shorter than those in clear speech. Interestingly, we found second mention reduction effects in both clear and plain speech, indicating that while clear speech is hyper-articulated, this hyper-articulation does not override probabilistic effects on duration. We also found an interaction between mention and frequency, but only in plain speech. High frequency words allowed more second mention reduction than low frequency words in plain speech, revealing a tendency to hypo-articulate as much as possible when all factors support it. Finally, we found that first mentions were more likely to be accented than second mentions. However, when these differences in accent likelihood were controlled, a significant second mention reduction effect remained. This supports the concept of a direct link between probability and duration, rather than a relationship solely mediated by prosodic prominence. PMID:20121039
Spectrotemporal Modulation Sensitivity as a Predictor of Speech-Reception Performance in Noise With Hearing Aids

PubMed Central

Danielsson, Henrik; Hällgren, Mathias; Stenfelt, Stefan; Rönnberg, Jerker; Lunner, Thomas

2016-01-01

The audiogram predicts <30% of the variance in speech-reception thresholds (SRTs) for hearing-impaired (HI) listeners fitted with individualized frequency-dependent gain. The remaining variance could reflect suprathreshold distortion in the auditory pathways or nonauditory factors such as cognitive processing. The relationship between a measure of suprathreshold auditory function—spectrotemporal modulation (STM) sensitivity—and SRTs in noise was examined for 154 HI listeners fitted with individualized frequency-specific gain. SRTs were measured for 65-dB SPL sentences presented in speech-weighted noise or four-talker babble to an individually programmed master hearing aid, with the output of an ear-simulating coupler played through insert earphones. Modulation-depth detection thresholds were measured over headphones for STM (2cycles/octave density, 4-Hz rate) applied to an 85-dB SPL, 2-kHz lowpass-filtered pink-noise carrier. SRTs were correlated with both the high-frequency (2–6 kHz) pure-tone average (HFA; R2 = .31) and STM sensitivity (R2 = .28). Combined with the HFA, STM sensitivity significantly improved the SRT prediction (ΔR2 = .13; total R2 = .44). The remaining unaccounted variance might be attributable to variability in cognitive function and other dimensions of suprathreshold distortion. STM sensitivity was most critical in predicting SRTs for listeners < 65 years old or with HFA <53 dB HL. Results are discussed in the context of previous work suggesting that STM sensitivity for low rates and low-frequency carriers is impaired by a reduced ability to use temporal fine-structure information to detect dynamic spectra. STM detection is a fast test of suprathreshold auditory function for frequencies <2 kHz that complements the HFA to predict variability in hearing-aid outcomes for speech perception in noise. PMID:27815546
Relationships among psychoacoustic judgments, speech understanding ability and self-perceived handicap in tinnitus subjects.

PubMed

Newman, C W; Wharton, J A; Shivapuja, B G; Jacobson, G P

1994-01-01

Tinnitus is often a disturbing symptom which affects 6-20% of the population. Relationships among tinnitus pitch and loudness judgments, audiometric speech understanding measures and self-perceived handicap were evaluated in a sample of subjects with tinnitus and hearing loss (THL). Data obtained from the THL sample on the audiometric speech measures were compared to the performance of an age-matched hearing loss only (HL) group. Both groups had normal hearing through 1 kHz with a sloping configuration of < or = 20 dB/octave between 2-12 kHz. The THL subjects performed more poorly on the low predictability items of the Speech Perception in Noise Test, suggesting that tinnitus may interfere with the perception of speech signals having reduced linguistic redundancy. The THL subjects rated their tinnitus as annoying at relatively low sensation levels using the pitch-match frequency as the reference tone. Further, significant relationships were found between loudness judgment measures and self-rated annoyance. No predictable relationships were observed between the audiometric speech measures and perceived handicap using the Tinnitus Handicap Questionnaire. These findings support the use of self-report measures in tinnitus patients in that audiometric speech tests alone may be insufficient in describing an individual's reaction to his/her communication breakdowns.
Is There a Relationship Between Tic Frequency and Physiological Arousal? Examination in a Sample of Children With Co-Occurring Tic and Anxiety Disorders.

PubMed

Conelea, Christine A; Ramanujam, Krishnapriya; Walther, Michael R; Freeman, Jennifer B; Garcia, Abbe M

2014-03-01

Stress is the contextual variable most commonly implicated in tic exacerbations. However, research examining associations between tics, stressors, and the biological stress response has yielded mixed results. This study examined whether tics occur at a greater frequency during discrete periods of heightened physiological arousal. Children with co-occurring tic and anxiety disorders (n = 8) completed two stress-induction tasks (discussion of family conflict, public speech). Observational (tic frequencies) and physiological (heart rate [HR]) data were synchronized using The Observer XT, and tic frequencies were compared across periods of high and low HR. Tic frequencies across the entire experiment did not increase during periods of higher HR. During the speech task, tic frequencies were significantly lower during periods of higher HR. Results suggest that tic exacerbations may not be associated with heightened physiological arousal and highlight the need for further tic research using integrated measurement of behavioral and biological processes. © The Author(s) 2014.

Audiovisual Perception of Noise Vocoded Speech in Dyslexic and Non-Dyslexic Adults: The Role of Low-Frequency Visual Modulations

ERIC Educational Resources Information Center

Megnin-Viggars, Odette; Goswami, Usha

2013-01-01

Visual speech inputs can enhance auditory speech information, particularly in noisy or degraded conditions. The natural statistics of audiovisual speech highlight the temporal correspondence between visual and auditory prosody, with lip, jaw, cheek and head movements conveying information about the speech envelope. Low-frequency spatial and…
Acoustic and Perceptual Analyses of Adductor Spasmodic Dysphonia in Mandarin-speaking Chinese.

PubMed

Chen, Zhipeng; Li, Jingyuan; Ren, Qingyi; Ge, Pingjiang

2018-02-12

The objective of this study was to examine the perceptual structure and acoustic characteristics of speech of patients with adductor spasmodic dysphonia (ADSD) in Mandarin. Case-Control Study MATERIALS AND METHODS: For the estimation of dysphonia level, perceptual and acoustic analysis were used for patients with ADSD (N = 20) and the control group (N = 20) that are Mandarin-Chinese speakers. For both subgroups, a sustained vowel and connected speech samples were obtained. The difference of perceptual and acoustic parameters between the two subgroups was assessed and analyzed. For acoustic assessment, the percentage of phonatory breaks (PBs) of connected reading and the percentage of aperiodic segments and frequency shifts (FS) of vowel and reading in patients with ADSD were significantly worse than controls, the mean harmonics-to-noise ratio and the fundamental frequency standard deviation of vowel as well. For perceptual evaluation, the rating of speech and vowel in patients with ADSD are significantly higher than controls. The percentage of aberrant acoustic events (PB, frequency shift, and aperiodic segment) and the fundamental frequency standard deviation and mean harmonics-to-noise ratio were significantly correlated with the perceptual rating in the vowel and reading productions. The perceptual and acoustic parameters of connected vowel and reading in patients with ADSD are worse than those in normal controls, and could validly and reliably estimate dysphonia of ADSD in Mandarin-speaking Chinese. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Formant transitions in the fluent speech of Farsi-speaking people who stutter.

PubMed

Dehqan, Ali; Yadegari, Fariba; Blomgren, Michael; Scherer, Ronald C

2016-06-01

Second formant (F2) transitions can be used to infer attributes of articulatory transitions. This study compared formant transitions during fluent speech segments of Farsi (Persian) speaking people who stutter and normally fluent Farsi speakers. Ten Iranian males who stutter and 10 normally fluent Iranian males participated. Sixteen different "CVt" tokens were embedded within the phrase "Begu CVt an". Measures included overall F2 transition frequency extents, durations, and derived overall slopes, initial F2 transition slopes at 30ms and 60ms, and speaking rate. (1) Mean overall formant frequency extent was significantly greater in 14 of the 16 CVt tokens for the group of stuttering speakers. (2) Stuttering speakers exhibited significantly longer overall F2 transitions for all 16 tokens compared to the nonstuttering speakers. (3) The overall F2 slopes were similar between the two groups. (4) The stuttering speakers exhibited significantly greater initial F2 transition slopes (positive or negative) for five of the 16 tokens at 30ms and six of the 16 tokens at 60ms. (5) The stuttering group produced a slower syllable rate than the non-stuttering group. During perceptually fluent utterances, the stuttering speakers had greater F2 frequency extents during transitions, took longer to reach vowel steady state, exhibited some evidence of steeper slopes at the beginning of transitions, had overall similar F2 formant slopes, and had slower speaking rates compared to nonstuttering speakers. Findings support the notion of different speech motor timing strategies in stuttering speakers. Findings are likely to be independent of the language spoken. Educational objectives This study compares aspects of F2 formant transitions between 10 stuttering and 10 nonstuttering speakers. Readers will be able to describe: (a) characteristics of formant frequency as a specific acoustic feature used to infer speech movements in stuttering and nonstuttering speakers, (b) two methods of measuring second formant (F2) transitions: the visual criteria method and fixed time criteria method, (c) characteristics of F2 transitions in the fluent speech of stuttering speakers and how those characteristics appear to differ from normally fluent speakers, and (d) possible cross-linguistic effects on acoustic analyses of stuttering. Copyright © 2016 Elsevier Inc. All rights reserved.
Relationship Among Signal Fidelity, Hearing Loss, and Working Memory for Digital Noise Suppression.

PubMed

Arehart, Kathryn; Souza, Pamela; Kates, James; Lunner, Thomas; Pedersen, Michael Syskind

2015-01-01

This study considered speech modified by additive babble combined with noise-suppression processing. The purpose was to determine the relative importance of the signal modifications, individual peripheral hearing loss, and individual cognitive capacity on speech intelligibility and speech quality. The participant group consisted of 31 individuals with moderate high-frequency hearing loss ranging in age from 51 to 89 years (mean = 69.6 years). Speech intelligibility and speech quality were measured using low-context sentences presented in babble at several signal-to-noise ratios. Speech stimuli were processed with a binary mask noise-suppression strategy with systematic manipulations of two parameters (error rate and attenuation values). The cumulative effects of signal modification produced by babble and signal processing were quantified using an envelope-distortion metric. Working memory capacity was assessed with a reading span test. Analysis of variance was used to determine the effects of signal processing parameters on perceptual scores. Hierarchical linear modeling was used to determine the role of degree of hearing loss and working memory capacity in individual listener response to the processed noisy speech. The model also considered improvements in envelope fidelity caused by the binary mask and the degradations to envelope caused by error and noise. The participants showed significant benefits in terms of intelligibility scores and quality ratings for noisy speech processed by the ideal binary mask noise-suppression strategy. This benefit was observed across a range of signal-to-noise ratios and persisted when up to a 30% error rate was introduced into the processing. Average intelligibility scores and average quality ratings were well predicted by an objective metric of envelope fidelity. Degree of hearing loss and working memory capacity were significant factors in explaining individual listener's intelligibility scores for binary mask processing applied to speech in babble. Degree of hearing loss and working memory capacity did not predict listeners' quality ratings. The results indicate that envelope fidelity is a primary factor in determining the combined effects of noise and binary mask processing for intelligibility and quality of speech presented in babble noise. Degree of hearing loss and working memory capacity are significant factors in explaining variability in listeners' speech intelligibility scores but not in quality ratings.
Recognition of speech in noise after application of time-frequency masks: Dependence on frequency and threshold parameters

PubMed Central

Sinex, Donal G.

2013-01-01

Binary time-frequency (TF) masks can be applied to separate speech from noise. Previous studies have shown that with appropriate parameters, ideal TF masks can extract highly intelligible speech even at very low speech-to-noise ratios (SNRs). Two psychophysical experiments provided additional information about the dependence of intelligibility on the frequency resolution and threshold criteria that define the ideal TF mask. Listeners identified AzBio Sentences in noise, before and after application of TF masks. Masks generated with 8 or 16 frequency bands per octave supported nearly-perfect identification. Word recognition accuracy was slightly lower and more variable with 4 bands per octave. When TF masks were generated with a local threshold criterion of 0 dB SNR, the mean speech reception threshold was −9.5 dB SNR, compared to −5.7 dB for unprocessed sentences in noise. Speech reception thresholds decreased by about 1 dB per dB of additional decrease in the local threshold criterion. Information reported here about the dependence of speech intelligibility on frequency and level parameters has relevance for the development of non-ideal TF masks for clinical applications such as speech processing for hearing aids. PMID:23556604
Speech training alters tone frequency tuning in rat primary auditory cortex

PubMed Central

Engineer, Crystal T.; Perez, Claudia A.; Carraway, Ryan S.; Chang, Kevin Q.; Roland, Jarod L.; Kilgard, Michael P.

2013-01-01

Previous studies in both humans and animals have documented improved performance following discrimination training. This enhanced performance is often associated with cortical response changes. In this study, we tested the hypothesis that long-term speech training on multiple tasks can improve primary auditory cortex (A1) responses compared to rats trained on a single speech discrimination task or experimentally naïve rats. Specifically, we compared the percent of A1 responding to trained sounds, the responses to both trained and untrained sounds, receptive field properties of A1 neurons, and the neural discrimination of pairs of speech sounds in speech trained and naïve rats. Speech training led to accurate discrimination of consonant and vowel sounds, but did not enhance A1 response strength or the neural discrimination of these sounds. Speech training altered tone responses in rats trained on six speech discrimination tasks but not in rats trained on a single speech discrimination task. Extensive speech training resulted in broader frequency tuning, shorter onset latencies, a decreased driven response to tones, and caused a shift in the frequency map to favor tones in the range where speech sounds are the loudest. Both the number of trained tasks and the number of days of training strongly predict the percent of A1 responding to a low frequency tone. Rats trained on a single speech discrimination task performed less accurately than rats trained on multiple tasks and did not exhibit A1 response changes. Our results indicate that extensive speech training can reorganize the A1 frequency map, which may have downstream consequences on speech sound processing. PMID:24344364
Techniques for the Enhancement of Linear Predictive Speech Coding in Adverse Conditions

NASA Astrophysics Data System (ADS)

Wrench, Alan A.

Available from UMI in association with The British Library. Requires signed TDF. The Linear Prediction model was first applied to speech two and a half decades ago. Since then it has been the subject of intense research and continues to be one of the principal tools in the analysis of speech. Its mathematical tractability makes it a suitable subject for study and its proven success in practical applications makes the study worthwhile. The model is known to be unsuited to speech corrupted by background noise. This has led many researchers to investigate ways of enhancing the speech signal prior to Linear Predictive analysis. In this thesis this body of work is extended. The chosen application is low bit-rate (2.4 kbits/sec) speech coding. For this task the performance of the Linear Prediction algorithm is crucial because there is insufficient bandwidth to encode the error between the modelled speech and the original input. A review of the fundamentals of Linear Prediction and an independent assessment of the relative performance of methods of Linear Prediction modelling are presented. A new method is proposed which is fast and facilitates stability checking, however, its stability is shown to be unacceptably poorer than existing methods. A novel supposition governing the positioning of the analysis frame relative to a voiced speech signal is proposed and supported by observation. The problem of coding noisy speech is examined. Four frequency domain speech processing techniques are developed and tested. These are: (i) Combined Order Linear Prediction Spectral Estimation; (ii) Frequency Scaling According to an Aural Model; (iii) Amplitude Weighting Based on Perceived Loudness; (iv) Power Spectrum Squaring. These methods are compared with the Recursive Linearised Maximum a Posteriori method. Following on from work done in the frequency domain, a time domain implementation of spectrum squaring is developed. In addition, a new method of power spectrum estimation is developed based on the Minimum Variance approach. This new algorithm is shown to be closely related to Linear Prediction but produces slightly broader spectral peaks. Spectrum squaring is applied to both the new algorithm and standard Linear Prediction and their relative performance is assessed. (Abstract shortened by UMI.).
Fundamental frequency discrimination and speech perception in noise in cochlear implant simulationsa)

PubMed Central

Carroll, Jeff; Zeng, Fan-Gang

2007-01-01

Increasing the number of channels at low frequencies improves discrimination of fundamental frequency (F0) in cochlear implants [Geurts and Wouters 2004]. We conducted three experiments to test whether improved F0 discrimination can be translated into increased speech intelligibility in noise in a cochlear implant simulation. The first experiment measured F0 discrimination and speech intelligibility in quiet as a function of channel density over different frequency regions. The results from this experiment showed a tradeoff in performance between F0 discrimination and speech intelligibility with a limited number of channels. The second experiment tested whether improved F0 discrimination and optimizing this tradeoff could improve speech performance with a competing talker. However, improved F0 discrimination did not improve speech intelligibility in noise. The third experiment identified the critical number of channels needed at low frequencies to improve speech intelligibility in noise. The result showed that, while 16 channels below 500 Hz were needed to observe any improvement in speech intelligibility in noise, even 32 channels did not achieve normal performance. Theoretically, these results suggest that without accurate spectral coding, F0 discrimination and speech perception in noise are two independent processes. Practically, the present results illustrate the need to increase the number of independent channels in cochlear implants. PMID:17604581
Cochlear Dead Regions in Typical Hearing Aid Candidates: Prevalence and Implications for Use of High-Frequency Speech Cues

PubMed Central

Cox, Robyn M; Alexander, Genevieve C; Johnson, Jani; Rivera, Izel

2011-01-01

We investigated the prevalence of cochlear dead regions in listeners with hearing losses similar to those of many hearing aid wearers, and explored the impact of these dead regions on speech perception. Prevalence of dead regions was assessed using the Threshold Equalizing Noise test (TEN(HL)). Speech recognition was measured using high-frequency emphasis (HFE) Quick Speech In Noise (QSIN) test stimuli and low-pass filtered HFE QSIN stimuli. About one third of subjects tested positive for a dead region at one or more frequencies. Also, groups without and with dead regions both benefited from additional high-frequency speech cues. PMID:21522068
Recognizing Whispered Speech Produced by an Individual with Surgically Reconstructed Larynx Using Articulatory Movement Data

PubMed Central

Cao, Beiming; Kim, Myungjong; Mau, Ted; Wang, Jun

2017-01-01

Individuals with larynx (vocal folds) impaired have problems in controlling their glottal vibration, producing whispered speech with extreme hoarseness. Standard automatic speech recognition using only acoustic cues is typically ineffective for whispered speech because the corresponding spectral characteristics are distorted. Articulatory cues such as the tongue and lip motion may help in recognizing whispered speech since articulatory motion patterns are generally not affected. In this paper, we investigated whispered speech recognition for patients with reconstructed larynx using articulatory movement data. A data set with both acoustic and articulatory motion data was collected from a patient with surgically reconstructed larynx using an electromagnetic articulograph. Two speech recognition systems, Gaussian mixture model-hidden Markov model (GMM-HMM) and deep neural network-HMM (DNN-HMM), were used in the experiments. Experimental results showed adding either tongue or lip motion data to acoustic features such as mel-frequency cepstral coefficient (MFCC) significantly reduced the phone error rates on both speech recognition systems. Adding both tongue and lip data achieved the best performance. PMID:29423453
Preservation of propositional speech in a pure anomic: the importance of an abstract vocabulary.

PubMed

Crutch, Sebastian J; Warrington, Elizabeth K

2003-12-01

We describe a detailed quantitative analysis of the propositional speech of a patient, FAV, who became severely anomic following a left occipito-temporal infarction. FAV showed a selective noun retrieval deficit in naming to confrontation and from verbal description. Nonetheless, his propositional speech was fluent and content-rich. To quantify this observation, three picture description-based tasks were designed to elicit spontaneous speech. These were pictures of professional occupations, real world scenes and stylised object scenes. FAV's performance was compared and contrasted with that of 5 age- and sex-matched control subjects on a number of variables including speech production rate, volume of output, pause frequency and duration, word frequency, word concreteness and diversity of vocabulary used. FAV's propositional speech fell within the range of normal control performance on the majority of measurements of quality, quantity and fluency. Only in the narrative tasks which relied more heavily upon a concrete vocabulary, did FAV become less voluble and resort to summarising the scenes in an manner. This dissociation between virtually intact propositional speech and a severe naming deficit represents the purest case of anomia currently on record. We attribute this dissociation in part to the preservation of his ability to retrieve his abstract word vocabulary. Our account demonstrates that poor performance on standard naming tasks may be indicative of only a narrowly defined word retrieval deficit. However, we also propose the existence of a feedback circuit which guides sentence construction by providing information regarding lexical availability.
English speech sound development in preschool-aged children from bilingual English-Spanish environments.

PubMed

Gildersleeve-Neumann, Christina E; Kester, Ellen S; Davis, Barbara L; Peña, Elizabeth D

2008-07-01

English speech acquisition by typically developing 3- to 4-year-old children with monolingual English was compared to English speech acquisition by typically developing 3- to 4-year-old children with bilingual English-Spanish backgrounds. We predicted that exposure to Spanish would not affect the English phonetic inventory but would increase error frequency and type in bilingual children. Single-word speech samples were collected from 33 children. Phonetically transcribed samples for the 3 groups (monolingual English children, English-Spanish bilingual children who were predominantly exposed to English, and English-Spanish bilingual children with relatively equal exposure to English and Spanish) were compared at 2 time points and for change over time for phonetic inventory, phoneme accuracy, and error pattern frequencies. Children demonstrated similar phonetic inventories. Some bilingual children produced Spanish phonemes in their English and produced few consonant cluster sequences. Bilingual children with relatively equal exposure to English and Spanish averaged more errors than did bilingual children who were predominantly exposed to English. Both bilingual groups showed higher error rates than English-only children overall, particularly for syllable-level error patterns. All language groups decreased in some error patterns, although the ones that decreased were not always the same across language groups. Some group differences of error patterns and accuracy were significant. Vowel error rates did not differ by language group. Exposure to English and Spanish may result in a higher English error rate in typically developing bilinguals, including the application of Spanish phonological properties to English. Slightly higher error rates are likely typical for bilingual preschool-aged children. Change over time at these time points for all 3 groups was similar, suggesting that all will reach an adult-like system in English with exposure and practice.
Systematic studies of modified vocalization: speech production changes during a variation of metronomic speech in persons who do and do not stutter.

PubMed

Davidow, Jason H; Bothe, Anne K; Ye, Jun

2011-06-01

The most common way to induce fluency using rhythm requires persons who stutter to speak one syllable or one word to each beat of a metronome, but stuttering can also be eliminated when the stimulus is of a particular duration (e.g., 1 second [s]). The present study examined stuttering frequency, speech production changes, and speech naturalness during rhythmic speech that alternated 1s of reading with 1s of silence. A repeated-measures design was used to compare data obtained during a control reading condition and during rhythmic reading in 10 persons who stutter (PWS) and 10 normally fluent controls. Ratings for speech naturalness were also gathered from naïve listeners. Results showed that mean vowel duration increased significantly, and the percentage of short phonated intervals decreased significantly, for both groups from the control to the experimental condition. Mean phonated interval length increased significantly for the fluent controls. Mean speech naturalness ratings during the experimental condition were approximately "7" on a 1-9 scale (1=highly natural; 9=highly unnatural), and these ratings were significantly correlated with vowel duration and phonated intervals for PWS. The findings indicate that PWS may be altering vocal fold vibration duration to obtain fluency during this rhythmic speech style, and that vocal fold vibration duration may have an impact on speech naturalness during rhythmic speech. Future investigations should examine speech production changes and speech naturalness during variations of this rhythmic condition. The reader will be able to: (1) describe changes (from a control reading condition) in speech production variables when alternating between 1s of reading and 1s of silence, (2) describe which rhythmic conditions have been found to sound and feel the most natural, (3) describe methodological issues for studies about alterations in speech production variables during fluency-inducing conditions, and (4) describe which fluency-inducing conditions have been shown to involve a reduction in short phonated intervals. Copyright © 2011 Elsevier Inc. All rights reserved.
Effect of the loss of auditory feedback on segmental parameters of vowels of postlingually deafened speakers.

PubMed

Schenk, Barbara S; Baumgartner, Wolf Dieter; Hamzavi, Jafar Sasan

2003-12-01

The most obvious and best documented changes in speech of postlingually deafened speakers are the rate, fundamental frequency, and volume (energy). These changes are due to the lack of auditory feedback. But auditory feedback affects not only the suprasegmental parameters of speech. The aim of this study was to determine the change at the segmental level of speech in terms of vowel formants. Twenty-three postlingually deafened and 18 normally hearing speakers were recorded reading a German text. The frequencies of the first and second formants and the vowel spaces of selected vowels in word-in-context condition were compared. All first formant frequencies (F1) of the postlingually deafened speakers were significantly different from those of the normally hearing people. The values of F1 were higher for the vowels /e/ (418+/-61 Hz compared with 359+/-52 Hz, P=0.006) and /o/ (459+/-58 compared with 390+/-45 Hz, P=0.0003) and lower for /a/ (765+/-115 Hz compared with 851+/-146 Hz, P=0.038). The second formant frequency (F2) only showed a significant increase for the vowel/e/(2016+/-347 Hz compared with 2279+/-250 Hz, P=0.012). The postlingually deafened people were divided into two subgroups according to duration of deafness (shorter/longer than 10 years of deafness). There was no significant difference in formant changes between the two groups. Our report demonstrated an effect of auditory feedback also on segmental features of speech of postlingually deafened people.
Effects of low harmonics on tone identification in natural and vocoded speech.

PubMed

Liu, Chang; Azimi, Behnam; Tahmina, Qudsia; Hu, Yi

2012-11-01

This study investigated the contribution of low-frequency harmonics to identifying Mandarin tones in natural and vocoded speech in quiet and noisy conditions. Results showed that low-frequency harmonics of natural speech led to highly accurate tone identification; however, for vocoded speech, low-frequency harmonics yielded lower tone identification than stimuli with full harmonics, except for tone 4. Analysis of the correlation between tone accuracy and the amplitude-F0 correlation index suggested that "more" speech contents (i.e., more harmonics) did not necessarily yield better tone recognition for vocoded speech, especially when the amplitude contour of the signals did not co-vary with the F0 contour.
Comparison of different speech tasks among adults who stutter and adults who do not stutter

PubMed Central

Ritto, Ana Paula; Costa, Julia Biancalana; Juste, Fabiola Staróbole; de Andrade, Claudia Regina Furquim

2016-01-01

OBJECTIVES: In this study, we compared the performance of both fluent speakers and people who stutter in three different speaking situations: monologue speech, oral reading and choral reading. This study follows the assumption that the neuromotor control of speech can be influenced by external auditory stimuli in both speakers who stutter and speakers who do not stutter. METHOD: Seventeen adults who stutter and seventeen adults who do not stutter were assessed in three speaking tasks: monologue, oral reading (solo reading aloud) and choral reading (reading in unison with the evaluator). Speech fluency and rate were measured for each task. RESULTS: The participants who stuttered had a lower frequency of stuttering during choral reading than during monologue and oral reading. CONCLUSIONS: According to the dual premotor system model, choral speech enhanced fluency by providing external cues for the timing of each syllable compensating for deficient internal cues. PMID:27074176
Effect of Fundamental Frequency on Judgments of Electrolaryngeal Speech

ERIC Educational Resources Information Center

Nagle, Kathy F.; Eadie, Tanya L.; Wright, Derek R.; Sumida, Yumi A.

2012-01-01

Purpose: To determine (a) the effect of fundamental frequency (f0) on speech intelligibility, acceptability, and perceived gender in electrolaryngeal (EL) speakers, and (b) the effect of known gender on speech acceptability in EL speakers. Method: A 2-part study was conducted. In Part 1, 34 healthy adults provided speech recordings using…
Suppressed Alpha Oscillations Predict Intelligibility of Speech and its Acoustic Details

PubMed Central

Weisz, Nathan

2012-01-01

Modulations of human alpha oscillations (8–13 Hz) accompany many cognitive processes, but their functional role in auditory perception has proven elusive: Do oscillatory dynamics of alpha reflect acoustic details of the speech signal and are they indicative of comprehension success? Acoustically presented words were degraded in acoustic envelope and spectrum in an orthogonal design, and electroencephalogram responses in the frequency domain were analyzed in 24 participants, who rated word comprehensibility after each trial. First, the alpha power suppression during and after a degraded word depended monotonically on spectral and, to a lesser extent, envelope detail. The magnitude of this alpha suppression exhibited an additional and independent influence on later comprehension ratings. Second, source localization of alpha suppression yielded superior parietal, prefrontal, as well as anterior temporal brain areas. Third, multivariate classification of the time–frequency pattern across participants showed that patterns of late posterior alpha power allowed best for above-chance classification of word intelligibility. Results suggest that both magnitude and topography of late alpha suppression in response to single words can indicate a listener's sensitivity to acoustic features and the ability to comprehend speech under adverse listening conditions. PMID:22100354
Audibility-based predictions of speech recognition for children and adults with normal hearing.

PubMed

McCreery, Ryan W; Stelmachowicz, Patricia G

2011-12-01

This study investigated the relationship between audibility and predictions of speech recognition for children and adults with normal hearing. The Speech Intelligibility Index (SII) is used to quantify the audibility of speech signals and can be applied to transfer functions to predict speech recognition scores. Although the SII is used clinically with children, relatively few studies have evaluated SII predictions of children's speech recognition directly. Children have required more audibility than adults to reach maximum levels of speech understanding in previous studies. Furthermore, children may require greater bandwidth than adults for optimal speech understanding, which could influence frequency-importance functions used to calculate the SII. Speech recognition was measured for 116 children and 19 adults with normal hearing. Stimulus bandwidth and background noise level were varied systematically in order to evaluate speech recognition as predicted by the SII and derive frequency-importance functions for children and adults. Results suggested that children required greater audibility to reach the same level of speech understanding as adults. However, differences in performance between adults and children did not vary across frequency bands. © 2011 Acoustical Society of America
The Effect of Syllable Repetition Rate on Vocal Characteristics

ERIC Educational Resources Information Center

Topbas, Oya; Orlikoff, Robert F.; St. Louis, Kenneth O.

2012-01-01

This study examined whether mean vocal fundamental frequency ("F"[subscript 0]) or speech sound pressure level (SPL) varies with changes in syllable repetition rate. Twenty-four young adults (12 M and 12 F) repeated the syllables/p[inverted v]/,/p[inverted v]t[schwa]/, and/p[inverted v]t[schwa]k[schwa]/at a modeled "slow" rate of approximately one…

Discrimination of static and dynamic spectral patterns by children and young adults in relationship to speech perception in noise.

PubMed

Rayes, Hanin; Sheft, Stanley; Shafiro, Valeriy

2014-01-01

Past work has shown relationship between the ability to discriminate spectral patterns and measures of speech intelligibility. The purpose of this study was to investigate the ability of both children and young adults to discriminate static and dynamic spectral patterns, comparing performance between the two groups and evaluating within-group results in terms of relationship to speech-in-noise perception. Data were collected from normal-hearing children (age range: 5.4 - 12.8 yrs) and young adults (mean age: 22.8 yrs) on two spectral discrimination tasks and speech-in-noise perception. The first discrimination task, involving static spectral profiles, measured the ability to detect a change in the phase of a low-density sinusoidal spectral ripple of wideband noise. Using dynamic spectral patterns, the second task determined the signal-to-noise ratio needed to discriminate the temporal pattern of frequency fluctuation imposed by stochastic low-rate frequency modulation (FM). Children performed significantly poorer than young adults on both discrimination tasks. For children, a significant correlation between speech-in-noise perception and spectral-pattern discrimination was obtained only with the dynamic patterns of the FM condition, with partial correlation suggesting that factors related to the children's age mediated the relationship.
Speech disorders in Israeli Arab children.

PubMed

Jaber, L; Nahmani, A; Shohat, M

1997-10-01

The aim of this work was to study the frequency of speech disorders in Israeli Arab children and its association with parental consanguinity. A questionnaire was sent to the parents of 1,495 Arab children attending kindergarten and the first two grades of the seven primary schools in the town of Taibe. Eight-six percent (1,282 parents) responded. The answers to the questionnaire revealed that 25% of the children reportedly had a speech and language disorder. Of the children identified by their parents as having a speech disorder, 44 were selected randomly for examination by a speech specialist. The disorders noted in this subgroup included errors in articulation (48.0%), poor language (18%), poor voice quality (15.9%); stuttering (13.6%), and other problems (4.5%). Rates of affected children of consanguineous and non-consanguineous marriages were 31% and 22.4%, respectively (p < 0.01). We conclude that speech disorders are an important problem among Israeli Arab schoolchildren. More comprehensive programs are needed to facilitate diagnosis and treatment.
Optimizing the Combination of Acoustic and Electric Hearing in the Implanted Ear

PubMed Central

Karsten, Sue A.; Turner, Christopher W.; Brown, Carolyn J.; Jeon, Eun Kyung; Abbas, Paul J.; Gantz, Bruce J.

2016-01-01

Objectives The aim of this study was to determine an optimal approach to program combined acoustic plus electric (A+E) hearing devices in the same ear to maximize speech-recognition performance. Design Ten participants with at least 1 year of experience using Nucleus Hybrid (short electrode) A+E devices were evaluated across three different fitting conditions that varied in the frequency ranges assigned to the acoustically and electrically presented portions of the spectrum. Real-ear measurements were used to optimize the acoustic component for each participant, and the acoustic stimulation was then held constant across conditions. The lower boundary of the electric frequency range was systematically varied to create three conditions with respect to the upper boundary of the acoustic spectrum: Meet, Overlap, and Gap programming. Consonant recognition in quiet and speech recognition in competing-talker babble were evaluated after participants were given the opportunity to adapt by using the experimental programs in their typical everyday listening situations. Participants provided subjective ratings and evaluations for each fitting condition. Results There were no significant differences in performance between conditions (Meet, Overlap, Gap) for consonant recognition in quiet. A significant decrement in performance was measured for the Overlap fitting condition for speech recognition in babble. Subjective ratings indicated a significant preference for the Meet fitting regimen. Conclusions Participants using the Hybrid ipsilateral A+E device generally performed better when the acoustic and electric spectra were programmed to meet at a single frequency region, as opposed to a gap or overlap. Although there is no particular advantage for the Meet fitting strategy for recognition of consonants in quiet, the advantage becomes evident for speech recognition in competing-talker babble and in patient preferences. PMID:23059851
The Dynamic Range for Korean Standard Sentence Material: A Gender Comparison in a Male and a Female Speakers.

PubMed

Park, Kyeong-Yeon; Jin, In-Ki

2015-09-01

The purpose of this study was to identify differences between the dynamic ranges (DRs) of male and female speakers using Korean standard sentence material. Consideration was especially given to effects within the predefined segmentalized frequency-bands. We used Korean standard sentence lists for adults as stimuli. Each sentence was normalized to a root-mean-square of 65 dB sound pressure level. The sentences were then modified to ensure there were no pauses, and the modified sentences were passed through a filter bank in order to perform the frequency analysis. Finally, the DR was quantified using a histogram that showed the cumulative envelope distribution levels of the speech in each frequency band. In DRs that were averaged across all frequency bands, there were no significant differences between the male and the female speakers. However, when considering effects within the predefined frequency bands, there were significant differences in several frequency bands between the DRs of male speech and those of female speech. This study shows that the DR of speech for the male speaker differed from the female speaker in nine frequency bands among 21 frequency bands. These observed differences suggest that a standardized DR of male speech in the band-audibility function of the speech intelligibility index may differ from that of female speech derived in the same way. Further studies are required to derive standardized DRs for Korean speakers.
The Importance of Production Frequency in Therapy for Childhood Apraxia of Speech

ERIC Educational Resources Information Center

Edeal, Denice Michelle; Gildersleeve-Neumann, Christina Elke

2011-01-01

Purpose: This study explores the importance of production frequency during speech therapy to determine whether more practice of speech targets leads to increased performance within a treatment session, as well as to motor learning, in the form of generalization to untrained words. Method: Two children with childhood apraxia of speech were treated…
The Effectiveness of SpeechEasy during Situations of Daily Living

ERIC Educational Resources Information Center

O'Donnell, Jennifer J.; Armson, Joy; Kiefte, Michael

2008-01-01

A multiple single-subject design was used to examine the effects of SpeechEasy on stuttering frequency in the laboratory and in longitudinal samples of speech produced in situations of daily living (SDL). Seven adults who stutter participated, all of whom had exhibited at least 30% reduction in stuttering frequency while using SpeechEasy during…
fMRI reveals two distinct cerebral networks subserving speech motor control.

PubMed

Riecker, A; Mathiak, K; Wildgruber, D; Erb, M; Hertrich, I; Grodd, W; Ackermann, H

2005-02-22

There are few data on the cerebral organization of motor aspects of speech production and the pathomechanisms of dysarthric deficits subsequent to brain lesions and diseases. The authors used fMRI to further examine the neural basis of speech motor control. In eight healthy volunteers, fMRI was performed during syllable repetitions synchronized to click trains (2 to 6 Hz; vs a passive listening task). Bilateral hemodynamic responses emerged at the level of the mesiofrontal and sensorimotor cortex, putamen/pallidum, thalamus, and cerebellum (two distinct activation spots at either side). In contrast, dorsolateral premotor cortex and anterior insula showed left-sided activation. Calculation of rate/response functions revealed a negative linear relationship between repetition frequency and blood oxygen level-dependent (BOLD) signal change within the striatum, whereas both cerebellar hemispheres exhibited a step-wise increase of activation at approximately 3 Hz. Analysis of the temporal dynamics of the BOLD effect found the various cortical and subcortical brain regions engaged in speech motor control to be organized into two separate networks (medial and dorsolateral premotor cortex, anterior insula, and superior cerebellum vs sensorimotor cortex, basal ganglia, and inferior cerebellum). These data provide evidence for two levels of speech motor control bound, most presumably, to motor preparation and execution processes. They also help to explain clinical observations such as an unimpaired or even accelerated speaking rate in Parkinson disease and slowed speech tempo, which does not fall below a rate of 3 Hz, in cerebellar disorders.
[Influence of human personal features on acoustic correlates of speech emotional intonation characteristics].

PubMed

Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M

2009-01-01

Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies.
Design of a robust baseband LPC coder for speech transmission over 9.6 kbit/s noisy channels

NASA Astrophysics Data System (ADS)

Viswanathan, V. R.; Russell, W. H.; Higgins, A. L.

1982-04-01

This paper describes the design of a baseband Linear Predictive Coder (LPC) which transmits speech over 9.6 kbit/sec synchronous channels with random bit errors of up to 1%. Presented are the results of our investigation of a number of aspects of the baseband LPC coder with the goal of maximizing the quality of the transmitted speech. Important among these aspects are: bandwidth of the baseband, coding of the baseband residual, high-frequency regeneration, and error protection of important transmission parameters. The paper discusses these and other issues, presents the results of speech-quality tests conducted during the various stages of optimization, and describes the details of the optimized speech coder. This optimized speech coding algorithm has been implemented as a real-time full-duplex system on an array processor. Informal listening tests of the real-time coder have shown that the coder produces good speech quality in the absence of channel bit errors and introduces only a slight degradation in quality for channel bit error rates of up to 1%.
The effect of instantaneous input dynamic range setting on the speech perception of children with the nucleus 24 implant.

PubMed

Davidson, Lisa S; Skinner, Margaret W; Holstad, Beth A; Fears, Beverly T; Richter, Marie K; Matusofsky, Margaret; Brenner, Christine; Holden, Timothy; Birath, Amy; Kettel, Jerrica L; Scollie, Susan

2009-06-01

The purpose of this study was to examine the effects of a wider instantaneous input dynamic range (IIDR) setting on speech perception and comfort in quiet and noise for children wearing the Nucleus 24 implant system and the Freedom speech processor. In addition, children's ability to understand soft and conversational level speech in relation to aided sound-field thresholds was examined. Thirty children (age, 7 to 17 years) with the Nucleus 24 cochlear implant system and the Freedom speech processor with two different IIDR settings (30 versus 40 dB) were tested on the Consonant Nucleus Consonant (CNC) word test at 50 and 60 dB SPL, the Bamford-Kowal-Bench Speech in Noise Test, and a loudness rating task for four-talker speech noise. Aided thresholds for frequency-modulated tones, narrowband noise, and recorded Ling sounds were obtained with the two IIDRs and examined in relation to CNC scores at 50 dB SPL. Speech Intelligibility Indices were calculated using the long-term average speech spectrum of the CNC words at 50 dB SPL measured at each test site and aided thresholds. Group mean CNC scores at 50 dB SPL with the 40 IIDR were significantly higher (p < 0.001) than with the 30 IIDR. Group mean CNC scores at 60 dB SPL, loudness ratings, and the signal to noise ratios-50 for Bamford-Kowal-Bench Speech in Noise Test were not significantly different for the two IIDRs. Significantly improved aided thresholds at 250 to 6000 Hz as well as higher Speech Intelligibility Indices afforded improved audibility for speech presented at soft levels (50 dB SPL). These results indicate that an increased IIDR provides improved word recognition for soft levels of speech without compromising comfort of higher levels of speech sounds or sentence recognition in noise.
Impact of cognitive function and dysarthria on spoken language and perceived speech severity in multiple sclerosis

NASA Astrophysics Data System (ADS)

Feenaughty, Lynda

Purpose: The current study sought to investigate the separate effects of dysarthria and cognitive status on global speech timing, speech hesitation, and linguistic complexity characteristics and how these speech behaviors impose on listener impressions for three connected speech tasks presumed to differ in cognitive-linguistic demand for four carefully defined speaker groups; 1) MS with cognitive deficits (MSCI), 2) MS with clinically diagnosed dysarthria and intact cognition (MSDYS), 3) MS without dysarthria or cognitive deficits (MS), and 4) healthy talkers (CON). The relationship between neuropsychological test scores and speech-language production and perceptual variables for speakers with cognitive deficits was also explored. Methods: 48 speakers, including 36 individuals reporting a neurological diagnosis of MS and 12 healthy talkers participated. The three MS groups and control group each contained 12 speakers (8 women and 4 men). Cognitive function was quantified using standard clinical tests of memory, information processing speed, and executive function. A standard z-score of ≤ -1.50 indicated deficits in a given cognitive domain. Three certified speech-language pathologists determined the clinical diagnosis of dysarthria for speakers with MS. Experimental speech tasks of interest included audio-recordings of an oral reading of the Grandfather passage and two spontaneous speech samples in the form of Familiar and Unfamiliar descriptive discourse. Various measures of spoken language were of interest. Suprasegmental acoustic measures included speech and articulatory rate. Linguistic speech hesitation measures included pause frequency (i.e., silent and filled pauses), mean silent pause duration, grammatical appropriateness of pauses, and interjection frequency. For the two discourse samples, three standard measures of language complexity were obtained including subordination index, inter-sentence cohesion adequacy, and lexical diversity. Ten listeners judged each speech sample using the perceptual construct of Speech Severity using a visual analog scale. Additional measures obtained to describe participants included the Sentence Intelligibility Test (SIT), the 10-item Communication Participation Item Bank (CPIB), and standard biopsychosocial measures of depression (Beck Depression Inventory-Fast Screen; BDI-FS), fatigue (Fatigue Severity Scale; FSS), and overall disease severity (Expanded Disability Status Scale; EDSS). Healthy controls completed all measures, with the exception of the CPIB and EDSS. All data were analyzed using standard, descriptive and parametric statistics. For the MSCI group, the relationship between neuropsychological test scores and speech-language variables were explored for each speech task using Pearson correlations. The relationship between neuropsychological test scores and Speech Severity also was explored. Results and Discussion: Topic familiarity for descriptive discourse did not strongly influence speech production or perceptual variables; however, results indicated predicted task-related differences for some spoken language measures. With the exception of the MSCI group, all speaker groups produced the same or slower global speech timing (i.e., speech and articulatory rates), more silent and filled pauses, more grammatical and longer silent pause durations in spontaneous discourse compared to reading aloud. Results revealed no appreciable task differences for linguistic complexity measures. Results indicated group differences for speech rate. The MSCI group produced significantly faster speech rates compared to the MSDYS group. Both the MSDYS and the MSCI groups were judged to have significantly poorer perceived Speech Severity compared to typically aging adults. The Task x Group interaction was only significant for the number of silent pauses. The MSDYS group produced fewer silent pauses in spontaneous speech and more silent pauses in the reading task compared to other groups. Finally, correlation analysis revealed moderate relationships between neuropsychological test scores and speech hesitation measures, within the MSCI group. Slower information processing and poorer memory were significantly correlated with more silent pauses and poorer executive function was associated with fewer filled pauses in the Unfamiliar discourse task. Results have both clinical and theoretical implications. Overall, clinicians should demonstrate caution when interpreting global measures of speech timing and perceptual measures in the absence of information about cognitive ability. Results also have implications for a comprehensive model of spoken language incorporating cognitive, linguistic, and motor variables.
Electroacoustic verification of frequency modulation systems in cochlear implant users.

PubMed

Fidêncio, Vanessa Luisa Destro; Jacob, Regina Tangerino de Souza; Tanamati, Liége Franzini; Bucuvic, Érika Cristina; Moret, Adriane Lima Mortari

2017-12-26

The frequency modulation system is a device that helps to improve speech perception in noise and is considered the most beneficial approach to improve speech recognition in noise in cochlear implant users. According to guidelines, there is a need to perform a check before fitting the frequency modulation system. Although there are recommendations regarding the behavioral tests that should be performed at the fitting of the frequency modulation system to cochlear implant users, there are no published recommendations regarding the electroacoustic test that should be performed. Perform and determine the validity of an electroacoustic verification test for frequency modulation systems coupled to different cochlear implant speech processors. The sample included 40 participants between 5 and 18 year's users of four different models of speech processors. For the electroacoustic evaluation, we used the Audioscan Verifit device with the HA-1 coupler and the listening check devices corresponding to each speech processor model. In cases where the transparency was not achieved, a modification was made in the frequency modulation gain adjustment and we used the Brazilian version of the "Phrases in Noise Test" to evaluate the speech perception in competitive noise. It was observed that there was transparency between the frequency modulation system and the cochlear implant in 85% of the participants evaluated. After adjusting the gain of the frequency modulation receiver in the other participants, the devices showed transparency when the electroacoustic verification test was repeated. It was also observed that patients demonstrated better performance in speech perception in noise after a new adjustment, that is, in these cases; the electroacoustic transparency caused behavioral transparency. The electroacoustic evaluation protocol suggested was effective in evaluation of transparency between the frequency modulation system and the cochlear implant. Performing the adjustment of the speech processor and the frequency modulation system gain are essential when fitting this device. Copyright © 2017 Associação Brasileira de Otorrinolaringologia e Cirurgia Cérvico-Facial. Published by Elsevier Editora Ltda. All rights reserved.
Effects of stress on heart rate complexity—A comparison between short-term and chronic stress

PubMed Central

Schubert, C.; Lambertz, M.; Nelesen, R.A.; Bardwell, W.; Choi, J.-B.; Dimsdale, J.E.

2009-01-01

This study examined chronic and short-term stress effects on heart rate variability (HRV), comparing time, frequency and phase domain (complexity) measures in 50 healthy adults. The hassles frequency subscale of the combined hassles and uplifts scale (CHUS) was used to measure chronic stress. Short-term stressor reactivity was assessed with a speech task. HRV measures were determined via surface electrocardiogram (ECG). Because respiration rate decreased during the speech task (p < .001), this study assessed the influence of respiration rate changes on the effects of interest. A series of repeated-measures analyses of covariance (ANCOVA) with Bonferroni adjustment revealed that short-term stress decreased HR D2 (calculated via the pointwise correlation dimension PD2) (p < .001), but increased HR mean (p < .001), standard deviation of R–R (SDRR) intervals (p < .001), low (LF) (p < .001) and high frequency band power (HF) (p = .009). Respiratory sinus arrhythmia (RSA) and LF/HF ratio did not change under short-term stress. Partial correlation adjusting for respiration rate showed that HR D2 was associated with chronic stress (r = −.35, p = .019). Differential effects of chronic and short-term stress were observed on several HRV measures. HR D2 decreased under both stress conditions reflecting lowered functionality of the cardiac pacemaker. The results confirm the importance of complexity metrics in modern stress research on HRV. PMID:19100813
Effects of stress on heart rate complexity--a comparison between short-term and chronic stress.

PubMed

Schubert, C; Lambertz, M; Nelesen, R A; Bardwell, W; Choi, J-B; Dimsdale, J E

2009-03-01

This study examined chronic and short-term stress effects on heart rate variability (HRV), comparing time, frequency and phase domain (complexity) measures in 50 healthy adults. The hassles frequency subscale of the combined hassles and uplifts scale (CHUS) was used to measure chronic stress. Short-term stressor reactivity was assessed with a speech task. HRV measures were determined via surface electrocardiogram (ECG). Because respiration rate decreased during the speech task (p<.001), this study assessed the influence of respiration rate changes on the effects of interest. A series of repeated-measures analyses of covariance (ANCOVA) with Bonferroni adjustment revealed that short-term stress decreased HR D2 (calculated via the pointwise correlation dimension PD2) (p<.001), but increased HR mean (p<.001), standard deviation of R-R (SDRR) intervals (p<.001), low (LF) (p<.001) and high frequency band power (HF) (p=.009). Respiratory sinus arrhythmia (RSA) and LF/HF ratio did not change under short-term stress. Partial correlation adjusting for respiration rate showed that HR D2 was associated with chronic stress (r=-.35, p=.019). Differential effects of chronic and short-term stress were observed on several HRV measures. HR D2 decreased under both stress conditions reflecting lowered functionality of the cardiac pacemaker. The results confirm the importance of complexity metrics in modern stress research on HRV.
Thermal welding vs. cold knife tonsillectomy: a comparison of voice and speech.

PubMed

Celebi, Saban; Yelken, Kursat; Celik, Oner; Taskin, Umit; Topak, Murat

2011-01-01

To compare acoustic, aerodynamic and perceptual voice and speech parameters in thermal welding system tonsillectomy and cold knife tonsillectomy patients in order to determine the impact of operation technique on voice and speech. Thirty tonsillectomy patients (22 children, 8 adults) participated in this study. The preferred technique was cold knife tonsillectomy in 15 patients and thermal welding system tonsillectomy in the remaining 15 patients. One week before and 1 month after surgery the following parameters were estimated: average of fundamental frequency, Jitter, Shimmer, harmonic to noise ratio, formant frequency analyses of sustained vowels. Perceptual speech analysis and aerodynamic measurements (maximum phonation time and s/z ratio) were also conducted. There was no significant difference in any of the parameters between cold knife tonsillectomy and thermal welding system tonsillectomy groups (p>0.05). When the groups were contrasted among themselves with regards to preoperative and postoperative rates, fundamental frequency was found to be significantly decreased after tonsillectomy in both of the groups (p<0.001). First formant for the vowel /a/ in the cold knife tonsillectomy group and for the vowel /i/ in the thermal welding system tonsillectomy group, second formant for the vowel /u/ in the thermal welding system tonsillectomy group and third formant for the vowel /u/ in the cold knife tonsillectomy group were found to be significantly decreased (p<0.05). The surgical technique, whether it is cold knife or thermal welding system, does not appear to affect voice and speech in tonsillectomy patients. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Frontal top-down signals increase coupling of auditory low-frequency oscillations to continuous speech in human listeners.

PubMed

Park, Hyojin; Ince, Robin A A; Schyns, Philippe G; Thut, Gregor; Gross, Joachim

2015-06-15

Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Frontal Top-Down Signals Increase Coupling of Auditory Low-Frequency Oscillations to Continuous Speech in Human Listeners

PubMed Central

Park, Hyojin; Ince, Robin A.A.; Schyns, Philippe G.; Thut, Gregor; Gross, Joachim

2015-01-01

Summary Humans show a remarkable ability to understand continuous speech even under adverse listening conditions. This ability critically relies on dynamically updated predictions of incoming sensory information, but exactly how top-down predictions improve speech processing is still unclear. Brain oscillations are a likely mechanism for these top-down predictions [1, 2]. Quasi-rhythmic components in speech are known to entrain low-frequency oscillations in auditory areas [3, 4], and this entrainment increases with intelligibility [5]. We hypothesize that top-down signals from frontal brain areas causally modulate the phase of brain oscillations in auditory cortex. We use magnetoencephalography (MEG) to monitor brain oscillations in 22 participants during continuous speech perception. We characterize prominent spectral components of speech-brain coupling in auditory cortex and use causal connectivity analysis (transfer entropy) to identify the top-down signals driving this coupling more strongly during intelligible speech than during unintelligible speech. We report three main findings. First, frontal and motor cortices significantly modulate the phase of speech-coupled low-frequency oscillations in auditory cortex, and this effect depends on intelligibility of speech. Second, top-down signals are significantly stronger for left auditory cortex than for right auditory cortex. Third, speech-auditory cortex coupling is enhanced as a function of stronger top-down signals. Together, our results suggest that low-frequency brain oscillations play a role in implementing predictive top-down control during continuous speech perception and that top-down control is largely directed at left auditory cortex. This suggests a close relationship between (left-lateralized) speech production areas and the implementation of top-down control in continuous speech perception. PMID:26028433
Does experimental pain affect auditory processing of speech-relevant signals? A study in healthy young adults.

PubMed

Sapir, Shimon; Pud, Dorit

2008-01-01

To assess the effect of tonic pain stimulation on auditory processing of speech-relevant acoustic signals in healthy pain-free volunteers. Sixty university students, randomly assigned to either a thermal pain stimulation (46 degrees C/6 min) group (PS) or no pain stimulation group (NPS), performed a rate change detection task (RCDT) involving sinusoidally frequency-modulated vowel-like signals. Task difficulty was manipulated by changing the rate of the modulated signals (henceforth rate). Perceived pain intensity was evaluated using a visual analog scale (VAS) (0-100). Mean pain rating was approximately 33 in the PS group and approximately 3 in the NPS group. Pain stimulation was associated with poorer performance on the RCDT, but this trend was not statistically significant. Performance worsened with increasing rate of signal modulation in both groups (p < 0.0001), with no pain by rate interaction. The present findings indicate a trend whereby mild or moderate pain appears to affect auditory processing of speech-relevant acoustic signals. This trend, however, was not statistically significant. It is possible that more intense pain would yield more pronounced (deleterious) effects on auditory processing, but this needs to be verified empirically.
Effects of Feedback Frequency and Timing on Acquisition, Retention, and Transfer of Speech Skills in Acquired Apraxia of Speech

ERIC Educational Resources Information Center

Hula, Shannon N. Austermann; Robin, Donald A.; Maas, Edwin; Ballard, Kirrie J.; Schmidt, Richard A.

2008-01-01

Purpose: Two studies examined speech skill learning in persons with apraxia of speech (AOS). Motor-learning research shows that delaying or reducing the frequency of feedback promotes retention and transfer of skills. By contrast, immediate or frequent feedback promotes temporary performance enhancement but interferes with retention and transfer.…
Objective measurement of motor speech characteristics in the healthy pediatric population.

PubMed

Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P

2011-12-01

To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.

A generalized time-frequency subtraction method for robust speech enhancement based on wavelet filter banks modeling of human auditory system.

PubMed

Shao, Yu; Chang, Chip-Hong

2007-08-01

We present a new speech enhancement scheme for a single-microphone system to meet the demand for quality noise reduction algorithms capable of operating at a very low signal-to-noise ratio. A psychoacoustic model is incorporated into the generalized perceptual wavelet denoising method to reduce the residual noise and improve the intelligibility of speech. The proposed method is a generalized time-frequency subtraction algorithm, which advantageously exploits the wavelet multirate signal representation to preserve the critical transient information. Simultaneous masking and temporal masking of the human auditory system are modeled by the perceptual wavelet packet transform via the frequency and temporal localization of speech components. The wavelet coefficients are used to calculate the Bark spreading energy and temporal spreading energy, from which a time-frequency masking threshold is deduced to adaptively adjust the subtraction parameters of the proposed method. An unvoiced speech enhancement algorithm is also integrated into the system to improve the intelligibility of speech. Through rigorous objective and subjective evaluations, it is shown that the proposed speech enhancement system is capable of reducing noise with little speech degradation in adverse noise environments and the overall performance is superior to several competitive methods.
The Disfluent Speech of Bilingual Spanish–English Children: Considerations for Differential Diagnosis of Stuttering

PubMed Central

Bedore, Lisa M.; Ramos, Daniel

2015-01-01

Purpose The primary purpose of this study was to describe the frequency and types of speech disfluencies that are produced by bilingual Spanish–English (SE) speaking children who do not stutter. The secondary purpose was to determine whether their disfluent speech is mediated by language dominance and/or language produced. Method Spanish and English narratives (a retell and a tell in each language) were elicited and analyzed relative to the frequency and types of speech disfluencies produced. These data were compared with the monolingual English-speaking guidelines for differential diagnosis of stuttering. Results The mean frequency of stuttering-like speech behaviors in the bilingual SE participants ranged from 3% to 22%, exceeding the monolingual English standard of 3 per 100 words. There was no significant frequency difference in stuttering-like or non-stuttering-like speech disfluency produced relative to the child's language dominance. There was a significant difference relative to the language the child was speaking; all children produced significantly more stuttering-like speech disfluencies in Spanish than in English. Conclusion Results demonstrate that the disfluent speech of bilingual SE children should be carefully considered relative to the complex nature of bilingualism. PMID:25215876
Investigations in mechanisms and strategies to enhance hearing with cochlear implants

NASA Astrophysics Data System (ADS)

Churchill, Tyler H.

Cochlear implants (CIs) produce hearing sensations by stimulating the auditory nerve (AN) with current pulses whose amplitudes are modulated by filtered acoustic temporal envelopes. While this technology has provided hearing for multitudinous CI recipients, even bilaterally-implanted listeners have more difficulty understanding speech in noise and localizing sounds than normal hearing (NH) listeners. Three studies reported here have explored ways to improve electric hearing abilities. Vocoders are often used to simulate CIs for NH listeners. Study 1 was a psychoacoustic vocoder study examining the effects of harmonic carrier phase dispersion and simulated CI current spread on speech intelligibility in noise. Results showed that simulated current spread was detrimental to speech understanding and that speech vocoded with carriers whose components' starting phases were equal was the least intelligible. Cross-correlogram analyses of AN model simulations confirmed that carrier component phase dispersion resulted in better neural envelope representation. Localization abilities rely on binaural processing mechanisms in the brainstem and mid-brain that are not fully understood. In Study 2, several potential mechanisms were evaluated based on the ability of metrics extracted from stereo AN simulations to predict azimuthal locations. Results suggest that unique across-frequency patterns of binaural cross-correlation may provide a strong cue set for lateralization and that interaural level differences alone cannot explain NH sensitivity to lateral position. While it is known that many bilateral CI users are sensitive to interaural time differences (ITDs) in low-rate pulsatile stimulation, most contemporary CI processing strategies use high-rate, constant-rate pulse trains. In Study 3, we examined the effects of pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition by bilateral CI listeners. Results showed that listeners were able to use low-rate pulse timing cues presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli even when mixed with high rates on other electrodes. These results have contributed to a better understanding of those aspects of the auditory system that support speech understanding and binaural hearing, suggested vocoder parameters that may simulate aspects of electric hearing, and shown that redundant, low-rate pulse timing supports improved spatial hearing for bilateral CI listeners.
Effect of cognitive load on articulation rate and formant frequencies during simulator flights.

PubMed

Huttunen, Kerttu H; Keränen, Heikki I; Pääkkönen, Rauno J; Päivikki Eskelinen-Rönkä, R; Leino, Tuomo K

2011-03-01

It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings. © 2011 Acoustical Society of America
Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling

NASA Astrophysics Data System (ADS)

Mousa, Allam

2010-01-01

Voice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.
The effect of filtered speech feedback on the frequency of stuttering

NASA Astrophysics Data System (ADS)

Rami, Manish Krishnakant

2000-10-01

This study investigated the effects of filtered components of speech and whispered speech on the frequency of stuttering. It is known that choral speech, shadowing, and altered auditory feedback are the only conditions which induce fluency without any additional effort than normally required to speak on the part of people who stutter. All these conditions use speech as a second signal. This experiment examined the role of components of speech signal as delineated by the source- filter theory of speech production. Three filtered speech signals, a whispered speech signal, and a choral speech signal formed the stimuli. It was postulated that if the speech signal in whole was necessary for producing fluency in people who stutter, then all other conditions except choral speech should fail to produce fluency enhancement. If the glottal source alone was adequate in restoring fluency, then only the conditions of NAF and whispered speech should fail in promoting fluency. In the event that full filter characteristics are necessary for the fluency creating effects, then all conditions except the choral speech and whispered speech should fail to produce fluency. If any part of the filter characteristics is sufficient in yielding fluency, then only the NAF and the approximate glottal source should fail to demonstrate an increase in the amount of fluency. Twelve adults who stuttered read passages under the six conditions while receiving auditory feedback consisting of one of the six experimental conditions: (a)NAF; (b)approximate glottal source; (c)glottal source and first formant; (d)glottal source and first two formants; and (e)whispered speech. Frequencies of stuttering were obtained for each condition and submitted to descriptive and inferential statistical analysis. Statistically significant differences in means were found within the choral feedback conditions. Specifically, the choral speech, the source and first formant, source and the first two formants, and the whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.
The predictive roles of neural oscillations in speech motor adaptability.

PubMed

Sengupta, Ranit; Nasir, Sazzad M

2016-06-01

The human speech system exhibits a remarkable flexibility by adapting to alterations in speaking environments. While it is believed that speech motor adaptation under altered sensory feedback involves rapid reorganization of speech motor networks, the mechanisms by which different brain regions communicate and coordinate their activity to mediate adaptation remain unknown, and explanations of outcome differences in adaption remain largely elusive. In this study, under the paradigm of altered auditory feedback with continuous EEG recordings, the differential roles of oscillatory neural processes in motor speech adaptability were investigated. The predictive capacities of different EEG frequency bands were assessed, and it was found that theta-, beta-, and gamma-band activities during speech planning and production contained significant and reliable information about motor speech adaptability. It was further observed that these bands do not work independently but interact with each other suggesting an underlying brain network operating across hierarchically organized frequency bands to support motor speech adaptation. These results provide novel insights into both learning and disorders of speech using time frequency analysis of neural oscillations. Copyright © 2016 the American Physiological Society.
Spatiotemporal dynamics of auditory attention synchronize with speech

PubMed Central

Wöstmann, Malte; Herrmann, Björn; Maess, Burkhard

2016-01-01

Attention plays a fundamental role in selectively processing stimuli in our environment despite distraction. Spatial attention induces increasing and decreasing power of neural alpha oscillations (8–12 Hz) in brain regions ipsilateral and contralateral to the locus of attention, respectively. This study tested whether the hemispheric lateralization of alpha power codes not just the spatial location but also the temporal structure of the stimulus. Participants attended to spoken digits presented to one ear and ignored tightly synchronized distracting digits presented to the other ear. In the magnetoencephalogram, spatial attention induced lateralization of alpha power in parietal, but notably also in auditory cortical regions. This alpha power lateralization was not maintained steadily but fluctuated in synchrony with the speech rate and lagged the time course of low-frequency (1–5 Hz) sensory synchronization. Higher amplitude of alpha power modulation at the speech rate was predictive of a listener’s enhanced performance of stream-specific speech comprehension. Our findings demonstrate that alpha power lateralization is modulated in tune with the sensory input and acts as a spatiotemporal filter controlling the read-out of sensory content. PMID:27001861
Sex and family history of cardiovascular disease influence heart rate variability during stress among healthy adults.

PubMed

Emery, Charles F; Stoney, Catherine M; Thayer, Julian F; Williams, DeWayne; Bodine, Andrew

2018-07-01

Studies of sex differences in heart rate variability (HRV) typically have not accounted for the influence of family history (FH) of cardiovascular disease (CVD). This study evaluated sex differences in HRV response to speech stress among men and women (age range 30-49 years) with and without a documented FH of CVD. Participants were 77 adults (mean age = 39.8 ± 6.2 years; range: 30-49 years; 52% female) with positive FH (FH+, n = 32) and negative FH (FH-, n = 45) of CVD, verified with relatives of participants. Cardiac activity for all participants was recorded via electrocardiogram during a standardized speech stress task with three phases: 5-minute rest, 5-minute speech, and 5-minute recovery. Outcomes included time domain and frequency domain indicators of HRV and heart rate (HR) at rest and during stress. Data were analyzed with repeated measures analysis of variance, with sex and FH as between subject variables and time/phase as a within subject variable. Women exhibited higher HR than did men and greater HR reactivity in response to the speech stress. However, women also exhibited greater HRV in both the time and frequency domains. FH+ women generally exhibited elevated HRV, despite the elevated risk of CVD associated with FH+. Although women participants exhibited higher HR at rest and during stress, women (both FH+ and FH-) also exhibited elevated HRV reactivity, reflecting greater autonomic control. Thus, enhanced autonomic function observed in prior studies of HRV among women is also evident among FH+ women during a standardized stress task. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparisons of Stuttering Frequency during and after Speech Initiation in Unaltered Feedback, Altered Auditory Feedback and Choral Speech Conditions

ERIC Educational Resources Information Center

Saltuklaroglu, Tim; Kalinowski, Joseph; Robbins, Mary; Crawcour, Stephen; Bowers, Andrew

2009-01-01

Background: Stuttering is prone to strike during speech initiation more so than at any other point in an utterance. The use of auditory feedback (AAF) has been found to produce robust decreases in the stuttering frequency by creating an electronic rendition of choral speech (i.e., speaking in unison). However, AAF requires users to self-initiate…
The Effects of Different Noise Types on Heart Rate Variability in Men

PubMed Central

Sim, Chang Sun; Sung, Joo Hyun; Cheon, Sang Hyeon; Lee, Jang Myung; Lee, Jae Won

2015-01-01

Purpose To determine the impact of noise on heart rate variability (HRV) in men, with a focus on the noise type rather than on noise intensity. Materials and Methods Forty college-going male volunteers were enrolled in this study and were randomly divided into four groups according to the type of noise they were exposed to: background, traffic, speech, or mixed (traffic and speech) noise. All groups except the background group (35 dB) were exposed to 45 dB sound pressure levels. We collected data on age, smoking status, alcohol consumption, and disease status from responses to self-reported questionnaires and medical examinations. We also measured HRV parameters and blood pressure levels before and after exposure to noise. The HRV parameters were evaluated while patients remained seated for 5 minutes, and frequency and time domain analyses were then performed. Results After noise exposure, only the speech noise group showed a reduced low frequency (LF) value, reflecting the activity of both the sympathetic and parasympathetic nervous systems. The low-to-high frequency (LF/HF) ratio, which reflected the activity of the autonomic nervous system (ANS), became more stable, decreasing from 5.21 to 1.37; however, this change was not statistically significant. Conclusion These results indicate that 45 dB(A) of noise, 10 dB(A) higher than background noise, affects the ANS. Additionally, the impact on HRV activity might differ according to the noise quality. Further studies will be required to ascertain the role of noise type. PMID:25510770
The effects of different noise types on heart rate variability in men.

PubMed

Sim, Chang Sun; Sung, Joo Hyun; Cheon, Sang Hyeon; Lee, Jang Myung; Lee, Jae Won; Lee, Jiho

2015-01-01

To determine the impact of noise on heart rate variability (HRV) in men, with a focus on the noise type rather than on noise intensity. Forty college-going male volunteers were enrolled in this study and were randomly divided into four groups according to the type of noise they were exposed to: background, traffic, speech, or mixed (traffic and speech) noise. All groups except the background group (35 dB) were exposed to 45 dB sound pressure levels. We collected data on age, smoking status, alcohol consumption, and disease status from responses to self-reported questionnaires and medical examinations. We also measured HRV parameters and blood pressure levels before and after exposure to noise. The HRV parameters were evaluated while patients remained seated for 5 minutes, and frequency and time domain analyses were then performed. After noise exposure, only the speech noise group showed a reduced low frequency (LF) value, reflecting the activity of both the sympathetic and parasympathetic nervous systems. The low-to-high frequency (LF/HF) ratio, which reflected the activity of the autonomic nervous system (ANS), became more stable, decreasing from 5.21 to 1.37; however, this change was not statistically significant. These results indicate that 45 dB(A) of noise, 10 dB(A) higher than background noise, affects the ANS. Additionally, the impact on HRV activity might differ according to the noise quality. Further studies will be required to ascertain the role of noise type.
Children's Use of the Prosodic Characteristics of Infant-Directed Speech.

ERIC Educational Resources Information Center

Weppelman, Tammy L.; Bostow, Angela; Schiffer, Ryan; Elbert-Perez, Evelyn; Newman, Rochelle S.

2003-01-01

Examined whether young children (4 years of age) show prosodic changes when speaking to infants. Measured children's word duration in infant-directed speech compared to adult-directed speech, examined amplitude variability, and examined both average fundamental frequency and fundamental frequency standard deviation. Results indicate that…
Dissecting choral speech: properties of the accompanist critical to stuttering reduction.

PubMed

Kiefte, Michael; Armson, Joy

2008-01-01

The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated to selectively eliminate specific differences between choral speech and AAF. Consistent with previous findings, results showed that both choral speech and AAF reduced stuttering compared to solo reading. Although reductions under AAF were substantial, they were less dramatic than those for choral speech. Stuttering reduction for choral speech was highly robust even when the accompanist's voice temporally lagged that of the AWS, when there was no opportunity for dynamic interplay between the AWS and accompanist, and when the accompanist was replaced by the AWS's own voice, all of which approximate specific features of AAF. Choral speech was also highly effective in reducing stuttering across changes in speech rate and for both familiar and unfamiliar passages. We concluded that differences in properties between choral speech and AAF other than those that were manipulated in this experiment must account for differences in stuttering reduction. The reader will be able to (1) describe differences in stuttering reduction associated with altered auditory feedback compared to choral speech conditions and (2) describe differences between delivery of a second voice signal as an altered rendition of the speakers own voice (altered auditory feedback) and alterations in the voice of an accompanist (choral speech).
Predicting Speech Intelligibility Decline in Amyotrophic Lateral Sclerosis Based on the Deterioration of Individual Speech Subsystems

PubMed Central

Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L.; Berry, James D.; Perry, Bridget; Green, Jordan R.

2016-01-01

Purpose To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Method Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Results Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Conclusion Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred. PMID:27148967
Predicting Speech Intelligibility Decline in Amyotrophic Lateral Sclerosis Based on the Deterioration of Individual Speech Subsystems.

PubMed

Rong, Panying; Yunusova, Yana; Wang, Jun; Zinman, Lorne; Pattee, Gary L; Berry, James D; Perry, Bridget; Green, Jordan R

2016-01-01

To determine the mechanisms of speech intelligibility impairment due to neurologic impairments, intelligibility decline was modeled as a function of co-occurring changes in the articulatory, resonatory, phonatory, and respiratory subsystems. Sixty-six individuals diagnosed with amyotrophic lateral sclerosis (ALS) were studied longitudinally. The disease-related changes in articulatory, resonatory, phonatory, and respiratory subsystems were quantified using multiple instrumental measures, which were subjected to a principal component analysis and mixed effects models to derive a set of speech subsystem predictors. A stepwise approach was used to select the best set of subsystem predictors to model the overall decline in intelligibility. Intelligibility was modeled as a function of five predictors that corresponded to velocities of lip and jaw movements (articulatory), number of syllable repetitions in the alternating motion rate task (articulatory), nasal airflow (resonatory), maximum fundamental frequency (phonatory), and speech pauses (respiratory). The model accounted for 95.6% of the variance in intelligibility, among which the articulatory predictors showed the most substantial independent contribution (57.7%). Articulatory impairments characterized by reduced velocities of lip and jaw movements and resonatory impairments characterized by increased nasal airflow served as the subsystem predictors of the longitudinal decline of speech intelligibility in ALS. Declines in maximum performance tasks such as the alternating motion rate preceded declines in intelligibility, thus serving as early predictors of bulbar dysfunction. Following the rapid decline in speech intelligibility, a precipitous decline in maximum performance tasks subsequently occurred.
Study of accent-based music speech protocol development for improving voice problems in stroke patients with mixed dysarthria.

PubMed

Kim, Soo Ji; Jo, Uiri

2013-01-01

Based on the anatomical and functional commonality between singing and speech, various types of musical elements have been employed in music therapy research for speech rehabilitation. This study was to develop an accent-based music speech protocol to address voice problems of stroke patients with mixed dysarthria. Subjects were 6 stroke patients with mixed dysarthria and they received individual music therapy sessions. Each session was conducted for 30 minutes and 12 sessions including pre- and post-test were administered for each patient. For examining the protocol efficacy, the measures of maximum phonation time (MPT), fundamental frequency (F0), average intensity (dB), jitter, shimmer, noise to harmonics ratio (NHR), and diadochokinesis (DDK) were compared between pre and post-test and analyzed with a paired sample t-test. The results showed that the measures of MPT, F0, dB, and sequential motion rates (SMR) were significantly increased after administering the protocol. Also, there were statistically significant differences in the measures of shimmer, and alternating motion rates (AMR) of the syllable /K$\\inve$/ between pre- and post-test. The results indicated that the accent-based music speech protocol may improve speech motor coordination including respiration, phonation, articulation, resonance, and prosody of patients with dysarthria. This suggests the possibility of utilizing the music speech protocol to maximize immediate treatment effects in the course of a long-term treatment for patients with dysarthria.
Coding strategies for cochlear implants under adverse environments

NASA Astrophysics Data System (ADS)

Tahmina, Qudsia

Cochlear implants are electronic prosthetic devices that restores partial hearing in patients with severe to profound hearing loss. Although most coding strategies have significantly improved the perception of speech in quite listening conditions, there remains limitations on speech perception under adverse environments such as in background noise, reverberation and band-limited channels, and we propose strategies that improve the intelligibility of speech transmitted over the telephone networks, reverberated speech and speech in the presence of background noise. For telephone processed speech, we propose to examine the effects of adding low-frequency and high- frequency information to the band-limited telephone speech. Four listening conditions were designed to simulate the receiving frequency characteristics of telephone handsets. Results indicated improvement in cochlear implant and bimodal listening when telephone speech was augmented with high frequency information and therefore this study provides support for design of algorithms to extend the bandwidth towards higher frequencies. The results also indicated added benefit from hearing aids for bimodal listeners in all four types of listening conditions. Speech understanding in acoustically reverberant environments is always a difficult task for hearing impaired listeners. Reverberated sounds consists of direct sound, early reflections and late reflections. Late reflections are known to be detrimental to speech intelligibility. In this study, we propose a reverberation suppression strategy based on spectral subtraction to suppress the reverberant energies from late reflections. Results from listening tests for two reverberant conditions (RT60 = 0.3s and 1.0s) indicated significant improvement when stimuli was processed with SS strategy. The proposed strategy operates with little to no prior information on the signal and the room characteristics and therefore, can potentially be implemented in real-time CI speech processors. For speech in background noise, we propose a mechanism underlying the contribution of harmonics to the benefit of electroacoustic stimulations in cochlear implants. The proposed strategy is based on harmonic modeling and uses synthesis driven approach to synthesize the harmonics in voiced segments of speech. Based on objective measures, results indicated improvement in speech quality. This study warrants further work into development of algorithms to regenerate harmonics of voiced segments in the presence of noise.
An investigation of the effects of a speech-restructuring treatment for stuttering on the distribution of intervals of phonation.

PubMed

Brown, Lisa; Wilson, Linda; Packman, Ann; Halaki, Mark; Onslow, Mark; Menzies, Ross

2016-12-01

The purpose of this study was to investigate whether stuttering reductions following the instatement phase of a speech-restructuring treatment for adults were accompanied by reductions in the frequency of short intervals of phonation (PIs). The study was prompted by the possibility that reductions in the frequency of short PIs is the mechanism underlying such reductions in stuttering. The distribution of PIs was determined for seven adults who stutter, before and immediately after the intensive phase of a speech-restructuring treatment program. Audiovisual recordings of conversational speech were made on both assessment occasions, with PIs recorded with an accelerometer. All seven participants had much lower levels of stuttering after treatment but these were associated with reductions in the frequency of short PIs for only four of them. For the other three participants, two showed no change in frequency of short PIs, while for the other participant the frequency of short PIs actually increased. Stuttering reduction with speech-restructuring treatment can co-occur with reduction in the frequency of short PIs. However, the latter does not appear necessary for this reduction in stuttering to occur. Thus, speech-restructuring treatment must have other, or additional, treatment agents for stuttering to reduce. Copyright © 2016 Elsevier Inc. All rights reserved.
Speech Rhythms and Multiplexed Oscillatory Sensory Coding in the Human Brain

PubMed Central

Gross, Joachim; Hoogenboom, Nienke; Thut, Gregor; Schyns, Philippe; Panzeri, Stefano; Belin, Pascal; Garrod, Simon

2013-01-01

Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations. PMID:24391472

Characterizing resonant component in speech: A different view of tracking fundamental frequency

NASA Astrophysics Data System (ADS)

Dong, Bin

2017-05-01

Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
Effects of various electrode configurations on music perception, intonation and speaker gender identification.

PubMed

Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut

2014-01-01

Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.
Speech intelligibility and speech quality of modified loudspeaker announcements examined in a simulated aircraft cabin.

PubMed

Pennig, Sibylle; Quehl, Julia; Wittkowski, Martin

2014-01-01

Acoustic modifications of loudspeaker announcements were investigated in a simulated aircraft cabin to improve passengers' speech intelligibility and quality of communication in this specific setting. Four experiments with 278 participants in total were conducted in an acoustic laboratory using a standardised speech test and subjective rating scales. In experiments 1 and 2 the sound pressure level (SPL) of the announcements was varied (ranging from 70 to 85 dB(A)). Experiments 3 and 4 focused on frequency modification (octave bands) of the announcements. All studies used a background noise with the same SPL (74 dB(A)), but recorded at different seat positions in the aircraft cabin (front, rear). The results quantify speech intelligibility improvements with increasing signal-to-noise ratio and amplification of particular octave bands, especially the 2 kHz and the 4 kHz band. Thus, loudspeaker power in an aircraft cabin can be reduced by using appropriate filter settings in the loudspeaker system.
On the Use of Evolutionary Algorithms to Improve the Robustness of Continuous Speech Recognition Systems in Adverse Conditions

NASA Astrophysics Data System (ADS)

Selouani, Sid-Ahmed; O'Shaughnessy, Douglas

2003-12-01

Limiting the decrease in performance due to acoustic environment changes remains a major challenge for continuous speech recognition (CSR) systems. We propose a novel approach which combines the Karhunen-Loève transform (KLT) in the mel-frequency domain with a genetic algorithm (GA) to enhance the data representing corrupted speech. The idea consists of projecting noisy speech parameters onto the space generated by the genetically optimized principal axis issued from the KLT. The enhanced parameters increase the recognition rate for highly interfering noise environments. The proposed hybrid technique, when included in the front-end of an HTK-based CSR system, outperforms that of the conventional recognition process in severe interfering car noise environments for a wide range of signal-to-noise ratios (SNRs) varying from 16 dB to[InlineEquation not available: see fulltext.] dB. We also showed the effectiveness of the KLT-GA method in recognizing speech subject to telephone channel degradations.
Calculation of selective filters of a device for primary analysis of speech signals

NASA Astrophysics Data System (ADS)

Chudnovskii, L. S.; Ageev, V. M.

2014-07-01

The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40-200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3-6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve.
Tolerable hearing-aid delays: IV. effects on subjective disturbance during speech production by hearing-impaired subjects.

PubMed

Stone, Michael A; Moore, Brian C J

2005-04-01

We assessed the effects of time delay in a hearing aid on subjective disturbance and reading rates while the user of the aid was speaking, using hearing-impaired subjects and real-time processing. The time delay was constant across frequency. A digital signal processor was programmed as a four-channel, fast-acting, wide-dynamic-range compression hearing aid. One of four delays could be selected on the aid to produce a total delay of 13, 21, 30, or 40 msec between microphone and receiver. Twenty-five subjects, mostly with near-symmetric hearing impairment of cochlear origin, were fitted bilaterally with behind-the-ear aids connected to the processor. The aids were programmed with insertion gains prescribed by the CAMEQ loudness equalization procedure for each subject and ear. Subjects were asked to read aloud from scripts: speech production rates were measured and subjective ratings of the disturbance of the delay were obtained. Subjects required some training to recognize the effects of the delay to rate it consistently. Subjective disturbance increased progressively with increasing delay and was a nonmonotonic function of low-frequency hearing loss. Subjects with mild or severe low-frequency hearing loss were generally less disturbed by the delay than those with moderate loss. Disturbance ratings tended to decrease over successive tests. Word production rates were not significantly affected by delay over the range of delays tested. The results follow a pattern similar to those presented in , obtained using a simulation of hearing loss and normally hearing subjects, except for the nonmonotonic variation of disturbance with low-frequency hearing loss. We hypothesize that disturbance is maximal when the levels in the ear canal of the low-frequency components are similar for the unaided and aided sounds. A rating of 3, which is probably just acceptable, was obtained for delays ranging from 14 to 30 msec, depending on the hearing loss. Some acclimatization to the subjective disturbance occurred over a time scale of about 1 hour.
The effects of decreased audibility produced by high-pass noise masking on cortical event-related potentials to speech sounds/ba/and/da.

PubMed

Martin, B A; Sigal, A; Kurtzberg, D; Stapells, D R

1997-03-01

This study investigated the effects of decreased audibility produced by high-pass noise masking on cortical event-related potentials (ERPs) N1, N2, and P3 to the speech sounds /ba/and/da/presented at 65 and 80 dB SPL. Normal-hearing subjects pressed a button in response to the deviant sound in an oddball paradigm. Broadband masking noise was presented at an intensity sufficient to completely mask the response to the 65-dB SPL speech sounds, and subsequently high-pass filtered at 4000, 2000, 1000, 500, and 250 Hz. With high-pass masking noise, pure-tone behavioral thresholds increased by an average of 38 dB at the high-pass cutoff and by 50 dB one octave above the cutoff frequency. Results show that as the cutoff frequency of the high-pass masker was lowered, ERP latencies to speech sounds increased and amplitudes decreased. The cutoff frequency where these changes first occurred and the rate of the change differed for N1 compared to N2, P3, and the behavioral measures. N1 showed gradual changes as the masker cutoff frequency was lowered. N2, P3, and behavioral measures showed marked changes below a masker cutoff of 2000 Hz. These results indicate that the decreased audibility resulting from the noise masking affects the various ERP components in a differential manner. N1 is related to the presence of audible stimulus energy, being present whether audible stimuli are discriminable or not. In contrast, N2 and P3 were absent when the stimuli were audible but not discriminable (i.e., when the second formant transitions were masked), reflecting stimulus discrimination. These data have implications regarding the effects of decreased audibility on cortical processing of speech sounds and for the study of cortical ERPs in populations with hearing impairment.
Singing in groups for Parkinson's disease (SING-PD): a pilot study of group singing therapy for PD-related voice/speech disorders.

PubMed

Shih, Ludy C; Piel, Jordan; Warren, Amanda; Kraics, Lauren; Silver, Althea; Vanderhorst, Veronique; Simon, David K; Tarsy, Daniel

2012-06-01

Parkinson's disease related speech and voice impairment have significant impact on quality of life measures. LSVT(®)LOUD voice and speech therapy (Lee Silverman Voice Therapy) has demonstrated scientific efficacy and clinical effectiveness, but musically based voice and speech therapy has been underexplored as a potentially useful method of rehabilitation. We undertook a pilot, open-label study of a group-based singing intervention, consisting of twelve 90-min weekly sessions led by a voice and speech therapist/singing instructor. The primary outcome measure of vocal loudness as measured by sound pressure level (SPL) at 50 cm during connected speech was not significantly different one week after the intervention or at 13 weeks after the intervention. A number of secondary measures reflecting pitch range, phonation time and maximum loudness also were unchanged. Voice related quality of life (VRQOL) and voice handicap index (VHI) also were unchanged. This study suggests that a group singing therapy intervention at this intensity and frequency does not result in significant improvement in objective and subject-rated measures of voice and speech impairment. Copyright © 2012 Elsevier Ltd. All rights reserved.
Multitalker Speech Perception with Ideal Time-Frequency Segregation: Effects of Voice Characteristics and Number of Talkers

DTIC Science & Technology

2009-03-23

Multitalker speech perception with ideal time-frequency segregation: Effects of voice characteristics and number of talkers Douglas S. Brungarta Air...INTRODUCTION Speech perception in multitalker listening environments is limited by two very different types of masking. The first is energetic...06 MAR 2009 2. REPORT TYPE 3. DATES COVERED 00-00-2009 to 00-00-2009 4. TITLE AND SUBTITLE Multitalker speech perception with ideal time
Comparing different models of the development of verb inflection in early child Spanish.

PubMed

Aguado-Orea, Javier; Pine, Julian M

2015-01-01

How children acquire knowledge of verb inflection is a long-standing question in language acquisition research. In the present study, we test the predictions of some current constructivist and generativist accounts of the development of verb inflection by focusing on data from two Spanish-speaking children between the ages of 2;0 and 2;6. The constructivist claim that children's early knowledge of verb inflection is only partially productive is tested by comparing the average number of different inflections per verb in matched samples of child and adult speech. The generativist claim that children's early use of verb inflection is essentially error-free is tested by investigating the rate at which the children made subject-verb agreement errors in different parts of the present tense paradigm. Our results show: 1) that, although even adults' use of verb inflection in Spanish tends to look somewhat lexically restricted, both children's use of verb inflection was significantly less flexible than that of their caregivers, and 2) that, although the rate at which the two children produced subject-verb agreement errors in their speech was very low, this overall error rate hid a consistent pattern of error in which error rates were substantially higher in low frequency than in high frequency contexts, and substantially higher for low frequency than for high frequency verbs. These results undermine the claim that children's use of verb inflection is fully productive from the earliest observable stages, and are consistent with the constructivist claim that knowledge of verb inflection develops only gradually.
Acoustic diagnosis of pulmonary hypertension: automated speech- recognition-inspired classification algorithm outperforms physicians

NASA Astrophysics Data System (ADS)

Kaddoura, Tarek; Vadlamudi, Karunakar; Kumar, Shine; Bobhate, Prashant; Guo, Long; Jain, Shreepal; Elgendi, Mohamed; Coe, James Y.; Kim, Daniel; Taylor, Dylan; Tymchak, Wayne; Schuurmans, Dale; Zemp, Roger J.; Adatia, Ian

2016-09-01

We hypothesized that an automated speech- recognition-inspired classification algorithm could differentiate between the heart sounds in subjects with and without pulmonary hypertension (PH) and outperform physicians. Heart sounds, electrocardiograms, and mean pulmonary artery pressures (mPAp) were recorded simultaneously. Heart sound recordings were digitized to train and test speech-recognition-inspired classification algorithms. We used mel-frequency cepstral coefficients to extract features from the heart sounds. Gaussian-mixture models classified the features as PH (mPAp ≥ 25 mmHg) or normal (mPAp < 25 mmHg). Physicians blinded to patient data listened to the same heart sound recordings and attempted a diagnosis. We studied 164 subjects: 86 with mPAp ≥ 25 mmHg (mPAp 41 ± 12 mmHg) and 78 with mPAp < 25 mmHg (mPAp 17 ± 5 mmHg) (p < 0.005). The correct diagnostic rate of the automated speech-recognition-inspired algorithm was 74% compared to 56% by physicians (p = 0.005). The false positive rate for the algorithm was 34% versus 50% (p = 0.04) for clinicians. The false negative rate for the algorithm was 23% and 68% (p = 0.0002) for physicians. We developed an automated speech-recognition-inspired classification algorithm for the acoustic diagnosis of PH that outperforms physicians that could be used to screen for PH and encourage earlier specialist referral.
The influence of audibility on speech recognition with nonlinear frequency compression for children and adults with hearing loss

PubMed Central

McCreery, Ryan W.; Alexander, Joshua; Brennan, Marc A.; Hoover, Brenda; Kopun, Judy; Stelmachowicz, Patricia G.

2014-01-01

Objective The primary goal of nonlinear frequency compression (NFC) and other frequency lowering strategies is to increase the audibility of high-frequency sounds that are not otherwise audible with conventional hearing-aid processing due to the degree of hearing loss, limited hearing aid bandwidth or a combination of both factors. The aim of the current study was to compare estimates of speech audibility processed by NFC to improvements in speech recognition for a group of children and adults with high-frequency hearing loss. Design Monosyllabic word recognition was measured in noise for twenty-four adults and twelve children with mild to severe sensorineural hearing loss. Stimuli were amplified based on each listener’s audiogram with conventional processing (CP) with amplitude compression or with NFC and presented under headphones using a software-based hearing aid simulator. A modification of the speech intelligibility index (SII) was used to estimate audibility of information in frequency-lowered bands. The mean improvement in SII was compared to the mean improvement in speech recognition. Results All but two listeners experienced improvements in speech recognition with NFC compared to CP, consistent with the small increase in audibility that was estimated using the modification of the SII. Children and adults had similar improvements in speech recognition with NFC. Conclusion Word recognition with NFC was higher than CP for children and adults with mild to severe hearing loss. The average improvement in speech recognition with NFC (7%) was consistent with the modified SII, which indicated that listeners experienced an increase in audibility with NFC compared to CP. Further studies are necessary to determine if changes in audibility with NFC are related to speech recognition with NFC for listeners with greater degrees of hearing loss, with a greater variety of compression settings, and using auditory training. PMID:24535558
THE COMPREHENSION OF RAPID SPEECH BY THE BLIND, PART III.

ERIC Educational Resources Information Center

FOULKE, EMERSON

A REVIEW OF THE RESEARCH ON THE COMPREHENSION OF RAPID SPEECH BY THE BLIND IDENTIFIES FIVE METHODS OF SPEECH COMPRESSION--SPEECH CHANGING, ELECTROMECHANICAL SAMPLING, COMPUTER SAMPLING, SPEECH SYNTHESIS, AND FREQUENCY DIVIDING WITH THE HARMONIC COMPRESSOR. THE SPEECH CHANGING AND ELECTROMECHANICAL SAMPLING METHODS AND THE NECESSARY APPARATUS HAVE…
Evaluating standard airborne sound insulation measures in terms of annoyance, loudness, and audibility ratings.

PubMed

Park, H K; Bradley, J S

2009-07-01

This paper reports the results of an evaluation of the merits of standard airborne sound insulation measures with respect to subjective ratings of the annoyance and loudness of transmitted sounds. Subjects listened to speech and music sounds modified to represent transmission through 20 different walls with sound transmission class (STC) ratings from 34 to 58. A number of variations in the standard measures were also considered. These included variations in the 8-dB rule for the maximum allowed deficiency in the STC measure as well as variations in the standard 32-dB total allowed deficiency. Several spectrum adaptation terms were considered in combination with weighted sound reduction index (R(w)) values as well as modifications to the range of included frequencies in the standard rating contour. A STC measure without an 8-dB rule and an R(w) rating with a new spectrum adaptation term were better predictors of annoyance and loudness ratings of speech sounds. R(w) ratings with one of two modified C(tr) spectrum adaptation terms were better predictors of annoyance and loudness ratings of transmitted music sounds. Although some measures were much better predictors of responses to one type of sound than were the standard STC and R(w) values, no measure was remarkably improved for predicting annoyance and loudness ratings of both music and speech sounds.
Speech Prosody Across Stimulus Types for Individuals with Parkinson's Disease.

PubMed

K-Y Ma, Joan; Schneider, Christine B; Hoffmann, Rüdiger; Storch, Alexander

2015-01-01

Up to 89% of the individuals with Parkinson's disease (PD) experience speech problem over the course of the disease. Speech prosody and intelligibility are two of the most affected areas in hypokinetic dysarthria. However, assessment of these areas could potentially be problematic as speech prosody and intelligibility could be affected by the type of speech materials employed. To comparatively explore the effects of different types of speech stimulus on speech prosody and intelligibility in PD speakers. Speech prosody and intelligibility of two groups of individuals with varying degree of dysarthria resulting from PD was compared to that of a group of control speakers using sentence reading, passage reading and monologue. Acoustic analysis including measures on fundamental frequency (F0), intensity and speech rate was used to form a prosodic profile for each individual. Speech intelligibility was measured for the speakers with dysarthria using direct magnitude estimation. Difference in F0 variability between the speakers with dysarthria and control speakers was only observed in sentence reading task. Difference in the average intensity level was observed for speakers with mild dysarthria to that of the control speakers. Additionally, there were stimulus effect on both intelligibility and prosodic profile. The prosodic profile of PD speakers was different from that of the control speakers in the more structured task, and lower intelligibility was found in less structured task. This highlighted the value of both structured and natural stimulus to evaluate speech production in PD speakers.
Mommy is only happy! Dutch mothers' realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent.

PubMed

Benders, Titia

2013-12-01

Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers' realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers' expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. Copyright © 2013 Elsevier Inc. All rights reserved.
Effect of input compression and input frequency response on music perception in cochlear implant users.

PubMed

Halliwell, Emily R; Jones, Linor L; Fraser, Matthew; Lockley, Morag; Hill-Feltham, Penelope; McKay, Colette M

2015-06-01

A study was conducted to determine whether modifications to input compression and input frequency response characteristics can improve music-listening satisfaction in cochlear implant users. Experiment 1 compared three pre-processed versions of music and speech stimuli in a laboratory setting: original, compressed, and flattened frequency response. Music excerpts comprised three music genres (classical, country, and jazz), and a running speech excerpt was compared. Experiment 2 implemented a flattened input frequency response in the speech processor program. In a take-home trial, participants compared unaltered and flattened frequency responses. Ten and twelve adult Nucleus Freedom cochlear implant users participated in Experiments 1 and 2, respectively. Experiment 1 revealed a significant preference for music stimuli with a flattened frequency response compared to both original and compressed stimuli, whereas there was a significant preference for the original (rising) frequency response for speech stimuli. Experiment 2 revealed no significant mean preference for the flattened frequency response, with 9 of 11 subjects preferring the rising frequency response. Input compression did not alter music enjoyment. Comparison of the two experiments indicated that individual frequency response preferences may depend on the genre or familiarity, and particularly whether the music contained lyrics.
Relative Salience of Speech Rhythm and Speech Rate on Perceived Foreign Accent in a Second Language.

PubMed

Polyanskaya, Leona; Ordin, Mikhail; Busa, Maria Grazia

2017-09-01

We investigated the independent contribution of speech rate and speech rhythm to perceived foreign accent. To address this issue we used a resynthesis technique that allows neutralizing segmental and tonal idiosyncrasies between identical sentences produced by French learners of English at different proficiency levels and maintaining the idiosyncrasies pertaining to prosodic timing patterns. We created stimuli that (1) preserved the idiosyncrasies in speech rhythm while controlling for the differences in speech rate between the utterances; (2) preserved the idiosyncrasies in speech rate while controlling for the differences in speech rhythm between the utterances; and (3) preserved the idiosyncrasies both in speech rate and speech rhythm. All the stimuli were created in intoned (with imposed intonational contour) and flat (with monotonized, constant F0) conditions. The original and the resynthesized sentences were rated by native speakers of English for degree of foreign accent. We found that both speech rate and speech rhythm influence the degree of perceived foreign accent, but the effect of speech rhythm is larger than that of speech rate. We also found that intonation enhances the perception of fine differences in rhythmic patterns but reduces the perceptual salience of fine differences in speech rate.
Integrating cognitive and peripheral factors in predicting hearing-aid processing effectiveness

PubMed Central

Kates, James M.; Arehart, Kathryn H.; Souza, Pamela E.

2013-01-01

Individual factors beyond the audiogram, such as age and cognitive abilities, can influence speech intelligibility and speech quality judgments. This paper develops a neural network framework for combining multiple subject factors into a single model that predicts speech intelligibility and quality for a nonlinear hearing-aid processing strategy. The nonlinear processing approach used in the paper is frequency compression, which is intended to improve the audibility of high-frequency speech sounds by shifting them to lower frequency regions where listeners with high-frequency loss have better hearing thresholds. An ensemble averaging approach is used for the neural network to avoid the problems associated with overfitting. Models are developed for two subject groups, one having nearly normal hearing and the other mild-to-moderate sloping losses. PMID:25669257
Phonation interval modification and speech performance quality during fluency-inducing conditions by adults who stutter

PubMed Central

Ingham, Roger J.; Bothe, Anne K.; Wang, Yuedong; Purkhiser, Krystal; New, Anneliese

2012-01-01

Purpose To relate changes in four variables previously defined as characteristic of normally fluent speech to changes in phonatory behavior during oral reading by persons who stutter (PWS) and normally fluent controls under multiple fluency-inducing (FI) conditions. Method Twelve PWS and 12 controls each completed 4 ABA experiments. During A phases, participants read normally. B phases were 4 different FI conditions: auditory masking, chorus reading, whispering, and rhythmic stimulation. Dependent variables were the durations of accelerometer-recorded phonated intervals; self-judged speech effort; and observer-judged stuttering frequency, speech rate, and speech naturalness. The method enabled a systematic replication of Ingham et al. (2009). Results All FI conditions resulted in decreased stuttering and decreases in the number of short phonated intervals, as compared with baseline conditions, but the only FI condition that satisfied all four characteristics of normally fluent speech was chorus reading. Increases in longer phonated intervals were associated with decreased stuttering but also with poorer naturalness and/or increased speech effort. Previous findings concerning the effects of FI conditions on speech naturalness and effort were replicated. Conclusions Measuring all relevant characteristics of normally fluent speech, in the context of treatments that aim to reduce the occurrence of short-duration PIs, may aid the search for an explanation of the nature of stuttering and may also maximize treatment outcomes for adults who stutter. PMID:22365886

The effect of speaking rate on serial-order sound-level errors in normal healthy controls and persons with aphasia.

PubMed

Fossett, Tepanta R D; McNeil, Malcolm R; Pratt, Sheila R; Tompkins, Connie A; Shuster, Linda I

Although many speech errors can be generated at either a linguistic or motoric level of production, phonetically well-formed sound-level serial-order errors are generally assumed to result from disruption of phonologic encoding (PE) processes. An influential model of PE (Dell, 1986; Dell, Burger & Svec, 1997) predicts that speaking rate should affect the relative proportion of these serial-order sound errors (anticipations, perseverations, exchanges). These predictions have been extended to, and have special relevance for persons with aphasia (PWA) because of the increased frequency with which speech errors occur and because their localization within the functional linguistic architecture may help in diagnosis and treatment. Supporting evidence regarding the effect of speaking rate on phonological encoding has been provided by studies using young normal language (NL) speakers and computer simulations. Limited data exist for older NL users and no group data exist for PWA. This study tested the phonologic encoding properties of Dell's model of speech production (Dell, 1986; Dell,et al., 1997), which predicts that increasing speaking rate affects the relative proportion of serial-order sound errors (i.e., anticipations, perseverations, and exchanges). The effects of speech rate on the error ratios of anticipation/exchange (AE), anticipation/perseveration (AP) and vocal reaction time (VRT) were examined in 16 normal healthy controls (NHC) and 16 PWA without concomitant motor speech disorders. The participants were recorded performing a phonologically challenging (tongue twister) speech production task at their typical and two faster speaking rates. A significant effect of increased rate was obtained for the AP but not the AE ratio. Significant effects of group and rate were obtained for VRT. Although the significant effect of rate for the AP ratio provided evidence that changes in speaking rate did affect PE, the results failed to support the model derived predictions regarding the direction of change for error type proportions. The current findings argued for an alternative concept of the role of activation and decay in influencing types of serial-order sound errors. Rather than a slow activation decay rate (Dell, 1986), the results of the current study were more compatible with an alternative explanation of rapid activation decay or slow build-up of residual activation.
The Sound of Dominance: Vocal Precursors of Perceived Dominance during Interpersonal Influence.

ERIC Educational Resources Information Center

Tusing, Kyle James; Dillard, James Price

2000-01-01

Determines the effects of vocal cues on judgments of dominance in an interpersonal influence context. Indicates that mean amplitude and amplitude standard deviation were positively associated with dominance judgments, whereas speech rate was negatively associated with dominance judgments. Finds that mean fundamental frequency was positively…
Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

PubMed Central

Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

2015-01-01

Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
Auditory detection of non-speech and speech stimuli in noise: Effects of listeners' native language background.

PubMed

Liu, Chang; Jin, Su-Hyun

2015-11-01

This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Distinct ERP Signatures of Word Frequency, Phrase Frequency, and Prototypicality in Speech Production

ERIC Educational Resources Information Center

Hendrix, Peter; Bolger, Patrick; Baayen, Harald

2017-01-01

Recent studies have documented frequency effects for word n-grams, independently of word unigram frequency. Further studies have revealed constructional prototype effects, both at the word level as well as for phrases. The present speech production study investigates the time course of these effects for the production of prepositional phrases in…
Cortical oscillations and entrainment in speech processing during working memory load.

PubMed

Hjortkjaer, Jens; Märcher-Rørsted, Jonatan; Fuglsang, Søren A; Dau, Torsten

2018-02-02

Neuronal oscillations are thought to play an important role in working memory (WM) and speech processing. Listening to speech in real-life situations is often cognitively demanding but it is unknown whether WM load influences how auditory cortical activity synchronizes to speech features. Here, we developed an auditory n-back paradigm to investigate cortical entrainment to speech envelope fluctuations under different degrees of WM load. We measured the electroencephalogram, pupil dilations and behavioural performance from 22 subjects listening to continuous speech with an embedded n-back task. The speech stimuli consisted of long spoken number sequences created to match natural speech in terms of sentence intonation, syllabic rate and phonetic content. To burden different WM functions during speech processing, listeners performed an n-back task on the speech sequences in different levels of background noise. Increasing WM load at higher n-back levels was associated with a decrease in posterior alpha power as well as increased pupil dilations. Frontal theta power increased at the start of the trial and increased additionally with higher n-back level. The observed alpha-theta power changes are consistent with visual n-back paradigms suggesting general oscillatory correlates of WM processing load. Speech entrainment was measured as a linear mapping between the envelope of the speech signal and low-frequency cortical activity (< 13 Hz). We found that increases in both types of WM load (background noise and n-back level) decreased cortical speech envelope entrainment. Although entrainment persisted under high load, our results suggest a top-down influence of WM processing on cortical speech entrainment. © 2018 The Authors. European Journal of Neuroscience published by Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Selective spatial attention modulates bottom-up informational masking of speech

PubMed Central

Carlile, Simon; Corkhill, Caitlin

2015-01-01

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention. PMID:25727100
Selective spatial attention modulates bottom-up informational masking of speech.

PubMed

Carlile, Simon; Corkhill, Caitlin

2015-03-02

To hear out a conversation against other talkers listeners overcome energetic and informational masking. Largely attributed to top-down processes, information masking has also been demonstrated using unintelligible speech and amplitude-modulated maskers suggesting bottom-up processes. We examined the role of speech-like amplitude modulations in information masking using a spatial masking release paradigm. Separating a target talker from two masker talkers produced a 20 dB improvement in speech reception threshold; 40% of which was attributed to a release from informational masking. When across frequency temporal modulations in the masker talkers are decorrelated the speech is unintelligible, although the within frequency modulation characteristics remains identical. Used as a masker as above, the information masking accounted for 37% of the spatial unmasking seen with this masker. This unintelligible and highly differentiable masker is unlikely to involve top-down processes. These data provides strong evidence of bottom-up masking involving speech-like, within-frequency modulations and that this, presumably low level process, can be modulated by selective spatial attention.
Tone classification of syllable-segmented Thai speech based on multilayer perception

NASA Astrophysics Data System (ADS)

Satravaha, Nuttavudh; Klinkhachorn, Powsiri; Lass, Norman

2002-05-01

Thai is a monosyllabic tonal language that uses tone to convey lexical information about the meaning of a syllable. Thus to completely recognize a spoken Thai syllable, a speech recognition system not only has to recognize a base syllable but also must correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. Thai has five distinctive tones (``mid,'' ``low,'' ``falling,'' ``high,'' and ``rising'') and each tone is represented by a single fundamental frequency (F0) pattern. However, several factors, including tonal coarticulation, stress, intonation, and speaker variability, affect the F0 pattern of a syllable in continuous Thai speech. In this study, an efficient method for tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress, and intonation, as well as a method to perform automatic syllable segmentation, were developed. Acoustic parameters were used as the main discriminating parameters. The F0 contour of a segmented syllable was normalized by using a z-score transformation before being presented to a tone classifier. The proposed system was evaluated on 920 test utterances spoken by 8 speakers. A recognition rate of 91.36% was achieved by the proposed system.
Strength of German accent under altered auditory feedback

PubMed Central

HOWELL, PETER; DWORZYNSKI, KATHARINA

2007-01-01

Borden’s (1979, 1980) hypothesis that speakers with vulnerable speech systems rely more heavily on feedback monitoring than do speakers with less vulnerable systems was investigated. The second language (L2) of a speaker is vulnerable, in comparison with the native language, so alteration to feedback should have a detrimental effect on it, according to this hypothesis. Here, we specifically examined whether altered auditory feedback has an effect on accent strength when speakers speak L2. There were three stages in the experiment. First, 6 German speakers who were fluent in English (their L2) were recorded under six conditions—normal listening, amplified voice level, voice shifted in frequency, delayed auditory feedback, and slowed and accelerated speech rate conditions. Second, judges were trained to rate accent strength. Training was assessed by whether it was successful in separating German speakers speaking English from native English speakers, also speaking English. In the final stage, the judges ranked recordings of each speaker from the first stage as to increasing strength of German accent. The results show that accents were more pronounced under frequency-shifted and delayed auditory feedback conditions than under normal or amplified feedback conditions. Control tests were done to ensure that listeners were judging accent, rather than fluency changes caused by altered auditory feedback. The findings are discussed in terms of Borden’s hypothesis and other accounts about why altered auditory feedback disrupts speech control. PMID:11414137
Enhancement of temporal periodicity cues in cochlear implants: Effects on prosodic perception and vowel identification

NASA Astrophysics Data System (ADS)

Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier

2005-07-01

Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
Top-down Processes in Simulated Electric-Acoustic Hearing: The Effect of Linguistic Context on Bimodal Benefit for Temporally Interrupted Speech

PubMed Central

Oh, Soo Hee; Donaldson, Gail S.; Kong, Ying-Yee

2016-01-01

Objectives Previous studies have documented the benefits of bimodal hearing as compared with a CI alone, but most have focused on the importance of bottom-up, low-frequency cues. The purpose of the present study was to evaluate the role of top-down processing in bimodal hearing by measuring the effect of sentence context on bimodal benefit for temporally interrupted sentences. It was hypothesized that low-frequency acoustic cues would facilitate the use of contextual information in the interrupted sentences, resulting in greater bimodal benefit for the higher context (CUNY) sentences than for the lower context (IEEE) sentences. Design Young normal-hearing listeners were tested in simulated bimodal listening conditions in which noise band vocoded sentences were presented to one ear with or without low-pass (LP) filtered speech or LP harmonic complexes (LPHCs) presented to the contralateral ear. Speech recognition scores were measured in three listening conditions: vocoder-alone, vocoder combined with LP speech, and vocoder combined with LPHCs. Temporally interrupted versions of the CUNY and IEEE sentences were used to assess listeners’ ability to fill in missing segments of speech by using top-down linguistic processing. Sentences were square-wave gated at a rate of 5 Hz with a 50 percent duty cycle. Three vocoder channel conditions were tested for each type of sentence (8, 12, and 16 channels for CUNY; 12, 16, and 32 channels for IEEE) and bimodal benefit was compared for similar amounts of spectral degradation (matched-channel comparisons) and similar ranges of baseline performance. Two gain measures, percentage-point gain and normalized gain, were examined. Results Significant effects of context on bimodal benefit were observed when LP speech was presented to the residual-hearing ear. For the matched-channel comparisons, CUNY sentences showed significantly higher normalized gains than IEEE sentences for both the 12-channel (20 points higher) and 16-channel (18 points higher) conditions. For the individual gain comparisons that used a similar range of baseline performance, CUNY sentences showed bimodal benefits that were significantly higher (7 percentage points, or 15 points normalized gain) than those for IEEE sentences. The bimodal benefits observed here for temporally interrupted speech were considerably smaller than those observed in an earlier study that used continuous speech (Kong et al., 2015). Further, unlike previous findings for continuous speech, no bimodal benefit was observed when LPHCs were presented to the LP ear. Conclusions Findings indicate that linguistic context has a significant influence on bimodal benefit for temporally interrupted speech and support the hypothesis that low-frequency acoustic information presented to the residual-hearing ear facilitates the use of top-down linguistic processing in bimodal hearing. However, bimodal benefit is reduced for temporally interrupted speech as compared to continuous speech, suggesting that listeners’ ability to restore missing speech information depends not only on top-down linguistic knowledge, but also on the quality of the bottom-up sensory input. PMID:27007220
Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals.

PubMed

Polur, Prasad D; Miller, Gerald E

2006-10-01

Computer speech recognition of individuals with dysarthria, such as cerebral palsy patients requires a robust technique that can handle conditions of very high variability and limited training data. In this study, application of a 10 state ergodic hidden Markov model (HMM)/artificial neural network (ANN) hybrid structure for a dysarthric speech (isolated word) recognition system, intended to act as an assistive tool, was investigated. A small size vocabulary spoken by three cerebral palsy subjects was chosen. The effect of such a structure on the recognition rate of the system was investigated by comparing it with an ergodic hidden Markov model as a control tool. This was done in order to determine if this modified technique contributed to enhanced recognition of dysarthric speech. The speech was sampled at 11 kHz. Mel frequency cepstral coefficients were extracted from them using 15 ms frames and served as training input to the hybrid model setup. The subsequent results demonstrated that the hybrid model structure was quite robust in its ability to handle the large variability and non-conformity of dysarthric speech. The level of variability in input dysarthric speech patterns sometimes limits the reliability of the system. However, its application as a rehabilitation/control tool to assist dysarthric motor impaired individuals holds sufficient promise.
Automatic evaluation of hypernasality based on a cleft palate speech database.

PubMed

He, Ling; Zhang, Jing; Liu, Qi; Yin, Heng; Lech, Margaret; Huang, Yunzhi

2015-05-01

The hypernasality is one of the most typical characteristics of cleft palate (CP) speech. The evaluation outcome of hypernasality grading decides the necessity of follow-up surgery. Currently, the evaluation of CP speech is carried out by experienced speech therapists. However, the result strongly depends on their clinical experience and subjective judgment. This work aims to propose an automatic evaluation system for hypernasality grading in CP speech. The database tested in this work is collected by the Hospital of Stomatology, Sichuan University, which has the largest number of CP patients in China. Based on the production process of hypernasality, source sound pulse and vocal tract filter features are presented. These features include pitch, the first and second energy amplified frequency bands, cepstrum based features, MFCC, short-time energy in the sub-bands features. These features combined with KNN classier are applied to automatically classify four grades of hypernasality: normal, mild, moderate and severe. The experiment results show that the proposed system achieves a good performance. The classification rates for four hypernasality grades reach up to 80.4%. The sensitivity of proposed features to the gender is also discussed.
Use of amplitude modulation cues recovered from frequency modulation for cochlear implant users when original speech cues are severely degraded.

PubMed

Won, Jong Ho; Shim, Hyun Joon; Lorenzi, Christian; Rubinstein, Jay T

2014-06-01

Won et al. (J Acoust Soc Am 132:1113-1119, 2012) reported that cochlear implant (CI) speech processors generate amplitude-modulation (AM) cues recovered from broadband speech frequency modulation (FM) and that CI users can use these cues for speech identification in quiet. The present study was designed to extend this finding for a wide range of listening conditions, where the original speech cues were severely degraded by manipulating either the acoustic signals or the speech processor. The manipulation of the acoustic signals included the presentation of background noise, simulation of reverberation, and amplitude compression. The manipulation of the speech processor included changing the input dynamic range and the number of channels. For each of these conditions, multiple levels of speech degradation were tested. Speech identification was measured for CI users and compared for stimuli having both AM and FM information (intact condition) or FM information only (FM condition). Each manipulation degraded speech identification performance for both intact and FM conditions. Performance for the intact and FM conditions became similar for stimuli having the most severe degradations. Identification performance generally overlapped for the intact and FM conditions. Moreover, identification performance for the FM condition was better than chance performance even at the maximum level of distortion. Finally, significant correlations were found between speech identification scores for the intact and FM conditions. Altogether, these results suggest that despite poor frequency selectivity, CI users can make efficient use of AM cues recovered from speech FM in difficult listening situations.
Speech transformation system (spectrum and/or excitation) without pitch extraction

NASA Astrophysics Data System (ADS)

Seneff, S.

1980-07-01

A speech analysis synthesis system was developed which is capable of independent manipulation of the fundamental frequency and spectral envelope of a speech waveform. The system deconvolved the original speech with the spectral envelope estimate to obtain a model for the excitation, explicit pitch extraction was not required and as a consequence, the transformed speech was more natural sounding than would be the case if the excitation were modeled as a sequence of pulses. It is shown that the system has applications in the areas of voice modifications, baseband excited vocoders, time scale modifications, and frequency compression as an aid to the partially deaf.
Perceptual sensitivity to spectral properties of earlier sounds during speech categorization.

PubMed

Stilp, Christian E; Assgari, Ashley A

2018-02-28

Speech perception is heavily influenced by surrounding sounds. When spectral properties differ between earlier (context) and later (target) sounds, this can produce spectral contrast effects (SCEs) that bias perception of later sounds. For example, when context sounds have more energy in low-F 1 frequency regions, listeners report more high-F 1 responses to a target vowel, and vice versa. SCEs have been reported using various approaches for a wide range of stimuli, but most often, large spectral peaks were added to the context to bias speech categorization. This obscures the lower limit of perceptual sensitivity to spectral properties of earlier sounds, i.e., when SCEs begin to bias speech categorization. Listeners categorized vowels (/ɪ/-/ɛ/, Experiment 1) or consonants (/d/-/g/, Experiment 2) following a context sentence with little spectral amplification (+1 to +4 dB) in frequency regions known to produce SCEs. In both experiments, +3 and +4 dB amplification in key frequency regions of the context produced SCEs, but lesser amplification was insufficient to bias performance. This establishes a lower limit of perceptual sensitivity where spectral differences across sounds can bias subsequent speech categorization. These results are consistent with proposed adaptation-based mechanisms that potentially underlie SCEs in auditory perception. Recent sounds can change what speech sounds we hear later. This can occur when the average frequency composition of earlier sounds differs from that of later sounds, biasing how they are perceived. These "spectral contrast effects" are widely observed when sounds' frequency compositions differ substantially. We reveal the lower limit of these effects, as +3 dB amplification of key frequency regions in earlier sounds was enough to bias categorization of the following vowel or consonant sound. Speech categorization being biased by very small spectral differences across sounds suggests that spectral contrast effects occur frequently in everyday speech perception.
The spatial unmasking of speech: evidence for within-channel processing of interaural time delay.

PubMed

Edmonds, Barrie A; Culling, John F

2005-05-01

Across-frequency processing by common interaural time delay (ITD) in spatial unmasking was investigated by measuring speech reception thresholds (SRTs) for high- and low-frequency bands of target speech presented against concurrent speech or a noise masker. Experiment 1 indicated that presenting one of these target bands with an ITD of +500 micros and the other with zero ITD (like the masker) provided some release from masking, but full binaural advantage was only measured when both target bands were given an ITD of + 500 micros. Experiment 2 showed that full binaural advantage could also be achieved when the high- and low-frequency bands were presented with ITDs of equal but opposite magnitude (+/- 500 micros). In experiment 3, the masker was also split into high- and low-frequency bands with ITDs of equal but opposite magnitude (+/-500 micros). The ITD of the low-frequency target band matched that of the high-frequency masking band and vice versa. SRTs indicated that, as long as the target and masker differed in ITD within each frequency band, full binaural advantage could be achieved. These results suggest that the mechanism underlying spatial unmasking exploits differences in ITD independently within each frequency channel.
Nouns slow down speech across structurally and culturally diverse languages

PubMed Central

Danielsen, Swintha; Hartmann, Iren; Pakendorf, Brigitte; Witzlack-Makarevich, Alena; de Jong, Nivja H.

2018-01-01

By force of nature, every bit of spoken language is produced at a particular speed. However, this speed is not constant—speakers regularly speed up and slow down. Variation in speech rate is influenced by a complex combination of factors, including the frequency and predictability of words, their information status, and their position within an utterance. Here, we use speech rate as an index of word-planning effort and focus on the time window during which speakers prepare the production of words from the two major lexical classes, nouns and verbs. We show that, when naturalistic speech is sampled from languages all over the world, there is a robust cross-linguistic tendency for slower speech before nouns compared with verbs, both in terms of slower articulation and more pauses. We attribute this slowdown effect to the increased amount of planning that nouns require compared with verbs. Unlike verbs, nouns can typically only be used when they represent new or unexpected information; otherwise, they have to be replaced by pronouns or be omitted. These conditions on noun use appear to outweigh potential advantages stemming from differences in internal complexity between nouns and verbs. Our findings suggest that, beneath the staggering diversity of grammatical structures and cultural settings, there are robust universals of language processing that are intimately tied to how speakers manage referential information when they communicate with one another. PMID:29760059
Integrating speech in time depends on temporal expectancies and attention.

PubMed

Scharinger, Mathias; Steinberg, Johanna; Tavano, Alessandro

2017-08-01

Sensory information that unfolds in time, such as in speech perception, relies on efficient chunking mechanisms in order to yield optimally-sized units for further processing. Whether or not two successive acoustic events receive a one-unit or a two-unit interpretation seems to depend on the fit between their temporal extent and a stipulated temporal window of integration. However, there is ongoing debate on how flexible this temporal window of integration should be, especially for the processing of speech sounds. Furthermore, there is no direct evidence of whether attention may modulate the temporal constraints on the integration window. For this reason, we here examine how different word durations, which lead to different temporal separations of sound onsets, interact with attention. In an Electroencephalography (EEG) study, participants actively and passively listened to words where word-final consonants were occasionally omitted. Words had either a natural duration or were artificially prolonged in order to increase the separation of speech sound onsets. Omission responses to incomplete speech input, originating in left temporal cortex, decreased when the critical speech sound was separated from previous sounds by more than 250 msec, i.e., when the separation was larger than the stipulated temporal window of integration (125-150 msec). Attention, on the other hand, only increased omission responses for stimuli with natural durations. We complemented the event-related potential (ERP) analyses by a frequency-domain analysis on the stimulus presentation rate. Notably, the power of stimulation frequency showed the same duration and attention effects than the omission responses. We interpret these findings on the background of existing research on temporal integration windows and further suggest that our findings may be accounted for within the framework of predictive coding. Copyright © 2017 Elsevier Ltd. All rights reserved.

Emotion Recognition from Chinese Speech for Smart Affective Services Using a Combination of SVM and DBN

PubMed Central

Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai

2017-01-01

Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design

PubMed Central

Wang, DeLiang

2008-01-01

A new approach to the separation of speech from speech-in-noise mixtures is the use of time-frequency (T-F) masking. Originated in the field of computational auditory scene analysis, T-F masking performs separation in the time-frequency domain. This article introduces the T-F masking concept and reviews T-F masking algorithms that separate target speech from either monaural or binaural mixtures, as well as microphone-array recordings. The review emphasizes techniques that are promising for hearing aid design. This article also surveys recent studies that evaluate the perceptual effects of T-F masking techniques, particularly their effectiveness in improving human speech recognition in noise. An assessment is made of the potential benefits of T-F masking methods for the hearing impaired in light of the processing constraints of hearing aids. Finally, several issues pertinent to T-F masking are discussed. PMID:18974204
Acoustic voice analysis of prelingually deaf adults before and after cochlear implantation.

PubMed

Evans, Maegan K; Deliyski, Dimitar D

2007-11-01

It is widely accepted that many severe to profoundly deaf adults have benefited from cochlear implants (CIs). However, limited research has been conducted to investigate changes in voice and speech of prelingually deaf adults who receive CIs, a population well known for presenting with a variety of voice and speech abnormalities. The purpose of this study was to use acoustic analysis to explore changes in voice and speech for three prelingually deaf males pre- and postimplantation over 6 months. The following measurements, some measured in varying contexts, were obtained: fundamental frequency (F0), jitter, shimmer, noise-to-harmonic ratio, voice turbulence index, soft phonation index, amplitude- and F0-variation, F0-range, speech rate, nasalance, and vowel production. Characteristics of vowel production were measured by determining the first formant (F1) and second formant (F2) of vowels in various contexts, magnitude of F2-variation, and rate of F2-variation. Perceptual measurements of pitch, pitch variability, loudness variability, speech rate, and intonation were obtained for comparison. Results are reported using descriptive statistics. The results showed patterns of change for some of the parameters while there was considerable variation across the subjects. All participants demonstrated a decrease in F0 in at least one context and demonstrated a change in nasalance toward the norm as compared to their normal hearing control. The two participants who were oral-language communicators were judged to produce vowels with an average of 97.2% accuracy and the sign-language user demonstrated low percent accuracy for vowel production.
Population responses in primary auditory cortex simultaneously represent the temporal envelope and periodicity features in natural speech.

PubMed

Abrams, Daniel A; Nicol, Trent; White-Schwoch, Travis; Zecker, Steven; Kraus, Nina

2017-05-01

Speech perception relies on a listener's ability to simultaneously resolve multiple temporal features in the speech signal. Little is known regarding neural mechanisms that enable the simultaneous coding of concurrent temporal features in speech. Here we show that two categories of temporal features in speech, the low-frequency speech envelope and periodicity cues, are processed by distinct neural mechanisms within the same population of cortical neurons. We measured population activity in primary auditory cortex of anesthetized guinea pig in response to three variants of a naturally produced sentence. Results show that the envelope of population responses closely tracks the speech envelope, and this cortical activity more closely reflects wider bandwidths of the speech envelope compared to narrow bands. Additionally, neuronal populations represent the fundamental frequency of speech robustly with phase-locked responses. Importantly, these two temporal features of speech are simultaneously observed within neuronal ensembles in auditory cortex in response to clear, conversation, and compressed speech exemplars. Results show that auditory cortical neurons are adept at simultaneously resolving multiple temporal features in extended speech sentences using discrete coding mechanisms. Copyright © 2017 Elsevier B.V. All rights reserved.
Double Fourier analysis for Emotion Identification in Voiced Speech

NASA Astrophysics Data System (ADS)

Sierra-Sosa, D.; Bastidas, M.; Ortiz P., D.; Quintero, O. L.

2016-04-01

We propose a novel analysis alternative, based on two Fourier Transforms for emotion recognition from speech. Fourier analysis allows for display and synthesizes different signals, in terms of power spectral density distributions. A spectrogram of the voice signal is obtained performing a short time Fourier Transform with Gaussian windows, this spectrogram portraits frequency related features, such as vocal tract resonances and quasi-periodic excitations during voiced sounds. Emotions induce such characteristics in speech, which become apparent in spectrogram time-frequency distributions. Later, the signal time-frequency representation from spectrogram is considered an image, and processed through a 2-dimensional Fourier Transform in order to perform the spatial Fourier analysis from it. Finally features related with emotions in voiced speech are extracted and presented.
Planning and Articulation in Incremental Word Production: Syllable-Frequency Effects in English

ERIC Educational Resources Information Center

Cholin, Joana; Dell, Gary S.; Levelt, Willem J. M.

2011-01-01

We investigated the role of syllables during speech planning in English by measuring syllable-frequency effects. So far, syllable-frequency effects in English have not been reported. English has poorly defined syllable boundaries, and thus the syllable might not function as a prominent unit in English speech production. Speakers produced either…
Gender classification in children based on speech characteristics: using fundamental and formant frequencies of Malay vowels.

PubMed

Zourmand, Alireza; Ting, Hua-Nong; Mirhassani, Seyed Mostafa

2013-03-01

Speech is one of the prevalent communication mediums for humans. Identifying the gender of a child speaker based on his/her speech is crucial in telecommunication and speech therapy. This article investigates the use of fundamental and formant frequencies from sustained vowel phonation to distinguish the gender of Malay children aged between 7 and 12 years. The Euclidean minimum distance and multilayer perceptron were used to classify the gender of 360 Malay children based on different combinations of fundamental and formant frequencies (F0, F1, F2, and F3). The Euclidean minimum distance with normalized frequency data achieved a classification accuracy of 79.44%, which was higher than that of the nonnormalized frequency data. Age-dependent modeling was used to improve the accuracy of gender classification. The Euclidean distance method obtained 84.17% based on the optimal classification accuracy for all age groups. The accuracy was further increased to 99.81% using multilayer perceptron based on mel-frequency cepstral coefficients. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Analysis of error type and frequency in apraxia of speech among Portuguese speakers.

PubMed

Cera, Maysa Luchesi; Minett, Thaís Soares Cianciarullo; Ortiz, Karin Zazo

2010-01-01

Most studies characterizing errors in the speech of patients with apraxia involve English language. To analyze the types and frequency of errors produced by patients with apraxia of speech whose mother tongue was Brazilian Portuguese. 20 adults with apraxia of speech caused by stroke were assessed. The types of error committed by patients were analyzed both quantitatively and qualitatively, and frequencies compared. We observed the presence of substitution, omission, trial-and-error, repetition, self-correction, anticipation, addition, reiteration and metathesis, in descending order of frequency, respectively. Omission type errors were one of the most commonly occurring whereas addition errors were infrequent. These findings differed to those reported in English speaking patients, probably owing to differences in the methodologies used for classifying error types; the inclusion of speakers with apraxia secondary to aphasia; and the difference in the structure of Portuguese language to English in terms of syllable onset complexity and effect on motor control. The frequency of omission and addition errors observed differed to the frequency reported for speakers of English.
The use of ultrasound in the study of articulatory properties of vowels in clear speech.

PubMed

Song, Jae Yung

2017-01-01

Although the acoustic properties of clear speech have been extensively studied, its underlying articulatory details have not been well understood. The purpose of the present study is twofold: To examine the specific articulatory processes of clear speech using ultrasound and to investigate whether and how the type of listener (hard of hearing, normal hearing) and the lexical property of words (frequency) interact in the production of clear speech. To this end, we examined productions of /ɑ/, /æ/ and /u/ from 16 speakers of US English. Overall, our ultrasound results suggested that the tongue's highest point moved in a direction that exaggerated the three vowels' phonological features, resulting in an expanded articulatory vowel space for the hard-of-hearing listener and low-frequency words. No interaction was found between the listener and word frequency, suggesting that the effects of word frequency hold constant across the two types of listeners.
The effects of complementary and alternative medicine on the speech of patients with depression

NASA Astrophysics Data System (ADS)

Fraas, Michael; Solloway, Michele

2004-05-01

It is well documented that patients suffering from depression exhibit articulatory timing deficits and speech that is monotonous and lacking pitch variation. Traditional remediation of depression has left many patients with adverse side effects and ineffective outcomes. Recent studies indicate that many Americans are seeking complementary and alternative forms of medicine to supplement traditional therapy approaches. The current investigation wishes to determine the efficacy of complementary and alternative medicine (CAM) on the remediation of speech deficits associated with depression. Subjects with depression and normal controls will participate in an 8-week treatment session using polarity therapy, a form of CAM. Subjects will be recorded producing a series of spontaneous and narrative speech samples. Acoustic analysis of mean fundamental frequency (F0), variation in F0 (standard deviation of F0), average rate of F0 change, and pause and utterance durations will be conducted. Differences pre- and post-CAM therapy between subjects with depression and normal controls will be discussed.
The impact of rate reduction and increased vocal intensity on coarticulation in dysarthria

NASA Astrophysics Data System (ADS)

Tjaden, Kris

2003-04-01

The dysarthrias are a group of speech disorders resulting from impairment to nervous system structures important for the motor execution of speech. Although numerous studies have examined how dysarthria impacts articulatory movements or changes in vocal tract shape, few studies of dysarthria consider that articulatory events and their acoustic consequences overlap or are coarticulated in connected speech. The impact of rate, loudness, and clarity on coarticulatory patterns in dysarthria also are poorly understood, although these prosodic manipulations frequently are employed as therapy strategies to improve intelligibility in dysarthria and also are known to affect coarticulatory patterns for at least some neurologically healthy speakers. The current study examined the effects of slowed rate and increased vocal intensity on anticipatory coarticulation for speakers with dysarthria secondary to Multiple Sclerosis (MS), as inferred from the acoustic signal. Healthy speakers were studied for comparison purposes. Three repetitions of twelve target words embedded in the carrier phrase ``It's a -- again'' were produced in habitual, loud, and slow speaking conditions. F2 frequencies and first moment coefficients were used to infer coarticulation. Both group and individual speaker trends will be examined in the data analyses.
Long-term results of hearing preservation cochlear implant surgery in patients with residual low frequency hearing.

PubMed

Moteki, Hideaki; Nishio, Shin-Ya; Miyagawa, Maiko; Tsukada, Keita; Iwasaki, Satoshi; Usami, Shin-Ichi

2017-05-01

Differences were found between patients with stable hearing and those with progressive hearing loss in the lower frequencies with respect to the rate of progression in the contralateral ear. It is suggested that the electric acoustic stimulation (EAS) can provide improvement in hearing ability over the long-term if residual hearing might be lost to some extent. To evaluate the long-term threshold changes in the low frequency hearing of the implanted ear as compared with the non-implanted ear, and the hearing abilities with EAS along with the extent of residual hearing. Seventeen individuals were enrolled and received the EAS implant with a 24-mm FLEXeas electrode array. Hearing thresholds and speech perception were measured pre- and post-operatively for 1-5 years. Post-operative hearing preservation (HP) rates were calculated using the preservation numerical scale. The average linear regression coefficient for the decline in hearing preservation score was -6.9 for the implanted ear and the patients were subsequently categorized into two groups: those with better than average, stable hearing; and those with worse than average, progressive hearing loss. EAS showed better results than electric stimulation alone, in spite of an absence of speech perception with acoustic stimulation.
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus)

PubMed Central

Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.

2017-01-01

The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Experience with speech sounds is not necessary for cue trading by budgerigars (Melopsittacus undulatus).

PubMed

Flaherty, Mary; Dent, Micheal L; Sawusch, James R

2017-01-01

The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Gender and vocal production mode discrimination using the high frequencies for speech and singing

PubMed Central

Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.

2014-01-01

Humans routinely produce acoustical energy at frequencies above 6 kHz during vocalization, but this frequency range is often not represented in communication devices and speech perception research. Recent advancements toward high-definition (HD) voice and extended bandwidth hearing aids have increased the interest in the high frequencies. The potential perceptual information provided by high-frequency energy (HFE) is not well characterized. We found that humans can accomplish tasks of gender discrimination and vocal production mode discrimination (speech vs. singing) when presented with acoustic stimuli containing only HFE at both amplified and normal levels. Performance in these tasks was robust in the presence of low-frequency masking noise. No substantial learning effect was observed. Listeners also were able to identify the sung and spoken text (excerpts from “The Star-Spangled Banner”) with very few exposures. These results add to the increasing evidence that the high frequencies provide at least redundant information about the vocal signal, suggesting that its representation in communication devices (e.g., cell phones, hearing aids, and cochlear implants) and speech/voice synthesizers could improve these devices and benefit normal-hearing and hearing-impaired listeners. PMID:25400613
Combined Electric and Acoustic Stimulation With Hearing Preservation: Effect of Cochlear Implant Low-Frequency Cutoff on Speech Understanding and Perceived Listening Difficulty.

PubMed

Gifford, René H; Davis, Timothy J; Sunderhaus, Linsey W; Menapace, Christine; Buck, Barbara; Crosson, Jillian; O'Neill, Lori; Beiter, Anne; Segel, Phil

The primary objective of this study was to assess the effect of electric and acoustic overlap for speech understanding in typical listening conditions using semidiffuse noise. This study used a within-subjects, repeated measures design including 11 experienced adult implant recipients (13 ears) with functional residual hearing in the implanted and nonimplanted ear. The aided acoustic bandwidth was fixed and the low-frequency cutoff for the cochlear implant (CI) was varied systematically. Assessments were completed in the R-SPACE sound-simulation system which includes a semidiffuse restaurant noise originating from eight loudspeakers placed circumferentially about the subject's head. AzBio sentences were presented at 67 dBA with signal to noise ratio varying between +10 and 0 dB determined individually to yield approximately 50 to 60% correct for the CI-alone condition with full CI bandwidth. Listening conditions for all subjects included CI alone, bimodal (CI + contralateral hearing aid), and bilateral-aided electric and acoustic stimulation (EAS; CI + bilateral hearing aid). Low-frequency cutoffs both below and above the original "clinical software recommendation" frequency were tested for all patients, in all conditions. Subjects estimated listening difficulty for all conditions using listener ratings based on a visual analog scale. Three primary findings were that (1) there was statistically significant benefit of preserved acoustic hearing in the implanted ear for most overlap conditions, (2) the default clinical software recommendation rarely yielded the highest level of speech recognition (1 of 13 ears), and (3) greater EAS overlap than that provided by the clinical recommendation yielded significant improvements in speech understanding. For standard-electrode CI recipients with preserved hearing, spectral overlap of acoustic and electric stimuli yielded significantly better speech understanding and less listening effort in a laboratory-based, restaurant-noise simulation. In conclusion, EAS patients may derive more benefit from greater acoustic and electric overlap than given in current software fitting recommendations, which are based solely on audiometric threshold. These data have larger scientific implications, as previous studies may not have assessed outcomes with optimized EAS parameters, thereby underestimating the benefit afforded by hearing preservation.
[On the improvement of discrimination of vibration stimuli by training in late aquired deafness (author's transl)].

PubMed

Schultz-Coulon, H J; Borghorst, U

1982-03-01

Acoustic signals of low frequencies can be percepted by the tactile sense as vibrations to a limited extent. In educating deaf children one takes trouble to combine tactile and visual speech perception in order to improve speech discrimination. The question of this study was, whether an improvement of tactile discrimination can be achieved even by patients with late aquired deafness. In a 45 years old female patient who had become deaf after adolescence the tactile discrimination of instrumental sounds (electric organ) within the frequency area c3-c4 (131-262 cps) as well as of speech sounds (30 mono- and multisyllabic words) was trained by means of the SIEMENS-Fonator. After two training courses of ten hours each (à 45 min) the patient was not only able to recognize pitch differences of two half steps and more as well as the tones of the scale with few errors only, but could also identify the words to a high percentage; in monosyllables she reached an identification rate of 75.6% and in words with 3 syllables of 85%. Additionally, a marked improvement of speech discrimination by lip reading was observed when using the Fonator. Accordingly, even in patients with late aquired deafness it appears to be worthwhile to train the tactile discrimination of vibration stimuli as to support lip reading.
Duration, Pitch, and Loudness in Kunqu Opera Stage Speech.

PubMed

Han, Qichao; Sundberg, Johan

2017-03-01

Kunqu is a special type of opera within the Chinese tradition with 600 years of history. In it, stage speech is used for the spoken dialogue. It is performed in Ming Dynasty's mandarin language and is a much more dominant part of the play than singing. Stage speech deviates considerably from normal conversational speech with respect to duration, loudness and pitch. This paper compares these properties in stage speech conversational speech. A famous, highly experienced female singer's performed stage speech and reading of the same lyrics in a conversational speech mode. Clear differences are found. As compared with conversational speech, stage speech had longer word and sentence duration and word duration was less variable. Average sound level was 16 dB higher. Also mean fundamental frequency was considerably higher and more varied. Within sentences, both loudness and fundamental frequency tended to vary according to a low-high-low pattern. Some of the findings fail to support current opinions regarding the characteristics of stage speech, and in this sense the study demonstrates the relevance of objective measurements in descriptions of vocal styles. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
An Analysis of Individual Differences in Recognizing Monosyllabic Words Under the Speech Intelligibility Index Framework

PubMed Central

Shen, Yi; Kern, Allison B.

2018-01-01

Individual differences in the recognition of monosyllabic words, either in isolation (NU6 test) or in sentence context (SPIN test), were investigated under the theoretical framework of the speech intelligibility index (SII). An adaptive psychophysical procedure, namely the quick-band-importance-function procedure, was developed to enable the fitting of the SII model to individual listeners. Using this procedure, the band importance function (i.e., the relative weights of speech information across the spectrum) and the link function relating the SII to recognition scores can be simultaneously estimated while requiring only 200 to 300 trials of testing. Octave-frequency band importance functions and link functions were estimated separately for NU6 and SPIN materials from 30 normal-hearing listeners who were naïve to speech recognition experiments. For each type of speech material, considerable individual differences in the spectral weights were observed in some but not all frequency regions. At frequencies where the greatest intersubject variability was found, the spectral weights were correlated between the two speech materials, suggesting that the variability in spectral weights reflected listener-originated factors. PMID:29532711
The Levels of Speech Usage Rating Scale: Comparison of Client Self-Ratings with Speech Pathologist Ratings

ERIC Educational Resources Information Center

Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

2012-01-01

Background: The term "speech usage" refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a…

Reliability, stability, and sensitivity to change and impairment in acoustic measures of timing and frequency.

PubMed

Vogel, Adam P; Fletcher, Janet; Snyder, Peter J; Fredrickson, Amy; Maruff, Paul

2011-03-01

Assessment of the voice for supporting classifications of central nervous system (CNS) impairment requires a different practical, methodological, and statistical framework compared with assessment of the voice to guide decisions about change in the CNS. In experimental terms, an understanding of the stability and sensitivity to change of an assessment protocol is required to guide decisions about CNS change. Five experiments (N = 70) were conducted using a set of commonly used stimuli (eg, sustained vowel, reading, extemporaneous speech) and easily acquired measures (eg, f₀-f₄, percent pause). Stability of these measures was examined through their repeated application in healthy adults over brief and intermediate retest intervals (ie, 30 seconds, 2 hours, and 1 week). Those measures found to be stable were then challenged using an experimental model that reliably changes voice acoustic properties (ie, the Lombard effect). Finally, adults with an established CNS-related motor speech disorder (dysarthria) were compared with healthy controls. Of the 61 acoustic variables studied, 36 showed good stability over all three stability experiments (eg, number of pauses, total speech time, speech rate, f₀-f₄. Of the measures with good stability, a number of frequency measures showed a change in response to increased vocal effort resulting from the Lombard effect challenge. Furthermore, several timing measures significantly separated the control and motor speech impairment groups. Measures with high levels of stability within healthy adults, and those that show sensitivity to change and impairment may prove effective for monitoring changes in CNS functioning. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Frequency-Shift Hearing Aid

NASA Technical Reports Server (NTRS)

Weinstein, Leonard M.

1994-01-01

Proposed hearing aid maps spectrum of speech into band of lower frequencies at which ear remains sensitive. By redirecting normal speech frequencies into frequency band from 100 to 1,500 Hz, hearing aid allows people to understand normal conversation, including telephone calls. Principle operation of hearing aid adapted to other uses such as, clearing up noisy telephone or radio communication. In addition, loud-speakers more easily understood in presence of high background noise.
A novel speech-processing strategy incorporating tonal information for cochlear implants.

PubMed

Lan, N; Nie, K B; Gao, S K; Zeng, F G

2004-05-01

Good performance in cochlear implant users depends in large part on the ability of a speech processor to effectively decompose speech signals into multiple channels of narrow-band electrical pulses for stimulation of the auditory nerve. Speech processors that extract only envelopes of the narrow-band signals (e.g., the continuous interleaved sampling (CIS) processor) may not provide sufficient information to encode the tonal cues in languages such as Chinese. To improve the performance in cochlear implant users who speak tonal language, we proposed and developed a novel speech-processing strategy, which extracted both the envelopes of the narrow-band signals and the fundamental frequency (F0) of the speech signal, and used them to modulate both the amplitude and the frequency of the electrical pulses delivered to stimulation electrodes. We developed an algorithm to extract the fundatmental frequency and identified the general patterns of pitch variations of four typical tones in Chinese speech. The effectiveness of the extraction algorithm was verified with an artificial neural network that recognized the tonal patterns from the extracted F0 information. We then compared the novel strategy with the envelope-extraction CIS strategy in human subjects with normal hearing. The novel strategy produced significant improvement in perception of Chinese tones, phrases, and sentences. This novel processor with dynamic modulation of both frequency and amplitude is encouraging for the design of a cochlear implant device for sensorineurally deaf patients who speak tonal languages.
EFFECT OF DELAYED AUDITORY FEEDBACK, SPEECH RATE, AND SEX ON SPEECH PRODUCTION.

PubMed

Stuart, Andrew; Kalinowski, Joseph

2015-06-01

Perturbations in Delayed Auditory Feedback (DAF) and speech rate were examined as sources of disruptions in speech between men and women. Fluent adult men (n = 16) and women (n = 16) spoke at a normal and an imposed fast rate of speech with 0, 25, 50, 100, and 200 msec. DAF. The syllable rate significantly increased when participants were instructed to speak at a fast rate, and the syllable rate decreased with increasing DAF delays. Men's speech rate was significantly faster during the fast speech rate condition with a 200 msec. DAF. Disfluencies increased with increasing DAF delay. Significantly more disfluency occurred at delays of 25 and 50 msec. at the fast rate condition, while more disfluency occurred at 100 and 200 msec. in normal rate conditions. Men and women did not display differences in the number of disfluencies. These findings demonstrate sex differences in susceptibility to perturbations in DAF and speech rate suggesting feedforward/feedback subsystems that monitor vocalizations may be different between sexes.
Acoustic Analysis of Speech of Cochlear Implantees and Its Implications

PubMed Central

Patadia, Rajesh; Govale, Prajakta; Rangasayee, R.; Kirtane, Milind

2012-01-01

Objectives Cochlear implantees have improved speech production skills compared with those using hearing aids, as reflected in their acoustic measures. When compared to normal hearing controls, implanted children had fronted vowel space and their /s/ and /∫/ noise frequencies overlapped. Acoustic analysis of speech provides an objective index of perceived differences in speech production which can be precursory in planning therapy. The objective of this study was to compare acoustic characteristics of speech in cochlear implantees with those of normal hearing age matched peers to understand implications. Methods Group 1 consisted of 15 children with prelingual bilateral severe-profound hearing loss (age, 5-11 years; implanted between 4-10 years). Prior to an implant behind the ear, hearing aids were used; prior & post implantation subjects received at least 1 year of aural intervention. Group 2 consisted of 15 normal hearing age matched peers. Sustained productions of vowels and words with selected consonants were recorded. Using Praat software for acoustic analysis, digitized speech tokens were measured for F1, F2, and F3 of vowels; centre frequency (Hz) and energy concentration (dB) in burst; voice onset time (VOT in ms) for stops; centre frequency (Hz) of noise in /s/; rise time (ms) for affricates. A t-test was used to find significant differences between groups. Results Significant differences were found in VOT for /b/, F1 and F2 of /e/, and F3 of /u/. No significant differences were found for centre frequency of burst, energy concentration for stops, centre frequency of noise in /s/, or rise time for affricates. These findings suggest that auditory feedback provided by cochlear implants enable subjects to monitor production of speech sounds. Conclusion Acoustic analysis of speech is an essential method for discerning characteristics which have or have not been improved by cochlear implantation and thus for planning intervention. PMID:22701768
A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels

PubMed Central

2014-01-01

The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583
A magnetic resonance imaging study on the articulatory and acoustic speech parameters of Malay vowels.

PubMed

Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin

2014-07-25

The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
A possible role for a paralemniscal auditory pathway in the coding of slow temporal information

PubMed Central

Abrams, Daniel A.; Nicol, Trent; Zecker, Steven; Kraus, Nina

2010-01-01

Low frequency temporal information present in speech is critical for normal perception, however the neural mechanism underlying the differentiation of slow rates in acoustic signals is not known. Data from the rat trigeminal system suggest that the paralemniscal pathway may be specifically tuned to code low-frequency temporal information. We tested whether this phenomenon occurs in the auditory system by measuring the representation of temporal rate in lemniscal and paralemniscal auditory thalamus and cortex in guinea pig. Similar to the trigeminal system, responses measured in auditory thalamus indicate that slow rates are differentially represented in a paralemniscal pathway. In cortex, both lemniscal and paralemniscal neurons indicated sensitivity to slow rates. We speculate that a paralemniscal pathway in the auditory system may be specifically tuned to code low frequency temporal information present in acoustic signals. These data suggest that somatosensory and auditory modalities have parallel sub-cortical pathways that separately process slow rates and the spatial representation of the sensory periphery. PMID:21094680
Fundamental frequency estimation of singing voice

NASA Astrophysics Data System (ADS)

de Cheveigné, Alain; Henrich, Nathalie

2002-05-01

A method of fundamental frequency (F0) estimation recently developped for speech [de Cheveigné and Kawahara, J. Acoust. Soc. Am. (to be published)] was applied to singing voice. An electroglottograph signal recorded together with the microphone provided a reference by which estimates could be validated. Using standard parameter settings as for speech, error rates were low despite the wide range of F0s (about 100 to 1600 Hz). Most ``errors'' were due to irregular vibration of the vocal folds, a sharp formant resonance that reduced the waveform to a single harmonic, or fast F0 changes such as in high-amplitude vibrato. Our database (18 singers from baritone to soprano) included examples of diphonic singing for which melody is carried by variations of the frequency of a narrow formant rather than F0. Varying a parameter (ratio of inharmonic to total power) the algorithm could be tuned to follow either frequency. Although the method has not been formally tested on a wide range of instruments, it seems appropriate for musical applications because it is accurate, accepts a wide range of F0s, and can be implemented with low latency for interactive applications. [Work supported by the Cognitique programme of the French Ministry of Research and Technology.
A speech pronunciation practice system for speech-impaired children: A study to measure its success.

PubMed

Salim, Siti Salwah; Mustafa, Mumtaz Begum Binti Peer; Asemi, Adeleh; Ahmad, Azila; Mohamed, Noraini; Ghazali, Kamila Binti

2016-09-01

The speech pronunciation practice (SPP) system enables children with speech impairments to practise and improve their speech pronunciation. However, little is known about the surrogate measures of the SPP system. This research aims to measure the success and effectiveness of the SPP system using three surrogate measures: usage (frequency of use), performance (recognition accuracy) and satisfaction (children's subjective reactions), and how these measures are aligned with the success of the SPP system, as well as to each other. We have measured the absolute change in the word error rate (WER) between the pre- and post-training, using the ANOVA test. Correlation co-efficiency (CC) analysis was conducted to test the relation between the surrogate measures, while a Structural Equation Model (SEM) was used to investigate the causal relations between the measures. The CC test results indicate a positive correlation between the surrogate measures. The SEM supports all the proposed gtheses. The ANOVA results indicate that SPP is effective in reducing the WER of impaired speech. The SPP system is an effective assistive tool, especially for high levels of severity. We found that performance is a mediator of the relation between "usage" and "satisfaction". Copyright © 2016 Elsevier Ltd. All rights reserved.
How our own speech rate influences our perception of others.

PubMed

Bosker, Hans Rutger

2017-08-01

In conversation, our own speech and that of others follow each other in rapid succession. Effects of the surrounding context on speech perception are well documented but, despite the ubiquity of the sound of our own voice, it is unknown whether our own speech also influences our perception of other talkers. This study investigated context effects induced by our own speech through 6 experiments, specifically targeting rate normalization (i.e., perceiving phonetic segments relative to surrounding speech rate). Experiment 1 revealed that hearing prerecorded fast or slow context sentences altered the perception of ambiguous vowels, replicating earlier work. Experiment 2 demonstrated that talking at a fast or slow rate prior to target presentation also altered target perception, though the effect of preceding speech rate was reduced. Experiment 3 showed that silent talking (i.e., inner speech) at fast or slow rates did not modulate the perception of others, suggesting that the effect of self-produced speech rate in Experiment 2 arose through monitoring of the external speech signal. Experiment 4 demonstrated that, when participants were played back their own (fast/slow) speech, no reduction of the effect of preceding speech rate was observed, suggesting that the additional task of speech production may be responsible for the reduced effect in Experiment 2. Finally, Experiments 5 and 6 replicate Experiments 2 and 3 with new participant samples. Taken together, these results suggest that variation in speech production may induce variation in speech perception, thus carrying implications for our understanding of spoken communication in dialogue settings. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Influence of Analogy Instruction for Pitch Variation on Perceptual Ratings of Other Speech Parameters

ERIC Educational Resources Information Center

Tse, Andy C. Y.; Wong, Andus W-K.; Ma, Estella P-M.; Whitehill, Tara L.; Masters, Rich S. W.

2013-01-01

Purpose: "Analogy" is the similarity of different concepts on which a comparison can be based. Recently, an analogy of "waves at sea" was shown to be effective in modulating fundamental frequency (F[subscript 0]) variation. Perceptions of intonation were not examined, as the primary aim of the work was to determine whether…
The analysis and detection of hypernasality based on a formant extraction algorithm

NASA Astrophysics Data System (ADS)

Qian, Jiahui; Fu, Fanglin; Liu, Xinyi; He, Ling; Yin, Heng; Zhang, Han

2017-08-01

In the clinical practice, the effective assessment of cleft palate speech disorders is important. For hypernasal speech, the resonance between nasal cavity and oral cavity causes an additional nasal formant. Thus, the formant frequency is a crucial cue for the judgment of hypernasality in cleft palate speech. Due to the existence of nasal formant, the peak merger occurs to the spectrum of nasal speech more often. However, the peak merger could not be solved by classical linear prediction coefficient root extraction method. In this paper, a method is proposed to detect the additional nasal formant in low-frequency region and obtain the formant frequency. The experiment results show that the proposed method could locate the nasal formant preferably. Moreover, the formants are regarded as the extraction features to proceed the detection of hypernasality. 436 phonemes, which are collected from Hospital of Stomatology, are used to carry out the experiment. The detection accuracy of hypernasality in cleft palate speech is 95.2%.
Perceived gender in clear and conversational speech

NASA Astrophysics Data System (ADS)

Booz, Jaime A.

Although many studies have examined acoustic and sociolinguistic differences between male and female speech, the relationship between talker speaking style and perceived gender has not yet been explored. The present study attempts to determine whether clear speech, a style adopted by talkers who perceive some barrier to effective communication, shifts perceptions of femininity for male and female talkers. Much of our understanding of gender perception in voice and speech is based on sustained vowels or single words, eliminating temporal, prosodic, and articulatory cues available in more naturalistic, connected speech. Thus, clear and conversational sentence stimuli, selected from the 41 talkers of the Ferguson Clear Speech Database (Ferguson, 2004) were presented to 17 normal-hearing listeners, aged 18 to 30. They rated the talkers' gender using a visual analog scale with "masculine" and "feminine" endpoints. This response method was chosen to account for within-category shifts of gender perception by allowing nonbinary responses. Mixed-effects regression analysis of listener responses revealed a small but significant effect of speaking style, and this effect was larger for male talkers than female talkers. Because of the high degree of talker variability observed for talker gender, acoustic analyses of these sentences were undertaken to determine the relationship between acoustic changes in clear and conversational speech and perceived femininity. Results of these analyses showed that mean fundamental frequency (fo) and f o standard deviation were significantly correlated to perceived gender for both male and female talkers, and vowel space was significantly correlated only for male talkers. Speaking rate and breathiness measures (CPPS) were not significantly related for either group. Outcomes of this study indicate that adopting a clear speaking style is correlated with increases in perceived femininity. Although the increase was small, some changes associated with making adjustments to improve speech clarity have a larger impact on perceived femininity than others. Using a clear speech strategy alone may not be sufficient for a male speaker to be perceived as female, but could be used as one of many tools to help speakers achieve more "feminine" speech, in conjunction with more specific strategies targeting the acoustic parameters outlined in this study.
Spectro-temporal characteristics of speech at high frequencies, and the potential for restoration of audibility to people with mild-to-moderate hearing loss.

PubMed

Moore, Brian C J; Stone, Michael A; Füllgrabe, Christian; Glasberg, Brian R; Puria, Sunil

2008-12-01

It is possible for auditory prostheses to provide amplification for frequencies above 6 kHz. However, most current hearing-aid fitting procedures do not give recommended gains for such high frequencies. This study was intended to provide information that could be useful in quantifying appropriate high-frequency gains, and in establishing the population of hearing-impaired people who might benefit from such amplification. The study had two parts. In the first part, wide-bandwidth recordings of normal conversational speech were obtained from a sample of male and female talkers. The recordings were used to determine the mean spectral shape over a wide frequency range, and to determine the distribution of levels (the speech dynamic range) as a function of center frequency. In the second part, audiometric thresholds were measured for frequencies of 0.125, 0.25, 0.5, 1, 2, 3, 4, 6, 8, 10, and 12.5 kHz for both ears of 31 people selected to have mild or moderate cochlear hearing loss. The hearing loss was never greater than 70 dB for any frequency up to 4 kHz. The mean spectrum level of the speech fell progressively with increasing center frequency above about 0.5 kHz. For speech with an overall level of 65 dB SPL, the mean 1/3-octave level was 49 and 37 dB SPL for center frequencies of 1 and 10 kHz, respectively. The dynamic range of the speech was similar for center frequencies of 1 and 10 kHz. The part of the dynamic range below the root-mean-square level was larger than reported in previous studies. The mean audiometric thresholds at high frequencies (10 and 12.5 kHz) were relatively high (69 and 77 dB HL, respectively), even though the mean thresholds for frequencies below 4 kHz were 41 dB HL or better. To partially restore audibility for a hearing loss of 65 dB at 10 kHz would require an effective insertion gain of about 36 dB at 10 kHz. With this gain, audibility could be (partly) restored for 25 of the 62 ears assessed.
Systematic studies of modified vocalization: the effect of speech rate on speech production measures during metronome-paced speech in persons who stutter.

PubMed

Davidow, Jason H

2014-01-01

Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech in order to determine changes that may be important for fluency during this fluency-inducing condition. Thirteen persons who stutter (PWS), aged 18-62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Vowel duration, voice onset time, pressure rise time and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30-100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. © 2013 Royal College of Speech and Language Therapists.
Cortical Tracking of Global and Local Variations of Speech Rhythm during Connected Natural Speech Perception.

PubMed

Alexandrou, Anna Maria; Saarinen, Timo; Kujala, Jan; Salmelin, Riitta

2018-06-19

During natural speech perception, listeners must track the global speaking rate, that is, the overall rate of incoming linguistic information, as well as transient, local speaking rate variations occurring within the global speaking rate. Here, we address the hypothesis that this tracking mechanism is achieved through coupling of cortical signals to the amplitude envelope of the perceived acoustic speech signals. Cortical signals were recorded with magnetoencephalography (MEG) while participants perceived spontaneously produced speech stimuli at three global speaking rates (slow, normal/habitual, and fast). Inherently to spontaneously produced speech, these stimuli also featured local variations in speaking rate. The coupling between cortical and acoustic speech signals was evaluated using audio-MEG coherence. Modulations in audio-MEG coherence spatially differentiated between tracking of global speaking rate, highlighting the temporal cortex bilaterally and the right parietal cortex, and sensitivity to local speaking rate variations, emphasizing the left parietal cortex. Cortical tuning to the temporal structure of natural connected speech thus seems to require the joint contribution of both auditory and parietal regions. These findings suggest that cortical tuning to speech rhythm operates on two functionally distinct levels: one encoding the global rhythmic structure of speech and the other associated with online, rapidly evolving temporal predictions. Thus, it may be proposed that speech perception is shaped by evolutionary tuning, a preference for certain speaking rates, and predictive tuning, associated with cortical tracking of the constantly changing rate of linguistic information in a speech stream.
Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy

NASA Astrophysics Data System (ADS)

Kayasith, Prakasith; Theeramunkong, Thanaruk

It is a tedious and subjective task to measure severity of a dysarthria by manually evaluating his/her speech using available standard assessment methods based on human perception. This paper presents an automated approach to assess speech quality of a dysarthric speaker with cerebral palsy. With the consideration of two complementary factors, speech consistency and speech distinction, a speech quality indicator called speech clarity index (Ψ) is proposed as a measure of the speaker's ability to produce consistent speech signal for a certain word and distinguished speech signal for different words. As an application, it can be used to assess speech quality and forecast speech recognition rate of speech made by an individual dysarthric speaker before actual exhaustive implementation of an automatic speech recognition system for the speaker. The effectiveness of Ψ as a speech recognition rate predictor is evaluated by rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). The results show that Ψ is a promising indicator for predicting recognition rate of dysarthric speech. All experiments had been done on speech corpus composed of speech data from eight normal speakers and eight dysarthric speakers.
Electromagnetic versus electrical coupling of personal frequency modulation (FM) receivers to cochlear implant sound processors.

PubMed

Schafer, Erin C; Romine, Denise; Musgrave, Elizabeth; Momin, Sadaf; Huynh, Christy

2013-01-01

Previous research has suggested that electrically coupled frequency modulation (FM) systems substantially improved speech-recognition performance in noise in individuals with cochlear implants (CIs). However, there is limited evidence to support the use of electromagnetically coupled (neck loop) FM receivers with contemporary CI sound processors containing telecoils. The primary goal of this study was to compare speech-recognition performance in noise and subjective ratings of adolescents and adults using one of three contemporary CI sound processors coupled to electromagnetically and electrically coupled FM receivers from Oticon. A repeated-measures design was used to compare speech-recognition performance in noise and subjective ratings without and with the FM systems across three test sessions (Experiment 1) and to compare performance at different FM-gain settings (Experiment 2). Descriptive statistics were used in Experiment 3 to describe output differences measured through a CI sound processor. Experiment 1 included nine adolescents or adults with unilateral or bilateral Advanced Bionics Harmony (n = 3), Cochlear Nucleus 5 (n = 3), and MED-EL OPUS 2 (n = 3) CI sound processors. In Experiment 2, seven of the original nine participants were tested. In Experiment 3, electroacoustic output was measured from a Nucleus 5 sound processor when coupled to the electromagnetically coupled Oticon Arc neck loop and electrically coupled Oticon R2. In Experiment 1, participants completed a field trial with each FM receiver and three test sessions that included speech-recognition performance in noise and a subjective rating scale. In Experiment 2, participants were tested in three receiver-gain conditions. Results in both experiments were analyzed using repeated-measures analysis of variance. Experiment 3 involved electroacoustic-test measures to determine the monitor-earphone output of the CI alone and CI coupled to the two FM receivers. The results in Experiment 1 suggested that both FM receivers provided significantly better speech-recognition performance in noise than the CI alone; however, the electromagnetically coupled receiver provided significantly better speech-recognition performance in noise and better ratings in some situations than the electrically coupled receiver when set to the same gain. In Experiment 2, the primary analysis suggested significantly better speech-recognition performance in noise for the neck-loop versus electrically coupled receiver, but a second analysis, using the best performance across gain settings for each device, revealed no significant differences between the two FM receivers. Experiment 3 revealed monitor-earphone output differences in the Nucleus 5 sound processor for the two FM receivers when set to the +8 setting used in Experiment 1 but equal output when the electrically coupled device was set to a +16 gain setting and the electromagnetically coupled device was set to the +8 gain setting. Individuals with contemporary sound processors may show more favorable speech-recognition performance in noise electromagnetically coupled FM systems (i.e., Oticon Arc), which is most likely related to the input processing and signal processing pathway within the CI sound processor for direct input versus telecoil input. Further research is warranted to replicate these findings with a larger sample size and to develop and validate a more objective approach to fitting FM systems to CI sound processors. American Academy of Audiology.
Speech Rate Entrainment in Children and Adults With and Without Autism Spectrum Disorder.

PubMed

Wynn, Camille J; Borrie, Stephanie A; Sellers, Tyra P

2018-05-03

Conversational entrainment, a phenomenon whereby people modify their behaviors to match their communication partner, has been evidenced as critical to successful conversation. It is plausible that deficits in entrainment contribute to the conversational breakdowns and social difficulties exhibited by people with autism spectrum disorder (ASD). This study examined speech rate entrainment in children and adult populations with and without ASD. Sixty participants including typically developing children, children with ASD, typically developed adults, and adults with ASD participated in a quasi-conversational paradigm with a pseudoconfederate. The confederate's speech rate was digitally manipulated to create slow and fast speech rate conditions. Typically developed adults entrained their speech rate in the quasi-conversational paradigm, using a faster rate during the fast speech rate conditions and a slower rate during the slow speech rate conditions. This entrainment pattern was not evident in adults with ASD or in children populations. Findings suggest that speech rate entrainment is a developmentally acquired skill and offers preliminary evidence of speech rate entrainment deficits in adults with ASD. Impairments in this area may contribute to the conversational breakdowns and social difficulties experienced by this population. Future work is needed to advance this area of inquiry.

Investigation of Extended Bandwidth Hearing Aid Amplification on Speech Intelligibility and Sound Quality in Adults with Mild-to-Moderate Hearing Loss.

PubMed

Seeto, Angeline; Searchfield, Grant D

2018-03-01

Advances in digital signal processing have made it possible to provide a wide-band frequency response with smooth, precise spectral shaping. Several manufacturers have introduced hearing aids that are claimed to provide gain for frequencies up to 10-12 kHz. However, there is currently limited evidence and very few independent studies evaluating the performance of the extended bandwidth hearing aids that have recently become available. This study investigated an extended bandwidth hearing aid using measures of speech intelligibility and sound quality to find out whether there was a significant benefit of extended bandwidth amplification over standard amplification. Repeated measures study designed to examine the efficacy of extended bandwidth amplification compared to standard bandwidth amplification. Sixteen adult participants with mild-to-moderate sensorineural hearing loss. Participants were bilaterally fit with a pair of Widex Mind 440 behind-the-ear hearing aids programmed with a standard bandwidth fitting and an extended bandwidth fitting; the latter provided gain up to 10 kHz. For each fitting, and an unaided condition, participants completed two speech measures of aided benefit, the Quick Speech-in-Noise test (QuickSIN™) and the Phonak Phoneme Perception Test (PPT; high-frequency perception in quiet), and a measure of sound quality rating. There were no significant differences found between unaided and aided conditions for QuickSIN™ scores. For the PPT, there were statistically significantly lower (improved) detection thresholds at high frequencies (6 and 9 kHz) with the extended bandwidth fitting. Although not statistically significant, participants were able to distinguish between 6 and 9 kHz 50% better with extended bandwidth. No significant difference was found in ability to recognize phonemes in quiet between the unaided and aided conditions when phonemes only contained frequency content <6 kHz. However significant benefit was found with the extended bandwidth fitting for recognition of 9-kHz phonemes. No significant difference in sound quality preference was found between the standard bandwidth and extended bandwidth fittings. This study demonstrated that a pair of currently available extended bandwidth hearing aids was technically capable of delivering high-frequency amplification that was both audible and useable to listeners with mild-to-moderate hearing loss. This amplification was of acceptable sound quality. Further research, particularly field trials, is required to ascertain the real-world benefit of high-frequency amplification. American Academy of Audiology
Glyph guessing for ‘oo’ and ‘ee’: spatial frequency information in sound symbolic matching for ancient and unfamiliar scripts

PubMed Central

2017-01-01

In three experiments, we asked whether diverse scripts contain interpretable information about the speech sounds they represent. When presented with a pair of unfamiliar letters, adult readers correctly guess which is /i/ (the ‘ee’ sound in ‘feet’), and which is /u/ (the ‘oo’ sound in ‘shoe’) at rates higher than expected by chance, as shown in a large sample of Singaporean university students (Experiment 1) and replicated in a larger sample of international Internet users (Experiment 2). To uncover what properties of the letters contribute to different scripts' ‘guessability,’ we analysed the visual spatial frequencies in each letter (Experiment 3). We predicted that the lower spectral frequencies in the formants of the vowel /u/ would pattern with lower spatial frequencies in the corresponding letters. Instead, we found that across all spatial frequencies, the letter with more black/white cycles (i.e. more ink) was more likely to be guessed as /u/, and the larger the difference between the glyphs in a pair, the higher the script's guessability. We propose that diverse groups of humans across historical time and geographical space tend to employ similar iconic strategies for representing speech in visual form, and provide norms for letter pairs from 56 diverse scripts. PMID:28989784
Stuttering adults' lack of pre-speech auditory modulation normalizes when speaking with delayed auditory feedback.

PubMed

Daliri, Ayoub; Max, Ludo

2018-02-01

Auditory modulation during speech movement planning is limited in adults who stutter (AWS), but the functional relevance of the phenomenon itself remains unknown. We investigated for AWS and adults who do not stutter (AWNS) (a) a potential relationship between pre-speech auditory modulation and auditory feedback contributions to speech motor learning and (b) the effect on pre-speech auditory modulation of real-time versus delayed auditory feedback. Experiment I used a sensorimotor adaptation paradigm to estimate auditory-motor speech learning. Using acoustic speech recordings, we quantified subjects' formant frequency adjustments across trials when continually exposed to formant-shifted auditory feedback. In Experiment II, we used electroencephalography to determine the same subjects' extent of pre-speech auditory modulation (reductions in auditory evoked potential N1 amplitude) when probe tones were delivered prior to speaking versus not speaking. To manipulate subjects' ability to monitor real-time feedback, we included speaking conditions with non-altered auditory feedback (NAF) and delayed auditory feedback (DAF). Experiment I showed that auditory-motor learning was limited for AWS versus AWNS, and the extent of learning was negatively correlated with stuttering frequency. Experiment II yielded several key findings: (a) our prior finding of limited pre-speech auditory modulation in AWS was replicated; (b) DAF caused a decrease in auditory modulation for most AWNS but an increase for most AWS; and (c) for AWS, the amount of auditory modulation when speaking with DAF was positively correlated with stuttering frequency. Lastly, AWNS showed no correlation between pre-speech auditory modulation (Experiment II) and extent of auditory-motor learning (Experiment I) whereas AWS showed a negative correlation between these measures. Thus, findings suggest that AWS show deficits in both pre-speech auditory modulation and auditory-motor learning; however, limited pre-speech modulation is not directly related to limited auditory-motor adaptation; and in AWS, DAF paradoxically tends to normalize their otherwise limited pre-speech auditory modulation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Speech training alters consonant and vowel responses in multiple auditory cortex fields

PubMed Central

Engineer, Crystal T.; Rahebi, Kimiya C.; Buell, Elizabeth P.; Fink, Melyssa K.; Kilgard, Michael P.

2015-01-01

Speech sounds evoke unique neural activity patterns in primary auditory cortex (A1). Extensive speech sound discrimination training alters A1 responses. While the neighboring auditory cortical fields each contain information about speech sound identity, each field processes speech sounds differently. We hypothesized that while all fields would exhibit training-induced plasticity following speech training, there would be unique differences in how each field changes. In this study, rats were trained to discriminate speech sounds by consonant or vowel in quiet and in varying levels of background speech-shaped noise. Local field potential and multiunit responses were recorded from four auditory cortex fields in rats that had received 10 weeks of speech discrimination training. Our results reveal that training alters speech evoked responses in each of the auditory fields tested. The neural response to consonants was significantly stronger in anterior auditory field (AAF) and A1 following speech training. The neural response to vowels following speech training was significantly weaker in ventral auditory field (VAF) and posterior auditory field (PAF). This differential plasticity of consonant and vowel sound responses may result from the greater paired pulse depression, expanded low frequency tuning, reduced frequency selectivity, and lower tone thresholds, which occurred across the four auditory fields. These findings suggest that alterations in the distributed processing of behaviorally relevant sounds may contribute to robust speech discrimination. PMID:25827927
Objective Quality and Intelligibility Prediction for Users of Assistive Listening Devices

PubMed Central

Falk, Tiago H.; Parsa, Vijay; Santos, João F.; Arehart, Kathryn; Hazrati, Oldooz; Huber, Rainer; Kates, James M.; Scollie, Susan

2015-01-01

This article presents an overview of twelve existing objective speech quality and intelligibility prediction tools. Two classes of algorithms are presented, namely intrusive and non-intrusive, with the former requiring the use of a reference signal, while the latter does not. Investigated metrics include both those developed for normal hearing listeners, as well as those tailored particularly for hearing impaired (HI) listeners who are users of assistive listening devices (i.e., hearing aids, HAs, and cochlear implants, CIs). Representative examples of those optimized for HI listeners include the speech-to-reverberation modulation energy ratio, tailored to hearing aids (SRMR-HA) and to cochlear implants (SRMR-CI); the modulation spectrum area (ModA); the hearing aid speech quality (HASQI) and perception indices (HASPI); and the PErception MOdel - hearing impairment quality (PEMO-Q-HI). The objective metrics are tested on three subjectively-rated speech datasets covering reverberation-alone, noise-alone, and reverberation-plus-noise degradation conditions, as well as degradations resultant from nonlinear frequency compression and different speech enhancement strategies. The advantages and limitations of each measure are highlighted and recommendations are given for suggested uses of the different tools under specific environmental and processing conditions. PMID:26052190
Isolating the Energetic Component of Speech-on-Speech Masking With Ideal Time-Frequency Segregation

DTIC Science & Technology

2006-12-01

Auditory Scene Analysis MIT Press, Cambridge, MA. Bronkhorst, A., and Plomp, R. 1992. “Effects of multiple speechlike maskers on binaural speech...C. J. 1994. “Perception and computational sepa- ration of simultaneous vowels: Cues arising from low frequency beating ,” J. Acoust. Soc. Am. 95...Litovsky, R., and Culling, J. 2004. “The benefit of binaural hearing in a cocktail party: Effects of location and type of interferer,” J. Acoust. Soc
[Effects of acaoustic adaptation of classrooms on the quality of verbal communication].

PubMed

Mikulski, Witold

2013-01-01

Voice organ disorders among teachers are caused by excessive voice strain. One of the measures to reduce this strain is to decrease background noise when teaching. Increasing the acoustic absorption of the room is a technical measure for achieving this aim. The absorption level also improves speech intelligibility rated by the following parameters: room reverberation time and speech transmission index (STI). This article presents the effects of acoustic adaptation of classrooms on the quality of verbal communication, aimed at getting the speech intelligibility at the good or excellent level. The article lists the criteria for evaluating classrooms in terms of the quality of verbal communication. The parameters were defined, using the measurement methods according to PN-EN ISO 3382-2:2010 and PN-EN 60268-16:2011. Acoustic adaptations were completed in two classrooms. After completing acoustic adaptations the reverberation time for the frequency of 1 kHz was reduced: in room no. 1 from 1.45 s to 0.44 s and in room no. 2 from 1.03 s to 0.37 s (maximum 0.65 s). At the same time, the speech transmission index increased: in room no. 1 from 0.55 (satisfactory speech intelligibility) to 0.75 (speech intelligibility close to excellent); in room no. 2 from 0.63 (good speech intelligibility) to 0.80 (excellent speech intelligibility). Therefore, it can be stated that prior to completing acoustic adaptations room no. 1 did not comply and room no. 2 barely complied with the criterion (speech transmission index of 0.62). After completing acoustic adaptations both rooms meet the requirements.
Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing.

PubMed

Jørgensen, Søren; Dau, Torsten

2011-09-01

A model for predicting the intelligibility of processed noisy speech is proposed. The speech-based envelope power spectrum model has a similar structure as the model of Ewert and Dau [(2000). J. Acoust. Soc. Am. 108, 1181-1196], developed to account for modulation detection and masking data. The model estimates the speech-to-noise envelope power ratio, SNR(env), at the output of a modulation filterbank and relates this metric to speech intelligibility using the concept of an ideal observer. Predictions were compared to data on the intelligibility of speech presented in stationary speech-shaped noise. The model was further tested in conditions with noisy speech subjected to reverberation and spectral subtraction. Good agreement between predictions and data was found in all cases. For spectral subtraction, an analysis of the model's internal representation of the stimuli revealed that the predicted decrease of intelligibility was caused by the estimated noise envelope power exceeding that of the speech. The classical concept of the speech transmission index fails in this condition. The results strongly suggest that the signal-to-noise ratio at the output of a modulation frequency selective process provides a key measure of speech intelligibility. © 2011 Acoustical Society of America
Systematic Studies of Modified Vocalization: The Effect of Speech Rate on Speech Production Measures During Metronome-Paced Speech in Persons who Stutter

PubMed Central

Davidow, Jason H.

2013-01-01

Background Metronome-paced speech results in the elimination, or substantial reduction, of stuttering moments. The cause of fluency during this fluency-inducing condition is unknown. Several investigations have reported changes in speech pattern characteristics from a control condition to a metronome-paced speech condition, but failure to control speech rate between conditions limits our ability to determine if the changes were necessary for fluency. Aims This study examined the effect of speech rate on several speech production variables during one-syllable-per-beat metronomic speech, in order to determine changes that may be important for fluency during this fluency-inducing condition. Methods and Procedures Thirteen persons who stutter (PWS), aged 18–62 years, completed a series of speaking tasks. Several speech production variables were compared between conditions produced at different metronome beat rates, and between a control condition and a metronome-paced speech condition produced at a rate equal to the control condition. Outcomes & Results Vowel duration, voice onset time, pressure rise time, and phonated intervals were significantly impacted by metronome beat rate. Voice onset time and the percentage of short (30–100 ms) phonated intervals significantly decreased from the control condition to the equivalent rate metronome-paced speech condition. Conclusions & Implications A reduction in the percentage of short phonated intervals may be important for fluency during syllable-based metronome-paced speech for PWS. Future studies should continue examining the necessity of this reduction. In addition, speech rate must be controlled in future fluency-inducing condition studies, including neuroimaging investigations, in order for this research to make a substantial contribution to finding the fluency-inducing mechanism of fluency-inducing conditions. PMID:24372888
Mechanisms Underlying Selective Neuronal Tracking of Attended Speech at a ‘Cocktail Party’

PubMed Central

Zion Golumbic, Elana M.; Ding, Nai; Bickel, Stephan; Lakatos, Peter; Schevon, Catherine A.; McKhann, Guy M.; Goodman, Robert R.; Emerson, Ronald; Mehta, Ashesh D.; Simon, Jonathan Z.; Poeppel, David; Schroeder, Charles E.

2013-01-01

Summary The ability to focus on and understand one talker in a noisy social environment is a critical social-cognitive capacity, whose underlying neuronal mechanisms are unclear. We investigated the manner in which speech streams are represented in brain activity and the way that selective attention governs the brain’s representation of speech using a ‘Cocktail Party’ Paradigm, coupled with direct recordings from the cortical surface in surgical epilepsy patients. We find that brain activity dynamically tracks speech streams using both low frequency phase and high frequency amplitude fluctuations, and that optimal encoding likely combines the two. In and near low level auditory cortices, attention ‘modulates’ the representation by enhancing cortical tracking of attended speech streams, but ignored speech remains represented. In higher order regions, the representation appears to become more ‘selective,’ in that there is no detectable tracking of ignored speech. This selectivity itself seems to sharpen as a sentence unfolds. PMID:23473326
Joint Spatial-Spectral Feature Space Clustering for Speech Activity Detection from ECoG Signals

PubMed Central

Kanas, Vasileios G.; Mporas, Iosif; Benz, Heather L.; Sgarbas, Kyriakos N.; Bezerianos, Anastasios; Crone, Nathan E.

2014-01-01

Brain machine interfaces for speech restoration have been extensively studied for more than two decades. The success of such a system will depend in part on selecting the best brain recording sites and signal features corresponding to speech production. The purpose of this study was to detect speech activity automatically from electrocorticographic signals based on joint spatial-frequency clustering of the ECoG feature space. For this study, the ECoG signals were recorded while a subject performed two different syllable repetition tasks. We found that the optimal frequency resolution to detect speech activity from ECoG signals was 8 Hz, achieving 98.8% accuracy by employing support vector machines (SVM) as a classifier. We also defined the cortical areas that held the most information about the discrimination of speech and non-speech time intervals. Additionally, the results shed light on the distinct cortical areas associated with the two syllable repetition tasks and may contribute to the development of portable ECoG-based communication. PMID:24658248
When Infants Talk, Infants Listen: Pre-Babbling Infants Prefer Listening to Speech with Infant Vocal Properties

ERIC Educational Resources Information Center

Masapollo, Matthew; Polka, Linda; Ménard, Lucie

2016-01-01

To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
The design of a device for hearer and feeler differentiation, part A. [speech modulated hearing device

NASA Technical Reports Server (NTRS)

Creecy, R.

1974-01-01

A speech modulated white noise device is reported that gives the rhythmic characteristics of a speech signal for intelligible reception by deaf persons. The signal is composed of random amplitudes and frequencies as modulated by the speech envelope characteristics of rhythm and stress. Time intensity parameters of speech are conveyed through the vibro-tactile sensation stimuli.
HUMAN SPEECH: A RESTRICTED USE OF THE MAMMALIAN LARYNX

PubMed Central

Titze, Ingo R.

2016-01-01

Purpose Speech has been hailed as unique to human evolution. While the inventory of distinct sounds producible with vocal tract articulators is a great advantage in human oral communication, it is argued here that the larynx as a sound source in speech is limited in its range and capability because a low fundamental frequency is ideal for phonemic intelligibility and source-filter independence. Method Four existing data sets were combined to make an argument regarding exclusive use of the larynx for speech: (1) range of fundamental frequency, (2) laryngeal muscle activation, (3) vocal fold length in relation to sarcomere length of the major laryngeal muscles, and (4) vocal fold morphological development. Results Limited data support the notion that speech tends to produce a contracture of the larynx. The morphological design of the human vocal folds, like that of primates and other mammals, is optimized for vocal communication over distances for which higher fundamental frequency, higher intensity, and fewer unvoiced segments are utilized than in conversational speech. Conclusion The positive message is that raising one’s voice to call, shout, or sing, or executing pitch glides to stretch the vocal folds, can counteract this trend toward a contracted state. PMID:27397113
Voice recognition through phonetic features with Punjabi utterances

NASA Astrophysics Data System (ADS)

Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.

2017-07-01

This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Effects of Long-Term Speech-in-Noise Training in Air Traffic Controllers and High Frequency Suppression. A Control Group Study.

PubMed

Pérez Zaballos, María Teresa; Ramos de Miguel, Ángel; Pérez Plasencia, Daniel; Zaballos González, María Luisa; Ramos Macías, Ángel

2015-12-01

To evaluate 1) if air traffic controllers (ATC) perform better than non-air traffic controllers in an open-set speech-in-noise test because of their experience with radio communications, and 2) if high-frequency information (>8000 Hz) substantially improves speech-in-noise perception across populations. The control group comprised 28 normal-hearing subjects, and the target group comprised 48 ATCs aged between 19 and 55 years who were native Spanish speakers. The hearing -in-noise abilities of the two groups were characterized under two signal conditions: 1) speech tokens and white noise sampled at 44.1 kHz (unfiltered condition) and 2) speech tokens plus white noise, each passed through a 4th order Butterworth filter with 70 and 8000 Hz low and high cutoffs (filtered condition). These tests were performed at signal-to-noise ratios of +5, 0, and -5-dB SNR. The ATCs outperformed the control group in all conditions. The differences were statistically significant in all cases, and the largest difference was observed under the most difficult conditions (-5 dB SNR). Overall, scores were higher when high-frequency components were not suppressed for both groups, although statistically significant differences were not observed for the control group at 0 dB SNR. The results indicate that ATCs are more capable of identifying speech in noise. This may be due to the effect of their training. On the other hand, performance seems to decrease when the high frequency components of speech are removed, regardless of training.
The Effects of Self-Generated Synchronous and Asynchronous Visual Speech Feedback on Overt Stuttering Frequency

ERIC Educational Resources Information Center

Snyder, Gregory J.; Hough, Monica Strauss; Blanchet, Paul; Ivy, Lennette J.; Waddell, Dwight

2009-01-01

Purpose: Relatively recent research documents that visual choral speech, which represents an externally generated form of synchronous visual speech feedback, significantly enhanced fluency in those who stutter. As a consequence, it was hypothesized that self-generated synchronous and asynchronous visual speech feedback would likewise enhance…
The Use of Interpreters by Speech-Language Pathologists Conducting Bilingual Speech-Language Assessments

ERIC Educational Resources Information Center

Palfrey, Carol Lynn

2013-01-01

The purpose of this non-experimental quantitative study was to explore the practices of speech-language pathologists in conducting bilingual assessments with interpreters. Data were obtained regarding the assessment tools and practices used by speech-language pathologists, the frequency with which they work with interpreters, and the procedures…
Dissecting Choral Speech: Properties of the Accompanist Critical to Stuttering Reduction

ERIC Educational Resources Information Center

Kiefte, Michael; Armson, Joy

2008-01-01

The effects of choral speech and altered auditory feedback (AAF) on stuttering frequency were compared to identify those properties of choral speech that make it a more effective condition for stuttering reduction. Seventeen adults who stutter (AWS) participated in an experiment consisting of special choral speech conditions that were manipulated…
Robust Frequency Invariant Beamforming with Low Sidelobe for Speech Enhancement

NASA Astrophysics Data System (ADS)

Zhu, Yiting; Pan, Xiang

2018-01-01

Frequency invariant beamformers (FIBs) are widely used in speech enhancement and source localization. There are two traditional optimization methods for FIB design. The first one is convex optimization, which is simple but the frequency invariant characteristic of the beam pattern is poor with respect to frequency band of five octaves. The least squares (LS) approach using spatial response variation (SRV) constraint is another optimization method. Although, it can provide good frequency invariant property, it usually couldn’t be used in speech enhancement for its lack of weight norm constraint which is related to the robustness of a beamformer. In this paper, a robust wideband beamforming method with a constant beamwidth is proposed. The frequency invariant beam pattern is achieved by resolving an optimization problem of the SRV constraint to cover speech frequency band. With the control of sidelobe level, it is available for the frequency invariant beamformer (FIB) to prevent distortion of interference from the undesirable direction. The approach is completed in time-domain by placing tapped delay lines(TDL) and finite impulse response (FIR) filter at the output of each sensor which is more convenient than the Frost processor. By invoking the weight norm constraint, the robustness of the beamformer is further improved against random errors. Experiment results show that the proposed method has a constant beamwidth and almost the same white noise gain as traditional delay-and-sum (DAS) beamformer.

Lexical effects on speech production and intelligibility in Parkinson's disease

NASA Astrophysics Data System (ADS)

Chiu, Yi-Fang

Individuals with Parkinson's disease (PD) often have speech deficits that lead to reduced speech intelligibility. Previous research provides a rich database regarding the articulatory deficits associated with PD including restricted vowel space (Skodda, Visser, & Schlegel, 2011) and flatter formant transitions (Tjaden & Wilding, 2004; Walsh & Smith, 2012). However, few studies consider the effect of higher level structural variables of word usage frequency and the number of similar sounding words (i.e. neighborhood density) on lower level articulation or on listeners' perception of dysarthric speech. The purpose of the study is to examine the interaction of lexical properties and speech articulation as measured acoustically in speakers with PD and healthy controls (HC) and the effect of lexical properties on the perception of their speech. Individuals diagnosed with PD and age-matched healthy controls read sentences with words that varied in word frequency and neighborhood density. Acoustic analysis was performed to compare second formant transitions in diphthongs, an indicator of the dynamics of tongue movement during speech production, across different lexical characteristics. Young listeners transcribed the spoken sentences and the transcription accuracy was compared across lexical conditions. The acoustic results indicate that both PD and HC speakers adjusted their articulation based on lexical properties but the PD group had significant reductions in second formant transitions compared to HC. Both groups of speakers increased second formant transitions for words with low frequency and low density, but the lexical effect is diphthong dependent. The change in second formant slope was limited in the PD group when the required formant movement for the diphthong is small. The data from listeners' perception of the speech by PD and HC show that listeners identified high frequency words with greater accuracy suggesting the use of lexical knowledge during the recognition process. The relationship between acoustic results and perceptual accuracy is limited in this study suggesting that listeners incorporate acoustic and non-acoustic information to maximize speech intelligibility.
Combined electric and acoustic hearing performance with Zebra® speech processor: speech reception, place, and temporal coding evaluation.

PubMed

Vaerenberg, Bart; Péan, Vincent; Lesbros, Guillaume; De Ceulaer, Geert; Schauwers, Karen; Daemers, Kristin; Gnansia, Dan; Govaerts, Paul J

2013-06-01

To assess the auditory performance of Digisonic(®) cochlear implant users with electric stimulation (ES) and electro-acoustic stimulation (EAS) with special attention to the processing of low-frequency temporal fine structure. Six patients implanted with a Digisonic(®) SP implant and showing low-frequency residual hearing were fitted with the Zebra(®) speech processor providing both electric and acoustic stimulation. Assessment consisted of monosyllabic speech identification tests in quiet and in noise at different presentation levels, and a pitch discrimination task using harmonic and disharmonic intonating complex sounds ( Vaerenberg et al., 2011 ). These tests investigate place and time coding through pitch discrimination. All tasks were performed with ES only and with EAS. Speech results in noise showed significant improvement with EAS when compared to ES. Whereas EAS did not yield better results in the harmonic intonation test, the improvements in the disharmonic intonation test were remarkable, suggesting better coding of pitch cues requiring phase locking. These results suggest that patients with residual hearing in the low-frequency range still have good phase-locking capacities, allowing them to process fine temporal information. ES relies mainly on place coding but provides poor low-frequency temporal coding, whereas EAS also provides temporal coding in the low-frequency range. Patients with residual phase-locking capacities can make use of these cues.
Age-Related Differences in Speech Rate Perception Do Not Necessarily Entail Age-Related Differences in Speech Rate Use

ERIC Educational Resources Information Center

Heffner, Christopher C.; Newman, Rochelle S.; Dilley, Laura C.; Idsardi, William J.

2015-01-01

Purpose: A new literature has suggested that speech rate can influence the parsing of words quite strongly in speech. The purpose of this study was to investigate differences between younger adults and older adults in the use of context speech rate in word segmentation, given that older adults perceive timing information differently from younger…
Production frequency effects in perception of phonological variation

NASA Astrophysics Data System (ADS)

Connine, Cynthia M.; Ranbom, Larissa J.

2004-05-01

Two experiments were conducted that investigated the relationship between phonological variant occurrence frequency (based on a corpus analysis of conversational speech) and auditory word recognition. The variant investigated was an alternation between the presence of [nt] and a nasal flap (e.g., center, cen'er). The corpus analysis showed that 80% of productions are nasal flaps, with wide variability across words (from 0% for ``enter'' to 100% for ``twenty''). In a production goodness rating experiment, listeners rated [nt] productions as better than their nasal flap counterparts. For individual items, a strong positive correlation was found between nasal flap frequency and goodness ratings: words typically produced with nasal flaps were rated as better productions. A lexical decision experiment showed that nasal flap variants were recognized more slowly and less accurately than [nt] versions. The rated quality of the nasal-flapped production was strongly correlated with the results of the lexical decision task: nasal-flapped words considered highly acceptable were recognized more quickly and accurately than words rated as poor nasal flap productions. The results demonstrate a strong relationship between experienced variant frequency and auditory word recognition and suggest that phonological variation is explicitly represented in the mental lexicon.
Evaluation of the importance of time-frequency contributions to speech intelligibility in noise

PubMed Central

Yu, Chengzhu; Wójcicki, Kamil K.; Loizou, Philipos C.; Hansen, John H. L.; Johnson, Michael T.

2014-01-01

Recent studies on binary masking techniques make the assumption that each time-frequency (T-F) unit contributes an equal amount to the overall intelligibility of speech. The present study demonstrated that the importance of each T-F unit to speech intelligibility varies in accordance with speech content. Specifically, T-F units are categorized into two classes, speech-present T-F units and speech-absent T-F units. Results indicate that the importance of each speech-present T-F unit to speech intelligibility is highly related to the loudness of its target component, while the importance of each speech-absent T-F unit varies according to the loudness of its masker component. Two types of mask errors are also considered, which include miss and false alarm errors. Consistent with previous work, false alarm errors are shown to be more harmful to speech intelligibility than miss errors when the mixture signal-to-noise ratio (SNR) is below 0 dB. However, the relative importance between the two types of error is conditioned on the SNR level of the input speech signal. Based on these observations, a mask-based objective measure, the loudness weighted hit-false, is proposed for predicting speech intelligibility. The proposed objective measure shows significantly higher correlation with intelligibility compared to two existing mask-based objective measures. PMID:24815280
A Deep Ensemble Learning Method for Monaural Speech Separation.

PubMed

Zhang, Xiao-Lei; Wang, DeLiang

2016-03-01

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN)-based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences between the two optimization objectives are not well understood. In this paper, we propose a deep ensemble method, named multicontext networks, to address monaural speech separation. The first multicontext network averages the outputs of multiple DNNs whose inputs employ different window lengths. The second multicontext network is a stack of multiple DNNs. Each DNN in a module of the stack takes the concatenation of original acoustic features and expansion of the soft output of the lower module as its input, and predicts the ratio mask of the target speaker; the DNNs in the same module employ different contexts. We have conducted extensive experiments with three speech corpora. The results demonstrate the effectiveness of the proposed method. We have also compared the two optimization objectives systematically and found that predicting the ideal time-frequency mask is more efficient in utilizing clean training speech, while predicting clean speech is less sensitive to SNR variations.
Effects of linear and nonlinear speech rate changes on speech intelligibility in stationary and fluctuating maskers

PubMed Central

Cooke, Martin; Aubanel, Vincent

2017-01-01

Algorithmic modifications to the durational structure of speech designed to avoid intervals of intense masking lead to increases in intelligibility, but the basis for such gains is not clear. The current study addressed the possibility that the reduced information load produced by speech rate slowing might explain some or all of the benefits of durational modifications. The study also investigated the influence of masker stationarity on the effectiveness of durational changes. Listeners identified keywords in sentences that had undergone linear and nonlinear speech rate changes resulting in overall temporal lengthening in the presence of stationary and fluctuating maskers. Relative to unmodified speech, a slower speech rate produced no intelligibility gains for the stationary masker, suggesting that a reduction in information rate does not underlie intelligibility benefits of durationally modified speech. However, both linear and nonlinear modifications led to substantial intelligibility increases in fluctuating noise. One possibility is that overall increases in speech duration provide no new phonetic information in stationary masking conditions, but that temporal fluctuations in the background increase the likelihood of glimpsing additional salient speech cues. Alternatively, listeners may have benefitted from an increase in the difference in speech rates between the target and background. PMID:28618803
Methods of analysis speech rate: a pilot study.

PubMed

Costa, Luanna Maria Oliveira; Martins-Reis, Vanessa de Oliveira; Celeste, Letícia Côrrea

2016-01-01

To describe the performance of fluent adults in different measures of speech rate. The study included 24 fluent adults, of both genders, speakers of Brazilian Portuguese, who were born and still living in the metropolitan region of Belo Horizonte, state of Minas Gerais, aged between 18 and 59 years. Participants were grouped by age: G1 (18-29 years), G2 (30-39 years), G3 (40-49 years), and G4 (50-59 years). The speech samples were obtained following the methodology of the Speech Fluency Assessment Protocol. In addition to the measures of speech rate proposed by the protocol (speech rate in words and syllables per minute), the rate of speech into phonemes per second and the articulation rate with and without the disfluencies were calculated. We used the nonparametric Friedman test and the Wilcoxon test for multiple comparisons. Groups were compared using the nonparametric Kruskal Wallis. The significance level was of 5%. There were significant differences between measures of speech rate involving syllables. The multiple comparisons showed that all the three measures were different. There was no effect of age for the studied measures. These findings corroborate previous studies. The inclusion of temporal acoustic measures such as speech rate in phonemes per second and articulation rates with and without disfluencies can be a complementary approach in the evaluation of speech rate.
Intentional changes in sound pressure level and rate: their impact on measures of respiration, phonation, and articulation.

PubMed

Dromey, C; Ramig, L O

1998-10-01

The purpose of the study was to compare the effects of changing sound pressure level (SPL) and rate on respiratory, phonatory, and articulatory behavior during sentence production. Ten subjects, 5 men and 5 women, repeated the sentence, "I sell a sapapple again," under 5 SPL and 5 rate conditions. From a multi-channel recording, measures were made of lung volume (LV), SPL, fundamental frequency (F0), semitone standard deviation (STSD), and upper and lower lip displacements and peak velocities. Loud speech led to increases in LV initiation, LV termination, F0, STSD, and articulatory displacements and peak velocities for both lips. Token-to-token variability in these articulatory measures generally decreased as SPL increased, whereas rate increases were associated with increased lip movement variability. LV excursion decreased as rate increased. F0 for the men and STSD for both genders increased with rate. Lower lip displacements became smaller for faster speech. The interspeaker differences in velocity change as a function of rate contrasted with the more consistent velocity performance across speakers for changes in SPL. Because SPL and rate change are targeted in therapy for dysarthria, the present data suggest directions for future research with disordered speakers.
Neural entrainment to rhythmic speech in children with developmental dyslexia

PubMed Central

Power, Alan J.; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2013-01-01

A rhythmic paradigm based on repetition of the syllable “ba” was used to study auditory, visual, and audio-visual oscillatory entrainment to speech in children with and without dyslexia using EEG. Children pressed a button whenever they identified a delay in the isochronous stimulus delivery (500 ms; 2 Hz delta band rate). Response power, strength of entrainment and preferred phase of entrainment in the delta and theta frequency bands were compared between groups. The quality of stimulus representation was also measured using cross-correlation of the stimulus envelope with the neural response. The data showed a significant group difference in the preferred phase of entrainment in the delta band in response to the auditory and audio-visual stimulus streams. A different preferred phase has significant implications for the quality of speech information that is encoded neurally, as it implies enhanced neuronal processing (phase alignment) at less informative temporal points in the incoming signal. Consistent with this possibility, the cross-correlogram analysis revealed superior stimulus representation by the control children, who showed a trend for larger peak r-values and significantly later lags in peak r-values compared to participants with dyslexia. Significant relationships between both peak r-values and peak lags were found with behavioral measures of reading. The data indicate that the auditory temporal reference frame for speech processing is atypical in developmental dyslexia, with low frequency (delta) oscillations entraining to a different phase of the rhythmic syllabic input. This would affect the quality of encoding of speech, and could underlie the cognitive impairments in phonological representation that are the behavioral hallmark of this developmental disorder across languages. PMID:24376407
Spectrotemporal Modulation Detection and Speech Perception by Cochlear Implant Users

PubMed Central

Won, Jong Ho; Moon, Il Joon; Jin, Sunhwa; Park, Heesung; Woo, Jihwan; Cho, Yang-Sun; Chung, Won-Ho; Hong, Sung Hwa

2015-01-01

Spectrotemporal modulation (STM) detection performance was examined for cochlear implant (CI) users. The test involved discriminating between an unmodulated steady noise and a modulated stimulus. The modulated stimulus presents frequency modulation patterns that change in frequency over time. In order to examine STM detection performance for different modulation conditions, two different temporal modulation rates (5 and 10 Hz) and three different spectral modulation densities (0.5, 1.0, and 2.0 cycles/octave) were employed, producing a total 6 different STM stimulus conditions. In order to explore how electric hearing constrains STM sensitivity for CI users differently from acoustic hearing, normal-hearing (NH) and hearing-impaired (HI) listeners were also tested on the same tasks. STM detection performance was best in NH subjects, followed by HI subjects. On average, CI subjects showed poorest performance, but some CI subjects showed high levels of STM detection performance that was comparable to acoustic hearing. Significant correlations were found between STM detection performance and speech identification performance in quiet and in noise. In order to understand the relative contribution of spectral and temporal modulation cues to speech perception abilities for CI users, spectral and temporal modulation detection was performed separately and related to STM detection and speech perception performance. The results suggest that that slow spectral modulation rather than slow temporal modulation may be important for determining speech perception capabilities for CI users. Lastly, test–retest reliability for STM detection was good with no learning. The present study demonstrates that STM detection may be a useful tool to evaluate the ability of CI sound processing strategies to deliver clinically pertinent acoustic modulation information. PMID:26485715
Perceptually relevant speech tracking in auditory and motor cortex reflects distinct linguistic features

PubMed Central

Gross, Joachim; Kayser, Christoph

2018-01-01

During online speech processing, our brain tracks the acoustic fluctuations in speech at different timescales. Previous research has focused on generic timescales (for example, delta or theta bands) that are assumed to map onto linguistic features such as prosody or syllables. However, given the high intersubject variability in speaking patterns, such a generic association between the timescales of brain activity and speech properties can be ambiguous. Here, we analyse speech tracking in source-localised magnetoencephalographic data by directly focusing on timescales extracted from statistical regularities in our speech material. This revealed widespread significant tracking at the timescales of phrases (0.6–1.3 Hz), words (1.8–3 Hz), syllables (2.8–4.8 Hz), and phonemes (8–12.4 Hz). Importantly, when examining its perceptual relevance, we found stronger tracking for correctly comprehended trials in the left premotor (PM) cortex at the phrasal scale as well as in left middle temporal cortex at the word scale. Control analyses using generic bands confirmed that these effects were specific to the speech regularities in our stimuli. Furthermore, we found that the phase at the phrasal timescale coupled to power at beta frequency (13–30 Hz) in motor areas. This cross-frequency coupling presumably reflects top-down temporal prediction in ongoing speech perception. Together, our results reveal specific functional and perceptually relevant roles of distinct tracking and cross-frequency processes along the auditory–motor pathway. PMID:29529019
Cross-Linguistic Differences in Bilinguals' Fundamental Frequency Ranges

ERIC Educational Resources Information Center

Ordin, Mikhail; Mennen, Ineke

2017-01-01

Purpose: We investigated cross-linguistic differences in fundamental frequency range (FFR) in Welsh-English bilingual speech. This is the first study that reports gender-specific behavior in switching FFRs across languages in bilingual speech. Method: FFR was conceptualized as a behavioral pattern using measures of span (range of fundamental…
Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users.

PubMed

Anderson, Elizabeth S; Oxenham, Andrew J; Nelson, Peggy B; Nelson, David A

2012-12-01

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects' thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time.
Effect of cognitive load on speech prosody in aviation: Evidence from military simulator flights.

PubMed

Huttunen, Kerttu; Keränen, Heikki; Väyrynen, Eero; Pääkkönen, Rauno; Leino, Tuomo

2011-01-01

Mental overload directly affects safety in aviation and needs to be alleviated. Speech recordings are obtained non-invasively and as such are feasible for monitoring cognitive load. We recorded speech of 13 military pilots while they were performing a simulator task. Three types of cognitive load (load on situation awareness, information processing and decision making) were rated by a flight instructor separately for each flight phase and participant. As a function of increased cognitive load, the mean utterance-level fundamental frequency (F0) increased, on average, by 7 Hz and the mean vocal intensity increased by 1 dB. In the most intensive simulator flight phases, mean F0 increased by 12 Hz and mean intensity, by 1.5 dB. At the same time, the mean F0 range decreased by 5 Hz, on average. Our results showed that prosodic features of speech can be used to monitor speaker state and support pilot training in a simulator environment. Copyright © 2010 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Sensitivity of human auditory cortex to rapid frequency modulation revealed by multivariate representational similarity analysis.

PubMed

Joanisse, Marc F; DeSouza, Diedre D

2014-01-01

Functional Magnetic Resonance Imaging (fMRI) was used to investigate the extent, magnitude, and pattern of brain activity in response to rapid frequency-modulated sounds. We examined this by manipulating the direction (rise vs. fall) and the rate (fast vs. slow) of the apparent pitch of iterated rippled noise (IRN) bursts. Acoustic parameters were selected to capture features used in phoneme contrasts, however the stimuli themselves were not perceived as speech per se. Participants were scanned as they passively listened to sounds in an event-related paradigm. Univariate analyses revealed a greater level and extent of activation in bilateral auditory cortex in response to frequency-modulated sweeps compared to steady-state sounds. This effect was stronger in the left hemisphere. However, no regions showed selectivity for either rate or direction of frequency modulation. In contrast, multivoxel pattern analysis (MVPA) revealed feature-specific encoding for direction of modulation in auditory cortex bilaterally. Moreover, this effect was strongest when analyses were restricted to anatomical regions lying outside Heschl's gyrus. We found no support for feature-specific encoding of frequency modulation rate. Differential findings of modulation rate and direction of modulation are discussed with respect to their relevance to phonetic discrimination.
Cortical activation patterns correlate with speech understanding after cochlear implantation

PubMed Central

Olds, Cristen; Pollonini, Luca; Abaya, Homer; Larky, Jannine; Loy, Megan; Bortfeld, Heather; Beauchamp, Michael S.; Oghalai, John S.

2015-01-01

Objectives Cochlear implants are a standard therapy for deafness, yet the ability of implanted patients to understand speech varies widely. To better understand this variability in outcomes, we used functional near-infrared spectroscopy (fNIRS) to image activity within regions of the auditory cortex and compare the results to behavioral measures of speech perception. Design We studied 32 deaf adults hearing through cochlear implants and 35 normal-hearing controls. We used fNIRS to measure responses within the lateral temporal lobe and the superior temporal gyrus to speech stimuli of varying intelligibility. The speech stimuli included normal speech, channelized speech (vocoded into 20 frequency bands), and scrambled speech (the 20 frequency bands were shuffled in random order). We also used environmental sounds as a control stimulus. Behavioral measures consisted of the Speech Reception Threshold, CNC words, and AzBio Sentence tests measured in quiet. Results Both control and implanted participants with good speech perception exhibited greater cortical activations to natural speech than to unintelligible speech. In contrast, implanted participants with poor speech perception had large, indistinguishable cortical activations to all stimuli. The ratio of cortical activation to normal speech to that of scrambled speech directly correlated with the CNC Words and AzBio Sentences scores. This pattern of cortical activation was not correlated with auditory threshold, age, side of implantation, or time after implantation. Turning off the implant reduced cortical activations in all implanted participants. Conclusions Together, these data indicate that the responses we measured within the lateral temporal lobe and the superior temporal gyrus correlate with behavioral measures of speech perception, demonstrating a neural basis for the variability in speech understanding outcomes after cochlear implantation. PMID:26709749
Perceiving speech in context: Compensation for contextual variability during acoustic cue encoding and categorization

NASA Astrophysics Data System (ADS)

Toscano, Joseph Christopher

Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech processing, listeners encode continuous differences in acoustic cues, independent of phonological categories; (2) at post-perceptual stages, fine-grained acoustic information is preserved; and (3) there is preliminary evidence that listeners encode cues relative to context via feedback from categories. These results are discussed in relation to proposed models of speech perception and sources of contextual variability.
Binaural enhancement for bilateral cochlear implant users.

PubMed

Brown, Christopher A

2014-01-01

Bilateral cochlear implant (BCI) users receive limited binaural cues and, thus, show little improvement to speech intelligibility from spatial cues. The feasibility of a method for enhancing the binaural cues available to BCI users is investigated. This involved extending interaural differences of levels, which typically are restricted to high frequencies, into the low-frequency region. Speech intelligibility was measured in BCI users listening over headphones and with direct stimulation, with a target talker presented to one side of the head in the presence of a masker talker on the other side. Spatial separation was achieved by applying either naturally occurring binaural cues or enhanced cues. In this listening configuration, BCI patients showed greater speech intelligibility with the enhanced binaural cues than with naturally occurring binaural cues. In some situations, it is possible for BCI users to achieve greater speech intelligibility when binaural cues are enhanced by applying interaural differences of levels in the low-frequency region.
Basic Technical Data on Transmission Systems and Equipment Using Communications Lines. Part 1.

DTIC Science & Technology

1978-08-01

without noticeable degradation of the speech quality. - 219 - The maximum number of repeater sections: For the KNK-6s For the KNK-6t for multiquad...power circuit 1]; 15. Low frequency amplifier for direction B - Aj 16. Low frequency amplifier; 17. KNN [initial slope network]; 18. LVN-2...frequency Voice frequency ringing at 3,800 Hz with a level 0.4 - 0.8 Np lower than the speech channel level. The system for service

Auditory Long Latency Responses to Tonal and Speech Stimuli

ERIC Educational Resources Information Center

Swink, Shannon; Stuart, Andrew

2012-01-01

Purpose: The effects of type of stimuli (i.e., nonspeech vs. speech), speech (i.e., natural vs. synthetic), gender of speaker and listener, speaker (i.e., self vs. other), and frequency alteration in self-produced speech on the late auditory cortical evoked potential were examined. Method: Young adult men (n = 15) and women (n = 15), all with…
The perceptual significance of high-frequency energy in the human voice.

PubMed

Monson, Brian B; Hunter, Eric J; Lotto, Andrew J; Story, Brad H

2014-01-01

While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired.
The perceptual significance of high-frequency energy in the human voice

PubMed Central

Monson, Brian B.; Hunter, Eric J.; Lotto, Andrew J.; Story, Brad H.

2014-01-01

While human vocalizations generate acoustical energy at frequencies up to (and beyond) 20 kHz, the energy at frequencies above about 5 kHz has traditionally been neglected in speech perception research. The intent of this paper is to review (1) the historical reasons for this research trend and (2) the work that continues to elucidate the perceptual significance of high-frequency energy (HFE) in speech and singing. The historical and physical factors reveal that, while HFE was believed to be unnecessary and/or impractical for applications of interest, it was never shown to be perceptually insignificant. Rather, the main causes for focus on low-frequency energy appear to be because the low-frequency portion of the speech spectrum was seen to be sufficient (from a perceptual standpoint), or the difficulty of HFE research was too great to be justifiable (from a technological standpoint). The advancement of technology continues to overcome concerns stemming from the latter reason. Likewise, advances in our understanding of the perceptual effects of HFE now cast doubt on the first cause. Emerging evidence indicates that HFE plays a more significant role than previously believed, and should thus be considered in speech and voice perception research, especially in research involving children and the hearing impaired. PMID:24982643
Frequency-specific hearing outcomes in pediatric type I tympanoplasty.

PubMed

Kent, David T; Kitsko, Dennis J; Wine, Todd; Chi, David H

2014-02-01

Middle ear disease is the primary cause of hearing loss in children and has a significant impact on language development and academic performance. Multiple prognostic factors have previously been examined, but there is little published data regarding frequency-specific hearing outcomes. To examine the relationship between type I tympanoplasty in a pediatric population and frequency-specific hearing changes, as well as the relationship between several prognostic factors and graft retention. Retrospective medical chart review (February 2006 to October 2011) of 492 consecutive pediatric otolaryngology patients undergoing type I tympanoplasty for tympanic membrane (TM) perforation of any etiology at a tertiary-care pediatric otolaryngology practice. Type I tympanoplasty. Preoperative and postoperative audiometric data were collected for patients undergoing successful TM repair. It was hypothesized before data collection that conductive hearing would improve at all frequencies with no significant change in sensorineural hearing. Data collected included air conduction at 250 to 8000 Hz, speech reception thresholds, bone conduction at 500 to 4000 Hz, and air-bone gap at 500 to 4000 Hz. Demographic data obtained included sex, age, size, mechanism, location of perforation, and operative repair technique. Of 492 patients, 320 were excluded; results were thus examined for 172 patients. Surgery was successful for 73.8% of patients. Perforation size was significantly associated with repair success (mean [SD] surgical success rate of 38.6% [15.3%] vs surgical failure rate of 31.4% [15.0%]; P < .01); however, mean (SD) age (9.02 [3.89] years [surgical success] vs 8.52 [3.43] years [surgical failure]; P > .05) and repair technique (medial [73.08%] vs lateral [76.47%] graft success; P > .99) were not. Air conduction significantly improved from 250 to 2000 Hz (P < .001), did not significantly improve at 4000 Hz (P = .08), and there was a nonsignificant decline at 8000 Hz (P = .12). Speech reception threshold significantly improved (20 vs 15 dB; P < .001). This large review found an association of TM perforation size with surgical success and an improvement in speech reception threshold, air conduction at 250 to 2000 Hz, air-bone gap at 500 to 2000 Hz, and worsening bone conduction at 4000 Hz. Patients with high-frequency hearing loss due to TM perforation should not anticipate significant recovery from type I tympanoplasty. Hearing loss at higher frequencies may require postoperative hearing rehabilitation.
Measuring the critical band for speech.

PubMed

Healy, Eric W; Bacon, Sid P

2006-02-01

The current experiments were designed to measure the frequency resolution employed by listeners during the perception of everyday sentences. Speech bands having nearly vertical filter slopes and narrow bandwidths were sharply partitioned into various numbers of equal log- or ERBN-width subbands. The temporal envelope from each partition was used to amplitude modulate a corresponding band of low-noise noise, and the modulated carriers were combined and presented to normal-hearing listeners. Intelligibility increased and reached asymptote as the number of partitions increased. In the mid- and high-frequency regions of the speech spectrum, the partition bandwidth corresponding to asymptotic performance matched current estimates of psychophysical tuning across a number of conditions. These results indicate that, in these regions, the critical band for speech matches the critical band measured using traditional psychoacoustic methods and nonspeech stimuli. However, in the low-frequency region, partition bandwidths at asymptote were somewhat narrower than would be predicted based upon psychophysical tuning. It is concluded that, overall, current estimates of psychophysical tuning represent reasonably well the ability of listeners to extract spectral detail from running speech.
Determination of the Potential Benefit of Time-Frequency Gain Manipulation

PubMed Central

Anzalone, Michael C.; Calandruccio, Lauren; Doherty, Karen A.; Carney, Laurel H.

2008-01-01

Objective The purpose of this study was to determine the maximum benefit provided by a time-frequency gain-manipulation algorithm for noise-reduction (NR) based on an ideal detector of speech energy. The amount of detected energy necessary to show benefit using this type of NR algorithm was examined, as well as the necessary speed and frequency resolution of the gain manipulation. Design NR was performed using time-frequency gain manipulation, wherein the gains of individual frequency bands depended on the absence or presence of speech energy within each band. Three different experiments were performed: (1) NR using ideal detectors, (2) NR with nonideal detectors, and (3) NR with ideal detectors and different processing speeds and frequency resolutions. All experiments were performed using the Hearing-in-Noise test (HINT). A total of 6 listeners with normal hearing and 14 listeners with hearing loss were tested. Results HINT thresholds improved for all listeners with NR based on the ideal detectors used in Experiment I. The nonideal detectors of Experiment II required detection of at least 90% of the speech energy before an improvement was seen in HINT thresholds. The results of Experiment III demonstrated that relatively high temporal resolution (<100 msec) was required by the NR algorithm to improve HINT thresholds. Conclusions The results indicated that a single-microphone NR system based on time-frequency gain manipulation improved the HINT thresholds of listeners. However, to obtain benefit in speech intelligibility, the detectors used in such a strategy were required to detect an unrealistically high percentage of the speech energy and to perform the gain manipulations on a fast temporal basis. PMID:16957499
JND measurements of the speech formants parameters and its implication in the LPC pole quantization

NASA Astrophysics Data System (ADS)

Orgad, Yaakov

1988-08-01

The inherent sensitivity of auditory perception is explicitly used with the objective of designing an efficient speech encoder. Speech can be modelled by a filter representing the vocal tract shape that is driven by an excitation signal representing glottal air flow. This work concentrates on the filter encoding problem, assuming that excitation signal encoding is optimal. Linear predictive coding (LPC) techniques were used to model a short speech segment by an all-pole filter; each pole was directly related to the speech formants. Measurements were made of the auditory just noticeable difference (JND) corresponding to the natural speech formants, with the LPC filter poles as the best candidates to represent the speech spectral envelope. The JND is the maximum precision required in speech quantization; it was defined on the basis of the shift of one pole parameter of a single frame of a speech segment, necessary to induce subjective perception of the distortion, with .75 probability. The average JND in LPC filter poles in natural speech was found to increase with increasing pole bandwidth and, to a lesser extent, frequency. The JND measurements showed a large spread of the residuals around the average values, indicating that inter-formant coupling and, perhaps, other, not yet fully understood, factors were not taken into account at this stage of the research. A future treatment should consider these factors. The average JNDs obtained in this work were used to design pole quantization tables for speech coding and provided a better bit-rate than the standard quantizer of reflection coefficient; a 30-bits-per-frame pole quantizer yielded a speech quality similar to that obtained with a standard 41-bits-per-frame reflection coefficient quantizer. Owing to the complexity of the numerical root extraction system, the practical implementation of the pole quantization approach remains to be proved.
Noise reduction technologies implemented in head-worn preprocessors for improving cochlear implant performance in reverberant noise fields.

PubMed

Chung, King; Nelson, Lance; Teske, Melissa

2012-09-01

The purpose of this study was to investigate whether a multichannel adaptive directional microphone and a modulation-based noise reduction algorithm could enhance cochlear implant performance in reverberant noise fields. A hearing aid was modified to output electrical signals (ePreprocessor) and a cochlear implant speech processor was modified to receive electrical signals (eProcessor). The ePreprocessor was programmed to flat frequency response and linear amplification. Cochlear implant listeners wore the ePreprocessor-eProcessor system in three reverberant noise fields: 1) one noise source with variable locations; 2) three noise sources with variable locations; and 3) eight evenly spaced noise sources from 0° to 360°. Listeners' speech recognition scores were tested when the ePreprocessor was programmed to omnidirectional microphone (OMNI), omnidirectional microphone plus noise reduction algorithm (OMNI + NR), and adaptive directional microphone plus noise reduction algorithm (ADM + NR). They were also tested with their own cochlear implant speech processor (CI_OMNI) in the three noise fields. Additionally, listeners rated overall sound quality preferences on recordings made in the noise fields. Results indicated that ADM+NR produced the highest speech recognition scores and the most preferable rating in all noise fields. Factors requiring attention in the hearing aid-cochlear implant integration process are discussed. Copyright © 2012 Elsevier B.V. All rights reserved.
Listening Effort and Speech Recognition with Frequency Compression Amplification for Children and Adults with Hearing Loss.

PubMed

Brennan, Marc A; Lewis, Dawna; McCreery, Ryan; Kopun, Judy; Alexander, Joshua M

2017-10-01

Nonlinear frequency compression (NFC) can improve the audibility of high-frequency sounds by lowering them to a frequency where audibility is better; however, this lowering results in spectral distortion. Consequently, performance is a combination of the effects of increased access to high-frequency sounds and the detrimental effects of spectral distortion. Previous work has demonstrated positive benefits of NFC on speech recognition when NFC is set to improve audibility while minimizing distortion. However, the extent to which NFC impacts listening effort is not well understood, especially for children with sensorineural hearing loss (SNHL). To examine the impact of NFC on recognition and listening effort for speech in adults and children with SNHL. Within-subject, quasi-experimental study. Participants listened to amplified nonsense words that were (1) frequency-lowered using NFC, (2) low-pass filtered at 5 kHz to simulate the restricted bandwidth (RBW) of conventional hearing aid processing, or (3) low-pass filtered at 10 kHz to simulate extended bandwidth (EBW) amplification. Fourteen children (8-16 yr) and 14 adults (19-65 yr) with mild-to-severe SNHL. Participants listened to speech processed by a hearing aid simulator that amplified input signals to fit a prescriptive target fitting procedure. Participants were blinded to the type of processing. Participants' responses to each nonsense word were analyzed for accuracy and verbal-response time (VRT; listening effort). A multivariate analysis of variance and linear mixed model were used to determine the effect of hearing-aid signal processing on nonsense word recognition and VRT. Both children and adults identified the nonsense words and initial consonants better with EBW and NFC than with RBW. The type of processing did not affect the identification of the vowels or final consonants. There was no effect of age on recognition of the nonsense words, initial consonants, medial vowels, or final consonants. VRT did not change significantly with the type of processing or age. Both adults and children demonstrated improved speech recognition with access to the high-frequency sounds in speech. Listening effort as measured by VRT was not affected by access to high-frequency sounds. American Academy of Audiology
Functional connectivity changes in adults with developmental stuttering: a preliminary study using quantitative electro-encephalography

PubMed Central

Joos, Kathleen; De Ridder, Dirk; Boey, Ronny A.; Vanneste, Sven

2014-01-01

Introduction: Stuttering is defined as speech characterized by verbal dysfluencies, but should not be seen as an isolated speech disorder, but as a generalized sensorimotor timing deficit due to impaired communication between speech related brain areas. Therefore we focused on resting state brain activity and functional connectivity. Method: We included 11 patients with developmental stuttering and 11 age matched controls. To objectify stuttering severity and the impact on quality of life (QoL), we used the Dutch validated Test for Stuttering Severity-Readers (TSS-R) and the Overall Assessment of the Speaker’s Experience of Stuttering (OASES), respectively. Furthermore, we used standardized low resolution brain electromagnetic tomography (sLORETA) analyses to look at resting state activity and functional connectivity differences and their correlations with the TSS-R and OASES. Results: No significant results could be obtained when looking at neural activity, however significant alterations in resting state functional connectivity could be demonstrated between persons who stutter (PWS) and fluently speaking controls, predominantly interhemispheric, i.e., a decreased functional connectivity for high frequency oscillations (beta and gamma) between motor speech areas (BA44 and 45) and the contralateral premotor (BA6) and motor (BA4) areas. Moreover, a positive correlation was found between functional connectivity at low frequency oscillations (theta and alpha) and stuttering severity, while a mixed increased and decreased functional connectivity at low and high frequency oscillations correlated with QoL. Discussion: PWS are characterized by decreased high frequency interhemispheric functional connectivity between motor speech, premotor and motor areas in the resting state, while higher functional connectivity in the low frequency bands indicates more severe speech disturbances, suggesting that increased interhemispheric and right sided functional connectivity is maladaptive. PMID:25352797
Human neuromagnetic steady-state responses to amplitude-modulated tones, speech, and music.

PubMed

Lamminmäki, Satu; Parkkonen, Lauri; Hari, Riitta

2014-01-01

Auditory steady-state responses that can be elicited by various periodic sounds inform about subcortical and early cortical auditory processing. Steady-state responses to amplitude-modulated pure tones have been used to scrutinize binaural interaction by frequency-tagging the two ears' inputs at different frequencies. Unlike pure tones, speech and music are physically very complex, as they include many frequency components, pauses, and large temporal variations. To examine the utility of magnetoencephalographic (MEG) steady-state fields (SSFs) in the study of early cortical processing of complex natural sounds, the authors tested the extent to which amplitude-modulated speech and music can elicit reliable SSFs. MEG responses were recorded to 90-s-long binaural tones, speech, and music, amplitude-modulated at 41.1 Hz at four different depths (25, 50, 75, and 100%). The subjects were 11 healthy, normal-hearing adults. MEG signals were averaged in phase with the modulation frequency, and the sources of the resulting SSFs were modeled by current dipoles. After the MEG recording, intelligibility of the speech, musical quality of the music stimuli, naturalness of music and speech stimuli, and the perceived deterioration caused by the modulation were evaluated on visual analog scales. The perceived quality of the stimuli decreased as a function of increasing modulation depth, more strongly for music than speech; yet, all subjects considered the speech intelligible even at the 100% modulation. SSFs were the strongest to tones and the weakest to speech stimuli; the amplitudes increased with increasing modulation depth for all stimuli. SSFs to tones were reliably detectable at all modulation depths (in all subjects in the right hemisphere, in 9 subjects in the left hemisphere) and to music stimuli at 50 to 100% depths, whereas speech usually elicited clear SSFs only at 100% depth.The hemispheric balance of SSFs was toward the right hemisphere for tones and speech, whereas SSFs to music showed no lateralization. In addition, the right lateralization of SSFs to the speech stimuli decreased with decreasing modulation depth. The results showed that SSFs can be reliably measured to amplitude-modulated natural sounds, with slightly different hemispheric lateralization for different carrier sounds. With speech stimuli, modulation at 100% depth is required, whereas for music the 75% or even 50% modulation depths provide a reasonable compromise between the signal-to-noise ratio of SSFs and sound quality or perceptual requirements. SSF recordings thus seem feasible for assessing the early cortical processing of natural sounds.
Cepstral domain modification of audio signals for data embedding: preliminary results

NASA Astrophysics Data System (ADS)

Gopalan, Kaliappan

2004-06-01

A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
Masking release with changing fundamental frequency: Electric acoustic stimulation resembles normal hearing subjects.

PubMed

Auinger, Alice Barbara; Riss, Dominik; Liepins, Rudolfs; Rader, Tobias; Keck, Tilman; Keintzel, Thomas; Kaider, Alexandra; Baumgartner, Wolf-Dieter; Gstoettner, Wolfgang; Arnoldner, Christoph

2017-07-01

It has been shown that patients with electric acoustic stimulation (EAS) perform better in noisy environments than patients with a cochlear implant (CI). One reason for this could be the preserved access to acoustic low-frequency cues including the fundamental frequency (F0). Therefore, our primary aim was to investigate whether users of EAS experience a release from masking with increasing F0 difference between target talker and masking talker. The study comprised 29 patients and consisted of three groups of subjects: EAS users, CI users and normal-hearing listeners (NH). All CI and EAS users were implanted with a MED-EL cochlear implant and had at least 12 months of experience with the implant. Speech perception was assessed with the Oldenburg sentence test (OlSa) using one sentence from the test corpus as speech masker. The F0 in this masking sentence was shifted upwards by 4, 8, or 12 semitones. For each of these masker conditions the speech reception threshold (SRT) was assessed by adaptively varying the masker level while presenting the target sentences at a fixed level. A statistically significant improvement in speech perception was found for increasing difference in F0 between target sentence and masker sentence in EAS users (p = 0.038) and in NH listeners (p = 0.003). In CI users (classic CI or EAS users with electrical stimulation only) speech perception was independent from differences in F0 between target and masker. A release from masking with increasing difference in F0 between target and masking speech was only observed in listeners and configurations in which the low-frequency region was presented acoustically. Thus, the speech information contained in the low frequencies seems to be crucial for allowing listeners to separate multiple sources. By combining acoustic and electric information, EAS users even manage tasks as complicated as segregating the audio streams from multiple talkers. Preserving the natural code, like fine-structure cues in the low-frequency region, seems to be crucial to provide CI users with the best benefit. Copyright © 2017 Elsevier B.V. All rights reserved.
Measuring Speech Comprehensibility in Students with Down Syndrome

PubMed Central

Woynaroski, Tiffany; Camarata, Stephen

2016-01-01

Purpose There is an ongoing need to develop assessments of spontaneous speech that focus on whether the child's utterances are comprehensible to listeners. This study sought to identify the attributes of a stable ratings-based measure of speech comprehensibility, which enabled examining the criterion-related validity of an orthography-based measure of the comprehensibility of conversational speech in students with Down syndrome. Method Participants were 10 elementary school students with Down syndrome and 4 unfamiliar adult raters. Averaged across-observer Likert ratings of speech comprehensibility were called a ratings-based measure of speech comprehensibility. The proportion of utterance attempts fully glossed constituted an orthography-based measure of speech comprehensibility. Results Averaging across 4 raters on four 5-min segments produced a reliable (G = .83) ratings-based measure of speech comprehensibility. The ratings-based measure was strongly (r > .80) correlated with the orthography-based measure for both the same and different conversational samples. Conclusion Reliable and valid measures of speech comprehensibility are achievable with the resources available to many researchers and some clinicians. PMID:27299989
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception

ERIC Educational Resources Information Center

Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.

2016-01-01

Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
Training Parents to Use the Natural Language Paradigm to Increase Their Autistic Children's Speech.

ERIC Educational Resources Information Center

Laski, Karen E.; And Others

1988-01-01

Parents of four nonverbal and four echolalic autistic children, aged five-nine, were trained to increase their children's speech by using the Natural Language Paradigm. Following training, parents increased the frequency with which they required their children to speak, and children increased the frequency of their verbalizations in three…
Perceptual Grouping Affects Pitch Judgments across Time and Frequency

ERIC Educational Resources Information Center

Borchert, Elizabeth M. O.; Micheyl, Christophe; Oxenham, Andrew J.

2011-01-01

Pitch, the perceptual correlate of fundamental frequency (F0), plays an important role in speech, music, and animal vocalizations. Changes in F0 over time help define musical melodies and speech prosody, while comparisons of simultaneous F0 are important for musical harmony, and for segregating competing sound sources. This study compared…
On the Locus of the Syllable Frequency Effect in Speech Production

ERIC Educational Resources Information Center

Laganaro, Marina; Alario, F. -Xavier

2006-01-01

The observation of a syllable frequency effect in naming latencies has been an argument in favor of a functional role of stored syllables in speech production. Accordingly, various theoretical models postulate that a repository of syllable representations is accessed during phonetic encoding. However, the direct empirical evidence for locating the…
Audio-vocal responses of vocal fundamental frequency and formant during sustained vowel vocalizations in different noises.

PubMed

Lee, Shao-Hsuan; Hsiao, Tzu-Yu; Lee, Guo-She

2015-06-01

Sustained vocalizations of vowels [a], [i], and syllable [mə] were collected in twenty normal-hearing individuals. On vocalizations, five conditions of different audio-vocal feedback were introduced separately to the speakers including no masking, wearing supra-aural headphones only, speech-noise masking, high-pass noise masking, and broad-band-noise masking. Power spectral analysis of vocal fundamental frequency (F0) was used to evaluate the modulations of F0 and linear-predictive-coding was used to acquire first two formants. The results showed that while the formant frequencies were not significantly shifted, low-frequency modulations (<3 Hz) of F0 significantly increased with reduced audio-vocal feedback across speech sounds and were significantly correlated with auditory awareness of speakers' own voices. For sustained speech production, the motor speech controls on F0 may depend on a feedback mechanism while articulation should rely more on a feedforward mechanism. Power spectral analysis of F0 might be applied to evaluate audio-vocal control for various hearing and neurological disorders in the future. Copyright © 2015 Elsevier B.V. All rights reserved.
Psychosocial stress based on public speech in humans: is there a real life/laboratory setting cross-adaptation?

PubMed

Jezova, D; Hlavacova, N; Dicko, I; Solarikova, P; Brezina, I

2016-07-01

Repeated or chronic exposure to stressors is associated with changes in neuroendocrine responses depending on the type, intensity, number and frequency of stress exposure as well as previous stress experience. The aim of the study was to test the hypothesis that salivary cortisol and cardiovascular responses to real-life psychosocial stressors related to public performance can cross-adapt with responses to psychosocial stress induced by public speech under laboratory setting. The sample consisted of 22 healthy male volunteers, which were either actors, more precisely students of dramatic arts or non-actors, students of other fields. The stress task consisted of 15 min anticipatory preparation phase and 15 min of public speech on an emotionally charged topic. The actors, who were accustomed to public speaking, responded with a rise in salivary cortisol as well as blood pressure to laboratory public speech. The values of salivary cortisol, systolic blood pressure and state anxiety were lower in actors compared to non-actors. Unlike non-actors, subjects with experience in public speaking did not show stress-induced rise in the heart rate. Evaluation of personality traits revealed that actors scored significantly higher in extraversion than the subjects in the non-actor group. In conclusion, neuroendocrine responses to real-life stressors in actors can partially cross-adapt with responses to psychosocial stress under laboratory setting. The most evident adaptation was at the level of heart rate responses. The public speech tasks may be of help in evaluation of the ability to cope with stress in real life in artists by simple laboratory testing.

Speech impairment in Down syndrome: a review.

PubMed

Kent, Ray D; Vorperian, Houri K

2013-02-01

This review summarizes research on disorders of speech production in Down syndrome (DS) for the purposes of informing clinical services and guiding future research. Review of the literature was based on searches using MEDLINE, Google Scholar, PsycINFO, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency, and intelligibility. The following conclusions pertain to four major areas of review: voice, speech sounds, fluency and prosody, and intelligibility. The first major area is voice. Although a number of studies have reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. The second major area is speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. The third major area is fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10%-45%, compared with about 1% in the general population. Research also points to significant disturbances in prosody. The fourth major area is intelligibility. Studies consistently show marked limitations in this area, but only recently has the research gone beyond simple rating scales.
Evidence of degraded representation of speech in noise, in the aging midbrain and cortex

PubMed Central

Simon, Jonathan Z.; Anderson, Samira

2016-01-01

Humans have a remarkable ability to track and understand speech in unfavorable conditions, such as in background noise, but speech understanding in noise does deteriorate with age. Results from several studies have shown that in younger adults, low-frequency auditory cortical activity reliably synchronizes to the speech envelope, even when the background noise is considerably louder than the speech signal. However, cortical speech processing may be limited by age-related decreases in the precision of neural synchronization in the midbrain. To understand better the neural mechanisms contributing to impaired speech perception in older adults, we investigated how aging affects midbrain and cortical encoding of speech when presented in quiet and in the presence of a single-competing talker. Our results suggest that central auditory temporal processing deficits in older adults manifest in both the midbrain and in the cortex. Specifically, midbrain frequency following responses to a speech syllable are more degraded in noise in older adults than in younger adults. This suggests a failure of the midbrain auditory mechanisms needed to compensate for the presence of a competing talker. Similarly, in cortical responses, older adults show larger reductions than younger adults in their ability to encode the speech envelope when a competing talker is added. Interestingly, older adults showed an exaggerated cortical representation of speech in both quiet and noise conditions, suggesting a possible imbalance between inhibitory and excitatory processes, or diminished network connectivity that may impair their ability to encode speech efficiently. PMID:27535374
Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners

PubMed Central

Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S.; Cho, Chang Hyun

2018-01-01

Background and Objectives It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Subjects and Methods Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. Results CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. Conclusions This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing. PMID:29325391
Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners.

PubMed

Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S; Cho, Chang Hyun

2017-12-01

It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing.
Speech-feature discrimination in children with Asperger syndrome as determined with the multi-feature mismatch negativity paradigm.

PubMed

Kujala, T; Kuuluvainen, S; Saalasti, S; Jansson-Verkasalo, E; von Wendt, L; Lepistö, T

2010-09-01

Asperger syndrome, belonging to the autistic spectrum of disorders, involves deficits in social interaction and prosodic use of language but normal development of formal language abilities. Auditory processing involves both hyper- and hypoactive reactivity to acoustic changes. Responses composed of mismatch negativity (MMN) and obligatory components were recorded for five types of deviations in syllables (vowel, vowel duration, consonant, syllable frequency, syllable intensity) with the multi-feature paradigm from 8-12-year old children with Asperger syndrome. Children with Asperger syndrome had larger MMNs for intensity and smaller MMNs for frequency changes than typically developing children, whereas no MMN group differences were found for the other deviant stimuli. Furthermore, children with Asperger syndrome performed more poorly than controls in Comprehension of Instructions subtest of a language test battery. Cortical speech-sound discrimination is aberrant in children with Asperger syndrome. This is evident both as hypersensitive and depressed neural reactions to speech-sound changes, and is associated with features (frequency, intensity) which are relevant for prosodic processing. The multi-feature MMN paradigm, which includes variation and thereby resembles natural speech hearing circumstances, suggests abnormal pattern of speech discrimination in Asperger syndrome, including both hypo- and hypersensitive responses for speech features. 2010 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
The Relationship between Speech Rate and Memory Span in Children.

ERIC Educational Resources Information Center

Henry, Lucy A.

1994-01-01

Examined whether speech rate is related to the amount recalled and if developmental increases in speech rate allow faster rehearsal with age, and hence, greater recall. Found that the group relationship was clear and replicable but that speech rates of individual children were not good predictors of those children's memory spans; age was found to…
Common and distinct neural substrates for the perception of speech rhythm and intonation.

PubMed

Zhang, Linjun; Shu, Hua; Zhou, Fengying; Wang, Xiaoyi; Li, Ping

2010-07-01

The present study examines the neural substrates for the perception of speech rhythm and intonation. Subjects listened passively to synthesized speech stimuli that contained no semantic and phonological information, in three conditions: (1) continuous speech stimuli with fixed syllable duration and fundamental frequency in the standard condition, (2) stimuli with varying vocalic durations of syllables in the speech rhythm condition, and (3) stimuli with varying fundamental frequency in the intonation condition. Compared to the standard condition, speech rhythm activated the right middle superior temporal gyrus (mSTG), whereas intonation activated the bilateral superior temporal gyrus and sulcus (STG/STS) and the right posterior STS. Conjunction analysis further revealed that rhythm and intonation activated a common area in the right mSTG but compared to speech rhythm, intonation elicited additional activations in the right anterior STS. Findings from the current study reveal that the right mSTG plays an important role in prosodic processing. Implications of our findings are discussed with respect to neurocognitive theories of auditory processing. (c) 2009 Wiley-Liss, Inc.
Perceptual learning for speech in noise after application of binary time-frequency masks

PubMed Central

Ahmadi, Mahnaz; Gross, Vauna L.; Sinex, Donal G.

2013-01-01

Ideal time-frequency (TF) masks can reject noise and improve the recognition of speech-noise mixtures. An ideal TF mask is constructed with prior knowledge of the target speech signal. The intelligibility of a processed speech-noise mixture depends upon the threshold criterion used to define the TF mask. The study reported here assessed the effect of training on the recognition of speech in noise after processing by ideal TF masks that did not restore perfect speech intelligibility. Two groups of listeners with normal hearing listened to speech-noise mixtures processed by TF masks calculated with different threshold criteria. For each group, a threshold criterion that initially produced word recognition scores between 0.56–0.69 was chosen for training. Listeners practiced with one set of TF-masked sentences until their word recognition performance approached asymptote. Perceptual learning was quantified by comparing word-recognition scores in the first and last training sessions. Word recognition scores improved with practice for all listeners with the greatest improvement observed for the same materials used in training. PMID:23464038
Speech task effects on acoustic measure of fundamental frequency in Cantonese-speaking children.

PubMed

Ma, Estella P-M; Lam, Nina L-N

2015-12-01

Speaking fundamental frequency (F0) is a voice measure frequently used to document changes in vocal performance over time. Knowing the intra-subject variability of speaking F0 has implications on its clinical usefulness. The present study examined the speaking F0 elicited from three speech tasks in Cantonese-speaking children. The study also compared the variability of speaking F0 elicited from different speech tasks. Fifty-six vocally healthy Cantonese-speaking children (31 boys and 25 girls) aged between 7.0 and 10.11 years participated. For each child, speaking F0 was elicited using speech tasks at three linguistic levels (sustained vowel /a/ prolongation, reading aloud a sentence and passage). Two types of variability, within-session (trial-to-trial) and across-session (test-retest) variability, were compared across speech tasks. Significant differences in mean speaking F0 values were found between speech tasks. Mean speaking F0 value elicited from sustained vowel phonations was significantly higher than those elicited from the connected speech tasks. The variability of speaking F0 was higher in sustained vowel prolongation than that in connected speech. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Speech enhancement based on modified phase-opponency detectors

NASA Astrophysics Data System (ADS)

Deshmukh, Om D.; Espy-Wilson, Carol Y.

2005-09-01

A speech enhancement algorithm based on a neural model was presented by Deshmukh et al., [149th meeting of the Acoustical Society America, 2005]. The algorithm consists of a bank of Modified Phase Opponency (MPO) filter pairs tuned to different center frequencies. This algorithm is able to enhance salient spectral features in speech signals even at low signal-to-noise ratios. However, the algorithm introduces musical noise and sometimes misses a spectral peak that is close in frequency to a stronger spectral peak. Refinement in the design of the MPO filters was recently made that takes advantage of the falling spectrum of the speech signal in sonorant regions. The modified set of filters leads to better separation of the noise and speech signals, and more accurate enhancement of spectral peaks. The improvements also lead to a significant reduction in musical noise. Continuity algorithms based on the properties of speech signals are used to further reduce the musical noise effect. The efficiency of the proposed method in enhancing the speech signal when the level of the background noise is fluctuating will be demonstrated. The performance of the improved speech enhancement method will be compared with various spectral subtraction-based methods. [Work supported by NSF BCS0236707.
Individual Monitoring of Vocal Effort With Relative Fundamental Frequency: Relationships With Aerodynamics and Listener Perception.

PubMed

Lien, Yu-An S; Michener, Carolyn M; Eadie, Tanya L; Stepp, Cara E

2015-06-01

The acoustic measure relative fundamental frequency (RFF) was investigated as a potential objective measure to track variations in vocal effort within and across individuals. Twelve speakers with healthy voices created purposeful modulations in their vocal effort during speech tasks. RFF and an aerodynamic measure of vocal effort, the ratio of sound pressure level to subglottal pressure level, were estimated from the aerodynamic and acoustic signals. Twelve listeners also judged the speech samples for vocal effort using the visual sort and rate method. Relationships between RFF and both the aerodynamic and perceptual measures of vocal effort were weak across speakers (R2 = .06-.26). Within speakers, relationships were variable but much stronger on average (R2 = .45-.56). RFF showed stronger relationships between both the aerodynamic and perceptual measures of vocal effort when examined within individuals versus across individuals. Future work is necessary to establish these relationships in individuals with voice disorders across the therapeutic process.
Individual Monitoring of Vocal Effort With Relative Fundamental Frequency: Relationships With Aerodynamics and Listener Perception

PubMed Central

Michener, Carolyn M.; Eadie, Tanya L.; Stepp, Cara E.

2015-01-01

Purpose The acoustic measure relative fundamental frequency (RFF) was investigated as a potential objective measure to track variations in vocal effort within and across individuals. Method Twelve speakers with healthy voices created purposeful modulations in their vocal effort during speech tasks. RFF and an aerodynamic measure of vocal effort, the ratio of sound pressure level to subglottal pressure level, were estimated from the aerodynamic and acoustic signals. Twelve listeners also judged the speech samples for vocal effort using the visual sort and rate method. Results Relationships between RFF and both the aerodynamic and perceptual measures of vocal effort were weak across speakers (R2 = .06–.26). Within speakers, relationships were variable but much stronger on average (R2 = .45–.56). Conclusions RFF showed stronger relationships between both the aerodynamic and perceptual measures of vocal effort when examined within individuals versus across individuals. Future work is necessary to establish these relationships in individuals with voice disorders across the therapeutic process. PMID:25675090
Rapid change in articulatory lip movement induced by preceding auditory feedback during production of bilabial plosives.

PubMed

Mochida, Takemi; Gomi, Hiroaki; Kashino, Makio

2010-11-08

There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified. This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested. The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.
The impact of impaired semantic knowledge on spontaneous iconic gesture production

PubMed Central

Cocks, Naomi; Dipper, Lucy; Pritchard, Madeleine; Morgan, Gary

2013-01-01

Background Previous research has found that people with aphasia produce more spontaneous iconic gesture than control participants, especially during word-finding difficulties. There is some evidence that impaired semantic knowledge impacts on the diversity of gestural handshapes, as well as the frequency of gesture production. However, no previous research has explored how impaired semantic knowledge impacts on the frequency and type of iconic gestures produced during fluent speech compared with those produced during word-finding difficulties. Aims To explore the impact of impaired semantic knowledge on the frequency and type of iconic gestures produced during fluent speech and those produced during word-finding difficulties. Methods & Procedures A group of 29 participants with aphasia and 29 control participants were video recorded describing a cartoon they had just watched. All iconic gestures were tagged and coded as either “manner,” “path only,” “shape outline” or “other”. These gestures were then separated into either those occurring during fluent speech or those occurring during a word-finding difficulty. The relationships between semantic knowledge and gesture frequency and form were then investigated in the two different conditions. Outcomes & Results As expected, the participants with aphasia produced a higher frequency of iconic gestures than the control participants, but when the iconic gestures produced during word-finding difficulties were removed from the analysis, the frequency of iconic gesture was not significantly different between the groups. While there was not a significant relationship between the frequency of iconic gestures produced during fluent speech and semantic knowledge, there was a significant positive correlation between semantic knowledge and the proportion of word-finding difficulties that contained gesture. There was also a significant positive correlation between the speakers' semantic knowledge and the proportion of gestures that were produced during fluent speech that were classified as “manner”. Finally while not significant, there was a positive trend between semantic knowledge of objects and the production of “shape outline” gestures during word-finding difficulties for objects. Conclusions The results indicate that impaired semantic knowledge in aphasia impacts on both the iconic gestures produced during fluent speech and those produced during word-finding difficulties but in different ways. These results shed new light on the relationship between impaired language and iconic co-speech gesture production and also suggest that analysis of iconic gesture may be a useful addition to clinical assessment. PMID:24058228
Sound frequency affects speech emotion perception: results from congenital amusia

PubMed Central

Lolli, Sydney L.; Lewenstein, Ari D.; Basurto, Julian; Winnik, Sean; Loui, Psyche

2015-01-01

Congenital amusics, or “tone-deaf” individuals, show difficulty in perceiving and producing small pitch differences. While amusia has marked effects on music perception, its impact on speech perception is less clear. Here we test the hypothesis that individual differences in pitch perception affect judgment of emotion in speech, by applying low-pass filters to spoken statements of emotional speech. A norming study was first conducted on Mechanical Turk to ensure that the intended emotions from the Macquarie Battery for Evaluation of Prosody were reliably identifiable by US English speakers. The most reliably identified emotional speech samples were used in Experiment 1, in which subjects performed a psychophysical pitch discrimination task, and an emotion identification task under low-pass and unfiltered speech conditions. Results showed a significant correlation between pitch-discrimination threshold and emotion identification accuracy for low-pass filtered speech, with amusics (defined here as those with a pitch discrimination threshold >16 Hz) performing worse than controls. This relationship with pitch discrimination was not seen in unfiltered speech conditions. Given the dissociation between low-pass filtered and unfiltered speech conditions, we inferred that amusics may be compensating for poorer pitch perception by using speech cues that are filtered out in this manipulation. To assess this potential compensation, Experiment 2 was conducted using high-pass filtered speech samples intended to isolate non-pitch cues. No significant correlation was found between pitch discrimination and emotion identification accuracy for high-pass filtered speech. Results from these experiments suggest an influence of low frequency information in identifying emotional content of speech. PMID:26441718
Collaborative Signaling of Informational Structures by Dynamic Speech Rate.

ERIC Educational Resources Information Center

Koiso, Hanae; Shimojima, Atsushi; Katagiri, Yasuhiro

1998-01-01

Investigated the functions of dynamic speech rates as contextualization cues in conversational Japanese, examining five spontaneous task-oriented dialogs and analyzing the potential of speech-rate changes in signaling the structure of the information being exchanged. Results found a correlation between speech decelerations and the openings of new…
The benefit of combining a deep neural network architecture with ideal ratio mask estimation in computational speech segregation to improve speech intelligibility.

PubMed

Bentsen, Thomas; May, Tobias; Kressner, Abigail A; Dau, Torsten

2018-01-01

Computational speech segregation attempts to automatically separate speech from noise. This is challenging in conditions with interfering talkers and low signal-to-noise ratios. Recent approaches have adopted deep neural networks and successfully demonstrated speech intelligibility improvements. A selection of components may be responsible for the success with these state-of-the-art approaches: the system architecture, a time frame concatenation technique and the learning objective. The aim of this study was to explore the roles and the relative contributions of these components by measuring speech intelligibility in normal-hearing listeners. A substantial improvement of 25.4 percentage points in speech intelligibility scores was found going from a subband-based architecture, in which a Gaussian Mixture Model-based classifier predicts the distributions of speech and noise for each frequency channel, to a state-of-the-art deep neural network-based architecture. Another improvement of 13.9 percentage points was obtained by changing the learning objective from the ideal binary mask, in which individual time-frequency units are labeled as either speech- or noise-dominated, to the ideal ratio mask, where the units are assigned a continuous value between zero and one. Therefore, both components play significant roles and by combining them, speech intelligibility improvements were obtained in a six-talker condition at a low signal-to-noise ratio.
Neural correlates of behavioral amplitude modulation sensitivity in the budgerigar midbrain

PubMed Central

Neilans, Erikson G.; Abrams, Kristina S.; Idrobo, Fabio; Carney, Laurel H.

2016-01-01

Amplitude modulation (AM) is a crucial feature of many communication signals, including speech. Whereas average discharge rates in the auditory midbrain correlate with behavioral AM sensitivity in rabbits, the neural bases of AM sensitivity in species with human-like behavioral acuity are unexplored. Here, we used parallel behavioral and neurophysiological experiments to explore the neural (midbrain) bases of AM perception in an avian speech mimic, the budgerigar (Melopsittacus undulatus). Behavioral AM sensitivity was quantified using operant conditioning procedures. Neural AM sensitivity was studied using chronically implanted microelectrodes in awake, unrestrained birds. Average discharge rates of multiunit recording sites in the budgerigar midbrain were insufficient to explain behavioral sensitivity to modulation frequencies <100 Hz for both tone- and noise-carrier stimuli, even with optimal pooling of information across recording sites. Neural envelope synchrony, in contrast, could explain behavioral performance for both carrier types across the full range of modulation frequencies studied (16–512 Hz). The results suggest that envelope synchrony in the budgerigar midbrain may underlie behavioral sensitivity to AM. Behavioral AM sensitivity based on synchrony in the budgerigar, which contrasts with rate-correlated behavioral performance in rabbits, raises the possibility that envelope synchrony, rather than average discharge rate, might also underlie AM perception in other species with sensitive AM detection abilities, including humans. These results highlight the importance of synchrony coding of envelope structure in the inferior colliculus. Furthermore, they underscore potential benefits of devices (e.g., midbrain implants) that evoke robust neural synchrony. PMID:26843608
Effects of Removing Low-Frequency Electric Information on Speech Perception with Bimodal Hearing

ERIC Educational Resources Information Center

Fowler, Jennifer R.; Eggleston, Jessica L.; Reavis, Kelly M.; McMillan, Garnett P.; Reiss, Lina A. J.

2016-01-01

Purpose: The objective was to determine whether speech perception could be improved for bimodal listeners (those using a cochlear implant [CI] in one ear and hearing aid in the contralateral ear) by removing low-frequency information provided by the CI, thereby reducing acoustic-electric overlap. Method: Subjects were adult CI subjects with at…
An experimental version of the MZT (speech-from-text) system with external F(sub 0) control

NASA Astrophysics Data System (ADS)

Nowak, Ignacy

1994-12-01

The version of a Polish speech from text system described in this article was developed using the speech-from-text system. The new system has additional functions which make it possible to enter commands in edited orthographic text to control the phrase component and accentuation parameters. This makes it possible to generate a series of modified intonation contours in the texts spoken by the system. The effects obtained are made easier to control by a graphic illustration of the base frequency pattern in phrases that were last 'spoken' by the system. This version of the system was designed as a test prototype which will help us expand and refine our set of rules for automatic generation of intonation contours, which in turn will enable the fully automated speech-from-text system to generate speech with a more varied and precisely formed fundamental frequency pattern.

Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users.

PubMed

Jaekel, Brittany N; Newman, Rochelle S; Goupell, Matthew J

2017-05-24

Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal.
76 FR 44326 - Telecommunications Relay Services and Speech-to-Speech Services for Individuals With Hearing and...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-07-25

... Relay Services and Speech-to-Speech Services for Individuals With Hearing and Speech Disabilities; Structure and Practices of the Video Relay Service Program AGENCY: Federal Communications Commission. ACTION...-minute video relay service (``VRS'') compensation rates, and adopts per-minute compensation rates for the...
Automated Speech Rate Measurement in Dysarthria

ERIC Educational Resources Information Center

Martens, Heidi; Dekens, Tomas; Van Nuffelen, Gwen; Latacz, Lukas; Verhelst, Werner; De Bodt, Marc

2015-01-01

Purpose: In this study, a new algorithm for automated determination of speech rate (SR) in dysarthric speech is evaluated. We investigated how reliably the algorithm calculates the SR of dysarthric speech samples when compared with calculation performed by speech-language pathologists. Method: The new algorithm was trained and tested using Dutch…
Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences

PubMed Central

Brand, Thomas

2018-01-01

In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect “binaural sluggishness.” In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization–cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism. PMID:29338577
Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences.

PubMed

Hauth, Christopher F; Brand, Thomas

2018-01-01

In studies investigating binaural processing in human listeners, relatively long and task-dependent time constants of a binaural window ranging from 10 ms to 250 ms have been observed. Such time constants are often thought to reflect "binaural sluggishness." In this study, the effect of binaural sluggishness on binaural unmasking of speech in stationary speech-shaped noise is investigated in 10 listeners with normal hearing. In order to design a masking signal with temporally varying binaural cues, the interaural phase difference of the noise was modulated sinusoidally with frequencies ranging from 0.25 Hz to 64 Hz. The lowest, that is the best, speech reception thresholds (SRTs) were observed for the lowest modulation frequency. SRTs increased with increasing modulation frequency up to 4 Hz. For higher modulation frequencies, SRTs remained constant in the range of 1 dB to 1.5 dB below the SRT determined in the diotic situation. The outcome of the experiment was simulated using a short-term binaural speech intelligibility model, which combines an equalization-cancellation (EC) model with the speech intelligibility index. This model segments the incoming signal into 23.2-ms time frames in order to predict release from masking in modulated noises. In order to predict the results from this study, the model required a further time constant applied to the EC mechanism representing binaural sluggishness. The best agreement with perceptual data was achieved using a temporal window of 200 ms in the EC mechanism.
The Levels of Speech Usage rating scale: comparison of client self-ratings with speech pathologist ratings.

PubMed

Gray, Christina; Baylor, Carolyn; Eadie, Tanya; Kendall, Diane; Yorkston, Kathryn

2012-01-01

The term 'speech usage' refers to what people want or need to do with their speech to fulfil the communication demands in their life roles. Speech-language pathologists (SLPs) need to know about clients' speech usage to plan appropriate interventions to meet their life participation goals. The Levels of Speech Usage is a categorical scale intended for client self-report of speech usage, but SLPs may want the option to use it as a proxy-report tool. The relationship between self-report and clinician ratings should be examined before the instrument is used in a proxy format. The primary purpose of this study was to compare client self-ratings with SLP ratings on the Levels of Speech Usage scale. The secondary purpose was to determine if the SLP ratings differed depending on whether or not the SLPs knew about the clients' medical condition. Self-ratings of adults with communication disorders on the Levels of Speech Usage scale were available from prior research. Vignettes about these individuals were created from existing data. Two sets of vignettes were created. One set contained information about demographic information, living situation, occupational status and hobbies or social activities. The second set was identical to the first with the addition of information about the clients' medical conditions and communication disorders. Various communication disorders were represented including dysarthria, voice disorders, laryngectomy, and mild cognitive and language disorders. Sixty SLPs were randomly divided into two groups with each group rating one set of vignettes. The task was completed online. While this does not replicate typical in-person clinical interactions, it was a feasible method for this study. For data analysis, the client self-ratings were considered fixed points and the percentage of SLP ratings in agreement with the self-ratings was calculated. The percentage of SLP ratings in exact agreement with client self-ratings was 44.9%. Agreement was lowest for the less-demanding speech usage categories and highest for the most demanding usage category. There was no significant difference between the two groups of SLPs based on knowledge of medical condition. SLPs often need to document the speech usage levels of clients. This study suggests the potential for SLPs to misjudge how clients see their own speech demands. Further research is needed to determine if similar results would be found in actual clinical interactions. Until then, SLPs should seek the input of their clients when using this instrument. © 2012 Royal College of Speech and Language Therapists.
Long-term temporal tracking of speech rate affects spoken-word recognition.

PubMed

Baese-Berk, Melissa M; Heffner, Christopher C; Dilley, Laura C; Pitt, Mark A; Morrill, Tuuli H; McAuley, J Devin

2014-08-01

Humans unconsciously track a wide array of distributional characteristics in their sensory environment. Recent research in spoken-language processing has demonstrated that the speech rate surrounding a target region within an utterance influences which words, and how many words, listeners hear later in that utterance. On the basis of hypotheses that listeners track timing information in speech over long timescales, we investigated the possibility that the perception of words is sensitive to speech rate over such a timescale (e.g., an extended conversation). Results demonstrated that listeners tracked variation in the overall pace of speech over an extended duration (analogous to that of a conversation that listeners might have outside the lab) and that this global speech rate influenced which words listeners reported hearing. The effects of speech rate became stronger over time. Our findings are consistent with the hypothesis that neural entrainment by speech occurs on multiple timescales, some lasting more than an hour. © The Author(s) 2014.
[Acoustic voice analysis using the Praat program: comparative study with the Dr. Speech program].

PubMed

Núñez Batalla, Faustino; González Márquez, Rocío; Peláez González, M Belén; González Laborda, Irene; Fernández Fernández, María; Morato Galán, Marta

2014-01-01

The European Laryngological Society (ELS) basic protocol for functional assessment of voice pathology includes 5 different approaches: perception, videostroboscopy, acoustics, aerodynamics and subjective rating by the patient. In this study we focused on acoustic voice analysis. The purpose of the present study was to correlate the results obtained by the commercial software Dr. Speech and the free software Praat in 2 fields: 1. Narrow-band spectrogram (the presence of noise according to Yanagihara, and the presence of subharmonics) (semi-quantitative). 2. Voice acoustic parameters (jitter, shimmer, harmonics-to-noise ratio, fundamental frequency) (quantitative). We studied a total of 99 voice samples from individuals with Reinke's oedema diagnosed using videostroboscopy. One independent observer used Dr. Speech 3.0 and a second one used the Praat program (Phonetic Sciences, University of Amsterdam). The spectrographic analysis consisted of obtaining a narrow-band spectrogram from the previous digitalised voice samples by the 2 independent observers. They then determined the presence of noise in the spectrogram, using the Yanagihara grades, as well as the presence of subharmonics. As a final result, the acoustic parameters of jitter, shimmer, harmonics-to-noise ratio and fundamental frequency were obtained from the 2 acoustic analysis programs. The results indicated that the sound spectrogram and the numerical values obtained for shimmer and jitter were similar for both computer programs, even though types 1, 2 and 3 voice samples were analysed. The Praat and Dr. Speech programs provide similar results in the acoustic analysis of pathological voices. Copyright © 2013 Elsevier España, S.L. All rights reserved.
Near-Term Fetuses Process Temporal Features of Speech

ERIC Educational Resources Information Center

Granier-Deferre, Carolyn; Ribeiro, Aurelie; Jacquet, Anne-Yvonne; Bassereau, Sophie

2011-01-01

The perception of speech and music requires processing of variations in spectra and amplitude over different time intervals. Near-term fetuses can discriminate acoustic features, such as frequencies and spectra, but whether they can process complex auditory streams, such as speech sequences and more specifically their temporal variations, fast or…
Acoustic analysis of trill sounds.

PubMed

Dhananjaya, N; Yegnanarayana, B; Bhaskararao, Peri

2012-04-01

In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed.
An examination of relations between participation, communication and age in children with complex communication needs.

PubMed

Clarke, Michael; Newton, Caroline; Petrides, Konstantinos; Griffiths, Tom; Lysley, Andrew; Price, Katie

2012-03-01

The aim of this study was to examine variation in the frequency of children's participation in out-of-school activities as a function of speech intelligibility, perceived effectiveness of the child's communication aid, and age. Sixty-nine caregivers of children with complex communication needs provided with communication aids completed a questionnaire survey. Rate of participation was higher for younger than for older children, particularly in recreational activities. Younger children with partial intelligibility participated more frequently in recreational and social activities than both younger children without speech and older children. Results and limitations are discussed within the context of participation research in childhood disability, highlighting the impact of communicative resources and maturation on everyday participation.
Acoustic analysis of speech under stress.

PubMed

Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish

2015-01-01

When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Rate and rhythm control strategies for apraxia of speech in nonfluent primary progressive aphasia.

PubMed

Beber, Bárbara Costa; Berbert, Monalise Costa Batista; Grawer, Ruth Siqueira; Cardoso, Maria Cristina de Almeida Freitas

2018-01-01

The nonfluent/agrammatic variant of primary progressive aphasia is characterized by apraxia of speech and agrammatism. Apraxia of speech limits patients' communication due to slow speaking rate, sound substitutions, articulatory groping, false starts and restarts, segmentation of syllables, and increased difficulty with increasing utterance length. Speech and language therapy is known to benefit individuals with apraxia of speech due to stroke, but little is known about its effects in primary progressive aphasia. This is a case report of a 72-year-old, illiterate housewife, who was diagnosed with nonfluent primary progressive aphasia and received speech and language therapy for apraxia of speech. Rate and rhythm control strategies for apraxia of speech were trained to improve initiation of speech. We discuss the importance of these strategies to alleviate apraxia of speech in this condition and the future perspectives in the area.
Compressed Speech Technology: Implications for Learning and Instruction.

ERIC Educational Resources Information Center

Sullivan, LeRoy L.

This paper first traces the historical development of speech compression technology, which has made it possible to alter the spoken rate of a pre-recorded message without excessive distortion. Terms used to describe techniques employed as the technology evolved are discussed, including rapid speech, rate altered speech, cut-and-spliced speech, and…
Multilingual vocal emotion recognition and classification using back propagation neural network

NASA Astrophysics Data System (ADS)

Kayal, Apoorva J.; Nirmal, Jagannath

2016-03-01

This work implements classification of different emotions in different languages using Artificial Neural Networks (ANN). Mel Frequency Cepstral Coefficients (MFCC) and Short Term Energy (STE) have been considered for creation of feature set. An emotional speech corpus consisting of 30 acted utterances per emotion has been developed. The emotions portrayed in this work are Anger, Joy and Neutral in each of English, Marathi and Hindi languages. Different configurations of Artificial Neural Networks have been employed for classification purposes. The performance of the classifiers has been evaluated by False Negative Rate (FNR), False Positive Rate (FPR), True Positive Rate (TPR) and True Negative Rate (TNR).
Neural Spike-Train Analyses of the Speech-Based Envelope Power Spectrum Model

PubMed Central

Rallapalli, Varsha H.

2016-01-01

Diagnosing and treating hearing impairment is challenging because people with similar degrees of sensorineural hearing loss (SNHL) often have different speech-recognition abilities. The speech-based envelope power spectrum model (sEPSM) has demonstrated that the signal-to-noise ratio (SNRENV) from a modulation filter bank provides a robust speech-intelligibility measure across a wider range of degraded conditions than many long-standing models. In the sEPSM, noise (N) is assumed to: (a) reduce S + N envelope power by filling in dips within clean speech (S) and (b) introduce an envelope noise floor from intrinsic fluctuations in the noise itself. While the promise of SNRENV has been demonstrated for normal-hearing listeners, it has not been thoroughly extended to hearing-impaired listeners because of limited physiological knowledge of how SNHL affects speech-in-noise envelope coding relative to noise alone. Here, envelope coding to speech-in-noise stimuli was quantified from auditory-nerve model spike trains using shuffled correlograms, which were analyzed in the modulation-frequency domain to compute modulation-band estimates of neural SNRENV. Preliminary spike-train analyses show strong similarities to the sEPSM, demonstrating feasibility of neural SNRENV computations. Results suggest that individual differences can occur based on differential degrees of outer- and inner-hair-cell dysfunction in listeners currently diagnosed into the single audiological SNHL category. The predicted acoustic-SNR dependence in individual differences suggests that the SNR-dependent rate of susceptibility could be an important metric in diagnosing individual differences. Future measurements of the neural SNRENV in animal studies with various forms of SNHL will provide valuable insight for understanding individual differences in speech-in-noise intelligibility.
Segmental and Suprasegmental Perception in Children Using Hearing Aids.

PubMed

Wenrich, Kaitlyn A; Davidson, Lisa S; Uchanski, Rosalie M

Suprasegmental perception (perception of stress, intonation, "how something is said" and "who says it") and segmental speech perception (perception of individual phonemes or perception of "what is said") are perceptual abilities that provide the foundation for the development of spoken language and effective communication. While there are numerous studies examining segmental perception in children with hearing aids (HAs), there are far fewer studies examining suprasegmental perception, especially for children with greater degrees of residual hearing. Examining the relation between acoustic hearing thresholds, and both segmental and suprasegmental perception for children with HAs, may ultimately enable better device recommendations (bilateral HAs, bimodal devices [one CI and one HA in opposite ears], bilateral CIs) for a particular degree of residual hearing. Examining both types of speech perception is important because segmental and suprasegmental cues are affected differentially by the type of hearing device(s) used (i.e., cochlear implant [CI] and/or HA). Additionally, suprathreshold measures, such as frequency resolution ability, may partially predict benefit from amplification and may assist audiologists in making hearing device recommendations. The purpose of this study is to explore the relationship between audibility (via hearing thresholds and speech intelligibility indices), and segmental and suprasegmental speech perception for children with HAs. A secondary goal is to explore the relationships among frequency resolution ability (via spectral modulation detection [SMD] measures), segmental and suprasegmental speech perception, and receptive language in these same children. A prospective cross-sectional design. Twenty-three children, ages 4 yr 11 mo to 11 yr 11 mo, participated in the study. Participants were recruited from pediatric clinic populations, oral schools for the deaf, and mainstream schools. Audiological history and hearing device information were collected from participants and their families. Segmental and suprasegmental speech perception, SMD, and receptive vocabulary skills were assessed. Correlations were calculated to examine the significance (p < 0.05) of relations between audibility and outcome measures. Measures of audibility and segmental speech perception are not significantly correlated, while low-frequency pure-tone average (unaided) is significantly correlated with suprasegmental speech perception. SMD is significantly correlated with all measures (measures of audibility, segmental and suprasegmental perception and vocabulary). Lastly, although age is not significantly correlated with measures of audibility, it is significantly correlated with all other outcome measures. The absence of a significant correlation between audibility and segmental speech perception might be attributed to overall audibility being maximized through well-fit HAs. The significant correlation between low-frequency unaided audibility and suprasegmental measures is likely due to the strong, predominantly low-frequency nature of suprasegmental acoustic properties. Frequency resolution ability, via SMD performance, is significantly correlated with all outcomes and requires further investigation; its significant correlation with vocabulary suggests that linguistic ability may be partially related to frequency resolution ability. Last, all of the outcome measures are significantly correlated with age, suggestive of developmental effects. American Academy of Audiology
Syntactic and semantic errors in radiology reports associated with speech recognition software.

PubMed

Ringler, Michael D; Goss, Brian C; Bartholmai, Brian J

2017-03-01

Speech recognition software can increase the frequency of errors in radiology reports, which may affect patient care. We retrieved 213,977 speech recognition software-generated reports from 147 different radiologists and proofread them for errors. Errors were classified as "material" if they were believed to alter interpretation of the report. "Immaterial" errors were subclassified as intrusion/omission or spelling errors. The proportion of errors and error type were compared among individual radiologists, imaging subspecialty, and time periods. In all, 20,759 reports (9.7%) contained errors, of which 3992 (1.9%) were material errors. Among immaterial errors, spelling errors were more common than intrusion/omission errors ( p < .001). Proportion of errors and fraction of material errors varied significantly among radiologists and between imaging subspecialties ( p < .001). Errors were more common in cross-sectional reports, reports reinterpreting results of outside examinations, and procedural studies (all p < .001). Error rate decreased over time ( p < .001), which suggests that a quality control program with regular feedback may reduce errors.
The SpeechEasy device in stuttering and nonstuttering adults: fluency effects while speaking and reading.

PubMed

Foundas, Anne L; Mock, Jeffrey R; Corey, David M; Golob, Edward J; Conture, Edward G

2013-08-01

The SpeechEasy is an electronic device designed to alleviate stuttering by manipulating auditory feedback via time delays and frequency shifts. Device settings (control, default, custom), ear-placement (left, right), speaking task, and cognitive variables were examined in people who stutter (PWS) (n=14) compared to controls (n=10). Among the PWS there was a significantly greater reduction in stuttering (compared to baseline) with custom device settings compared to the non-altered feedback (control) condition. Stuttering was reduced the most during reading, followed by narrative and conversation. For the conversation task, stuttering was reduced more when the device was worn in the left ear. Those individuals with a more severe stuttering rate at baseline had a greater benefit from the use of the device compared to individuals with less severe stuttering. Our results support the view that overt stuttering is associated with defective speech-language monitoring that can be influenced by manipulating auditory feedback. Copyright © 2013 Elsevier Inc. All rights reserved.
Multi-time resolution analysis of speech: evidence from psychophysics

PubMed Central

Chait, Maria; Greenberg, Steven; Arai, Takayuki; Simon, Jonathan Z.; Poeppel, David

2015-01-01

How speech signals are analyzed and represented remains a foundational challenge both for cognitive science and neuroscience. A growing body of research, employing various behavioral and neurobiological experimental techniques, now points to the perceptual relevance of both phoneme-sized (10–40 Hz modulation frequency) and syllable-sized (2–10 Hz modulation frequency) units in speech processing. However, it is not clear how information associated with such different time scales interacts in a manner relevant for speech perception. We report behavioral experiments on speech intelligibility employing a stimulus that allows us to investigate how distinct temporal modulations in speech are treated separately and whether they are combined. We created sentences in which the slow (~4 Hz; Slow) and rapid (~33 Hz; Shigh) modulations—corresponding to ~250 and ~30 ms, the average duration of syllables and certain phonetic properties, respectively—were selectively extracted. Although Slow and Shigh have low intelligibility when presented separately, dichotic presentation of Shigh with Slow results in supra-additive performance, suggesting a synergistic relationship between low- and high-modulation frequencies. A second experiment desynchronized presentation of the Slow and Shigh signals. Desynchronizing signals relative to one another had no impact on intelligibility when delays were less than ~45 ms. Longer delays resulted in a steep intelligibility decline, providing further evidence of integration or binding of information within restricted temporal windows. Our data suggest that human speech perception uses multi-time resolution processing. Signals are concurrently analyzed on at least two separate time scales, the intermediate representations of these analyses are integrated, and the resulting bound percept has significant consequences for speech intelligibility—a view compatible with recent insights from neuroscience implicating multi-timescale auditory processing. PMID:26136650

Converted and upgraded maps programmed in the newer speech processor for the first generation of multichannel cochlear implant.

PubMed

Magalhães, Ana Tereza de Matos; Goffi-Gomez, M Valéria Schmidt; Hoshino, Ana Cristina; Tsuji, Robinson Koji; Bento, Ricardo Ferreira; Brito, Rubens

2013-09-01

To identify the technological contributions of the newer version of speech processor to the first generation of multichannel cochlear implant and the satisfaction of users of the new technology. Among the new features available, we focused on the effect of the frequency allocation table, the T-SPL and C-SPL, and the preprocessing gain adjustments (adaptive dynamic range optimization). Prospective exploratory study. Cochlear implant center at hospital. Cochlear implant users of the Spectra processor with speech recognition in closed set. Seventeen patients were selected between the ages of 15 and 82 and deployed for more than 8 years. The technology update of the speech processor for the Nucleus 22. To determine Freedom's contribution, thresholds and speech perception tests were performed with the last map used with the Spectra and the maps created for Freedom. To identify the effect of the frequency allocation table, both upgraded and converted maps were programmed. One map was programmed with 25 dB T-SPL and 65 dB C-SPL and the other map with adaptive dynamic range optimization. To assess satisfaction, SADL and APHAB were used. All speech perception tests and all sound field thresholds were statistically better with the new speech processor; 64.7% of patients preferred maintaining the same frequency table that was suggested for the older processor. The sound field threshold was statistically significant at 500, 1,000, 1,500, and 2,000 Hz with 25 dB T-SPL/65 dB C-SPL. Regarding patient's satisfaction, there was a statistically significant improvement, only in the subscale of speech in noise abilities and phone use. The new technology improved the performance of patients with the first generation of multichannel cochlear implant.
Cochlear implantation with hearing preservation yields significant benefit for speech recognition in complex listening environments.

PubMed

Gifford, René H; Dorman, Michael F; Skarzynski, Henryk; Lorens, Artur; Polak, Marek; Driscoll, Colin L W; Roland, Peter; Buchman, Craig A

2013-01-01

The aim of this study was to assess the benefit of having preserved acoustic hearing in the implanted ear for speech recognition in complex listening environments. The present study included a within-subjects, repeated-measures design including 21 English-speaking and 17 Polish-speaking cochlear implant (CI) recipients with preserved acoustic hearing in the implanted ear. The patients were implanted with electrodes that varied in insertion depth from 10 to 31 mm. Mean preoperative low-frequency thresholds (average of 125, 250, and 500 Hz) in the implanted ear were 39.3 and 23.4 dB HL for the English- and Polish-speaking participants, respectively. In one condition, speech perception was assessed in an eight-loudspeaker environment in which the speech signals were presented from one loudspeaker and restaurant noise was presented from all loudspeakers. In another condition, the signals were presented in a simulation of a reverberant environment with a reverberation time of 0.6 sec. The response measures included speech reception thresholds (SRTs) and percent correct sentence understanding for two test conditions: CI plus low-frequency hearing in the contralateral ear (bimodal condition) and CI plus low-frequency hearing in both ears (best-aided condition). A subset of six English-speaking listeners were also assessed on measures of interaural time difference thresholds for a 250-Hz signal. Small, but significant, improvements in performance (1.7-2.1 dB and 6-10 percentage points) were found for the best-aided condition versus the bimodal condition. Postoperative thresholds in the implanted ear were correlated with the degree of electric and acoustic stimulation (EAS) benefit for speech recognition in diffuse noise. There was no reliable relationship among measures of audiometric threshold in the implanted ear nor elevation in threshold after surgery and improvement in speech understanding in reverberation. There was a significant correlation between interaural time difference threshold at 250 Hz and EAS-related benefit for the adaptive speech reception threshold. The findings of this study suggest that (1) preserved low-frequency hearing improves speech understanding for CI recipients, (2) testing in complex listening environments, in which binaural timing cues differ for signal and noise, may best demonstrate the value of having two ears with low-frequency acoustic hearing, and (3) preservation of binaural timing cues, although poorer than observed for individuals with normal hearing, is possible after unilateral cochlear implantation with hearing preservation and is associated with EAS benefit. The results of this study demonstrate significant communicative benefit for hearing preservation in the implanted ear and provide support for the expansion of CI criteria to include individuals with low-frequency thresholds in even the normal to near-normal range.
The Interrelationships between Ratings of Speech and Facial Acceptability in Persons with Cleft Palate.

ERIC Educational Resources Information Center

Sinko, Garnet R.; Hedrick, Dona L.

1982-01-01

Thirty untrained young adult observers rated the speech and facial acceptablity of 20 speakers with cleft palate. The observers were reliable in rating both speech and facial acceptability. Judgments of facial acceptability were generally more positive, suggesting that speech is generally judged more negatively in speakers with cleft palate.…
Untrained listeners' ratings of speech disorders in a group with cleft palate: a comparison with speech and language pathologists' ratings.

PubMed

Brunnegård, Karin; Lohmander, Anette; van Doorn, Jan

2009-01-01

Hypernasal resonance, audible nasal air emission and/or nasal turbulence, and articulation errors are typical speech disorders associated with the speech of children with cleft lip and palate. Several studies indicate that hypernasal resonance tends to be perceived negatively by listeners. Most perceptual studies of speech disorders related to cleft palate are carried out with speech and language pathologists as listeners, whereas only a few studies have been conducted to explore how judgements by untrained listeners compare with expert assessments. These types of studies can be used to determine whether children for whom speech and language pathologists recommend intervention have a significant speech deviance that is also detected by untrained listeners. To compare ratings by untrained listeners with ratings by speech and language pathologists for cleft palate speech. An assessment form for untrained listeners was developed using statements and a five-point scale. The assessment form was tailored to facilitate comparison with expert judgements. Twenty-eight untrained listeners assessed the speech of 26 speakers with cleft palate and ten speakers without cleft in a comparison group. This assessment was compared with the joint assessment of two expert speech and language pathologists. Listener groups generally agreed on which speakers were nasal. The untrained listeners detected hyper- and hyponasality when it was present in speech and considered moderate to severe hypernasality to be serious enough to call for intervention. The expert listeners assessed audible nasal air emission and/or nasal turbulence to be present in twice as many speakers as the untrained listeners who were much less sensitive to audible nasal air emission and/or nasal turbulence. The results of untrained listeners' ratings in this study in the main confirm the ratings of speech and language pathologists and show that cleft palate speech disorders may have an impact in the everyday life of the speaker.
Real-Time Speech/Music Classification With a Hierarchical Oblique Decision Tree

DTIC Science & Technology

2008-04-01

REAL-TIME SPEECH/ MUSIC CLASSIFICATION WITH A HIERARCHICAL OBLIQUE DECISION TREE Jun Wang, Qiong Wu, Haojiang Deng, Qin Yan Institute of Acoustics...time speech/ music classification with a hierarchical oblique decision tree. A set of discrimination features in frequency domain are selected...handle signals without discrimination and can not work properly in the existence of multimedia signals. This paper proposes a real-time speech/ music
Speech Perception With Combined Electric-Acoustic Stimulation: A Simulation and Model Comparison.

PubMed

Rader, Tobias; Adel, Youssef; Fastl, Hugo; Baumann, Uwe

2015-01-01

The aim of this study is to simulate speech perception with combined electric-acoustic stimulation (EAS), verify the advantage of combined stimulation in normal-hearing (NH) subjects, and then compare it with cochlear implant (CI) and EAS user results from the authors' previous study. Furthermore, an automatic speech recognition (ASR) system was built to examine the impact of low-frequency information and is proposed as an applied model to study different hypotheses of the combined-stimulation advantage. Signal-detection-theory (SDT) models were applied to assess predictions of subject performance without the need to assume any synergistic effects. Speech perception was tested using a closed-set matrix test (Oldenburg sentence test), and its speech material was processed to simulate CI and EAS hearing. A total of 43 NH subjects and a customized ASR system were tested. CI hearing was simulated by an aurally adequate signal spectrum analysis and representation, the part-tone-time-pattern, which was vocoded at 12 center frequencies according to the MED-EL DUET speech processor. Residual acoustic hearing was simulated by low-pass (LP)-filtered speech with cutoff frequencies 200 and 500 Hz for NH subjects and in the range from 100 to 500 Hz for the ASR system. Speech reception thresholds were determined in amplitude-modulated noise and in pseudocontinuous noise. Previously proposed SDT models were lastly applied to predict NH subject performance with EAS simulations. NH subjects tested with EAS simulations demonstrated the combined-stimulation advantage. Increasing the LP cutoff frequency from 200 to 500 Hz significantly improved speech reception thresholds in both noise conditions. In continuous noise, CI and EAS users showed generally better performance than NH subjects tested with simulations. In modulated noise, performance was comparable except for the EAS at cutoff frequency 500 Hz where NH subject performance was superior. The ASR system showed similar behavior to NH subjects despite a positive signal-to-noise ratio shift for both noise conditions, while demonstrating the synergistic effect for cutoff frequencies ≥300 Hz. One SDT model largely predicted the combined-stimulation results in continuous noise, while falling short of predicting performance observed in modulated noise. The presented simulation was able to demonstrate the combined-stimulation advantage for NH subjects as observed in EAS users. Only NH subjects tested with EAS simulations were able to take advantage of the gap listening effect, while CI and EAS user performance was consistently degraded in modulated noise compared with performance in continuous noise. The application of ASR systems seems feasible to assess the impact of different signal processing strategies on speech perception with CI and EAS simulations. In continuous noise, SDT models were largely able to predict the performance gain without assuming any synergistic effects, but model amendments are required to explain the gap listening effect in modulated noise.
Role of working memory and lexical knowledge in perceptual restoration of interrupted speech.

PubMed

Nagaraj, Naveen K; Magimairaj, Beula M

2017-12-01

The role of working memory (WM) capacity and lexical knowledge in perceptual restoration (PR) of missing speech was investigated using the interrupted speech perception paradigm. Speech identification ability, which indexed PR, was measured using low-context sentences periodically interrupted at 1.5 Hz. PR was measured for silent gated, low-frequency speech noise filled, and low-frequency fine-structure and envelope filled interrupted conditions. WM capacity was measured using verbal and visuospatial span tasks. Lexical knowledge was assessed using both receptive vocabulary and meaning from context tests. Results showed that PR was better for speech noise filled condition than other conditions tested. Both receptive vocabulary and verbal WM capacity explained unique variance in PR for the speech noise filled condition, but were unrelated to performance in the silent gated condition. It was only receptive vocabulary that uniquely predicted PR for fine-structure and envelope filled conditions. These findings suggest that the contribution of lexical knowledge and verbal WM during PR depends crucially on the information content that replaced the silent intervals. When perceptual continuity was partially restored by filler speech noise, both lexical knowledge and verbal WM capacity facilitated PR. Importantly, for fine-structure and envelope filled interrupted conditions, lexical knowledge was crucial for PR.
Lexical influences on competing speech perception in younger, middle-aged, and older adults

PubMed Central

Helfer, Karen S.; Jesse, Alexandra

2015-01-01

The influence of lexical characteristics of words in to-be-attended and to-be-ignored speech streams was examined in a competing speech task. Older, middle-aged, and younger adults heard pairs of low-cloze probability sentences in which the frequency or neighborhood density of words was manipulated in either the target speech stream or the masking speech stream. All participants also completed a battery of cognitive measures. As expected, for all groups, target words that occur frequently or that are from sparse lexical neighborhoods were easier to recognize than words that are infrequent or from dense neighborhoods. Compared to other groups, these neighborhood density effects were largest for older adults; the frequency effect was largest for middle-aged adults. Lexical characteristics of words in the to-be-ignored speech stream also affected recognition of to-be-attended words, but only when overall performance was relatively good (that is, when younger participants listened to the speech streams at a more advantageous signal-to-noise ratio). For these listeners, to-be-ignored masker words from sparse neighborhoods interfered with recognition of target speech more than masker words from dense neighborhoods. Amount of hearing loss and cognitive abilities relating to attentional control modulated overall performance as well as the strength of lexical influences. PMID:26233036
The Effect of Frequency Transposition on Speech Perception in Adolescents and Young Adults with Profound Hearing Loss

ERIC Educational Resources Information Center

Gou, J.; Smith, J.; Valero, J.; Rubio, I.

2011-01-01

This paper reports on a clinical trial evaluating outcomes of a frequency-lowering technique for adolescents and young adults with severe to profound hearing impairment. Outcomes were defined by changes in aided thresholds, speech perception, and acceptance. The participants comprised seven young people aged between 13 and 25 years. They were…
Processing F0 with cochlear implants: Modulation frequency discrimination and speech intonation recognition.

PubMed

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners' Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners' everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial-F0s to represent male vs. female voices, and different F0 contours (i.e. falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners' sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners' performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution.
Processing F0 with Cochlear Implants: Modulation Frequency Discrimination and Speech Intonation Recognition

PubMed Central

Chatterjee, Monita; Peng, Shu-Chen

2008-01-01

Fundamental frequency (F0) processing by cochlear implant (CI) listeners was measured using a psychophysical task and a speech intonation recognition task. Listeners’ Weber fractions for modulation frequency discrimination were measured using an adaptive, 3-interval, forced-choice paradigm: stimuli were presented through a custom research interface. In the speech intonation recognition task, listeners were asked to indicate whether resynthesized bisyllabic words, when presented in the free field through the listeners’ everyday speech processor, were question-like or statement-like. The resynthesized tokens were systematically manipulated to have different initial F0s to represent male vs. female voices, and different F0 contours (i.e., falling, flat, and rising) Although the CI listeners showed considerable variation in performance on both tasks, significant correlations were observed between the CI listeners’ sensitivity to modulation frequency in the psychophysical task and their performance in intonation recognition. Consistent with their greater reliance on temporal cues, the CI listeners’ performance in the intonation recognition task was significantly poorer with the higher initial-F0 stimuli than with the lower initial-F0 stimuli. Similar results were obtained with normal hearing listeners attending to noiseband-vocoded CI simulations with reduced spectral resolution. PMID:18093766
Study of acoustic correlates associate with emotional speech

NASA Astrophysics Data System (ADS)

Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth

2004-10-01

This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.
Glove-talk II - a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1997-01-01

Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users

PubMed Central

Newman, Rochelle S.; Goupell, Matthew J.

2017-01-01

Purpose Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate information could explain some of the variability in this population's speech perception outcomes. Method Phonemes with manipulated voice-onset-time (VOT) durations were embedded in sentences with different speech rates. Twenty-three CI and 29 NH participants performed a phoneme identification task. NH participants heard the same unprocessed stimuli as the CI participants or stimuli degraded by a sine vocoder, simulating aspects of CI processing. Results CI participants showed larger rate normalization effects (6.6 ms) than the NH participants (3.7 ms) and had shallower (less reliable) category boundary slopes. NH participants showed similarly shallow slopes when presented acoustically degraded vocoded signals, but an equal or smaller rate effect in response to reductions in available spectral and temporal information. Conclusion CI participants can rate normalize, despite their degraded speech input, and show a larger rate effect compared to NH participants. CI participants may particularly rely on rate normalization to better maintain perceptual constancy of the speech signal. PMID:28395319
Don’t speak too fast! Processing of fast rate speech in children with specific language impairment

PubMed Central

Bedoin, Nathalie; Krifi-Papoz, Sonia; Herbillon, Vania; Caillot-Bascoul, Aurélia; Gonzalez-Monge, Sibylle; Boulenger, Véronique

2018-01-01

Background Perception of speech rhythm requires the auditory system to track temporal envelope fluctuations, which carry syllabic and stress information. Reduced sensitivity to rhythmic acoustic cues has been evidenced in children with Specific Language Impairment (SLI), impeding syllabic parsing and speech decoding. Our study investigated whether these children experience specific difficulties processing fast rate speech as compared with typically developing (TD) children. Method Sixteen French children with SLI (8–13 years old) with mainly expressive phonological disorders and with preserved comprehension and 16 age-matched TD children performed a judgment task on sentences produced 1) at normal rate, 2) at fast rate or 3) time-compressed. Sensitivity index (d′) to semantically incongruent sentence-final words was measured. Results Overall children with SLI perform significantly worse than TD children. Importantly, as revealed by the significant Group × Speech Rate interaction, children with SLI find it more challenging than TD children to process both naturally or artificially accelerated speech. The two groups do not significantly differ in normal rate speech processing. Conclusion In agreement with rhythm-processing deficits in atypical language development, our results suggest that children with SLI face difficulties adjusting to rapid speech rate. These findings are interpreted in light of temporal sampling and prosodic phrasing frameworks and of oscillatory mechanisms underlying speech perception. PMID:29373610
Speech Impairment in Down Syndrome: A Review

PubMed Central

Kent, Ray D.; Vorperian, Houri K.

2012-01-01

Purpose This review summarizes research on disorders of speech production in Down Syndrome (DS) for the purposes of informing clinical services and guiding future research. Method Review of the literature was based on searches using Medline, Google Scholar, Psychinfo, and HighWire Press, as well as consideration of reference lists in retrieved documents (including online sources). Search terms emphasized functions related to voice, articulation, phonology, prosody, fluency and intelligibility. Conclusions The following conclusions pertain to four major areas of review: (a) Voice. Although a number of studies have been reported on vocal abnormalities in DS, major questions remain about the nature and frequency of the phonatory disorder. Results of perceptual and acoustic studies have been mixed, making it difficult to draw firm conclusions or even to identify sensitive measures for future study. (b) Speech sounds. Articulatory and phonological studies show that speech patterns in DS are a combination of delayed development and errors not seen in typical development. Delayed (i.e., developmental) and disordered (i.e., nondevelopmental) patterns are evident by the age of about 3 years, although DS-related abnormalities possibly appear earlier, even in infant babbling. (c) Fluency and prosody. Stuttering and/or cluttering occur in DS at rates of 10 to 45%, compared to about 1% in the general population. Research also points to significant disturbances in prosody. (d) Intelligibility. Studies consistently show marked limitations in this area but it is only recently that research goes beyond simple rating scales. PMID:23275397
Anxiety and speaking in people who stutter: an investigation using the emotional Stroop task.

PubMed

Hennessey, Neville W; Dourado, Esther; Beilby, Janet M

2014-06-01

People with anxiety disorders show an attentional bias towards threat or negative emotion words. This exploratory study examined whether people who stutter (PWS), who can be anxious when speaking, show similar bias and whether reactions to threat words also influence speech motor planning and execution. Comparisons were made between 31 PWS and 31 fluent controls in a modified emotional Stroop task where, depending on a visual cue, participants named the colour of threat and neutral words at either a normal or fast articulation rate. In a manual version of the same task participants pressed the corresponding colour button with either a long or short duration. PWS but not controls were slower to respond to threat words than neutral words, however, this emotionality effect was only evident for verbal responding. Emotionality did not interact with speech rate, but the size of the emotionality effect among PWS did correlate with frequency of stuttering. Results suggest PWS show an attentional bias to threat words similar to that found in people with anxiety disorder. In addition, this bias appears to be contingent on engaging the speech production system as a response modality. No evidence was found to indicate that emotional reactivity during the Stroop task constrains or destabilises, perhaps via arousal mechanisms, speech motor adjustment or execution for PWS. The reader will be able to: (1) explain the importance of cognitive aspects of anxiety, such as attentional biases, in the possible cause and/or maintenance of anxiety in people who stutter, (2) explain how the emotional Stroop task can be used as a measure of attentional bias to threat information, and (3) evaluate the findings with respect to the relationship between attentional bias to threat information and speech production in people who stutter. Copyright © 2013 Elsevier Inc. All rights reserved.
A Rating Scale for the Functional Assessment of Patients with Familial Dysautonomia (Riley Day Syndrome)

PubMed Central

Axelrod, Felicia B.; Rolnitzky, Linda; von Simson, Gabrielle Gold; Berlin, Dena; Kaufmann, Horacio

2012-01-01

Objective To develop a reliable rating scale to assess functional capacity in children with familial dysautonomia, evaluate changes over time and determine whether severity within a particular functional category at a young age affected survival. Study design Ten functional categories were retrospectively assessed in 123 patients with familial dysautonomia at age 7 years ± 6 months. Each of the ten Functional Severity Scale (FuSS) categories (motor development, cognitive ability, psychological status, expressive speech, balance, oral coordination, frequency of dysautonomic crisis, respiratory, cardiovascular and nutritional status) was scored from 1 (worst or severely affected) to 5 (best or no impairment). Changes over time were analyzed further in 22 of the 123 patients who were also available at ages 17 and 27 years. Results Severely impaired cardiovascular function and high frequency of dysautonomic crisis negatively affected survival (p<0.005 and p<0.001, respectively). In the 22 individuals followed up to age 27 years, psychological status significantly worsened (p=0.01), and expressive speech improved (p=0.045). From age 17 to 27 years, balance worsened markedly (p =0.048). Conclusion The FuSS scale is a reliable tool to measure functional capacity in patients with familial dysautonomia. The scale may prove useful in providing prognosis and as a complementary endpoint in clinical trials. PMID:22727867
Assessing the role of spectral and intensity cues in spectral ripple detection and discrimination in cochlear-implant users

PubMed Central

Anderson, Elizabeth S.; Oxenham, Andrew J.; Nelson, Peggy B.; Nelson, David A.

2012-01-01

Measures of spectral ripple resolution have become widely used psychophysical tools for assessing spectral resolution in cochlear-implant (CI) listeners. The objective of this study was to compare spectral ripple discrimination and detection in the same group of CI listeners. Ripple detection thresholds were measured over a range of ripple frequencies and were compared to spectral ripple discrimination thresholds previously obtained from the same CI listeners. The data showed that performance on the two measures was correlated, but that individual subjects’ thresholds (at a constant spectral modulation depth) for the two tasks were not equivalent. In addition, spectral ripple detection was often found to be possible at higher rates than expected based on the available spectral cues, making it likely that temporal-envelope cues played a role at higher ripple rates. Finally, spectral ripple detection thresholds were compared to previously obtained speech-perception measures. Results confirmed earlier reports of a robust relationship between detection of widely spaced ripples and measures of speech recognition. In contrast, intensity difference limens for broadband noise did not correlate with spectral ripple detection measures, suggesting a dissociation between the ability to detect small changes in intensity across frequency and across time. PMID:23231122
Training parents to use the natural language paradigm to increase their autistic children's speech.

PubMed Central

Laski, K E; Charlop, M H; Schreibman, L

1988-01-01

Parents of four nonverbal and four echolalic autistic children were trained to increase their children's speech by using the Natural Language Paradigm (NLP), a loosely structured procedure conducted in a play environment with a variety of toys. Parents were initially trained to use the NLP in a clinic setting, with subsequent parent-child speech sessions occurring at home. The results indicated that following training, parents increased the frequency with which they required their children to speak (i.e., modeled words and phrases, prompted answers to questions). Correspondingly, all children increased the frequency of their verbalizations in three nontraining settings. Thus, the NLP appears to be an efficacious program for parents to learn and use in the home to increase their children's speech. PMID:3225256

Helium Speech: An Application of Standing Waves

NASA Astrophysics Data System (ADS)

Wentworth, Christopher D.

2011-04-01

Taking a breath of helium gas and then speaking or singing to the class is a favorite demonstration for an introductory physics course, as it usually elicits appreciative laughter, which serves to energize the class session. Students will usually report that the helium speech "raises the frequency" of the voice. A more accurate description of the phenomenon requires that we distinguish between the frequencies of sound produced by the larynx and the filtering of those frequencies by the vocal tract. We will describe here an experiment done by introductory physics students that uses helium speech as a context for learning about the human vocal system and as an application of the standing sound-wave concept. Modern acoustic analysis software easily obtained by instructors for student use allows data to be obtained and analyzed quickly.
Effects of Within-Talker Variability on Speech Intelligibility in Mandarin-Speaking Adult and Pediatric Cochlear Implant Patients

PubMed Central

Su, Qiaotong; Galvin, John J.; Zhang, Guoping; Li, Yongxin

2016-01-01

Cochlear implant (CI) speech performance is typically evaluated using well-enunciated speech produced at a normal rate by a single talker. CI users often have greater difficulty with variations in speech production encountered in everyday listening. Within a single talker, speaking rate, amplitude, duration, and voice pitch information may be quite variable, depending on the production context. The coarse spectral resolution afforded by the CI limits perception of voice pitch, which is an important cue for speech prosody and for tonal languages such as Mandarin Chinese. In this study, sentence recognition from the Mandarin speech perception database was measured in adult and pediatric Mandarin-speaking CI listeners for a variety of speaking styles: voiced speech produced at slow, normal, and fast speaking rates; whispered speech; voiced emotional speech; and voiced shouted speech. Recognition of Mandarin Hearing in Noise Test sentences was also measured. Results showed that performance was significantly poorer with whispered speech relative to the other speaking styles and that performance was significantly better with slow speech than with fast or emotional speech. Results also showed that adult and pediatric performance was significantly poorer with Mandarin Hearing in Noise Test than with Mandarin speech perception sentences at the normal rate. The results suggest that adult and pediatric Mandarin-speaking CI patients are highly susceptible to whispered speech, due to the lack of lexically important voice pitch cues and perhaps other qualities associated with whispered speech. The results also suggest that test materials may contribute to differences in performance observed between adult and pediatric CI users. PMID:27363714
Speech rate in Parkinson's disease: A controlled study.

PubMed

Martínez-Sánchez, F; Meilán, J J G; Carro, J; Gómez Íñiguez, C; Millian-Morell, L; Pujante Valverde, I M; López-Alburquerque, T; López, D E

2016-09-01

Speech disturbances will affect most patients with Parkinson's disease (PD) over the course of the disease. The origin and severity of these symptoms are of clinical and diagnostic interest. To evaluate the clinical pattern of speech impairment in PD patients and identify significant differences in speech rate and articulation compared to control subjects. Speech rate and articulation in a reading task were measured using an automatic analytical method. A total of 39 PD patients in the 'on' state and 45 age-and sex-matched asymptomatic controls participated in the study. None of the patients experienced dyskinesias or motor fluctuations during the test. The patients with PD displayed a significant reduction in speech and articulation rates; there were no significant correlations between the studied speech parameters and patient characteristics such as L-dopa dose, duration of the disorder, age, and UPDRS III scores and Hoehn & Yahr scales. Patients with PD show a characteristic pattern of declining speech rate. These results suggest that in PD, disfluencies are the result of the movement disorder affecting the physiology of speech production systems. Copyright © 2014 Sociedad Española de Neurología. Publicado por Elsevier España, S.L.U. All rights reserved.
The speech naturalness of people who stutter speaking under delayed auditory feedback as perceived by different groups of listeners.

PubMed

Van Borsel, John; Eeckhout, Hannelore

2008-09-01

This study investigated listeners' perception of the speech naturalness of people who stutter (PWS) speaking under delayed auditory feedback (DAF) with particular attention for possible listener differences. Three panels of judges consisting of 14 stuttering individuals, 14 speech language pathologists, and 14 naive listeners rated the naturalness of speech samples of stuttering and non-stuttering individuals using a 9-point interval scale. Results clearly indicate that these three groups evaluate naturalness differently. Naive listeners appear to be more severe in their judgements than speech language pathologists and stuttering listeners, and speech language pathologists are apparently more severe than PWS. The three listener groups showed similar trends with respect to the relationship between speech naturalness and speech rate. Results of all three indicated that for PWS, the slower a speaker's rate was, the less natural speech was judged to sound. The three listener groups also showed similar trends with regard to naturalness of the stuttering versus the non-stuttering individuals. All three panels considered the speech of the non-stuttering participants more natural. The reader will be able to: (1) discuss the speech naturalness of people who stutter speaking under delayed auditory feedback, (2) discuss listener differences about the naturalness of people who stutter speaking under delayed auditory feedback, and (3) discuss the importance of speech rate for the naturalness of speech.
Deficits of congenital amusia beyond pitch: Evidence from impaired categorical perception of vowels in Cantonese-speaking congenital amusics

PubMed Central

Shao, Jing; Huang, Xunan

2017-01-01

Congenital amusia is a lifelong disorder of fine-grained pitch processing in music and speech. However, it remains unclear whether amusia is a pitch-specific deficit, or whether it affects frequency/spectral processing more broadly, such as the perception of formant frequency in vowels, apart from pitch. In this study, in order to illuminate the scope of the deficits, we compared the performance of 15 Cantonese-speaking amusics and 15 matched controls on the categorical perception of sound continua in four stimulus contexts: lexical tone, pure tone, vowel, and voice onset time (VOT). Whereas lexical tone, pure tone and vowel continua rely on frequency/spectral processing, the VOT continuum depends on duration/temporal processing. We found that the amusic participants performed similarly to controls in all stimulus contexts in the identification, in terms of the across-category boundary location and boundary width. However, the amusic participants performed systematically worse than controls in discriminating stimuli in those three contexts that depended on frequency/spectral processing (lexical tone, pure tone and vowel), whereas they performed normally when discriminating duration differences (VOT). These findings suggest that the deficit of amusia is probably not pitch specific, but affects frequency/spectral processing more broadly. Furthermore, there appeared to be differences in the impairment of frequency/spectral discrimination in speech and nonspeech contexts. The amusic participants exhibited less benefit in between-category discriminations than controls in speech contexts (lexical tone and vowel), suggesting reduced categorical perception; on the other hand, they performed inferiorly compared to controls across the board regardless of between- and within-category discriminations in nonspeech contexts (pure tone), suggesting impaired general auditory processing. These differences imply that the frequency/spectral-processing deficit might be manifested differentially in speech and nonspeech contexts in amusics—it is manifested as a deficit of higher-level phonological processing in speech sounds, and as a deficit of lower-level auditory processing in nonspeech sounds. PMID:28829808
The Perception of "Sine-Wave Speech" by Adults with Developmental Dyslexia.

ERIC Educational Resources Information Center

Rosner, Burton S.; Talcott, Joel B.; Witton, Caroline; Hogg, James D.; Richardson, Alexandra J.; Hansen, Peter C.; Stein, John F.

2003-01-01

"Sine-wave speech" sentences contain only four frequency-modulated sine waves, lacking many acoustic cues present in natural speech. Adults with (n=19) and without (n=14) dyslexia were asked to reproduce orally sine-wave utterances in successive trials. Results suggest comprehension of sine-wave sentences is impaired in some adults with…
Cross-Frequency Integration for Consonant and Vowel Identification in Bimodal Hearing

ERIC Educational Resources Information Center

Kong, Ying-Yee; Braida, Louis D.

2011-01-01

Purpose: Improved speech recognition in binaurally combined acoustic-electric stimulation (otherwise known as "bimodal hearing") could arise when listeners integrate speech cues from the acoustic and electric hearing. The aims of this study were (a) to identify speech cues extracted in electric hearing and residual acoustic hearing in the…
Speech Spectrum's Correlation with Speakers' Eysenck Personality Traits

PubMed Central

Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue

2012-01-01

The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014
Gender disparity in subcortical encoding of binaurally presented speech stimuli: an auditory evoked potentials study.

PubMed

Ahadi, Mohsen; Pourbakht, Akram; Jafari, Amir Homayoun; Shirjian, Zahra; Jafarpisheh, Amir Salar

2014-06-01

To investigate the influence of gender on subcortical representation of speech acoustic parameters where simultaneously presented to both ears. Two-channel speech-evoked auditory brainstem responses were obtained in 25 female and 23 male normal hearing young adults by using binaural presentation of the 40 ms synthetic consonant-vowel/da/, and the encoding of the fast and slow elements of speech stimuli at subcortical level were compared in the temporal and spectral domains between the sexes using independent sample, two tailed t-test. Highly detectable responses were established in both groups. Analysis in the time domain revealed earlier and larger Fast-onset-responses in females but there was no gender related difference in sustained segment and offset of the response. Interpeak intervals between Frequency Following Response peaks were also invariant to sex. Based on shorter onset responses in females, composite onset measures were also sex dependent. Analysis in the spectral domain showed more robust and better representation of fundamental frequency as well as the first formant and high frequency components of first formant in females than in males. Anatomical, biological and biochemical distinctions between females and males could alter the neural encoding of the acoustic cues of speech stimuli at subcortical level. Females have an advantage in binaural processing of the slow and fast elements of speech. This could be a physiological evidence for better identification of speaker and emotional tone of voice, as well as better perceiving the phonetic information of speech in women. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Language-independent talker-specificity in first-language and second-language speech production by bilingual talkers: L1 speaking rate predicts L2 speaking rate

PubMed Central

Bradlow, Ann R.; Kim, Midam; Blasingame, Michael

2017-01-01

Second-language (L2) speech is consistently slower than first-language (L1) speech, and L1 speaking rate varies within- and across-talkers depending on many individual, situational, linguistic, and sociolinguistic factors. It is asked whether speaking rate is also determined by a language-independent talker-specific trait such that, across a group of bilinguals, L1 speaking rate significantly predicts L2 speaking rate. Two measurements of speaking rate were automatically extracted from recordings of read and spontaneous speech by English monolinguals (n = 27) and bilinguals from ten L1 backgrounds (n = 86): speech rate (syllables/second), and articulation rate (syllables/second excluding silent pauses). Replicating prior work, L2 speaking rates were significantly slower than L1 speaking rates both across-groups (monolinguals' L1 English vs bilinguals' L2 English), and across L1 and L2 within bilinguals. Critically, within the bilingual group, L1 speaking rate significantly predicted L2 speaking rate, suggesting that a significant portion of inter-talker variation in L2 speech is derived from inter-talker variation in L1 speech, and that individual variability in L2 spoken language production may be best understood within the context of individual variability in L1 spoken language production. PMID:28253679
Orbital component extraction by time-variant sinusoidal modeling.

NASA Astrophysics Data System (ADS)

Sinnesael, Matthias; Zivanovic, Miroslav; De Vleeschouwer, David; Claeys, Philippe; Schoukens, Johan

2016-04-01

Accurately deciphering periodic variations in paleoclimate proxy signals is essential for cyclostratigraphy. Classical spectral analysis often relies on methods based on the (Fast) Fourier Transformation. This technique has no unique solution separating variations in amplitude and frequency. This characteristic makes it difficult to correctly interpret a proxy's power spectrum or to accurately evaluate simultaneous changes in amplitude and frequency in evolutionary analyses. Here, we circumvent this drawback by using a polynomial approach to estimate instantaneous amplitude and frequency in orbital components. This approach has been proven useful to characterize audio signals (music and speech), which are non-stationary in nature (Zivanovic and Schoukens, 2010, 2012). Paleoclimate proxy signals and audio signals have in nature similar dynamics; the only difference is the frequency relationship between the different components. A harmonic frequency relationship exists in audio signals, whereas this relation is non-harmonic in paleoclimate signals. However, the latter difference is irrelevant for the problem at hand. Using a sliding window approach, the model captures time variations of an orbital component by modulating a stationary sinusoid centered at its mean frequency, with a single polynomial. Hence, the parameters that determine the model are the mean frequency of the orbital component and the polynomial coefficients. The first parameter depends on geologic interpretation, whereas the latter are estimated by means of linear least-squares. As an output, the model provides the orbital component waveform, either in the depth or time domain. Furthermore, it allows for a unique decomposition of the signal into its instantaneous amplitude and frequency. Frequency modulation patterns can be used to reconstruct changes in accumulation rate, whereas amplitude modulation can be used to reconstruct e.g. eccentricity-modulated precession. The time-variant sinusoidal model is applied to well-established Pleistocene benthic isotope records to evaluate its performance. Zivanovic M. and Schoukens J. (2010) On The Polynomial Approximation for Time-Variant Harmonic Signal Modeling. IEEE Transactions On Audio, Speech, and Language Processing vol. 19, no. 3, pp. 458-467. Doi: 10.1109/TASL.2010.2049673. Zivanovic M. and Schoukens J. (2012) Single and Piecewise Polynomials for Modeling of Pitched Sounds. IEEE Transactions On Audio, Speech, and Language Processing vol. 20, no. 4, pp. 1270-1281. Doi: 10.1109/TASL.2011.2174228.
Comparison of formant detection methods used in speech processing applications

NASA Astrophysics Data System (ADS)

Belean, Bogdan

2013-11-01

The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
Sensory deprivation due to otitis media episodes in early childhood and its effect at later age: A psychoacoustic and speech perception measure.

PubMed

Shetty, Hemanth Narayan; Koonoor, Vishal

2016-11-01

Past research has reported that children with repeated occurrences of otitis media at an early age have a negative impact on speech perception at a later age. The present study necessitates documenting the temporal and spectral processing on speech perception in noise from normal and atypical groups. The present study evaluated the relation between speech perception in noise and temporal; and spectral processing abilities in children with normal and atypical groups. The study included two experiments. In the first experiment, temporal resolution and frequency discrimination of listeners with normal group and three subgroups of atypical groups (had a history of OM) a) less than four episodes b) four to nine episodes and c) More than nine episodes during their chronological age of 6 months to 2 years) were evaluated using measures of temporal modulation transfer function and frequency discrimination test. In the second experiment, SNR 50 was evaluated on each group of study participants. All participants had normal hearing and middle ear status during the course of testing. Demonstrated that children with atypical group had significantly poorer modulation detection threshold, peak sensitivity and bandwidth; and frequency discrimination to each F0 than normal hearing listeners. Furthermore, there was a significant correlation seen between measures of temporal resolution; frequency discrimination and speech perception in noise. It infers atypical groups have significant impairment in extracting envelope as well as fine structure cues from the signal. The results supported the idea that episodes of OM before 2 years of agecan produce periods of sensory deprivation that alters the temporal and spectral skills which in turn has negative consequences on speech perception in noise. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
The Queen's English: an alternative, biosocial hypothesis for the distinctive features of "gay speech".

PubMed

Rendall, Drew; Vasey, Paul L; McKenzie, Jared

2008-02-01

Popular stereotypes concerning the speech of homosexuals typically attribute speech patterns characteristic of the opposite-sex, i.e., broadly feminized speech in gay men and broadly masculinized speech in lesbian women. A small body of recent empirical research has begun to address the subject more systematically and to consider specific mechanistic hypotheses to account for the potentially distinctive features of homosexual speech. Results do not yet fully endorse the stereotypes but they do not entirely discount them either; nor do they cleanly favor any single mechanistic hypothesis. To contribute to this growing body of research, we report acoustic analyses of 2,875 vowel sounds from a balanced set of 125 speakers representing heterosexual and homosexual individuals of each sex from southern Alberta, Canada. Analyses focused on voice pitch and formant frequencies which together determine the principle perceptual features of vowels. There was no significant difference in mean voice pitch between heterosexual and homosexual men or between heterosexual and homosexual women, but there were significant differences in the formant frequencies of vowels produced by both homosexual groups compared to their heterosexual counterparts. Formant frequency differences were specific to only certain vowel sounds and some could be attributed to basic differences in body size between heterosexual and homosexual speakers. The remaining formant frequency differences were not obviously due to differences in vocal tract anatomy between heterosexual and homosexual speakers, nor did they reflect global feminization or masculinization of vowel production patterns in homosexual men and women, respectively. The vowel-specific differences observed could reflect social modeling processes in which only certain speech patterns of the opposite-sex, or of same-sex homosexuals, are selectively adopted. However, we introduce an alternative biosocial hypothesis, specifically that the distinctive, vowel-specific features of homosexual speakers relative to heterosexual speakers arise incidentally as a product of broader psychobehavioral differences between the two groups that are, in turn, continuous with and flow from the physiological processes that affect sexual orientation to begin with.
The Auditory-Brainstem Response to Continuous, Non-repetitive Speech Is Modulated by the Speech Envelope and Reflects Speech Processing

PubMed Central

Reichenbach, Chagit S.; Braiman, Chananel; Schiff, Nicholas D.; Hudspeth, A. J.; Reichenbach, Tobias

2016-01-01

The auditory-brainstem response (ABR) to short and simple acoustical signals is an important clinical tool used to diagnose the integrity of the brainstem. The ABR is also employed to investigate the auditory brainstem in a multitude of tasks related to hearing, such as processing speech or selectively focusing on one speaker in a noisy environment. Such research measures the response of the brainstem to short speech signals such as vowels or words. Because the voltage signal of the ABR has a tiny amplitude, several hundred to a thousand repetitions of the acoustic signal are needed to obtain a reliable response. The large number of repetitions poses a challenge to assessing cognitive functions due to neural adaptation. Here we show that continuous, non-repetitive speech, lasting several minutes, may be employed to measure the ABR. Because the speech is not repeated during the experiment, the precise temporal form of the ABR cannot be determined. We show, however, that important structural features of the ABR can nevertheless be inferred. In particular, the brainstem responds at the fundamental frequency of the speech signal, and this response is modulated by the envelope of the voiced parts of speech. We accordingly introduce a novel measure that assesses the ABR as modulated by the speech envelope, at the fundamental frequency of speech and at the characteristic latency of the response. This measure has a high signal-to-noise ratio and can hence be employed effectively to measure the ABR to continuous speech. We use this novel measure to show that the ABR is weaker to intelligible speech than to unintelligible, time-reversed speech. The methods presented here can be employed for further research on speech processing in the auditory brainstem and can lead to the development of future clinical diagnosis of brainstem function. PMID:27303286
From prosodic structure to acoustic saliency: A fMRI investigation of speech rate, clarity, and emphasis

NASA Astrophysics Data System (ADS)

Golfinopoulos, Elisa

Acoustic variability in fluent speech can arise at many stages in speech production planning and execution. For example, at the phonological encoding stage, the grouping of phonemes into syllables determines which segments are coarticulated and, by consequence, segment-level acoustic variation. Likewise phonetic encoding, which determines the spatiotemporal extent of articulatory gestures, will affect the acoustic detail of segments. Functional magnetic resonance imaging (fMRI) was used to measure brain activity of fluent adult speakers in four speaking conditions: fast, normal, clear, and emphatic (or stressed) speech. These speech manner changes typically result in acoustic variations that do not change the lexical or semantic identity of productions but do affect the acoustic saliency of phonemes, syllables and/or words. Acoustic responses recorded inside the scanner were assessed quantitatively using eight acoustic measures and sentence duration was used as a covariate of non-interest in the neuroimaging analysis. Compared to normal speech, emphatic speech was characterized acoustically by a greater difference between stressed and unstressed vowels in intensity, duration, and fundamental frequency, and neurally by increased activity in right middle premotor cortex and supplementary motor area, and bilateral primary sensorimotor cortex. These findings are consistent with right-lateralized motor planning of prosodic variation in emphatic speech. Clear speech involved an increase in average vowel and sentence durations and average vowel spacing, along with increased activity in left middle premotor cortex and bilateral primary sensorimotor cortex. These findings are consistent with an increased reliance on feedforward control, resulting in hyper-articulation, under clear as compared to normal speech. Fast speech was characterized acoustically by reduced sentence duration and average vowel spacing, and neurally by increased activity in left anterior frontal operculum and posterior dorsal inferior frontal gyms pars opercularis -- regions thought to be involved in sequencing and phrase-level structural processing. Taken together these findings identify the acoustic and neural correlates of adjusting speech manner and underscore the different processing stages that can contribute to acoustic variability in fluent sentence production.
Speech coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ravishankar, C., Hughes Network Systems, Germantown, MD

Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfullymore » regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the coding techniques are equally applicable to any voice signal whether or not it carries any intelligible information, as the term speech implies. Other terms that are commonly used are speech compression and voice compression since the fundamental idea behind speech coding is to reduce (compress) the transmission rate (or equivalently the bandwidth) And/or reduce storage requirements In this document the terms speech and voice shall be used interchangeably.« less
The Role of Clinical Experience in Speech-Language Pathologists' Perception of Subphonemic Detail in Children's Speech

PubMed Central

Munson, Benjamin; Johnson, Julie M.; Edwards, Jan

2013-01-01

Purpose This study examined whether experienced speech-language pathologists differ from inexperienced people in their perception of phonetic detail in children's speech. Method Convenience samples comprising 21 experienced speech-language pathologist and 21 inexperienced listeners participated in a series of tasks in which they made visual-analog scale (VAS) ratings of children's natural productions of target /s/-/θ/, /t/-/k/, and /d/-/ɡ/ in word-initial position. Listeners rated the perception distance between individual productions and ideal productions. Results The experienced listeners' ratings differed from inexperienced listeners' in four ways: they had higher intra-rater reliability, they showed less bias toward a more frequent sound, their ratings were more closely related to the acoustic characteristics of the children's speech, and their responses were related to a different set of predictor variables. Conclusions Results suggest that experience working as a speech-language pathologist leads to better perception of phonetic detail in children's speech. Limitations and future research are discussed. PMID:22230182
Oral Motor Abilities Are Task Dependent: A Factor Analytic Approach to Performance Rate.

PubMed

Staiger, Anja; Schölderle, Theresa; Brendel, Bettina; Bötzel, Kai; Ziegler, Wolfram

2017-01-01

Measures of performance rates in speech-like or volitional nonspeech oral motor tasks are frequently used to draw inferences about articulation rate abnormalities in patients with neurologic movement disorders. The study objective was to investigate the structural relationship between rate measures of speech and of oral motor behaviors different from speech. A total of 130 patients with neurologic movement disorders and 130 healthy subjects participated in the study. Rate data was collected for oral reading (speech), rapid syllable repetition (speech-like), and rapid single articulator movements (nonspeech). The authors used factor analysis to determine whether the different rate variables reflect the same or distinct constructs. The behavioral data were most appropriately captured by a measurement model in which the different task types loaded onto separate latent variables. The data on oral motor performance rates show that speech tasks and oral motor tasks such as rapid syllable repetition or repetitive single articulator movements measure separate traits.
Spectral context affects temporal processing in awake auditory cortex

PubMed Central

Beitel, Ralph E.; Vollmer, Maike; Heiser, Marc A; Schreiner, Christoph E.

2013-01-01

Amplitude modulation encoding is critical for human speech perception and complex sound processing in general. The modulation transfer function (MTF) is a staple of auditory psychophysics, and has been shown to predict speech intelligibility performance in a range of adverse listening conditions and hearing impairments, including cochlear implant-supported hearing. Although both tonal and broadband carriers have been employed in psychophysical studies of modulation detection and discrimination, relatively little is known about differences in the cortical representation of such signals. We obtained MTFs in response to sinusoidal amplitude modulation (SAM) for both narrowband tonal carriers and 2-octave bandwidth noise carriers in the auditory core of awake squirrel monkeys. MTFs spanning modulation frequencies from 4 to 512 Hz were obtained using 16 channel linear recording arrays sampling across all cortical laminae. Carrier frequency for tonal SAM and center frequency for noise SAM was set at the estimated best frequency for each penetration. Changes in carrier type affected both rate and temporal MTFs in many neurons. Using spike discrimination techniques, we found that discrimination of modulation frequency was significantly better for tonal SAM than for noise SAM, though the differences were modest at the population level. Moreover, spike trains elicited by tonal and noise SAM could be readily discriminated in most cases. Collectively, our results reveal remarkable sensitivity to the spectral content of modulated signals, and indicate substantial interdependence between temporal and spectral processing in neurons of the core auditory cortex. PMID:23719811

Implications of High-Frequency Cochlear Dead Regions for Fitting Hearing Aids to Adults with Mild to Moderately-Severe Hearing Loss

PubMed Central

Cox, Robyn M.; Johnson, Jani A.; Alexander, Genevieve C.

2012-01-01

Short Summary It has been suggested that existence of high-frequency cochlear dead regions (DRs) has implications for hearing aid fitting, and that the optimal amount of high-frequency gain is reduced for these patients. This investigation used laboratory and field measurements to examine the effectiveness of reduced high-frequency gain in typical hearing aid users with high-frequency DRs. Both types of data revealed that speech understanding was better with the evidence-based prescription than with reduced high-frequency gain, and that this was seen for listeners with and without DRs. Nevertheless, subjects did not always prefer the amplification condition that produced better speech understanding. PMID:22555183
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives.

PubMed

Monson, Brian B; Lotto, Andrew J; Story, Brad H

2012-09-01

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech.
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech.

PubMed

Ben-David, Boaz M; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H H M

2016-02-01

Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5 discrete emotions (anger, fear, happiness, sadness, and neutral) presented in prosody and semantics. Listeners were asked to rate the sentence as a whole, integrating both speech channels, or to focus on one channel only (prosody or semantics). We observed supremacy of congruency, failure of selective attention, and prosodic dominance. Supremacy of congruency means that a sentence that presents the same emotion in both speech channels was rated highest; failure of selective attention means that listeners were unable to selectively attend to one channel when instructed; and prosodic dominance means that prosodic information plays a larger role than semantics in processing emotional speech. Emotional prosody and semantics are separate but not separable channels, and it is difficult to perceive one without the influence of the other. Our findings indicate that the Test for Rating of Emotions in Speech can reveal specific aspects in the processing of emotional speech and may in the future prove useful for understanding emotion-processing deficits in individuals with pathologies.
The role of first formant information in simulated electro-acoustic hearing.

PubMed

Verschuur, Carl; Boland, Conor; Frost, Emily; Constable, Jack

2013-06-01

Cochlear implant (CI) recipients with residual hearing show improved performance with the addition of low-frequency acoustic stimulation (electro-acoustic stimulation, EAS). The present study sought to determine whether a synthesized first formant (F1) signal provided benefit to speech recognition in simulated EAS hearing and to compare such benefit with that from other low-frequency signals. A further aim was to determine if F1 amplitude or frequency was more important in determining benefit and if F1 benefit varied with formant bandwidth. In two experiments, sentence recordings from a male speaker were processed via a simulation of a partial insertion CI, and presented to normal hearing listeners in combination with various low-frequency signals, including a tone tracking fundamental frequency (F0), low-pass filtered speech, and signals based on F1 estimation. A simulated EAS benefit was found with F1 signals, and was similar to the benefit from F0 or low-pass filtered speech. The benefit did not differ significantly with the narrowing or widening of the F1 bandwidth. The benefit from low-frequency envelope signals was significantly less than the benefit from any low-frequency signal containing fine frequency information. Results indicate that F1 provides a benefit in simulated EAS hearing but low frequency envelope information is less important than low frequency fine structure in determining such benefit.
Sentence intelligibility during segmental interruption and masking by speech-modulated noise: Effects of age and hearing loss

PubMed Central

Fogerty, Daniel; Ahlstrom, Jayne B.; Bologna, William J.; Dubno, Judy R.

2015-01-01

This study investigated how single-talker modulated noise impacts consonant and vowel cues to sentence intelligibility. Younger normal-hearing, older normal-hearing, and older hearing-impaired listeners completed speech recognition tests. All listeners received spectrally shaped speech matched to their individual audiometric thresholds to ensure sufficient audibility with the exception of a second younger listener group who received spectral shaping that matched the mean audiogram of the hearing-impaired listeners. Results demonstrated minimal declines in intelligibility for older listeners with normal hearing and more evident declines for older hearing-impaired listeners, possibly related to impaired temporal processing. A correlational analysis suggests a common underlying ability to process information during vowels that is predictive of speech-in-modulated noise abilities. Whereas, the ability to use consonant cues appears specific to the particular characteristics of the noise and interruption. Performance declines for older listeners were mostly confined to consonant conditions. Spectral shaping accounted for the primary contributions of audibility. However, comparison with the young spectral controls who received identical spectral shaping suggests that this procedure may reduce wideband temporal modulation cues due to frequency-specific amplification that affected high-frequency consonants more than low-frequency vowels. These spectral changes may impact speech intelligibility in certain modulation masking conditions. PMID:26093436
A real-time phoneme counting algorithm and application for speech rate monitoring.

PubMed

Aharonson, Vered; Aharonson, Eran; Raichlin-Levi, Katia; Sotzianu, Aviv; Amir, Ofer; Ovadia-Blechman, Zehava

2017-03-01

Adults who stutter can learn to control and improve their speech fluency by modifying their speaking rate. Existing speech therapy technologies can assist this practice by monitoring speaking rate and providing feedback to the patient, but cannot provide an accurate, quantitative measurement of speaking rate. Moreover, most technologies are too complex and costly to be used for home practice. We developed an algorithm and a smartphone application that monitor a patient's speaking rate in real time and provide user-friendly feedback to both patient and therapist. Our speaking rate computation is performed by a phoneme counting algorithm which implements spectral transition measure extraction to estimate phoneme boundaries. The algorithm is implemented in real time in a mobile application that presents its results in a user-friendly interface. The application incorporates two modes: one provides the patient with visual feedback of his/her speech rate for self-practice and another provides the speech therapist with recordings, speech rate analysis and tools to manage the patient's practice. The algorithm's phoneme counting accuracy was validated on ten healthy subjects who read a paragraph at slow, normal and fast paces, and was compared to manual counting of speech experts. Test-retest and intra-counter reliability were assessed. Preliminary results indicate differences of -4% to 11% between automatic and human phoneme counting. Differences were largest for slow speech. The application can thus provide reliable, user-friendly, real-time feedback for speaking rate control practice. Copyright Â© 2017 Elsevier Inc. All rights reserved.
Speech intelligibility at high helium-oxygen pressures.

PubMed

Rothman, H B; Gelfand, R; Hollien, H; Lambertsen, C J

1980-12-01

Word-list intelligibility scores of unprocessed speech (mean of 4 subjects) were recorded in helium-oxygen atmospheres at stable pressures equivalent to 1600, 1400, 1200, 1000, 860, 690, 560, 392, and 200 fsw daring Predictive Studies IV-1975 by wide-bandwidth condenser microphones (frequency responses not degraded by increased gas density). Intelligibility scores were substantially lower in helium-oxygen a 200 fsw than in air at l ATA, but there was little difference between 200 fsw and 1600 fsw. A previously documented prominent decrease in intelligibility of speech between 200 or 600 fsw because of helium and pressure was probably due to degradation of microphone frequency response by high gas density.
An Experimental Investigation of the Effect of Altered Auditory Feedback on the Conversational Speech of Adults Who Stutter

ERIC Educational Resources Information Center

Lincoln, Michelle; Packman, Ann; Onslow, Mark; Jones, Mark

2010-01-01

Purpose: To investigate the impact on percentage of syllables stuttered of various durations of delayed auditory feedback (DAF), levels of frequency-altered feedback (FAF), and masking auditory feedback (MAF) during conversational speech. Method: Eleven adults who stuttered produced 10-min conversational speech samples during a control condition…
The Disfluent Speech of Bilingual Spanish-English Children: Considerations for Differential Diagnosis of Stuttering

ERIC Educational Resources Information Center

Byrd, Courtney T.; Bedore, Lisa M.; Ramos, Daniel

2015-01-01

Purpose: The primary purpose of this study was to describe the frequency and types of speech disfluencies that are produced by bilingual Spanish-English (SE) speaking children who do not stutter. The secondary purpose was to determine whether their disfluent speech is mediated by language dominance and/or language produced. Method: Spanish and…
Results using the OPAL strategy in Mandarin speaking cochlear implant recipients.

PubMed

Vandali, Andrew E; Dawson, Pam W; Arora, Komal

2017-01-01

To evaluate the effectiveness of an experimental pitch-coding strategy for improving recognition of Mandarin lexical tone in cochlear implant (CI) recipients. Adult CI recipients were tested on recognition of Mandarin tones in quiet and speech-shaped noise at a signal-to-noise ratio of +10 dB; Mandarin sentence speech-reception threshold (SRT) in speech-shaped noise; and pitch discrimination of synthetic complex-harmonic tones in quiet. Two versions of the experimental strategy were examined: (OPAL) linear (1:1) mapping of fundamental frequency (F0) to the coded modulation rate; and (OPAL+) transposed mapping of high F0s to a lower coded rate. Outcomes were compared to results using the clinical ACE™ strategy. Five Mandarin speaking users of Nucleus® cochlear implants. A small but significant benefit in recognition of lexical tones was observed using OPAL compared to ACE in noise, but not in quiet, and not for OPAL+ compared to ACE or OPAL in quiet or noise. Sentence SRTs were significantly better using OPAL+ and comparable using OPAL to those using ACE. No differences in pitch discrimination thresholds were observed across strategies. OPAL can provide benefits to Mandarin lexical tone recognition in moderately noisy conditions and preserve perception of Mandarin sentences in challenging noise conditions.
Duration of the speech disfluencies of beginning stutterers.

PubMed

Zebrowski, P M

1991-06-01

This study compared the duration of within-word disfluencies and the number of repeated units per instance of sound/syllable and whole-word repetitions of beginning stutterers to those produced by age- and sex-matched nonstuttering children. Subjects were 10 stuttering children [9 males and 1 female; mean age 4:1 (years:months); age range 3:2-5:1), and 10 nonstuttering children (9 males and 1 female; mean age 4:0; age range: 2:10-5:1). Mothers of the stuttering children reported that their children had been stuttering for 1 year or less. One 300-word conversational speech sample from each of the stuttering and nonstuttering children was analyzed for (a) mean duration of sound/syllable repetition and sound prolongation, (b) mean number of repeated units per instance of sound/syllable and whole-word repetition, and (c) various related measures of the frequency of all between- and within-word speech disfluencies. There were no significant between-group differences for either the duration of acoustically measured sound/syllable repetitions and sound prolongations or the number of repeated units per instance of sound/syllable and whole-word repetition. Unlike frequency and type of speech disfluency produced, average duration of within-word disfluencies and number of repeated units per repetition do not differentiate the disfluent speech of beginning stutterers and their nonstuttering peers. Additional analyses support findings from previous perceptual work that type and frequency of speech disfluency, not duration, are the principal characteristics listeners use in distinguishing these two talker groups.
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments

PubMed Central

Colburn, H. Steven

2016-01-01

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. PMID:27698261
A Binaural Grouping Model for Predicting Speech Intelligibility in Multitalker Environments.

PubMed

Mi, Jing; Colburn, H Steven

2016-10-03

Spatially separating speech maskers from target speech often leads to a large intelligibility improvement. Modeling this phenomenon has long been of interest to binaural-hearing researchers for uncovering brain mechanisms and for improving signal-processing algorithms in hearing-assistive devices. Much of the previous binaural modeling work focused on the unmasking enabled by binaural cues at the periphery, and little quantitative modeling has been directed toward the grouping or source-separation benefits of binaural processing. In this article, we propose a binaural model that focuses on grouping, specifically on the selection of time-frequency units that are dominated by signals from the direction of the target. The proposed model uses Equalization-Cancellation (EC) processing with a binary decision rule to estimate a time-frequency binary mask. EC processing is carried out to cancel the target signal and the energy change between the EC input and output is used as a feature that reflects target dominance in each time-frequency unit. The processing in the proposed model requires little computational resources and is straightforward to implement. In combination with the Coherence-based Speech Intelligibility Index, the model is applied to predict the speech intelligibility data measured by Marrone et al. The predicted speech reception threshold matches the pattern of the measured data well, even though the predicted intelligibility improvements relative to the colocated condition are larger than some of the measured data, which may reflect the lack of internal noise in this initial version of the model. © The Author(s) 2016.
Musicians change their tune: how hearing loss alters the neural code.

PubMed

Parbery-Clark, Alexandra; Anderson, Samira; Kraus, Nina

2013-08-01

Individuals with sensorineural hearing loss have difficulty understanding speech, especially in background noise. This deficit remains even when audibility is restored through amplification, suggesting that mechanisms beyond a reduction in peripheral sensitivity contribute to the perceptual difficulties associated with hearing loss. Given that normal-hearing musicians have enhanced auditory perceptual skills, including speech-in-noise perception, coupled with heightened subcortical responses to speech, we aimed to determine whether similar advantages could be observed in middle-aged adults with hearing loss. Results indicate that musicians with hearing loss, despite self-perceptions of average performance for understanding speech in noise, have a greater ability to hear in noise relative to nonmusicians. This is accompanied by more robust subcortical encoding of sound (e.g., stimulus-to-response correlations and response consistency) as well as more resilient neural responses to speech in the presence of background noise (e.g., neural timing). Musicians with hearing loss also demonstrate unique neural signatures of spectral encoding relative to nonmusicians: enhanced neural encoding of the speech-sound's fundamental frequency but not of its upper harmonics. This stands in contrast to previous outcomes in normal-hearing musicians, who have enhanced encoding of the harmonics but not the fundamental frequency. Taken together, our data suggest that although hearing loss modifies a musician's spectral encoding of speech, the musician advantage for perceiving speech in noise persists in a hearing-impaired population by adaptively strengthening underlying neural mechanisms for speech-in-noise perception. Copyright © 2013 Elsevier B.V. All rights reserved.
Automatic speech recognition research at NASA-Ames Research Center

NASA Technical Reports Server (NTRS)

Coler, Clayton R.; Plummer, Robert P.; Huff, Edward M.; Hitchcock, Myron H.

1977-01-01

A trainable acoustic pattern recognizer manufactured by Scope Electronics is presented. The voice command system VCS encodes speech by sampling 16 bandpass filters with center frequencies in the range from 200 to 5000 Hz. Variations in speaking rate are compensated for by a compression algorithm that subdivides each utterance into eight subintervals in such a way that the amount of spectral change within each subinterval is the same. The recorded filter values within each subinterval are then reduced to a 15-bit representation, giving a 120-bit encoding for each utterance. The VCS incorporates a simple recognition algorithm that utilizes five training samples of each word in a vocabulary of up to 24 words. The recognition rate of approximately 85 percent correct for untrained speakers and 94 percent correct for trained speakers was not considered adequate for flight systems use. Therefore, the built-in recognition algorithm was disabled, and the VCS was modified to transmit 120-bit encodings to an external computer for recognition.
Speech Rate Normalization and Phonemic Boundary Perception in Cochlear-Implant Users

ERIC Educational Resources Information Center

Jaekel, Brittany N.; Newman, Rochelle S.; Goupell, Matthew J.

2017-01-01

Purpose: Normal-hearing (NH) listeners rate normalize, temporarily remapping phonemic category boundaries to account for a talker's speech rate. It is unknown if adults who use auditory prostheses called cochlear implants (CI) can rate normalize, as CIs transmit degraded speech signals to the auditory nerve. Ineffective adjustment to rate…
The effect of presentation level and stimulation rate on speech perception and modulation detection for cochlear implant users.

PubMed

Brochier, Tim; McDermott, Hugh J; McKay, Colette M

2017-06-01

In order to improve speech understanding for cochlear implant users, it is important to maximize the transmission of temporal information. The combined effects of stimulation rate and presentation level on temporal information transfer and speech understanding remain unclear. The present study systematically varied presentation level (60, 50, and 40 dBA) and stimulation rate [500 and 2400 pulses per second per electrode (pps)] in order to observe how the effect of rate on speech understanding changes for different presentation levels. Speech recognition in quiet and noise, and acoustic amplitude modulation detection thresholds (AMDTs) were measured with acoustic stimuli presented to speech processors via direct audio input (DAI). With the 500 pps processor, results showed significantly better performance for consonant-vowel nucleus-consonant words in quiet, and a reduced effect of noise on sentence recognition. However, no rate or level effect was found for AMDTs, perhaps partly because of amplitude compression in the sound processor. AMDTs were found to be strongly correlated with the effect of noise on sentence perception at low levels. These results indicate that AMDTs, at least when measured with the CP910 Freedom speech processor via DAI, explain between-subject variance of speech understanding, but do not explain within-subject variance for different rates and levels.
Auditory rehabilitation after stroke: treatment of auditory processing disorders in stroke patients with personal frequency-modulated (FM) systems.

PubMed

Koohi, Nehzat; Vickers, Deborah; Chandrashekar, Hoskote; Tsang, Benjamin; Werring, David; Bamiou, Doris-Eva

2017-03-01

Auditory disability due to impaired auditory processing (AP) despite normal pure-tone thresholds is common after stroke, and it leads to isolation, reduced quality of life and physical decline. There are currently no proven remedial interventions for AP deficits in stroke patients. This is the first study to investigate the benefits of personal frequency-modulated (FM) systems in stroke patients with disordered AP. Fifty stroke patients had baseline audiological assessments, AP tests and completed the (modified) Amsterdam Inventory for Auditory Disability and Hearing Handicap Inventory for Elderly questionnaires. Nine out of these 50 patients were diagnosed with disordered AP based on severe deficits in understanding speech in background noise but with normal pure-tone thresholds. These nine patients underwent spatial speech-in-noise testing in a sound-attenuating chamber (the "crescent of sound") with and without FM systems. The signal-to-noise ratio (SNR) for 50% correct speech recognition performance was measured with speech presented from 0° azimuth and competing babble from ±90° azimuth. Spatial release from masking (SRM) was defined as the difference between SNRs measured with co-located speech and babble and SNRs measured with spatially separated speech and babble. The SRM significantly improved when babble was spatially separated from target speech, while the patients had the FM systems in their ears compared to without the FM systems. Personal FM systems may substantially improve speech-in-noise deficits in stroke patients who are not eligible for conventional hearing aids. FMs are feasible in stroke patients and show promise to address impaired AP after stroke. Implications for Rehabilitation This is the first study to investigate the benefits of personal frequency-modulated (FM) systems in stroke patients with disordered AP. All cases significantly improved speech perception in noise with the FM systems, when noise was spatially separated from the speech signal by 90° compared with unaided listening. Personal FM systems are feasible in stroke patients, and may be of benefit in just under 20% of this population, who are not eligible for conventional hearing aids.
Out-of-synchrony speech entrainment in developmental dyslexia.

PubMed

Molinaro, Nicola; Lizarazu, Mikel; Lallier, Marie; Bourguignon, Mathieu; Carreiras, Manuel

2016-08-01

Developmental dyslexia is a reading disorder often characterized by reduced awareness of speech units. Whether the neural source of this phonological disorder in dyslexic readers results from the malfunctioning of the primary auditory system or damaged feedback communication between higher-order phonological regions (i.e., left inferior frontal regions) and the auditory cortex is still under dispute. Here we recorded magnetoencephalographic (MEG) signals from 20 dyslexic readers and 20 age-matched controls while they were listening to ∼10-s-long spoken sentences. Compared to controls, dyslexic readers had (1) an impaired neural entrainment to speech in the delta band (0.5-1 Hz); (2) a reduced delta synchronization in both the right auditory cortex and the left inferior frontal gyrus; and (3) an impaired feedforward functional coupling between neural oscillations in the right auditory cortex and the left inferior frontal regions. This shows that during speech listening, individuals with developmental dyslexia present reduced neural synchrony to low-frequency speech oscillations in primary auditory regions that hinders higher-order speech processing steps. The present findings, thus, strengthen proposals assuming that improper low-frequency acoustic entrainment affects speech sampling. This low speech-brain synchronization has the strong potential to cause severe consequences for both phonological and reading skills. Interestingly, the reduced speech-brain synchronization in dyslexic readers compared to normal readers (and its higher-order consequences across the speech processing network) appears preserved through the development from childhood to adulthood. Thus, the evaluation of speech-brain synchronization could possibly serve as a diagnostic tool for early detection of children at risk of dyslexia. Hum Brain Mapp 37:2767-2783, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Determining the energetic and informational components of speech-on-speech masking

PubMed Central

Kidd, Gerald; Mason, Christine R.; Swaminathan, Jayaganesh; Roverud, Elin; Clayton, Kameron K.; Best, Virginia

2016-01-01

Identification of target speech was studied under masked conditions consisting of two or four independent speech maskers. In the reference conditions, the maskers were colocated with the target, the masker talkers were the same sex as the target, and the masker speech was intelligible. The comparison conditions, intended to provide release from masking, included different-sex target and masker talkers, time-reversal of the masker speech, and spatial separation of the maskers from the target. Significant release from masking was found for all comparison conditions. To determine whether these reductions in masking could be attributed to differences in energetic masking, ideal time-frequency segregation (ITFS) processing was applied so that the time-frequency units where the masker energy dominated the target energy were removed. The remaining target-dominated “glimpses” were reassembled as the stimulus. Speech reception thresholds measured using these resynthesized ITFS-processed stimuli were the same for the reference and comparison conditions supporting the conclusion that the amount of energetic masking across conditions was the same. These results indicated that the large release from masking found under all comparison conditions was due primarily to a reduction in informational masking. Furthermore, the large individual differences observed generally were correlated across the three masking release conditions. PMID:27475139

Glove-TalkII--a neural-network interface which maps gestures to parallel formant speech synthesizer controls.

PubMed

Fels, S S; Hinton, G E

1998-01-01

Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Intracranial mapping of auditory perception: event-related responses and electrocortical stimulation.

PubMed

Sinai, A; Crone, N E; Wied, H M; Franaszczuk, P J; Miglioretti, D; Boatman-Reich, D

2009-01-01

We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping.
Intracranial mapping of auditory perception: Event-related responses and electrocortical stimulation

PubMed Central

Sinai, A.; Crone, N.E.; Wied, H.M.; Franaszczuk, P.J.; Miglioretti, D.; Boatman-Reich, D.

2010-01-01

Objective We compared intracranial recordings of auditory event-related responses with electrocortical stimulation mapping (ESM) to determine their functional relationship. Methods Intracranial recordings and ESM were performed, using speech and tones, in adult epilepsy patients with subdural electrodes implanted over lateral left cortex. Evoked N1 responses and induced spectral power changes were obtained by trial averaging and time-frequency analysis. Results ESM impaired perception and comprehension of speech, not tones, at electrode sites in the posterior temporal lobe. There was high spatial concordance between ESM sites critical for speech perception and the largest spectral power (100% concordance) and N1 (83%) responses to speech. N1 responses showed good sensitivity (0.75) and specificity (0.82), but poor positive predictive value (0.32). Conversely, increased high-frequency power (>60 Hz) showed high specificity (0.98), but poorer sensitivity (0.67) and positive predictive value (0.67). Stimulus-related differences were observed in the spatial-temporal patterns of event-related responses. Conclusions Intracranial auditory event-related responses to speech were associated with cortical sites critical for auditory perception and comprehension of speech. Significance These results suggest that the distribution and magnitude of intracranial auditory event-related responses to speech reflect the functional significance of the underlying cortical regions and may be useful for pre-surgical functional mapping. PMID:19070540
Children with a cochlear implant: characteristics and determinants of speech recognition, speech-recognition growth rate, and speech production.

PubMed

Wie, Ona Bø; Falkenberg, Eva-Signe; Tvete, Ole; Tomblin, Bruce

2007-05-01

The objectives of the study were to describe the characteristics of the first 79 prelingually deaf cochlear implant users in Norway and to investigate to what degree the variation in speech recognition, speech- recognition growth rate, and speech production could be explained by the characteristics of the child, the cochlear implant, the family, and the educational setting. Data gathered longitudinally were analysed using descriptive statistics, multiple regression, and growth-curve analysis. The results show that more than 50% of the variation could be explained by these characteristics. Daily user-time, non-verbal intelligence, mode of communication, length of CI experience, and educational placement had the highest effect on the outcome. The results also indicate that children educated in a bilingual approach to education have better speech perception and faster speech perception growth rate with increased focus on spoken language.
LOOP- SIMULATION OF THE AUTOMATIC FREQUENCY CONTROL SUBSYSTEM OF A DIFFERENTIAL MINIMUM SHIFT KEYING RECEIVER

NASA Technical Reports Server (NTRS)

Davarian, F.

1994-01-01

The LOOP computer program was written to simulate the Automatic Frequency Control (AFC) subsystem of a Differential Minimum Shift Keying (DMSK) receiver with a bit rate of 2400 baud. The AFC simulated by LOOP is a first order loop configuration with a first order R-C filter. NASA has been investigating the concept of mobile communications based on low-cost, low-power terminals linked via geostationary satellites. Studies have indicated that low bit rate transmission is suitable for this application, particularly from the frequency and power conservation point of view. A bit rate of 2400 BPS is attractive due to its applicability to the linear predictive coding of speech. Input to LOOP includes the following: 1) the initial frequency error; 2) the double-sided loop noise bandwidth; 3) the filter time constants; 4) the amount of intersymbol interference; and 5) the bit energy to noise spectral density. LOOP output includes: 1) the bit number and the frequency error of that bit; 2) the computed mean of the frequency error; and 3) the standard deviation of the frequency error. LOOP is written in MS SuperSoft FORTRAN 77 for interactive execution and has been implemented on an IBM PC operating under PC DOS with a memory requirement of approximately 40K of 8 bit bytes. This program was developed in 1986.
Voice-stress measure of mental workload

NASA Technical Reports Server (NTRS)

Alpert, Murray; Schneider, Sid J.

1988-01-01

In a planned experiment, male subjects between the age of 18 and 50 will be required to produce speech while performing various tasks. Analysis of the speech produced should reveal which aspects of voice prosody are associated with increased workloads. Preliminary results with two female subjects suggest a possible trend for voice frequency and amplitude to be higher and the variance of the voice frequency to be lower in the high workload condition.
Adaptive method of recognition of signals for one and two-frequency signal system in the telephony on the background of speech

NASA Astrophysics Data System (ADS)

Kuznetsov, Michael V.

2006-05-01

For reliable teamwork of various systems of automatic telecommunication including transferring systems of optical communication networks it is necessary authentic recognition of signals for one- or two-frequency service signal system. The analysis of time parameters of an accepted signal allows increasing reliability of detection and recognition of the service signal system on a background of speech.
Perceptual weighting of individual and concurrent cues for sentence intelligibility: Frequency, envelope, and fine structure

PubMed Central

Fogerty, Daniel

2011-01-01

The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener’s performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise. PMID:21361454
Autonomic Nervous System Responses During Perception of Masked Speech may Reflect Constructs other than Subjective Listening Effort

PubMed Central

Francis, Alexander L.; MacPherson, Megan K.; Chandrasekaran, Bharath; Alvar, Ann M.

2016-01-01

Typically, understanding speech seems effortless and automatic. However, a variety of factors may, independently or interactively, make listening more effortful. Physiological measures may help to distinguish between the application of different cognitive mechanisms whose operation is perceived as effortful. In the present study, physiological and behavioral measures associated with task demand were collected along with behavioral measures of performance while participants listened to and repeated sentences. The goal was to measure psychophysiological reactivity associated with three degraded listening conditions, each of which differed in terms of the source of the difficulty (distortion, energetic masking, and informational masking), and therefore were expected to engage different cognitive mechanisms. These conditions were chosen to be matched for overall performance (keywords correct), and were compared to listening to unmasked speech produced by a natural voice. The three degraded conditions were: (1) Unmasked speech produced by a computer speech synthesizer, (2) Speech produced by a natural voice and masked byspeech-shaped noise and (3) Speech produced by a natural voice and masked by two-talker babble. Masked conditions were both presented at a -8 dB signal to noise ratio (SNR), a level shown in previous research to result in comparable levels of performance for these stimuli and maskers. Performance was measured in terms of proportion of key words identified correctly, and task demand or effort was quantified subjectively by self-report. Measures of psychophysiological reactivity included electrodermal (skin conductance) response frequency and amplitude, blood pulse amplitude and pulse rate. Results suggest that the two masked conditions evoked stronger psychophysiological reactivity than did the two unmasked conditions even when behavioral measures of listening performance and listeners’ subjective perception of task demand were comparable across the three degraded conditions. PMID:26973564
Prosody and Semantics Are Separate but Not Separable Channels in the Perception of Emotional Speech: Test for Rating of Emotions in Speech

ERIC Educational Resources Information Center

Ben-David, Boaz M.; Multani, Namita; Shakuf, Vered; Rudzicz, Frank; van Lieshout, Pascal H. H. M.

2016-01-01

Purpose: Our aim is to explore the complex interplay of prosody (tone of speech) and semantics (verbal content) in the perception of discrete emotions in speech. Method: We implement a novel tool, the Test for Rating of Emotions in Speech. Eighty native English speakers were presented with spoken sentences made of different combinations of 5…
Tolerable hearing aid delays. V. Estimation of limits for open canal fittings.

PubMed

Stone, Michael A; Moore, Brian C J; Meisenbacher, Katrin; Derleth, Ralph P

2008-08-01

Open canal fittings are a popular alternative to close-fitting earmolds for use with patients whose low-frequency hearing is near normal. Open canal fittings reduce the occlusion effect but also provide little attenuation of external air-borne sounds. The wearer therefore receives a mixture of air-borne sound and amplified but delayed sound through the hearing aid. To explore systematically the effect of the mixing, we simulated with varying degrees of complexity the effects of both a hearing loss and a high-quality hearing aid programmed to compensate for that loss, and used normal-hearing participants to assess the processing. The off-line processing was intended to simulate the percept of listening to the speech of a single (external) talker. The effect of introducing a delay on a subjective measure of speech quality (disturbance rating on a scale from 1 to 7, 7 being maximal disturbance) was assessed using both a constant gain and a gain that varied across frequency. In three experiments we assessed the effects of different amounts of delay, maximum aid gain and rate of change of gain with frequency. The simulated hearing aids were chosen to be appropriate for typical mild to moderate high-frequency losses starting at 1 or 2 kHz. Two of the experiments used simulations of linear hearing aids, whereas the third used fast-acting multichannel wide-dynamic-range compression and a simulation of loudness recruitment. In one experiment, a condition was included in which spectral ripples produced by comb-filtering were partially removed using a digital filter. For linear hearing aids, disturbance increased progressively with increasing delay and with decreasing rate of change of gain; the effect of amount of gain was small when the gain varied across frequency. The effect of reducing spectral ripples was also small. When the simulation of dynamic processes was included (experiment 3), the pattern with delay remained similar, but disturbance increased with increasing gain. It is argued that this is mainly due to disturbance increasing with increasing simulated hearing loss, probably because of the dynamic processing involved in the hearing aid and recruitment simulation. A disturbance rating of 3 may be considered as just acceptable. This rating was reached for delays of about 5 and 6 msec, for simulated hearing losses starting at 2 and 1 kHz, respectively. The perceptual effect of reducing the spectral ripples produced by comb-filtering was small; the effect was greatest when the hearing aid gain was small and when the hearing loss started at a low frequency.
Cochlear implantation with hearing preservation yields significant benefit for speech recognition in complex listening environments

PubMed Central

Gifford, René H.; Dorman, Michael F.; Skarzynski, Henryk; Lorens, Artur; Polak, Marek; Driscoll, Colin L. W.; Roland, Peter; Buchman, Craig A.

2012-01-01

Objective The aim of this study was to assess the benefit of having preserved acoustic hearing in the implanted ear for speech recognition in complex listening environments. Design The current study included a within subjects, repeated-measures design including 21 English speaking and 17 Polish speaking cochlear implant recipients with preserved acoustic hearing in the implanted ear. The patients were implanted with electrodes that varied in insertion depth from 10 to 31 mm. Mean preoperative low-frequency thresholds (average of 125, 250 and 500 Hz) in the implanted ear were 39.3 and 23.4 dB HL for the English- and Polish-speaking participants, respectively. In one condition, speech perception was assessed in an 8-loudspeaker environment in which the speech signals were presented from one loudspeaker and restaurant noise was presented from all loudspeakers. In another condition, the signals were presented in a simulation of a reverberant environment with a reverberation time of 0.6 sec. The response measures included speech reception thresholds (SRTs) and percent correct sentence understanding for two test conditions: cochlear implant (CI) plus low-frequency hearing in the contralateral ear (bimodal condition) and CI plus low-frequency hearing in both ears (best aided condition). A subset of 6 English-speaking listeners were also assessed on measures of interaural time difference (ITD) thresholds for a 250-Hz signal. Results Small, but significant, improvements in performance (1.7 – 2.1 dB and 6 – 10 percentage points) were found for the best-aided condition vs. the bimodal condition. Postoperative thresholds in the implanted ear were correlated with the degree of EAS benefit for speech recognition in diffuse noise. There was no reliable relationship among measures of audiometric threshold in the implanted ear nor elevation in threshold following surgery and improvement in speech understanding in reverberation. There was a significant correlation between ITD threshold at 250 Hz and EAS-related benefit for the adaptive SRT. Conclusions Our results suggest that (i) preserved low-frequency hearing improves speech understanding for CI recipients (ii) testing in complex listening environments, in which binaural timing cues differ for signal and noise, may best demonstrate the value of having two ears with low-frequency acoustic hearing and (iii) preservation of binaural timing cues, albeit poorer than observed for individuals with normal hearing, is possible following unilateral cochlear implantation with hearing preservation and is associated with EAS benefit. Our results demonstrate significant communicative benefit for hearing preservation in the implanted ear and provide support for the expansion of cochlear implant criteria to include individuals with low-frequency thresholds in even the normal to near-normal range. PMID:23446225
Optimal speech level for speech transmission in a noisy environment for young adults and aged persons

NASA Astrophysics Data System (ADS)

Sato, Hayato; Ota, Ryo; Morimoto, Masayuki; Sato, Hiroshi

2005-04-01

Assessing sound environment of classrooms for the aged is a very important issue, because classrooms can be used by the aged for their lifelong learning, especially in the aged society. Hence hearing loss due to aging is a considerable factor for classrooms. In this study, the optimal speech level in noisy fields for both young adults and aged persons was investigated. Listening difficulty ratings and word intelligibility scores for familiar words were used to evaluate speech transmission performance. The results of the tests demonstrated that the optimal speech level for moderate background noise (i.e., less than around 60 dBA) was fairly constant. Meanwhile, the optimal speech level depended on the speech-to-noise ratio when the background noise level exceeded around 60 dBA. The minimum required speech level to minimize difficulty ratings for the aged was higher than that for the young. However, the minimum difficulty ratings for both the young and the aged were given in the range of speech level of 70 to 80 dBA of speech level.
Validity and Reliability of Visual Analog Scaling for Assessment of Hypernasality and Audible Nasal Emission in Children With Repaired Cleft Palate.

PubMed

Baylis, Adriane; Chapman, Kathy; Whitehill, Tara L; Group, The Americleft Speech

2015-11-01

To investigate the validity and reliability of multiple listener judgments of hypernasality and audible nasal emission, in children with repaired cleft palate, using visual analog scaling (VAS) and equal-appearing interval (EAI) scaling. Prospective comparative study of multiple listener ratings of hypernasality and audible nasal emission. Multisite institutional. Five trained and experienced speech-language pathologist listeners from the Americleft Speech Project. Average VAS and EAI ratings of hypernasality and audible nasal emission/turbulence for 12 video-recorded speech samples from the Americleft Speech Project. Intrarater and interrater reliability was computed, as well as linear and polynomial models of best fit. Intrarater and interrater reliability was acceptable for both rating methods; however, reliability was higher for VAS as compared to EAI ratings. When VAS ratings were plotted against EAI ratings, results revealed a stronger curvilinear relationship. The results of this study provide additional evidence that alternate rating methods such as VAS may offer improved validity and reliability over EAI ratings of speech. VAS should be considered a viable method for rating hypernasality and nasal emission in speech in children with repaired cleft palate.
Effects of Additional Low-Pass-Filtered Speech on Listening Effort for Noise-Band-Vocoded Speech in Quiet and in Noise.

PubMed

Pals, Carina; Sarampalis, Anastasios; van Dijk, Mart; Başkent, Deniz

2018-05-11

Residual acoustic hearing in electric-acoustic stimulation (EAS) can benefit cochlear implant (CI) users in increased sound quality, speech intelligibility, and improved tolerance to noise. The goal of this study was to investigate whether the low-pass-filtered acoustic speech in simulated EAS can provide the additional benefit of reducing listening effort for the spectrotemporally degraded signal of noise-band-vocoded speech. Listening effort was investigated using a dual-task paradigm as a behavioral measure, and the NASA Task Load indeX as a subjective self-report measure. The primary task of the dual-task paradigm was identification of sentences presented in three experiments at three fixed intelligibility levels: at near-ceiling, 50%, and 79% intelligibility, achieved by manipulating the presence and level of speech-shaped noise in the background. Listening effort for the primary intelligibility task was reflected in the performance on the secondary, visual response time task. Experimental speech processing conditions included monaural or binaural vocoder, with added low-pass-filtered speech (to simulate EAS) or without (to simulate CI). In Experiment 1, in quiet with intelligibility near-ceiling, additional low-pass-filtered speech reduced listening effort compared with binaural vocoder, in line with our expectations, although not compared with monaural vocoder. In Experiments 2 and 3, for speech in noise, added low-pass-filtered speech allowed the desired intelligibility levels to be reached at less favorable speech-to-noise ratios, as expected. It is interesting that this came without the cost of increased listening effort usually associated with poor speech-to-noise ratios; at 50% intelligibility, even a reduction in listening effort on top of the increased tolerance to noise was observed. The NASA Task Load indeX did not capture these differences. The dual-task results provide partial evidence for a potential decrease in listening effort as a result of adding low-frequency acoustic speech to noise-band-vocoded speech. Whether these findings translate to CI users with residual acoustic hearing will need to be addressed in future research because the quality and frequency range of low-frequency acoustic sound available to listeners with hearing loss may differ from our idealized simulations, and additional factors, such as advanced age and varying etiology, may also play a role.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
Fundamental frequency information for speech recognition via bimodal stimulation: cochlear implant in one ear and hearing aid in the other.

PubMed

Shpak, Talma; Most, Tova; Luntz, Michal

2014-01-01

The aim of this study was to examine the role of fundamental frequency (F0) information in improving speech perception of individuals with a cochlear implant (CI) who use a contralateral hearing aid (HA). The authors hypothesized that in bilateral-bimodal (CI/HA) users the perception of natural prosody speech would be superior to the perception of speech with monotonic flattened F0 contour, whereas in unilateral CI users the perception of both speech signals would be similar. They also hypothesized that in the CI/HA listening condition the speech perception scores would improve as a function of the magnitude of the difference between the F0 characteristics of the target speech signal and the F0 characteristics of the competitors, whereas in the CI-alone condition such a pattern would not be recognized, or at least not as clearly. Two tests were administered to 29 experienced CI/HA adult users who, regardless of their residual hearing or speech perception abilities, had chosen to continue using an HA in the nonimplanted ear for at least 75% of their waking hours. In the first test, the difference between the perception of speech characterized by natural prosody and speech characterized by monotonic flattened F0 contour was assessed in the presence of babble noise produced by three competing male talkers. In the second test the perception of semantically unpredictable sentences was evaluated in the presence of a competing reversed speech sentence spoken by different single talkers with different F0 characteristics. Each test was carried out under two listening conditions: CI alone and CI/HA. Under both listening conditions, the perception of speech characterized by natural prosody was significantly better than the perception of speech in which monotonic F0 contour was flattened. Differences between the scores for natural prosody and for monotonic flattened F0 speech contour were significantly greater, however, in the CI/HA condition than with CI alone. In the second test, the overall scores for perception of semantically unpredictable sentences in the presence of all competitors were higher in the CI/HA condition in the presence of all competitors. In both listening conditions, scores increased significantly with increasing difference between the F0 characteristics of the target speech signal and the F0 characteristics of the competitor. The higher scores obtained in the CI/HA condition than with CI alone in both of the task-specific tests suggested that the use of a contralateral HA provides improved low-frequency information, resulting in better performance by the CI/HA users.
Auditory Processing and Speech Perception in Children with Specific Language Impairment: Relations with Oral Language and Literacy Skills

ERIC Educational Resources Information Center

Vandewalle, Ellen; Boets, Bart; Ghesquiere, Pol; Zink, Inge

2012-01-01

This longitudinal study investigated temporal auditory processing (frequency modulation and between-channel gap detection) and speech perception (speech-in-noise and categorical perception) in three groups of 6 years 3 months to 6 years 8 months-old children attending grade 1: (1) children with specific language impairment (SLI) and literacy delay…
Two different phenomena in basic motor speech performance in premanifest Huntington disease.

PubMed

Skodda, Sabine; Grönheit, Wenke; Lukas, Carsten; Bellenberg, Barbara; von Hein, Sarah M; Hoffmann, Rainer; Saft, Carsten

2016-03-09

Dysarthria is a common feature in Huntington disease (HD). The aim of this cross-sectional pilot study was the description and objective analysis of different speech parameters with special emphasis on the aspect of speech timing of connected speech and nonspeech verbal utterances in premanifest HD (preHD). A total of 28 preHD mutation carriers and 28 age- and sex-matched healthy speakers had to perform a reading task and several syllable repetition tasks. Results of computerized acoustic analysis of different variables for the measurement of speech rate and regularity were correlated with clinical measures and MRI-based brain atrophy assessment by voxel-based morphometry. An impaired capacity to steadily repeat single syllables with higher variations in preHD compared to healthy controls was found (variance 1: Cohen d = 1.46). Notably, speech rate was increased compared to controls and showed correlations to the volume of certain brain areas known to be involved in the sensory-motor speech networks (net speech rate: Cohen d = 1.19). Furthermore, speech rate showed correlations to disease burden score, probability of disease onset, the estimated years to onset, and clinical measures like the cognitive score. Measurement of speech rate and regularity might be helpful additional tools for the monitoring of subclinical functional disability in preHD. As one of the possible causes for higher performance in preHD, we discuss huntingtin-dependent temporarily advantageous development processes of the brain. © 2016 American Academy of Neurology.
A simulation study of harmonics regeneration in noise reduction for electric and acoustic stimulation.

PubMed

Hu, Yi

2010-05-01

Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.
Neural encoding of the speech envelope by children with developmental dyslexia.

PubMed

Power, Alan J; Colling, Lincoln J; Mead, Natasha; Barnes, Lisa; Goswami, Usha

2016-09-01

Developmental dyslexia is consistently associated with difficulties in processing phonology (linguistic sound structure) across languages. One view is that dyslexia is characterised by a cognitive impairment in the "phonological representation" of word forms, which arises long before the child presents with a reading problem. Here we investigate a possible neural basis for developmental phonological impairments. We assess the neural quality of speech encoding in children with dyslexia by measuring the accuracy of low-frequency speech envelope encoding using EEG. We tested children with dyslexia and chronological age-matched (CA) and reading-level matched (RL) younger children. Participants listened to semantically-unpredictable sentences in a word report task. The sentences were noise-vocoded to increase reliance on envelope cues. Envelope reconstruction for envelopes between 0 and 10Hz showed that the children with dyslexia had significantly poorer speech encoding in the 0-2Hz band compared to both CA and RL controls. These data suggest that impaired neural encoding of low frequency speech envelopes, related to speech prosody, may underpin the phonological deficit that causes dyslexia across languages. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

Detection of high-frequency energy level changes in speech and singing

PubMed Central

Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.

2014-01-01

Previous work has shown that human listeners are sensitive to level differences in high-frequency energy (HFE) in isolated vowel sounds produced by male singers. Results indicated that sensitivity to HFE level changes increased with overall HFE level, suggesting that listeners would be more “tuned” to HFE in vocal production exhibiting higher levels of HFE. It follows that sensitivity to HFE level changes should be higher (1) for female vocal production than for male vocal production and (2) for singing than for speech. To test this hypothesis, difference limens for HFE level changes in male and female speech and singing were obtained. Listeners showed significantly greater ability to detect level changes in singing vs speech but not in female vs male speech. Mean differences limen scores for speech and singing were about 5 dB in the 8-kHz octave (5.6–11.3 kHz) but 8–10 dB in the 16-kHz octave (11.3–22 kHz). These scores are lower (better) than those previously reported for isolated vowels and some musical instruments. PMID:24437780
The ability of cochlear implant users to use temporal envelope cues recovered from speech frequency modulation.

PubMed

Won, Jong Ho; Lorenzi, Christian; Nie, Kaibao; Li, Xing; Jameyson, Elyse M; Drennan, Ward R; Rubinstein, Jay T

2012-08-01

Previous studies have demonstrated that normal-hearing listeners can understand speech using the recovered "temporal envelopes," i.e., amplitude modulation (AM) cues from frequency modulation (FM). This study evaluated this mechanism in cochlear implant (CI) users for consonant identification. Stimuli containing only FM cues were created using 1, 2, 4, and 8-band FM-vocoders to determine if consonant identification performance would improve as the recovered AM cues become more available. A consistent improvement was observed as the band number decreased from 8 to 1, supporting the hypothesis that (1) the CI sound processor generates recovered AM cues from broadband FM, and (2) CI users can use the recovered AM cues to recognize speech. The correlation between the intact and the recovered AM components at the output of the sound processor was also generally higher when the band number was low, supporting the consonant identification results. Moreover, CI subjects who were better at using recovered AM cues from broadband FM cues showed better identification performance with intact (unprocessed) speech stimuli. This suggests that speech perception performance variability in CI users may be partly caused by differences in their ability to use AM cues recovered from FM speech cues.
Perception of Filtered Speech by Children with Developmental Dyslexia and Children with Specific Language Impairments

PubMed Central

Goswami, Usha; Cumming, Ruth; Chait, Maria; Huss, Martina; Mead, Natasha; Wilson, Angela M.; Barnes, Lisa; Fosker, Tim

2016-01-01

Here we use two filtered speech tasks to investigate children’s processing of slow (<4 Hz) versus faster (∼33 Hz) temporal modulations in speech. We compare groups of children with either developmental dyslexia (Experiment 1) or speech and language impairments (SLIs, Experiment 2) to groups of typically-developing (TD) children age-matched to each disorder group. Ten nursery rhymes were filtered so that their modulation frequencies were either low-pass filtered (<4 Hz) or band-pass filtered (22 – 40 Hz). Recognition of the filtered nursery rhymes was tested in a picture recognition multiple choice paradigm. Children with dyslexia aged 10 years showed equivalent recognition overall to TD controls for both the low-pass and band-pass filtered stimuli, but showed significantly impaired acoustic learning during the experiment from low-pass filtered targets. Children with oral SLIs aged 9 years showed significantly poorer recognition of band pass filtered targets compared to their TD controls, and showed comparable acoustic learning effects to TD children during the experiment. The SLI samples were also divided into children with and without phonological difficulties. The children with both SLI and phonological difficulties were impaired in recognizing both kinds of filtered speech. These data are suggestive of impaired temporal sampling of the speech signal at different modulation rates by children with different kinds of developmental language disorder. Both SLI and dyslexic samples showed impaired discrimination of amplitude rise times. Implications of these findings for a temporal sampling framework for understanding developmental language disorders are discussed. PMID:27303348
An integrated approach to improving noisy speech perception

NASA Astrophysics Data System (ADS)

Koval, Serguei; Stolbov, Mikhail; Smirnova, Natalia; Khitrov, Mikhail

2002-05-01

For a number of practical purposes and tasks, experts have to decode speech recordings of very poor quality. A combination of techniques is proposed to improve intelligibility and quality of distorted speech messages and thus facilitate their comprehension. Along with the application of noise cancellation and speech signal enhancement techniques removing and/or reducing various kinds of distortions and interference (primarily unmasking and normalization in time and frequency fields), the approach incorporates optimal listener expert tactics based on selective listening, nonstandard binaural listening, accounting for short-term and long-term human ear adaptation to noisy speech, as well as some methods of speech signal enhancement to support speech decoding during listening. The approach integrating the suggested techniques ensures high-quality ultimate results and has successfully been applied by Speech Technology Center experts and by numerous other users, mainly forensic institutions, to perform noisy speech records decoding for courts, law enforcement and emergency services, accident investigation bodies, etc.
Post-treatment speech naturalness of comprehensive stuttering program clients and differences in ratings among listener groups.

PubMed

Teshima, Shelli; Langevin, Marilyn; Hagler, Paul; Kully, Deborah

2010-03-01

The purposes of this study were to investigate naturalness of the post-treatment speech of Comprehensive Stuttering Program (CSP) clients and differences in naturalness ratings by three listener groups. Listeners were 21 student speech-language pathologists, 9 community members, and 15 listeners who stutter. Listeners rated perceptually fluent speech samples of CSP clients obtained immediately post-treatment (Post) and at 5 years follow-up (F5), and speech samples of matched typically fluent (TF) speakers. A 9-point interval rating scale was used. A 3 (listener group)x2 (time)x2 (speaker) mixed ANOVA was used to test for differences among mean ratings. The difference between CSP Post and F5 mean ratings was statistically significant. The F5 mean rating was within the range reported for typically fluent speakers. Student speech-language pathologists were found to be less critical than community members and listeners who stutter in rating naturalness; however, there were no significant differences in ratings made by community members and listeners who stutter. Results indicate that the naturalness of post-treatment speech of CSP clients improves in the post-treatment period and that it is possible for clients to achieve levels of naturalness that appear to be acceptable to adults who stutter and that are within the range of naturalness ratings given to typically fluent speakers. Readers will be able to (a) summarize key findings of studies that have investigated naturalness ratings, and (b) interpret the naturalness ratings of Comprehensive Stuttering Program speaker samples and the ratings made by the three listener groups in this study.
Benefits of Localization and Speech Perception with Multiple Noise Sources in Listeners with a Short-electrode Cochlear Implant

PubMed Central

Dunn, Camille C.; Perreau, Ann; Gantz, Bruce; Tyler, Richard

2009-01-01

Background Research suggests that for individuals with significant low-frequency hearing, implantation of a short-electrode cochlear implant may provide benefits of improved speech perception abilities. Because this strategy combines acoustic and electrical hearing within the same ear while at the same time preserving low-frequency residual acoustic hearing in both ears, localization abilities may also be improved. However, very little research has focused on the localization and spatial hearing abilities of users with a short-electrode cochlear implant. Purpose The purpose of this study was to evaluate localization abilities for listeners with a short-electrode cochlear implant who continue to wear hearing aids in both ears. A secondary purpose was to document speech perception abilities using a speech in noise test with spatially-separate noise sources. Research Design Eleven subjects that utilized a short-electrode cochlear implant and bilateral hearing aids were tested on localization and speech perception with multiple noise locations using an eight-loudspeaker array. Performance was assessed across four listening conditions using various combinations of cochlear implant and/or hearing aid use. Results Results for localization showed no significant difference between using bilateral hearing aids and bilateral hearing aids plus the cochlear implant. However, there was a significant difference between the bilateral hearing aid condition and the implant plus use of a contralateral hearing aid for all eleven subjects. Results for speech perception showed a significant benefit when using bilateral hearing aids plus the cochlear implant over use of the implant plus only one hearing aid. Conclusion Combined use of both hearing aids and the cochlear implant show significant benefits for both localization and speech perception in noise for users with a short-electrode cochlear implant. These results emphasize the importance of low-frequency information in two ears for the purpose of localization and speech perception in noise. PMID:20085199
Benefits of localization and speech perception with multiple noise sources in listeners with a short-electrode cochlear implant.

PubMed

Dunn, Camille C; Perreau, Ann; Gantz, Bruce; Tyler, Richard S

2010-01-01

Research suggests that for individuals with significant low-frequency hearing, implantation of a short-electrode cochlear implant may provide benefits of improved speech perception abilities. Because this strategy combines acoustic and electrical hearing within the same ear while at the same time preserving low-frequency residual acoustic hearing in both ears, localization abilities may also be improved. However, very little research has focused on the localization and spatial hearing abilities of users with a short-electrode cochlear implant. The purpose of this study was to evaluate localization abilities for listeners with a short-electrode cochlear implant who continue to wear hearing aids in both ears. A secondary purpose was to document speech perception abilities using a speech-in-noise test with spatially separate noise sources. Eleven subjects that utilized a short-electrode cochlear implant and bilateral hearing aids were tested on localization and speech perception with multiple noise locations using an eight-loudspeaker array. Performance was assessed across four listening conditions using various combinations of cochlear implant and/or hearing aid use. Results for localization showed no significant difference between using bilateral hearing aids and bilateral hearing aids plus the cochlear implant. However, there was a significant difference between the bilateral hearing aid condition and the implant plus use of a contralateral hearing aid for all 11 subjects. Results for speech perception showed a significant benefit when using bilateral hearing aids plus the cochlear implant over use of the implant plus only one hearing aid. Combined use of both hearing aids and the cochlear implant show significant benefits for both localization and speech perception in noise for users with a short-electrode cochlear implant. These results emphasize the importance of low-frequency information in two ears for the purpose of localization and speech perception in noise.
Deep neural network and noise classification-based speech enhancement

NASA Astrophysics Data System (ADS)

Shi, Wenhua; Zhang, Xiongwei; Zou, Xia; Han, Wei

2017-07-01

In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
Effects of interior aircraft noise on speech intelligibility and annoyance

NASA Technical Reports Server (NTRS)

Pearsons, K. S.; Bennett, R. L.

1977-01-01

Recordings of the aircraft ambiance from ten different types of aircraft were used in conjunction with four distinct speech interference tests as stimuli to determine the effects of interior aircraft background levels and speech intelligibility on perceived annoyance in 36 subjects. Both speech intelligibility and background level significantly affected judged annoyance. However, the interaction between the two variables showed that above an 85 db background level the speech intelligibility results had a minimal effect on annoyance ratings. Below this level, people rated the background as less annoying if there was adequate speech intelligibility.
A Smartphone Application for Customized Frequency Table Selection in Cochlear Implants.

PubMed

Jethanamest, Daniel; Azadpour, Mahan; Zeman, Annette M; Sagi, Elad; Svirsky, Mario A

2017-09-01

A novel smartphone-based software application can facilitate self-selection of frequency allocation tables (FAT) in postlingually deaf cochlear implant (CI) users. CIs use FATs to represent the tonotopic organization of a normal cochlea. Current CI fitting methods typically use a standard FAT for all patients regardless of individual differences in cochlear size and electrode location. In postlingually deaf patients, different amounts of mismatch can result between the frequency-place function they experienced when they had normal hearing and the frequency-place function that results from the standard FAT. For some CI users, an alternative FAT may enhance sound quality or speech perception. Currently, no widely available tools exist to aid real-time selection of different FATs. This study aims to develop a new smartphone tool for this purpose and to evaluate speech perception and sound quality measures in a pilot study of CI subjects using this application. A smartphone application for a widely available mobile platform (iOS) was developed to serve as a preprocessor of auditory input to a clinical CI speech processor and enable interactive real-time selection of FATs. The application's output was validated by measuring electrodograms for various inputs. A pilot study was conducted in six CI subjects. Speech perception was evaluated using word recognition tests. All subjects successfully used the portable application with their clinical speech processors to experience different FATs while listening to running speech. The users were all able to select one table that they judged provided the best sound quality. All subjects chose a FAT different from the standard FAT in their everyday clinical processor. Using the smartphone application, the mean consonant-nucleus-consonant score with the default FAT selection was 28.5% (SD 16.8) and 29.5% (SD 16.4) when using a self-selected FAT. A portable smartphone application enables CI users to self-select frequency allocation tables in real time. Even though the self-selected FATs that were deemed to have better sound quality were only tested acutely (i.e., without long-term experience with them), speech perception scores were not inferior to those obtained with the clinical FATs. This software application may be a valuable tool for improving future methods of CI fitting.
Affective Properties of Mothers' Speech to Infants With Hearing Impairment and Cochlear Implants

PubMed Central

Bergeson, Tonya R.; Xu, Huiping; Kitamura, Christine

2015-01-01

Purpose The affective properties of infant-directed speech influence the attention of infants with normal hearing to speech sounds. This study explored the affective quality of maternal speech to infants with hearing impairment (HI) during the 1st year after cochlear implantation as compared to speech to infants with normal hearing. Method Mothers of infants with HI and mothers of infants with normal hearing matched by age (NH-AM) or hearing experience (NH-EM) were recorded playing with their infants during 3 sessions over a 12-month period. Speech samples of 25 s were low-pass filtered, leaving intonation but not speech information intact. Sixty adults rated the stimuli along 5 scales: positive/negative affect and intention to express affection, to encourage attention, to comfort/soothe, and to direct behavior. Results Low-pass filtered speech to HI and NH-EM groups was rated as more positive, affective, and comforting compared with the such speech to the NH-AM group. Speech to infants with HI and with NH-AM was rated as more directive than speech to the NH-EM group. Mothers decreased affective qualities in speech to all infants but increased directive qualities in speech to infants with NH-EM over time. Conclusions Mothers fine-tune communicative intent in speech to their infant's developmental stage. They adjust affective qualities to infants' hearing experience rather than to chronological age but adjust directive qualities of speech to the chronological age of their infants. PMID:25679195
Association of Orofacial Muscle Activity and Movement during Changes in Speech Rate and Intensity

ERIC Educational Resources Information Center

McClean, Michael D.; Tasko, Stephen M.

2003-01-01

Understanding how orofacial muscle activity and movement covary across changes in speech rate and intensity has implications for the neural control of speech production and the use of clinical procedures that manipulate speech prosody. The present study involved a correlation analysis relating average lower-lip and jaw-muscle activity to lip and…
Of Mouths and Men: Non-Native Listeners' Identification and Evaluation of Varieties of English.

ERIC Educational Resources Information Center

Jarvella, Robert J.; Bang, Eva; Jakobsen, Arnt Lykke; Mees, Inger M.

2001-01-01

Advanced Danish students of English tried to identify the national origin of young men from Ireland, Scotland, England, and the United States from their speech and then rated the speech for attractiveness. Listeners rated speech produced by Englishmen as most attractive, and speech by Americans as least attractive. (Author/VWL)
Utterances in infant-directed speech are shorter, not slower.

PubMed

Martin, Andrew; Igarashi, Yosuke; Jincho, Nobuyuki; Mazuka, Reiko

2016-11-01

It has become a truism in the literature on infant-directed speech (IDS) that IDS is pronounced more slowly than adult-directed speech (ADS). Using recordings of 22 Japanese mothers speaking to their infant and to an adult, we show that although IDS has an overall lower mean speech rate than ADS, this is not the result of an across-the-board slowing in which every vowel is expanded equally. Instead, the speech rate difference is entirely due to the effects of phrase-final lengthening, which disproportionally affects IDS because of its shorter utterances. These results demonstrate that taking utterance-internal prosodic characteristics into account is crucial to studies of speech rate. Copyright © 2016 Elsevier B.V. All rights reserved.
Webcam delivery of the Camperdown Program for adolescents who stutter: a phase II trial.

PubMed

Carey, Brenda; O'Brian, Sue; Lowe, Robyn; Onslow, Mark

2014-10-01

This Phase II clinical trial examined stuttering adolescents' responsiveness to the Webcam-delivered Camperdown Program. Sixteen adolescents were treated by Webcam with no clinic attendance. Primary outcome was percentage of syllables stuttered (%SS). Secondary outcomes were number of sessions, weeks and hours to maintenance, self-reported stuttering severity, speech satisfaction, speech naturalness, self-reported anxiety, self-reported situation avoidance, self-reported impact of stuttering, and satisfaction with Webcam treatment delivery. Data were collected before treatment and up to 12 months after entry into maintenance. Fourteen participants completed the treatment. Group mean stuttering frequency was 6.1 %SS (range, 0.7-14.7) pretreatment and 2.8 %SS (range, 0-12.2) 12 months after entry into maintenance, with half the participants stuttering at 1.2 %SS or lower at this time. Treatment was completed in a mean of 25 sessions (15.5 hr). Self-reported stuttering severity ratings, self-reported stuttering impact, and speech satisfaction scores supported %SS outcomes. Minimal anxiety was evident either pre- or post-treatment. Individual responsiveness to the treatment varied, with half the participants showing little reduction in avoidance of speech situations. The Webcam service delivery model was appealing to participants, although it was efficacious and efficient for only half. Suggestions for future stuttering treatment development for adolescents are discussed.
Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users.

PubMed

Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan

2017-02-01

Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.
Effect of delayed auditory feedback on normal speakers at two speech rates

NASA Astrophysics Data System (ADS)

Stuart, Andrew; Kalinowski, Joseph; Rastatter, Michael P.; Lynch, Kerry

2002-05-01

This study investigated the effect of short and long auditory feedback delays at two speech rates with normal speakers. Seventeen participants spoke under delayed auditory feedback (DAF) at 0, 25, 50, and 200 ms at normal and fast rates of speech. Significantly two to three times more dysfluencies were displayed at 200 ms (p<0.05) relative to no delay or the shorter delays. There were significantly more dysfluencies observed at the fast rate of speech (p=0.028). These findings implicate the peripheral feedback system(s) of fluent speakers for the disruptive effects of DAF on normal speech production at long auditory feedback delays. Considering the contrast in fluency/dysfluency exhibited between normal speakers and those who stutter at short and long delays, it appears that speech disruption of normal speakers under DAF is a poor analog of stuttering.
Design and Evaluation of a Cochlear Implant Strategy Based on a “Phantom” Channel

PubMed Central

Nogueira, Waldo; Litvak, Leonid M.; Saoji, Aniket A.; Büchner, Andreas

2015-01-01

Unbalanced bipolar stimulation, delivered using charge balanced pulses, was used to produce “Phantom stimulation”, stimulation beyond the most apical contact of a cochlear implant’s electrode array. The Phantom channel was allocated audio frequencies below 300Hz in a speech coding strategy, conveying energy some two octaves lower than the clinical strategy and hence delivering the fundamental frequency of speech and of many musical tones. A group of 12 Advanced Bionics cochlear implant recipients took part in a chronic study investigating the fitting of the Phantom strategy and speech and music perception when using Phantom. The evaluation of speech in noise was performed immediately after fitting Phantom for the first time (Session 1) and after one month of take-home experience (Session 2). A repeated measures of analysis of variance (ANOVA) within factors strategy (Clinical, Phantom) and interaction time (Session 1, Session 2) revealed a significant effect for the interaction time and strategy. Phantom obtained a significant improvement in speech intelligibility after one month of use. Furthermore, a trend towards a better performance with Phantom (48%) with respect to F120 (37%) after 1 month of use failed to reach significance after type 1 error correction. Questionnaire results show a preference for Phantom when listening to music, likely driven by an improved balance between high and low frequencies. PMID:25806818
Self-efficacy and quality of life in adults who stutter.

PubMed

Carter, Alice; Breen, Lauren; Yaruss, J Scott; Beilby, Janet

2017-12-01

Self-efficacy has emerged as a potential predictor of quality of life for adults who stutter. Research has focused primarily on the positive relationship self-efficacy has to treatment outcomes, but little is known about the relationship between self-efficacy and quality of life for adults who stutter. The purpose of this mixed- methods study is to determine the predictive value of self-efficacy and its relationship to quality of life for adults who stutter. The Self-Efficacy Scale for Adult Stutterers and the Overall Assessment of the Speaker's Experience with Stuttering were administered to 39 adults who stutter, aged 18- 77. Percentage of syllables stuttered was calculated from a conversational speech sample as a measure of stuttered speech frequency. Qualitative interviews with semi-structured probes were conducted with 10 adults and analyzed using thematic analysis to explore the lived experience of adults who stutter. Self-efficacy emerged as a strong positive predictor of quality of life for adults living with a stuttered speech disorder. Stuttered speech frequency was a moderate negative predictor of self-efficacy. Major qualitative themes identified from the interviews with the participants were: encumbrance, self-concept, confidence, acceptance, life-long journey, treatment, and support. Results provide clarity on the predictive value of self-efficacy and its relationship to quality of life and stuttered speech frequency. Findings highlight that the unique life experiences of adults who stutter require a multidimensional approach to the assessment and treatment of stuttered speech disorders. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Top-down modulation of auditory processing: effects of sound context, musical expertise and attentional focus.

PubMed

Tervaniemi, M; Kruck, S; De Baene, W; Schröger, E; Alter, K; Friederici, A D

2009-10-01

By recording auditory electrical brain potentials, we investigated whether the basic sound parameters (frequency, duration and intensity) are differentially encoded among speech vs. music sounds by musicians and non-musicians during different attentional demands. To this end, a pseudoword and an instrumental sound of comparable frequency and duration were presented. The accuracy of neural discrimination was tested by manipulations of frequency, duration and intensity. Additionally, the subjects' attentional focus was manipulated by instructions to ignore the sounds while watching a silent movie or to attentively discriminate the different sounds. In both musicians and non-musicians, the pre-attentively evoked mismatch negativity (MMN) component was larger to slight changes in music than in speech sounds. The MMN was also larger to intensity changes in music sounds and to duration changes in speech sounds. During attentional listening, all subjects more readily discriminated changes among speech sounds than among music sounds as indexed by the N2b response strength. Furthermore, during attentional listening, musicians displayed larger MMN and N2b than non-musicians for both music and speech sounds. Taken together, the data indicate that the discriminative abilities in human audition differ between music and speech sounds as a function of the sound-change context and the subjective familiarity of the sound parameters. These findings provide clear evidence for top-down modulatory effects in audition. In other words, the processing of sounds is realized by a dynamically adapting network considering type of sound, expertise and attentional demands, rather than by a strictly modularly organized stimulus-driven system.

Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension.

PubMed

Drijvers, Linda; Özyürek, Asli; Jensen, Ole

2018-05-01

During face-to-face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued-recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand-area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low- and high-frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low- and high-frequency oscillations in predicting the integration of auditory and visual information at a semantic level. © 2018 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.
Hearing and seeing meaning in noise: Alpha, beta, and gamma oscillations predict gestural enhancement of degraded speech comprehension

PubMed Central

Özyürek, Asli; Jensen, Ole

2018-01-01

Abstract During face‐to‐face communication, listeners integrate speech with gestures. The semantic information conveyed by iconic gestures (e.g., a drinking gesture) can aid speech comprehension in adverse listening conditions. In this magnetoencephalography (MEG) study, we investigated the spatiotemporal neural oscillatory activity associated with gestural enhancement of degraded speech comprehension. Participants watched videos of an actress uttering clear or degraded speech, accompanied by a gesture or not and completed a cued‐recall task after watching every video. When gestures semantically disambiguated degraded speech comprehension, an alpha and beta power suppression and a gamma power increase revealed engagement and active processing in the hand‐area of the motor cortex, the extended language network (LIFG/pSTS/STG/MTG), medial temporal lobe, and occipital regions. These observed low‐ and high‐frequency oscillatory modulations in these areas support general unification, integration and lexical access processes during online language comprehension, and simulation of and increased visual attention to manual gestures over time. All individual oscillatory power modulations associated with gestural enhancement of degraded speech comprehension predicted a listener's correct disambiguation of the degraded verb after watching the videos. Our results thus go beyond the previously proposed role of oscillatory dynamics in unimodal degraded speech comprehension and provide first evidence for the role of low‐ and high‐frequency oscillations in predicting the integration of auditory and visual information at a semantic level. PMID:29380945
Adding part-of-speech information to the SUBTLEX-US word frequencies.

PubMed

Brysbaert, Marc; New, Boris; Keuleers, Emmanuel

2012-12-01

The SUBTLEX-US corpus has been parsed with the CLAWS tagger, so that researchers have information about the possible word classes (parts-of-speech, or PoSs) of the entries. Five new columns have been added to the SUBTLEX-US word frequency list: the dominant (most frequent) PoS for the entry, the frequency of the dominant PoS, the frequency of the dominant PoS relative to the entry's total frequency, all PoSs observed for the entry, and the respective frequencies of these PoSs. Because the current definition of lemma frequency does not seem to provide word recognition researchers with useful information (as illustrated by a comparison of the lemma frequencies and the word form frequencies from the Corpus of Contemporary American English), we have not provided a column with this variable. Instead, we hope that the full list of PoS frequencies will help researchers to collectively determine which combination of frequencies is the most informative.
Fitting and verification of frequency modulation systems on children with normal hearing.

PubMed

Schafer, Erin C; Bryant, Danielle; Sanders, Katie; Baldus, Nicole; Algier, Katherine; Lewis, Audrey; Traber, Jordan; Layden, Paige; Amin, Aneeqa

2014-06-01

Several recent investigations support the use of frequency modulation (FM) systems in children with normal hearing and auditory processing or listening disorders such as those diagnosed with auditory processing disorders, autism spectrum disorders, attention-deficit hyperactivity disorder, Friedreich ataxia, and dyslexia. The American Academy of Audiology (AAA) published suggested procedures, but these guidelines do not cite research evidence to support the validity of the recommended procedures for fitting and verifying nonoccluding open-ear FM systems on children with normal hearing. Documenting the validity of these fitting procedures is critical to maximize the potential FM-system benefit in the above-mentioned populations of children with normal hearing and those with auditory-listening problems. The primary goal of this investigation was to determine the validity of the AAA real-ear approach to fitting FM systems on children with normal hearing. The secondary goal of this study was to examine speech-recognition performance in noise and loudness ratings without and with FM systems in children with normal hearing sensitivity. A two-group, cross-sectional design was used in the present study. Twenty-six typically functioning children, ages 5-12 yr, with normal hearing sensitivity participated in the study. Participants used a nonoccluding open-ear FM receiver during laboratory-based testing. Participants completed three laboratory tests: (1) real-ear measures, (2) speech recognition performance in noise, and (3) loudness ratings. Four real-ear measures were conducted to (1) verify that measured output met prescribed-gain targets across the 1000-4000 Hz frequency range for speech stimuli, (2) confirm that the FM-receiver volume did not exceed predicted uncomfortable loudness levels, and (3 and 4) measure changes to the real-ear unaided response when placing the FM receiver in the child's ear. After completion of the fitting, speech recognition in noise at a -5 signal-to-noise ratio and loudness ratings at a +5 signal-to-noise ratio were measured in four conditions: (1) no FM system, (2) FM receiver on the right ear, (3) FM receiver on the left ear, and (4) bilateral FM system. The results of this study suggested that the slightly modified AAA real-ear measurement procedures resulted in a valid fitting of one FM system on children with normal hearing. On average, prescriptive targets were met for 1000, 2000, 3000, and 4000 Hz within 3 dB, and maximum output of the FM system never exceeded and was significantly lower than predicted uncomfortable loudness levels for the children. There was a minimal change in the real-ear unaided response when the open-ear FM receiver was placed into the ear. Use of the FM system on one or both ears resulted in significantly better speech recognition in noise relative to a no-FM condition, and the unilateral and bilateral FM receivers resulted in a comfortably loud signal when listening in background noise. Real-ear measures are critical for obtaining an appropriate fit of an FM system on children with normal hearing. American Academy of Audiology.
Slowed Speech Input has a Differential Impact on On-line and Off-line Processing in Children’s Comprehension of Pronouns

PubMed Central

Walenski, Matthew; Swinney, David

2009-01-01

The central question underlying this study revolves around how children process co-reference relationships—such as those evidenced by pronouns (him) and reflexives (himself)—and how a slowed rate of speech input may critically affect this process. Previous studies of child language processing have demonstrated that typical language developing (TLD) children as young as 4 years of age process co-reference relations in a manner similar to adults on-line. In contrast, off-line measures of pronoun comprehension suggest a developmental delay for pronouns (relative to reflexives). The present study examines dependency relations in TLD children (ages 5–13) and investigates how a slowed rate of speech input affects the unconscious (on-line) and conscious (off-line) parsing of these constructions. For the on-line investigations (using a cross-modal picture priming paradigm), results indicate that at a normal rate of speech TLD children demonstrate adult-like syntactic reflexes. At a slowed rate of speech the typical language developing children displayed a breakdown in automatic syntactic parsing (again, similar to the pattern seen in unimpaired adults). As demonstrated in the literature, our off-line investigations (sentence/picture matching task) revealed that these children performed much better on reflexives than on pronouns at a regular speech rate. However, at the slow speech rate, performance on pronouns was substantially improved, whereas performance on reflexives was not different than at the regular speech rate. We interpret these results in light of a distinction between fast automatic processes (relied upon for on-line processing in real time) and conscious reflective processes (relied upon for off-line processing), such that slowed speech input disrupts the former, yet improves the latter. PMID:19343495
Central Tendency and Dispersion Measures of the Fundamental Frequencies of Four Vowels as Produced by Two Year-Old and Four-Year Children.

NASA Astrophysics Data System (ADS)

Monroe, Roberta Lynn

The intrinsic fundamental frequency effect among vowels is a vocalic phenomenon of adult speech in which high vowels have higher fundamental frequencies in relation to low vowels. Acoustic investigations of children's speech have shown that variability of the speech signal decreases as children's ages increase. Fundamental frequency measures have been suggested as an indirect metric for the development of laryngeal stability and coordination. Studies of the intrinsic fundamental frequency effect have been conducted among 8- and 9-year old children and in infants. The present study investigated this effect among 2- and 4-year old children. Eight 2-year old and eight 4-year old children produced four vowels, /ae/, /i/, /u/, and /a/, in CVC syllables. Three measures of fundamental frequency were taken. These were mean fundamental frequency, the intra-utterance standard deviation of the fundamental frequency, and the extent to which the cycle-to-cycle pattern of the fundamental frequency was predicted by a linear trend. An analysis of variance was performed to compare the two age groups, the four vowels, and the earlier and later repetitions of the CVC syllables. A significant difference between the two age groups was detected using the intra-utterance standard deviation of the fundamental frequency. Mean fundamental frequencies and linear trend analysis showed that voicing of the preceding consonant determined the statistical significance of the age-group comparisons. Statistically significant differences among the fundamental frequencies of the four vowels were not detected for either age group.
The development and validation of the speech quality instrument.

PubMed

Chen, Stephanie Y; Griffin, Brianna M; Mancuso, Dean; Shiau, Stephanie; DiMattia, Michelle; Cellum, Ilana; Harvey Boyd, Kelly; Prevoteau, Charlotte; Kohlberg, Gavriel D; Spitzer, Jaclyn B; Lalwani, Anil K

2017-12-08

Although speech perception tests are available to evaluate hearing, there is no standardized validated tool to quantify speech quality. The objective of this study is to develop a validated tool to measure quality of speech heard. Prospective instrument validation study of 35 normal hearing adults recruited at a tertiary referral center. Participants listened to 44 speech clips of male/female voices reciting the Rainbow Passage. Speech clips included original and manipulated excerpts capturing goal qualities such as mechanical and garbled. Listeners rated clips on a 10-point visual analog scale (VAS) of 18 characteristics (e.g. cartoonish, garbled). Skewed distribution analysis identified mean ratings in the upper and lower 2-point limits of the VAS (ratings of 8-10, 0-2, respectively); items with inconsistent responses were eliminated. The test was pruned to a final instrument of nine speech clips that clearly define qualities of interest: speech-like, male/female, cartoonish, echo-y, garbled, tinny, mechanical, rough, breathy, soothing, hoarse, like, pleasant, natural. Mean ratings were highest for original female clips (8.8) and lowest for not-speech manipulation (2.1). Factor analysis identified two subsets of characteristics: internal consistency demonstrated Cronbach's alpha of 0.95 and 0.82 per subset. Test-retest reliability of total scores was high, with an intraclass correlation coefficient of 0.76. The Speech Quality Instrument (SQI) is a concise, valid tool for assessing speech quality as an indicator for hearing performance. SQI may be a valuable outcome measure for cochlear implant recipients who, despite achieving excellent speech perception, often experience poor speech quality. 2b. Laryngoscope, 2017. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
'Just wait then and see what he does': a speech act analysis of healthcare professionals' interaction coaching with parents of children with autism spectrum disorders.

PubMed

McKnight, Lindsay M; O'Malley-Keighran, Mary-Pat; Carroll, Clare

2016-11-01

There is evidence indicating that parent training programmes including interaction coaching of parents of children with autism spectrum disorders (ASD) can increase parental responsiveness, promote language development and social interaction skills in children with ASD. However, there is a lack of research exploring precisely how healthcare professionals use language in interaction coaching. To identify the speech acts of healthcare professionals during individual video-recorded interaction coaching sessions of a Hanen-influenced parent training programme with parents of children with ASD. This retrospective study used speech act analysis. Healthcare professional participants included two speech-language therapists and one occupational therapist. Sixteen videos were transcribed and a speech act analysis was conducted to identify the form and functions of the language used by the healthcare professionals. Descriptive statistics provided frequencies and percentages for the different speech acts used across the 16 videos. Six types of speech acts used by the healthcare professionals during coaching sessions were identified. These speech acts were, in order of frequency: Instructing, Modelling, Suggesting, Commanding, Commending and Affirming. The healthcare professionals were found to tailor their interaction coaching to the learning needs of the parents. A pattern was observed in which more direct speech acts were used in instances where indirect speech acts did not achieve the intended response. The study provides an insight into the nature of interaction coaching provided by healthcare professionals during a parent training programme. It identifies the types of language used during interaction coaching. It also highlights additional important aspects of interaction coaching such as the ability of healthcare professionals to adjust the directness of the coaching in order to achieve the intended parental response to the child's interaction. The findings may be used to increase the awareness of healthcare professionals about the types of speech acts used during interaction coaching as well as the manner in which coaching sessions are conducted. © 2016 Royal College of Speech and Language Therapists.
Neurogenic Orofacial Weakness and Speech in Adults With Dysarthria

PubMed Central

Makashay, Matthew J.; Helou, Leah B.; Clark, Heather M.

2017-01-01

Purpose This study compared orofacial strength between adults with dysarthria and neurologically normal (NN) matched controls. In addition, orofacial muscle weakness was examined for potential relationships to speech impairments in adults with dysarthria. Method Matched groups of 55 adults with dysarthria and 55 NN adults generated maximum pressure (Pmax) against an air-filled bulb during lingual elevation, protrusion and lateralization, and buccodental and labial compressions. These orofacial strength measures were compared with speech intelligibility, perceptual ratings of speech, articulation rate, and fast syllable-repetition rate. Results The dysarthria group demonstrated significantly lower orofacial strength than the NN group on all tasks. Lingual strength correlated moderately and buccal strength correlated weakly with most ratings of speech deficits. Speech intelligibility was not sensitive to dysarthria severity. Individuals with severely reduced anterior lingual elevation Pmax (< 18 kPa) had normal to profoundly impaired sentence intelligibility (99%–6%) and moderately to severely impaired speech (26%–94% articulatory imprecision; 33%–94% overall severity). Conclusions Results support the presence of orofacial muscle weakness in adults with dysarthrias of varying etiologies but reinforce tenuous links between orofacial strength and speech production disorders. By examining individual data, preliminary evidence emerges to suggest that speech, but not necessarily intelligibility, is likely to be impaired when lingual weakness is severe. PMID:28763804
Development and Perceptual Evaluation of Amplitude-Based F0 Control in Electrolarynx Speech

ERIC Educational Resources Information Center

Saikachi, Yoko; Stevens, Kenneth N.; Hillman, Robert E.

2009-01-01

Purpose: Current electrolarynx (EL) devices produce a mechanical speech quality that has been largely attributed to the lack of natural fundamental frequency (F0) variation. In order to improve the quality of EL speech, in the present study the authors aimed to develop and evaluate an automatic F0 control scheme, in which F0 was modulated based on…
Regular/Irregular is Not the Whole Story: The Role of Frequency and Generalization in the Acquisition of German Past Participle Inflection

ERIC Educational Resources Information Center

Szagun, Gisela

2011-01-01

The acquisition of German participle inflection was investigated using spontaneous speech samples from six children between 1 ; 4 and 3 ; 8 and ten children between 1 ; 4 and 2 ; 10 recorded longitudinally at regular intervals. Child-directed speech was also analyzed. In adult and child speech weak participles were significantly more frequent than…
The Interaction of Temporal and Spectral Acoustic Information with Word Predictability on Speech Intelligibility

NASA Astrophysics Data System (ADS)

Shahsavarani, Somayeh Bahar

High-level, top-down information such as linguistic knowledge is a salient cortical resource that influences speech perception under most listening conditions. But, are all listeners able to exploit these resources for speech facilitation to the same extent? It was found that children with cochlear implants showed different patterns of benefit from contextual information in speech perception compared with their normal-haring peers. Previous studies have discussed the role of non-acoustic factors such as linguistic and cognitive capabilities to account for this discrepancy. Given the fact that the amount of acoustic information encoded and processed by auditory nerves of listeners with cochlear implants differs from normal-hearing listeners and even varies across individuals with cochlear implants, it is important to study the interaction of specific acoustic properties of the speech signal with contextual cues. This relationship has been mostly neglected in previous research. In this dissertation, we aimed to explore how different acoustic dimensions interact to affect listeners' abilities to combine top-down information with bottom-up information in speech perception beyond the known effects of linguistic and cognitive capacities shown previously. Specifically, the present study investigated whether there were any distinct context effects based on the resolution of spectral versus slowly-varying temporal information in perception of spectrally impoverished speech. To that end, two experiments were conducted. In both experiments, a noise-vocoded technique was adopted to generate spectrally-degraded speech to approximate acoustic cues delivered to listeners with cochlear implants. The frequency resolution was manipulated by varying the number of frequency channels. The temporal resolution was manipulated by low-pass filtering of amplitude envelope with varying low-pass cutoff frequencies. The stimuli were presented to normal-hearing native speakers of American English. Our results revealed a significant interaction effect between spectral, temporal, and contextual information in the perception of spectrally-degraded speech. This suggests that specific types and degradation of bottom-up information combine differently to utilize contextual resources. These findings emphasize the importance of taking the listener's specific auditory abilities into consideration while studying context effects. These results also introduce a novel perspective for designing interventions for listeners with cochlear implants or other auditory prostheses.
The influence of speech rate and accent on access and use of semantic information.

PubMed

Sajin, Stanislav M; Connine, Cynthia M

2017-04-01

Circumstances in which the speech input is presented in sub-optimal conditions generally lead to processing costs affecting spoken word recognition. The current study indicates that some processing demands imposed by listening to difficult speech can be mitigated by feedback from semantic knowledge. A set of lexical decision experiments examined how foreign accented speech and word duration impact access to semantic knowledge in spoken word recognition. Results indicate that when listeners process accented speech, the reliance on semantic information increases. Speech rate was not observed to influence semantic access, except in the setting in which unusually slow accented speech was presented. These findings support interactive activation models of spoken word recognition in which attention is modulated based on speech demands.
Divided attention disrupts perceptual encoding during speech recognition.

PubMed

Mattys, Sven L; Palmer, Shekeila D

2015-03-01

Performing a secondary task while listening to speech has a detrimental effect on speech processing, but the locus of the disruption within the speech system is poorly understood. Recent research has shown that cognitive load imposed by a concurrent visual task increases dependency on lexical knowledge during speech processing, but it does not affect lexical activation per se. This suggests that "lexical drift" under cognitive load occurs either as a post-lexical bias at the decisional level or as a secondary consequence of reduced perceptual sensitivity. This study aimed to adjudicate between these alternatives using a forced-choice task that required listeners to identify noise-degraded spoken words with or without the addition of a concurrent visual task. Adding cognitive load increased the likelihood that listeners would select a word acoustically similar to the target even though its frequency was lower than that of the target. Thus, there was no evidence that cognitive load led to a high-frequency response bias. Rather, cognitive load seems to disrupt sublexical encoding, possibly by impairing perceptual acuity at the auditory periphery.
CNTNAP2 Is Significantly Associated With Speech Sound Disorder in the Chinese Han Population.

PubMed

Zhao, Yun-Jing; Wang, Yue-Ping; Yang, Wen-Zhu; Sun, Hong-Wei; Ma, Hong-Wei; Zhao, Ya-Ru

2015-11-01

Speech sound disorder is the most common communication disorder. Some investigations support the possibility that the CNTNAP2 gene might be involved in the pathogenesis of speech-related diseases. To investigate single-nucleotide polymorphisms in the CNTNAP2 gene, 300 unrelated speech sound disorder patients and 200 normal controls were included in the study. Five single-nucleotide polymorphisms were amplified and directly sequenced. Significant differences were found in the genotype (P = .0003) and allele (P = .0056) frequencies of rs2538976 between patients and controls. The excess frequency of the A allele in the patient group remained significant after Bonferroni correction (P = .0280). A significant haplotype association with rs2710102T/+rs17236239A/+2538976A/+2710117A (P = 4.10e-006) was identified. A neighboring single-nucleotide polymorphism, rs10608123, was found in complete linkage disequilibrium with rs2538976, and the genotypes exactly corresponded to each other. The authors propose that these CNTNAP2 variants increase the susceptibility to speech sound disorder. The single-nucleotide polymorphisms rs10608123 and rs2538976 may merge into one single-nucleotide polymorphism. © The Author(s) 2015.
Analysis of high-frequency energy in long-term average spectra of singing, speech, and voiceless fricatives

PubMed Central

Monson, Brian B.; Lotto, Andrew J.; Story, Brad H.

2012-01-01

The human singing and speech spectrum includes energy above 5 kHz. To begin an in-depth exploration of this high-frequency energy (HFE), a database of anechoic high-fidelity recordings of singers and talkers was created and analyzed. Third-octave band analysis from the long-term average spectra showed that production level (soft vs normal vs loud), production mode (singing vs speech), and phoneme (for voiceless fricatives) all significantly affected HFE characteristics. Specifically, increased production level caused an increase in absolute HFE level, but a decrease in relative HFE level. Singing exhibited higher levels of HFE than speech in the soft and normal conditions, but not in the loud condition. Third-octave band levels distinguished phoneme class of voiceless fricatives. Female HFE levels were significantly greater than male levels only above 11 kHz. This information is pertinent to various areas of acoustics, including vocal tract modeling, voice synthesis, augmentative hearing technology (hearing aids and cochlear implants), and training/therapy for singing and speech. PMID:22978902
Spatiotemporal movement variability in ALS: Speaking rate effects on tongue, lower lip, and jaw motor control

PubMed Central

Kuruvilla-Dugdale, Mili; Mefferd, Antje

2017-01-01

Purpose Although it is frequently presumed that bulbar muscle degeneration in Amyotrophic Lateral Sclerosis (ALS) is associated with progressive loss of speech motor control, empirical evidence is limited. Furthermore, because speaking rate slows with disease progression and rate manipulations are used to improve intelligibility in ALS, this study sought to (i) determine between and within-group differences in articulatory motor control as a result of speaking rate changes and (ii) identify the strength of association between articulatory motor control and speech impairment severity. Method Ten talkers with ALS and 11 healthy controls repeated the target sentence at habitual, fast, and slow rates. The spatiotemporal variability index (STI) was calculated to determine tongue, lower lip, and jaw movement variability. Results During habitual speech, talkers with mild-moderate dysarthria displayed significantly lower tongue and lip movement variability whereas those with severe dysarthria showed greater variability compared to controls. Within-group rate effects were significant only for talkers with ALS. Specifically, lip and tongue movement variability significantly increased during slow speech relative to habitual and fast speech. Finally, preliminary associations between speech impairment severity and movement variability were moderate to strong in talkers with ALS. Conclusion Between-group differences for habitual speech and within-group effects for slow speech replicated previous findings for lower lip and jaw movements. Preliminary findings of moderate to strong associations between speech impairment severity and STI suggest that articulatory variability may vary from pathologically low (possibly indicating articulatory compensation) to pathologically high variability (possibly indicating loss of control) with dysarthria progression in ALS. PMID:28528293
SUBTHALAMIC NUCLEUS NEURONS DIFFERENTIALLY ENCODE EARLY AND LATE ASPECTS OF SPEECH PRODUCTION.

PubMed

Lipski, W J; Alhourani, A; Pirnia, T; Jones, P W; Dastolfo-Hromack, C; Helou, L B; Crammond, D J; Shaiman, S; Dickey, M W; Holt, L L; Turner, R S; Fiez, J A; Richardson, R M

2018-05-22

Basal ganglia-thalamocortical loops mediate all motor behavior, yet little detail is known about the role of basal ganglia nuclei in speech production. Using intracranial recording during deep brain stimulation surgery in humans with Parkinson's disease, we tested the hypothesis that the firing rate of subthalamic nucleus neurons is modulated in sync with motor execution aspects of speech. Nearly half of seventy-nine unit recordings exhibited firing rate modulation, during a syllable reading task across twelve subjects (male and female). Trial-to-trial timing of changes in subthalamic neuronal activity, relative to cue onset versus production onset, revealed that locking to cue presentation was associated more with units that decreased firing rate, while locking to speech onset was associated more with units that increased firing rate. These unique data indicate that subthalamic activity is dynamic during the production of speech, reflecting temporally-dependent inhibition and excitation of separate populations of subthalamic neurons. SIGNIFICANCE STATEMENT The basal ganglia are widely assumed to participate in speech production, yet no prior studies have reported detailed examination of speech-related activity in basal ganglia nuclei. Using microelectrode recordings from the subthalamic nucleus during a single syllable reading task, in awake humans undergoing deep brain stimulation implantation surgery, we show that the firing rate of subthalamic nucleus neurons is modulated in response to motor execution aspects of speech. These results are the first to establish a role for subthalamic nucleus neurons in encoding of aspects of speech production, and they lay the groundwork for launching a modern subfield to explore basal ganglia function in human speech. Copyright © 2018 the authors.
Long short-term memory for speaker generalization in supervised speech separation

PubMed Central

Chen, Jitong; Wang, DeLiang

2017-01-01

Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261
Age-group differences in speech identification despite matched audiometrically normal hearing: contributions from auditory temporal processing and cognition

PubMed Central

Füllgrabe, Christian; Moore, Brian C. J.; Stone, Michael A.

2015-01-01

Hearing loss with increasing age adversely affects the ability to understand speech, an effect that results partly from reduced audibility. The aims of this study were to establish whether aging reduces speech intelligibility for listeners with normal audiograms, and, if so, to assess the relative contributions of auditory temporal and cognitive processing. Twenty-one older normal-hearing (ONH; 60–79 years) participants with bilateral audiometric thresholds ≤ 20 dB HL at 0.125–6 kHz were matched to nine young (YNH; 18–27 years) participants in terms of mean audiograms, years of education, and performance IQ. Measures included: (1) identification of consonants in quiet and in noise that was unmodulated or modulated at 5 or 80 Hz; (2) identification of sentences in quiet and in co-located or spatially separated two-talker babble; (3) detection of modulation of the temporal envelope (TE) at frequencies 5–180 Hz; (4) monaural and binaural sensitivity to temporal fine structure (TFS); (5) various cognitive tests. Speech identification was worse for ONH than YNH participants in all types of background. This deficit was not reflected in self-ratings of hearing ability. Modulation masking release (the improvement in speech identification obtained by amplitude modulating a noise background) and spatial masking release (the benefit obtained from spatially separating masker and target speech) were not affected by age. Sensitivity to TE and TFS was lower for ONH than YNH participants, and was correlated positively with speech-in-noise (SiN) identification. Many cognitive abilities were lower for ONH than YNH participants, and generally were correlated positively with SiN identification scores. The best predictors of the intelligibility of SiN were composite measures of cognition and TFS sensitivity. These results suggest that declines in speech perception in older persons are partly caused by cognitive and perceptual changes separate from age-related changes in audiometric sensitivity. PMID:25628563

Effect of age at cochlear implantation on auditory and speech development of children with auditory neuropathy spectrum disorder.

PubMed

Liu, Yuying; Dong, Ruijuan; Li, Yuling; Xu, Tianqiu; Li, Yongxin; Chen, Xueqing; Gong, Shusheng

2014-12-01

To evaluate the auditory and speech abilities in children with auditory neuropathy spectrum disorder (ANSD) after cochlear implantation (CI) and determine the role of age at implantation. Ten children participated in this retrospective case series study. All children had evidence of ANSD. All subjects had no cochlear nerve deficiency on magnetic resonance imaging and had used the cochlear implants for a period of 12-84 months. We divided our children into two groups: children who underwent implantation before 24 months of age and children who underwent implantation after 24 months of age. Their auditory and speech abilities were evaluated using the following: behavioral audiometry, the Categories of Auditory Performance (CAP), the Meaningful Auditory Integration Scale (MAIS), the Infant-Toddler Meaningful Auditory Integration Scale (IT-MAIS), the Standard-Chinese version of the Monosyllabic Lexical Neighborhood Test (LNT), the Multisyllabic Lexical Neighborhood Test (MLNT), the Speech Intelligibility Rating (SIR) and the Meaningful Use of Speech Scale (MUSS). All children showed progress in their auditory and language abilities. The 4-frequency average hearing level (HL) (500Hz, 1000Hz, 2000Hz and 4000Hz) of aided hearing thresholds ranged from 17.5 to 57.5dB HL. All children developed time-related auditory perception and speech skills. Scores of children with ANSD who received cochlear implants before 24 months tended to be better than those of children who received cochlear implants after 24 months. Seven children completed the Mandarin Lexical Neighborhood Test. Approximately half of the children showed improved open-set speech recognition. Cochlear implantation is helpful for children with ANSD and may be a good optional treatment for many ANSD children. In addition, children with ANSD fitted with cochlear implants before 24 months tended to acquire auditory and speech skills better than children fitted with cochlear implants after 24 months. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Cognitive control components and speech symptoms in people with schizophrenia.

PubMed

Becker, Theresa M; Cicero, David C; Cowan, Nelson; Kerns, John G

2012-03-30

Previous schizophrenia research suggests poor cognitive control is associated with schizophrenia speech symptoms. However, cognitive control is a broad construct. Two important cognitive control components are poor goal maintenance and poor verbal working memory storage. In the current research, people with schizophrenia (n=45) performed three cognitive tasks that varied in their goal maintenance and verbal working memory storage demands. Speech symptoms were assessed using clinical rating scales, ratings of disorganized speech from typed transcripts, and self-reported disorganization. Overall, alogia was associated with both goal maintenance and verbal working memory tasks. Objectively rated disorganized speech was associated with poor goal maintenance and with a task that included both goal maintenance and verbal working memory storage demands. In contrast, self-reported disorganization was unrelated to either amount of objectively rated disorganized speech or to cognitive control task performance, instead being associated with negative mood symptoms. Overall, our results suggest that alogia is associated with both poor goal maintenance and poor verbal working memory storage and that disorganized speech is associated with poor goal maintenance. In addition, patients' own assessment of their disorganization is related to negative mood, but perhaps not to objective disorganized speech or to cognitive control task performance. Published by Elsevier Ireland Ltd.
Preliminary study of acoustic analysis for evaluating speech-aid oral prostheses: Characteristic dips in octave spectrum for comparison of nasality.

PubMed

Chang, Yen-Liang; Hung, Chao-Ho; Chen, Po-Yueh; Chen, Wei-Chang; Hung, Shih-Han

2015-10-01

Acoustic analysis is often used in speech evaluation but seldom for the evaluation of oral prostheses designed for reconstruction of surgical defect. This study aimed to introduce the application of acoustic analysis for patients with velopharyngeal insufficiency (VPI) due to oral surgery and rehabilitated with oral speech-aid prostheses. The pre- and postprosthetic rehabilitation acoustic features of sustained vowel sounds from two patients with VPI were analyzed and compared with the acoustic analysis software Praat. There were significant differences in the octave spectrum of sustained vowel speech sound between the pre- and postprosthetic rehabilitation. Acoustic measurements of sustained vowels for patients before and after prosthetic treatment showed no significant differences for all parameters of fundamental frequency, jitter, shimmer, noise-to-harmonics ratio, formant frequency, F1 bandwidth, and band energy difference. The decrease in objective nasality perceptions correlated very well with the decrease in dips of the spectra for the male patient with a higher speech bulb height. Acoustic analysis may be a potential technique for evaluating the functions of oral speech-aid prostheses, which eliminates dysfunctions due to the surgical defect and contributes to a high percentage of intelligible speech. Octave spectrum analysis may also be a valuable tool for detecting changes in nasality characteristics of the voice during prosthetic treatment of VPI. Copyright © 2014. Published by Elsevier B.V.
Restoring the missing features of the corrupted speech using linear interpolation methods

NASA Astrophysics Data System (ADS)

Rassem, Taha H.; Makbol, Nasrin M.; Hasan, Ali Muttaleb; Zaki, Siti Syazni Mohd; Girija, P. N.

2017-10-01

One of the main challenges in the Automatic Speech Recognition (ASR) is the noise. The performance of the ASR system reduces significantly if the speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low Signal to Noise Ratio (SNR) elements, the incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to the spectrogram in order to restore the missing elements, which is one direction. In another direction, speech recognizer should be able to restore the missing elements due to deleting low SNR elements before performing the recognition. This is can be done using different spectrogram reconstruction methods. In this paper, the geometrical spectrogram reconstruction methods suggested by some researchers are implemented as a toolbox. In these geometrical reconstruction methods, the linear interpolation along time or frequency methods are used to predict the missing elements between adjacent observed elements in the spectrogram. Moreover, a new linear interpolation method using time and frequency together is presented. The CMU Sphinx III software is used in the experiments to test the performance of the linear interpolation reconstruction method. The experiments are done under different conditions such as different lengths of the window and different lengths of utterances. Speech corpus consists of 20 males and 20 females; each one has two different utterances are used in the experiments. As a result, 80% recognition accuracy is achieved with 25% SNR ratio.
Auditory perceptual simulation: Simulating speech rates or accents?

PubMed

Zhou, Peiyun; Christianson, Kiel

2016-07-01

When readers engage in Auditory Perceptual Simulation (APS) during silent reading, they mentally simulate characteristics of voices attributed to a particular speaker or a character depicted in the text. Previous research found that auditory perceptual simulation of a faster native English speaker during silent reading led to shorter reading times that auditory perceptual simulation of a slower non-native English speaker. Yet, it was uncertain whether this difference was triggered by the different speech rates of the speakers, or by the difficulty of simulating an unfamiliar accent. The current study investigates this question by comparing faster Indian-English speech and slower American-English speech in the auditory perceptual simulation paradigm. Analyses of reading times of individual words and the full sentence reveal that the auditory perceptual simulation effect again modulated reading rate, and auditory perceptual simulation of the faster Indian-English speech led to faster reading rates compared to auditory perceptual simulation of the slower American-English speech. The comparison between this experiment and the data from Zhou and Christianson (2016) demonstrate further that the "speakers'" speech rates, rather than the difficulty of simulating a non-native accent, is the primary mechanism underlying auditory perceptual simulation effects. Copyright © 2016 Elsevier B.V. All rights reserved.
Prediction of Beck Depression Inventory (BDI-II) Score Using Acoustic Measurements in a Sample of Iium Engineering Students

NASA Astrophysics Data System (ADS)

Fikri Zanil, Muhamad; Nur Wahidah Nik Hashim, Nik; Azam, Huda

2017-11-01

Psychiatrist currently relies on questionnaires and interviews for psychological assessment. These conservative methods often miss true positives and might lead to death, especially in cases where a patient might be experiencing suicidal predisposition but was only diagnosed as major depressive disorder (MDD). With modern technology, an assessment tool might aid psychiatrist with a more accurate diagnosis and thus hope to reduce casualty. This project will explore on the relationship between speech features of spoken audio signal (reading) in Bahasa Malaysia with the Beck Depression Inventory scores. The speech features used in this project were Power Spectral Density (PSD), Mel-frequency Ceptral Coefficients (MFCC), Transition Parameter, formant and pitch. According to analysis, the optimum combination of speech features to predict BDI-II scores include PSD, MFCC and Transition Parameters. The linear regression approach with sequential forward/backward method was used to predict the BDI-II scores using reading speech. The result showed 0.4096 mean absolute error (MAE) for female reading speech. For male, the BDI-II scores successfully predicted 100% less than 1 scores difference with MAE of 0.098437. A prediction system called Depression Severity Evaluator (DSE) was developed. The DSE managed to predict one out of five subjects. Although the prediction rate was low, the system precisely predict the score within the maximum difference of 4.93 for each person. This demonstrates that the scores are not random numbers.
Effects of ethanol intoxication on speech suprasegmentals

NASA Astrophysics Data System (ADS)

Hollien, Harry; Dejong, Gea; Martin, Camilo A.; Schwartz, Reva; Liljegren, Kristen

2001-12-01

The effects of ingesting ethanol have been shown to be somewhat variable in humans. To date, there appear to be but few universals. Yet, the question often arises: is it possible to determine if a person is intoxicated by observing them in some manner? A closely related question is: can speech be used for this purpose and, if so, can the degree of intoxication be determined? One of the many issues associated with these questions involves the relationships between a person's paralinguistic characteristics and the presence and level of inebriation. To this end, young, healthy speakers of both sexes were carefully selected and sorted into roughly equal groups of light, moderate, and heavy drinkers. They were asked to produce four types of utterances during a learning phase, when sober and at four strictly controlled levels of intoxication (three ascending and one descending). The primary motor speech measures employed were speaking fundamental frequency, speech intensity, speaking rate and nonfluencies. Several statistically significant changes were found for increasing intoxication; the primary ones included rises in F0, in task duration and for nonfluencies. Minor gender differences were found but they lacked statistical significance. So did the small differences among the drinking category subgroups and the subject groupings related to levels of perceived intoxication. Finally, although it may be concluded that certain changes in speech suprasegmentals will occur as a function of increasing intoxication, these patterns cannot be viewed as universal since a few subjects (about 20%) exhibited no (or negative) changes.
Spatial hearing benefits demonstrated with presentation of acoustic temporal fine structure cues in bilateral cochlear implant listeners.

PubMed

Churchill, Tyler H; Kan, Alan; Goupell, Matthew J; Litovsky, Ruth Y

2014-09-01

Most contemporary cochlear implant (CI) processing strategies discard acoustic temporal fine structure (TFS) information, and this may contribute to the observed deficits in bilateral CI listeners' ability to localize sounds when compared to normal hearing listeners. Additionally, for best speech envelope representation, most contemporary speech processing strategies use high-rate carriers (≥900 Hz) that exceed the limit for interaural pulse timing to provide useful binaural information. Many bilateral CI listeners are sensitive to interaural time differences (ITDs) in low-rate (<300 Hz) constant-amplitude pulse trains. This study explored the trade-off between superior speech temporal envelope representation with high-rate carriers and binaural pulse timing sensitivity with low-rate carriers. The effects of carrier pulse rate and pulse timing on ITD discrimination, ITD lateralization, and speech recognition in quiet were examined in eight bilateral CI listeners. Stimuli consisted of speech tokens processed at different electrical stimulation rates, and pulse timings that either preserved or did not preserve acoustic TFS cues. Results showed that CI listeners were able to use low-rate pulse timing cues derived from acoustic TFS when presented redundantly on multiple electrodes for ITD discrimination and lateralization of speech stimuli.
A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

PubMed

Yu, Chengzhu; Hansen, John H L

2017-03-01

Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
Production and perception of whispered vowels

NASA Astrophysics Data System (ADS)

Kiefte, Michael

2005-09-01

Information normally associated with pitch, such as intonation, can still be conveyed in whispered speech despite the absence of voicing. For example, it is possible to whisper the question ``You are going today?'' without any syntactic information to distinguish this sentence from a simple declarative. It has been shown that pitch change in whispered speech is correlated with the simultaneous raising or lowering of several formants [e.g., M. Kiefte, J. Acoust. Soc. Am. 116, 2546 (2004)]. However, spectral peak frequencies associated with formants have been identified as important correlates to vowel identity. Spectral peak frequencies may serve two roles in the perception of whispered speech: to indicate both vowel identity and intended pitch. Data will be presented to examine the relative importance of several acoustic properties including spectral peak frequencies and spectral shape parameters in both the production and perception of whispered vowels. Speakers were asked to phonate and whisper vowels at three different pitches across a range of roughly a musical fifth. It will be shown that relative spectral change is preserved within vowels across intended pitches in whispered speech. In addition, several models of vowel identification by listeners will be presented. [Work supported by SSHRC.
Using Flanagan's phase vocoder to improve cochlear implant performance

NASA Astrophysics Data System (ADS)

Zeng, Fan-Gang

2004-10-01

The cochlear implant has restored partial hearing to more than 100000 deaf people worldwide, allowing the average user to talk on the telephone in quiet environment. However, significant difficulty still remains for speech recognition in noise, music perception, and tonal language understanding. This difficulty may be related to speech processing strategies in current cochlear implants that emphasized the extraction and encoding of the temporal envelope while ignoring the temporal fine structure in speech sounds. A novel strategy was developed based on Flanagan's phase vocoder [Flanagan and Golden, Bell Syst. Tech. 45, 1493-1509 (1966)], in which frequency modulation was extracted from the temporal fine structure and then added to amplitude modulation in the current cochlear implants. Acoustic simulation results showed that amplitude and frequency modulation contributed complementarily to speech perception with amplitude modulation contributing mainly to intelligibility whereas frequency modulation contributed to speaker identification and auditory grouping. The results also showed that the novel strategy significantly improved cochlear implant performance under realistic listening situations. Overall, the present result demonstrated that Flanagan's classic work on phase vocoder still shed insight on current problems of both theoretical and practical importance. [Work supported by NIH.
Slowed articulation rate is a sensitive diagnostic marker for identifying non-fluent primary progressive aphasia

PubMed Central

Cordella, Claire; Dickerson, Bradford C.; Quimby, Megan; Yunusova, Yana; Green, Jordan R.

2016-01-01

Background Primary progressive aphasia (PPA) is a neurodegenerative aphasic syndrome with three distinct clinical variants: non-fluent (nfvPPA), logopenic (lvPPA), and semantic (svPPA). Speech (non-) fluency is a key diagnostic marker used to aid identification of the clinical variants, and researchers have been actively developing diagnostic tools to assess speech fluency. Current approaches reveal coarse differences in fluency between subgroups, but often fail to clearly differentiate nfvPPA from the variably fluent lvPPA. More robust subtype differentiation may be possible with finer-grained measures of fluency. Aims We sought to identify the quantitative measures of speech rate—including articulation rate and pausing measures—that best differentiated PPA subtypes, specifically the non-fluent group (nfvPPA) from the more fluent groups (lvPPA, svPPA). The diagnostic accuracy of the quantitative speech rate variables was compared to that of a speech fluency impairment rating made by clinicians. Methods and Procedures Automatic estimates of pause and speech segment durations and rate measures were derived from connected speech samples of participants with PPA (N=38; 11 nfvPPA, 14 lvPPA, 13 svPPA) and healthy age-matched controls (N=8). Clinician ratings of fluency impairment were made using a previously validated clinician rating scale developed specifically for use in PPA. Receiver operating characteristic (ROC) analyses enabled a quantification of diagnostic accuracy. Outcomes and Results Among the quantitative measures, articulation rate was the most effective for differentiating between nfvPPA and the more fluent lvPPA and svPPA groups. The diagnostic accuracy of both speech and articulation rate measures was markedly better than that of the clinician rating scale, and articulation rate was the best classifier overall. Area under the curve (AUC) values for articulation rate were good to excellent for identifying nfvPPA from both svPPA (AUC=.96) and lvPPA (AUC=.86). Cross-validation of accuracy results for articulation rate showed good generalizability outside the training dataset. Conclusions Results provide empirical support for (1) the efficacy of quantitative assessments of speech fluency and (2) a distinct non-fluent PPA subtype characterized, at least in part, by an underlying disturbance in speech motor control. The trend toward improved classifier performance for quantitative rate measures demonstrates the potential for a more accurate and reliable approach to subtyping in the fluency domain, and suggests that articulation rate may be a useful input variable as part of a multi-dimensional clinical subtyping approach. PMID:28757671
Inferior frontal sensitivity to common speech sounds is amplified by increasing word intelligibility.

PubMed

Vaden, Kenneth I; Kuchinsky, Stefanie E; Keren, Noam I; Harris, Kelly C; Ahlstrom, Jayne B; Dubno, Judy R; Eckert, Mark A

2011-11-01

The left inferior frontal gyrus (LIFG) exhibits increased responsiveness when people listen to words composed of speech sounds that frequently co-occur in the English language (Vaden, Piquado, & Hickok, 2011), termed high phonotactic frequency (Vitevitch & Luce, 1998). The current experiment aimed to further characterize the relation of phonotactic frequency to LIFG activity by manipulating word intelligibility in participants of varying age. Thirty six native English speakers, 19-79 years old (mean=50.5, sd=21.0) indicated with a button press whether they recognized 120 binaurally presented consonant-vowel-consonant words during a sparse sampling fMRI experiment (TR=8 s). Word intelligibility was manipulated by low-pass filtering (cutoff frequencies of 400 Hz, 1000 Hz, 1600 Hz, and 3150 Hz). Group analyses revealed a significant positive correlation between phonotactic frequency and LIFG activity, which was unaffected by age and hearing thresholds. A region of interest analysis revealed that the relation between phonotactic frequency and LIFG activity was significantly strengthened for the most intelligible words (low-pass cutoff at 3150 Hz). These results suggest that the responsiveness of the left inferior frontal cortex to phonotactic frequency reflects the downstream impact of word recognition rather than support of word recognition, at least when there are no speech production demands. Published by Elsevier Ltd.
Voiceprints: Hearing for Those Who Can't?

ERIC Educational Resources Information Center

Science News, 1979

1979-01-01

An electrical engineer and phoneticist has taught himself to read voiceprints, spectrograms that display a plot of the sound frequency of speech against time. This technique may prove useful in training the deaf to read voiceprints or to mimic natural speech. (BB)
On pressure-frequency relations in the excised larynx.

PubMed

Alipour, Fariborz; Scherer, Ronald C

2007-10-01

The purpose of this study was to find relationships between subglottal pressure (P(s)) and fundamental frequency (F(0)) of phonation in excised larynx models. This included also the relation between F(0) and its rate of change with pressure (dFdP). Canine larynges were prepared and mounted over a tapered tube that supplied pressurized, heated, and humidified air. Glottal adduction was accomplished either by using two-pronged probes to press the arytenoids together or by passing a suture to simulate lateral cricoarytenoid muscle activation. The pressure-frequency relation was obtained through a series of pressure-flow sweep experiments that were conducted for eight excised canine larynges. It was found that, at set adduction and elongation levels, the pressure-frequency relation is nonlinear, and is highly influenced by the adduction and elongation. The results indicated that for the lower phonation mode, the average rate of change of frequency with pressure was 2.9+/-0.7 Hzcm H(2)O, and for the higher mode was 5.3+/-0.5 Hzcm H(2)O for adduction changes and 8.2+/-4.4 Hzcm H(2)O for elongation changes. The results suggest that during speech and singing, the dFdP relationships are taken into account.
Impairments of speech fluency in Lewy body spectrum disorder.

PubMed

Ash, Sharon; McMillan, Corey; Gross, Rachel G; Cook, Philip; Gunawardena, Delani; Morgan, Brianna; Boller, Ashley; Siderowf, Andrew; Grossman, Murray

2012-03-01

Few studies have examined connected speech in demented and non-demented patients with Parkinson's disease (PD). We assessed the speech production of 35 patients with Lewy body spectrum disorder (LBSD), including non-demented PD patients, patients with PD dementia (PDD), and patients with dementia with Lewy bodies (DLB), in a semi-structured narrative speech sample in order to characterize impairments of speech fluency and to determine the factors contributing to reduced speech fluency in these patients. Both demented and non-demented PD patients exhibited reduced speech fluency, characterized by reduced overall speech rate and long pauses between sentences. Reduced speech rate in LBSD correlated with measures of between-utterance pauses, executive functioning, and grammatical comprehension. Regression analyses related non-fluent speech, grammatical difficulty, and executive difficulty to atrophy in frontal brain regions. These findings indicate that multiple factors contribute to slowed speech in LBSD, and this is mediated in part by disease in frontal brain regions. Copyright Â© 2011 Elsevier Inc. All rights reserved.
Function word repetitions emerge when speakers are operantly conditioned to reduce frequency of silent pauses.

PubMed

Howell, P; Sackin, S

2001-09-01

Beattie and Bradbury (1979) reported a study in which, in one condition, they punished speakers when they produced silent pauses (by lighting a light they were supposed to keep switched off). They found speakers were able to reduce silent pauses and that this was not achieved at the expense of reduced overall speech rate. They reported an unexpected increase in word repetition rate. A recent theory proposed by Howell, Au-Yeung, and Sackin (1999) predicts that the change in word repetition rate will occur on function, not content words. This hypothesis is tested and confirmed. The results are used to assess the theory and to consider practical applications of this conditioning procedure.
Speech and gait in Parkinson's disease: When rhythm matters.

PubMed

Ricciardi, Lucia; Ebreo, Michela; Graziosi, Adriana; Barbuto, Marianna; Sorbera, Chiara; Morgante, Letterio; Morgante, Francesca

2016-11-01

Speech disturbances in Parkinson's disease (PD) are heterogeneous, ranging from hypokinetic to hyperkinetic types. Repetitive speech disorder has been demonstrated in more advanced disease stages and has been considered the speech equivalent of freezing of gait (FOG). We aimed to verify a possible relationship between speech and FOG in patients with PD. Forty-three consecutive PD patients and 20 healthy control subjects underwent standardized speech evaluation using the Italian version of the Dysarthria Profile (DP), for its motor component, and subsets of the Battery for the Analysis of the Aphasic Deficit (BADA), for its procedural component. DP is a scale composed of 7 sub-sections assessing different features of speech; the rate/prosody section of DP includes items investigating the presence of repetitive speech disorder. Severity of FOG was evaluated with the new freezing of gait questionnaire (NFGQ). PD patients performed worse at DP and BADA compared to healthy controls; patients with FOG or with Hoehn-Yahr >2 reported lower scores in the articulation, intellibility, rate/prosody sections of DP and in the semantic verbal fluency test. Logistic regression analysis showed that only age and rate/prosody scores were significantly associated to FOG in PD. Multiple regression analysis showed that only the severity of FOG was associated to rate/prosody score. Our data demonstrate that repetitive speech disorder is related to FOG and is associated to advanced disease stages and independent of disease duration. Speech dysfluency represents a disorder of motor speech control, possibly sharing pathophysiological mechanisms with FOG. Copyright © 2016 Elsevier Ltd. All rights reserved.
Motor speech signature of behavioral variant frontotemporal dementia: Refining the phenotype.

PubMed

Vogel, Adam P; Poole, Matthew L; Pemberton, Hugh; Caverlé, Marja W J; Boonstra, Frederique M C; Low, Essie; Darby, David; Brodtmann, Amy

2017-08-22

To provide a comprehensive description of motor speech function in behavioral variant frontotemporal dementia (bvFTD). Forty-eight individuals (24 bvFTD and 24 age- and sex-matched healthy controls) provided speech samples. These varied in complexity and thus cognitive demand. Their language was assessed using the Progressive Aphasia Language Scale and verbal fluency tasks. Speech was analyzed perceptually to describe the nature of deficits and acoustically to quantify differences between patients with bvFTD and healthy controls. Cortical thickness and subcortical volume derived from MRI scans were correlated with speech outcomes in patients with bvFTD. Speech of affected individuals was significantly different from that of healthy controls. The speech signature of patients with bvFTD is characterized by a reduced rate (75%) and accuracy (65%) on alternating syllable production tasks, and prosodic deficits including reduced speech rate (45%), prolonged intervals (54%), and use of short phrases (41%). Groups differed on acoustic measures derived from the reading, unprepared monologue, and diadochokinetic tasks but not the days of the week or sustained vowel tasks. Variability of silence length was associated with cortical thickness of the inferior frontal gyrus and insula and speech rate with the precentral gyrus. One in 8 patients presented with moderate speech timing deficits with a further two-thirds rated as mild or subclinical. Subtle but measurable deficits in prosody are common in bvFTD and should be considered during disease management. Language function correlated with speech timing measures derived from the unprepared monologue only. © 2017 American Academy of Neurology.
Long-term effectiveness of the SpeechEasy fluency-enhancement device.

PubMed

Gallop, Ronald F; Runyan, Charles M

2012-12-01

The SpeechEasy has been found to be an effective device for reduction of stuttering frequency for many people who stutter (PWS); published studies typically have compared stuttering reduction at initial fitting of the device to results achieved up to one year later. This study examines long-term effectiveness by examining whether effects of the SpeechEasy were maintained for longer periods, from 13 to 59 months. Results indicated no significant change for seven device users from post-fitting to the time of the study (t=-.074, p=.943); however, findings varied greatly on a case-by-case basis. Most notably, when stuttering frequency for eleven users and former users, prior to device fitting, was compared to current stuttering frequency while not wearing the device, the change over time was found to be statistically significant (t=2.851, p=.017), suggesting a carry-over effect of the device. There was no significant difference in stuttering frequency when users were wearing versus not wearing the device currently (t=1.949, p=0.92). Examinations of these results, as well as direction for future research, are described herein. The reader will be able to: (a) identify and briefly describe two types of altered auditory feedback which the SpeechEasy incorporates in order to help reduce stuttering; (b) describe the carry-over effect found in this study, suggest effectiveness associated with the device over a longer period of time than previously reported, as well as its implications, and (c) list factors that might be assessed in future research involving this device in order to more narrowly determine which prospective users are most likely to benefit from employing the SpeechEasy. Copyright © 2012 Elsevier Inc. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.