Formant-frequency variation and its effects on across-formant grouping in speech perception.
Roberts, Brian; Summers, Robert J; Bailey, Peter J
2013-01-01
How speech is separated perceptually from other speech remains poorly understood. In a series of experiments, perceptual organisation was probed by presenting three-formant (F1+F2+F3) analogues of target sentences dichotically, together with a competitor for F2 (F2C), or for F2+F3, which listeners must reject to optimise recognition. To control for energetic masking, the competitor was always presented in the opposite ear to the corresponding target formant(s). Sine-wave speech was used initially, and different versions of F2C were derived from F2 using separate manipulations of its amplitude and frequency contours. F2Cs with time-varying frequency contours were highly effective competitors, whatever their amplitude characteristics, whereas constant-frequency F2Cs were ineffective. Subsequent studies used synthetic-formant speech to explore the effects of manipulating the rate and depth of formant-frequency change in the competitor. Competitor efficacy was not tuned to the rate of formant-frequency variation in the target sentences; rather, the reduction in intelligibility increased with competitor rate relative to the rate for the target sentences. Therefore, differences in speech rate may not be a useful cue for separating the speech of concurrent talkers. Effects of competitors whose depth of formant-frequency variation was scaled by a range of factors were explored using competitors derived either by inverting the frequency contour of F2 about its geometric mean (plausibly speech-like pattern) or by using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Competitor efficacy depended on the overall depth of frequency variation, not depth relative to that for the other formants. Furthermore, the triangle-wave competitors were as effective as their more speech-like counterparts. Overall, the results suggest that formant-frequency variation is critical for the across-frequency grouping of formants but that this grouping does not depend on speech-specific constraints.
Effects of the rate of formant-frequency variation on the grouping of formants in speech perception.
Summers, Robert J; Bailey, Peter J; Roberts, Brian
2012-04-01
How speech is separated perceptually from other speech remains poorly understood. Recent research suggests that the ability of an extraneous formant to impair intelligibility depends on the modulation of its frequency, but not its amplitude, contour. This study further examined the effect of formant-frequency variation on intelligibility by manipulating the rate of formant-frequency change. Target sentences were synthetic three-formant (F1 + F2 + F3) analogues of natural utterances. Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3C; F2 + F3), where F2C + F3C constitute a competitor for F2 and F3 that listeners must reject to optimize recognition. Competitors were derived using formant-frequency contours extracted from extended passages spoken by the same talker and processed to alter the rate of formant-frequency variation, such that rate scale factors relative to the target sentences were 0, 0.25, 0.5, 1, 2, and 4 (0 = constant frequencies). Competitor amplitude contours were either constant, or time-reversed and rate-adjusted in parallel with the frequency contour. Adding a competitor typically reduced intelligibility; this reduction increased with competitor rate until the rate was at least twice that of the target sentences. Similarity in the results for the two amplitude conditions confirmed that formant amplitude contours do not influence across-formant grouping. The findings indicate that competitor efficacy is not tuned to the rate of the target sentences; most probably, it depends primarily on the overall rate of frequency variation in the competitor formants. This suggests that, when segregating the speech of concurrent talkers, differences in speech rate may not be a significant cue for across-frequency grouping of formants.
Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.
Gelfer, Marylou Pausewang; Bennett, Quinn E
2013-09-01
The purpose of the present study was to investigate the contribution of vowel formant frequencies to gender identification in connected speech, the distinctiveness of vowel formants in males versus females, and how ambiguous speaking fundamental frequencies (SFFs) and vowel formants might affect perception of gender. Multivalent experimental. Speakers subjects (eight tall males, eight short females, and seven males and seven females of "middle" height) were recorded saying two carrier phrases to elicit the vowels /i/ and /α/ and a sentence. The gender/height groups were selected to (presumably) maximize formant differences between some groups (tall vs short) and minimize differences between others (middle height). Each subjects' samples were digitally altered to distinct SFFs (116, 145, 155, 165, and 207 Hz) to represent SFFs typical of average males, average females, and in an ambiguous range. Listeners judged the gender of each randomized altered speech sample. Results indicated that female speakers were perceived as female even with an SFF in the typical male range. For male speakers, gender perception was less accurate at SFFs of 165 Hz and higher. Although the ranges of vowel formants had considerable overlap between genders, significant differences in formant frequencies of males and females were seen. Vowel formants appeared to be important to perception of gender, especially for SFFs in the range of 145-165 Hz; however, formants may be a more salient cue in connected speech when compared with isolated vowels or syllables. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Henry, Kenneth S; Amburgey, Kassidy N; Abrams, Kristina S; Idrobo, Fabio; Carney, Laurel H
2017-10-01
Vowels are complex sounds with four to five spectral peaks known as formants. The frequencies of the two lowest formants, F1and F2, are sufficient for vowel discrimination. Behavioral studies show that many birds and mammals can discriminate vowels. However, few studies have quantified thresholds for formant-frequency discrimination. The present study examined formant-frequency discrimination in budgerigars (Melopsittacus undulatus) and humans using stimuli with one or two formants and a constant fundamental frequency of 200 Hz. Stimuli had spectral envelopes similar to natural speech and were presented with random level variation. Thresholds were estimated for frequency discrimination of F1, F2, and simultaneous F1 and F2 changes. The same two-down, one-up tracking procedure and single-interval, two-alternative task were used for both species. Formant-frequency discrimination thresholds were as sensitive in budgerigars as in humans and followed the same patterns across all conditions. Thresholds expressed as percent frequency difference were higher for F1 than for F2, and were unchanged between stimuli with one or two formants. Thresholds for simultaneous F1 and F2 changes indicated that discrimination was based on combined information from both formant regions. Results were consistent with previous human studies and show that budgerigars provide an exceptionally sensitive animal model of vowel feature discrimination.
Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques.
Fitch, W T
1997-08-01
Body weight, length, and vocal tract length were measured for 23 rhesus macaques (Macaca mulatta) of various sizes using radiographs and computer graphic techniques. linear predictive coding analysis of tape-recorded threat vocalizations were used to determine vocal tract resonance frequencies ("formants") for the same animals. A new acoustic variable is proposed, "formant dispersion," which should theoretically depend upon vocal tract length. Formant dispersion is the averaged difference between successive formant frequencies, and was found to be closely tied to both vocal tract length and body size. Despite the common claim that voice fundamental frequency (F0) provides an acoustic indication of body size, repeated investigations have failed to support such a relationship in many vertebrate species including humans. Formant dispersion, unlike voice pitch, is proposed to be a reliable predictor of body size in macaques, and probably many other species.
Maxillary arch dimensions associated with acoustic parameters in prepubertal children.
Hamdan, Abdul-Latif; Khandakji, Mohannad; Macari, Anthony Tannous
2018-04-18
To evaluate the association between maxillary arch dimensions and fundamental frequency and formants of voice in prepubertal subjects. Thirty-five consecutive prepubertal patients seeking orthodontic treatment were recruited (mean age = 11.41 ± 1.46 years; range, 8 to 13.7 years). Participants with a history of respiratory infection, laryngeal manipulation, dysphonia, congenital facial malformations, or history of orthodontic treatment were excluded. Dental measurements included maxillary arch length, perimeter, depth, and width. Voice parameters comprising fundamental frequency (f0_sustained), Habitual pitch (f0_count), Jitter, Shimmer, and different formant frequencies (F1, F2, F3, and F4) were measured using acoustic analysis prior to initiation of any orthodontic treatment. Pearson's correlation coefficients were used to measure the strength of associations between different dental and voice parameters. Multiple linear regressions were computed for the predictions of different dental measurements. Arch width and arch depth had moderate significant negative correlations with f0 ( r = -0.52; P = .001 and r = -0.39; P = .022, respectively) and with habitual frequency ( r = -0.51; P = .0014 and r = -0.34; P = .04, respectively). Arch depth and arch length were significantly correlated with formant F3 and formant F4, respectively. Predictors of arch depth included frequencies of F3 vowels, with a significant regression equation ( P-value < .001; R 2 = 0.49). Similarly, fundamental frequency f0 and frequencies of formant F3 vowels were predictors of arch width, with a significant regression equation ( P-value < .001; R 2 = 0.37). There is a significant association between arch dimensions, particularly arch length and depth, and voice parameters. The formant most predictive of arch depth and width is the third formant, along with fundamental frequency of voice.
Formant frequencies in country singers' speech and singing.
Stone, R E; Cleveland, T F; Sundberg, J
1999-06-01
In previous investigations breathing kinematics, subglottal pressures, and voice source characteristics of a group of premier country singers have been analyzed. The present study complements the description of these singers' voice properties by examining the formant frequencies in five of these country singers' spoken and sung versions of the national anthem and of a song of their own choosing. The formant frequencies were measured for identical phonemes under both conditions. Comparisons revealed that the singers used the same or slightly higher formant frequencies when they were singing than when they were speaking. The differences may be related to the higher fundamental frequency in singing. These findings are in good agreement with previous observations regarding breathing, subglottal pressures, and voice source, but are in marked contrast to what has been found for classically trained singers.
Formant frequencies in Middle Eastern singers.
Hamdan, Abdul-latif; Tabri, Dollen; Deeb, Reem; Rifai, Hani; Rameh, Charbel; Fuleihan, Nabil
2008-01-01
This work was conducted to describe the formant frequencies in a group of Middle Eastern singers and to look for the presence of the singer's formant described in operatic singers. A total of 13 Middle Eastern singers were enrolled in this study. There were 5 men and 8 women. Descriptive analysis was performed to report the various formants (F1, F2, F3, and F4) in both speaking and singing. The Wilcoxon test was used to compare the means of the formants under both conditions. For both sexes combined, for the /a/ vowel, F1 singing was significantly lower than F1 speaking (P = .05) and F3 singing was significantly higher than F3 speaking (P = .046). For the /u/ vowel, only F2 singing was significantly higher than F2 speaking (P = .012). For the /i/ vowel, both F2 and F3 singing were significantly lower than F2 and F3 speaking, respectively (P = .006 and .012, respectively). There was no clustering of the formants in any of the Middle Eastern sung vowels. Formant frequencies for the vowels /a/, /i/, and /u/ differ between Middle Eastern singing vs speaking. There is absence of the singer's formant.
Comparison of formant detection methods used in speech processing applications
NASA Astrophysics Data System (ADS)
Belean, Bogdan
2013-11-01
The paper describes time frequency representations of speech signal together with the formant significance in speech processing applications. Speech formants can be used in emotion recognition, sex discrimination or diagnosing different neurological diseases. Taking into account the various applications of formant detection in speech signal, two methods for detecting formants are presented. First, the poles resulted after a complex analysis of LPC coefficients are used for formants detection. The second approach uses the Kalman filter for formant prediction along the speech signal. Results are presented for both approaches on real life speech spectrograms. A comparison regarding the features of the proposed methods is also performed, in order to establish which method is more suitable in case of different speech processing applications.
The role of fundamental frequency and formants in the perception of speaker sex
NASA Astrophysics Data System (ADS)
Hillenbrand, James M.
2005-09-01
The purpose of this study was to determine the relative contributions of fundamental frequency (F0) and formants in controlling the speaker-sex percept. A source-filter synthesizer was used to create four versions of 25 sentences spoken by men: (1) unmodified synthesis; (2) F0 only shifted up toward values typical of women; (3) formants only shifted up toward values typical of women; and (4) both F0 and formants shifted up. Identical methods were used to generate four comparable versions of 25 sentences spoken by women (e.g., unmodified synthesis, F0 only shifted down toward values typical of men, etc.). Listening tests showed: (1) perceived talker sex for the unmodified synthesis conditions was nearly always correct; (2) shifting both F0 and formants was usually effective (~82%) in changing the perceived sex of the utterance; (3) shifting either F0 or formants alone was usually ineffective in changing the perceived sex of the utterance. Both F0 and formants are apparently needed to specify speaker sex, though even together these cues are not entirely effective. Results also suggested that F0 is just slightly more important than formants, despite the fact that the male-female difference in F0 is proportionally much larger than the difference in formants. [Work supported by NIH.
The effect of change in spectral slope and formant frequencies on the perception of loudness.
Duvvuru, Sirisha; Erickson, Molly
2013-11-01
This study attempts to understand how changes in spectral slope and formant frequency influence changes in perceived loudness. It was hypothesized that voices synthesized with steeper spectral slopes will be perceived as less loud than voices synthesized with less steep spectral slopes, in spite of the fact that they are of equal root mean square (RMS) amplitude. It was also hypothesized that stimuli with higher formant patterns will be perceived as louder than those with lower formant patterns, in spite of the fact that they are of equal RMS amplitude. Repeated measures factorial design. For the pitches A3, C4, B4, and F5, three different source signals were synthesized with varying slopes of -9, -12, and -15 dB/octave using a frequency vibrato rate of 5.6 Hz and a frequency vibrato extent of 50 cents. Each of the three source signals were filtered using two formant patterns, a lower formant pattern typical of a mezzo-soprano (pattern A) and a higher formant pattern typical of a soprano (pattern B) for the vowel /a/. For each pitch, the six stimuli were combined into all possible pairs and normalized to equal RMS amplitude. Listeners were presented with 120 paired stimuli (60 pairs repeated twice). The listener's task was to indicate whether the first or second stimulus in the pair was louder. Generally, as the spectral slope decreased, perceived loudness increased, with the magnitude of the perceived difference in loudness being related to the degree of difference in spectral slope. Likewise, at all pitches except A3, perceived loudness increased as formant frequency increased. RMS amplitude is an important predictor of loudness perception, but many other factors also affect the perception of this important vocal parameter. Spectral composition is one such factor and must be considered when using loudness perception in the process of clinical diagnostics. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Rendall, Drew; Kollias, Sophie; Ney, Christina; Lloyd, Peter
2005-02-01
Key voice features-fundamental frequency (F0) and formant frequencies-can vary extensively between individuals. Much of the variation can be traced to differences in the size of the larynx and vocal-tract cavities, but whether these differences in turn simply reflect differences in speaker body size (i.e., neutral vocal allometry) remains unclear. Quantitative analyses were therefore undertaken to test the relationship between speaker body size and voice F0 and formant frequencies for human vowels. To test the taxonomic generality of the relationships, the same analyses were conducted on the vowel-like grunts of baboons, whose phylogenetic proximity to humans and similar vocal production biology and voice acoustic patterns recommend them for such comparative research. For adults of both species, males were larger than females and had lower mean voice F0 and formant frequencies. However, beyond this, F0 variation did not track body-size variation between the sexes in either species, nor within sexes in humans. In humans, formant variation correlated significantly with speaker height but only in males and not in females. Implications for general vocal allometry are discussed as are implications for speech origins theories, and challenges to them, related to laryngeal position and vocal tract length. .
Are men better than women at acoustic size judgements?
Charlton, Benjamin D; Taylor, Anna M; Reby, David
2013-08-23
Formants are important phonetic elements of human speech that are also used by humans and non-human mammals to assess the body size of potential mates and rivals. As a consequence, it has been suggested that formant perception, which is crucial for speech perception, may have evolved through sexual selection. Somewhat surprisingly, though, no previous studies have examined whether sexes differ in their ability to use formants for size evaluation. Here, we investigated whether men and women differ in their ability to use the formant frequency spacing of synthetic vocal stimuli to make auditory size judgements over a wide range of fundamental frequencies (the main determinant of vocal pitch). Our results reveal that men are significantly better than women at comparing the apparent size of stimuli, and that lower pitch improves the ability of both men and women to perform these acoustic size judgements. These findings constitute the first demonstration of a sex difference in formant perception, and lend support to the idea that acoustic size normalization, a crucial prerequisite for speech perception, may have been sexually selected through male competition. We also provide the first evidence that vocalizations with relatively low pitch improve the perception of size-related formant information.
A Chinese alligator in heliox: formant frequencies in a crocodilian
Reber, Stephan A.; Nishimura, Takeshi; Janisch, Judith; Robertson, Mark; Fitch, W. Tecumseh
2015-01-01
ABSTRACT Crocodilians are among the most vocal non-avian reptiles. Adults of both sexes produce loud vocalizations known as ‘bellows’ year round, with the highest rate during the mating season. Although the specific function of these vocalizations remains unclear, they may advertise the caller's body size, because relative size differences strongly affect courtship and territorial behaviour in crocodilians. In mammals and birds, a common mechanism for producing honest acoustic signals of body size is via formant frequencies (vocal tract resonances). To our knowledge, formants have to date never been documented in any non-avian reptile, and formants do not seem to play a role in the vocalizations of anurans. We tested for formants in crocodilian vocalizations by using playbacks to induce a female Chinese alligator (Alligator sinensis) to bellow in an airtight chamber. During vocalizations, the animal inhaled either normal air or a helium/oxygen mixture (heliox) in which the velocity of sound is increased. Although heliox allows normal respiration, it alters the formant distribution of the sound spectrum. An acoustic analysis of the calls showed that the source signal components remained constant under both conditions, but an upward shift of high-energy frequency bands was observed in heliox. We conclude that these frequency bands represent formants. We suggest that crocodilian vocalizations could thus provide an acoustic indication of body size via formants. Because birds and crocodilians share a common ancestor with all dinosaurs, a better understanding of their vocal production systems may also provide insight into the communication of extinct Archosaurians. PMID:26246611
Optimizing Vowel Formant Measurements in Four Acoustic Analysis Systems for Diverse Speaker Groups
Derdemezis, Ekaterini; Kent, Ray D.; Fourakis, Marios; Reinicke, Emily L.; Bolt, Daniel M.
2016-01-01
Purpose This study systematically assessed the effects of select linear predictive coding (LPC) analysis parameter manipulations on vowel formant measurements for diverse speaker groups using 4 trademarked Speech Acoustic Analysis Software Packages (SAASPs): CSL, Praat, TF32, and WaveSurfer. Method Productions of 4 words containing the corner vowels were recorded from 4 speaker groups with typical development (male and female adults and male and female children) and 4 speaker groups with Down syndrome (male and female adults and male and female children). Formant frequencies were determined from manual measurements using a consensus analysis procedure to establish formant reference values, and from the 4 SAASPs (using both the default analysis parameters and with adjustments or manipulations to select parameters). Smaller differences between values obtained from the SAASPs and the consensus analysis implied more optimal analysis parameter settings. Results Manipulations of default analysis parameters in CSL, Praat, and TF32 yielded more accurate formant measurements, though the benefit was not uniform across speaker groups and formants. In WaveSurfer, manipulations did not improve formant measurements. Conclusions The effects of analysis parameter manipulations on accuracy of formant-frequency measurements varied by SAASP, speaker group, and formant. The information from this study helps to guide clinical and research applications of SAASPs. PMID:26501214
A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation
NASA Astrophysics Data System (ADS)
Apostol, Lian; Perrier, Pascal; Bailly, Gérard
2004-01-01
A method is proposed to model the interspeaker variability of formant patterns for oral vowels. It is assumed that this variability originates in the differences existing among speakers in the respective lengths of their front and back vocal-tract cavities. In order to characterize, from the spectral description of the acoustic speech signal, these vocal-tract differences between speakers, each formant is interpreted, according to the concept of formant-cavity affiliation, as a resonance of a specific vocal-tract cavity. Its frequency can thus be directly related to the corresponding cavity length, and a transformation model can be proposed from a speaker A to a speaker B on the basis of the frequency ratios of the formants corresponding to the same resonances. In order to minimize the number of sounds to be recorded for each speaker in order to carry out this speaker transformation, the frequency ratios are exactly computed only for the three extreme cardinal vowels [eye, aye, you] and they are approximated for the remaining vowels through an interpolation function. The method is evaluated through its capacity to transform the (F1,F2) formant patterns of eight oral vowels pronounced by five male speakers into the (F1,F2) patterns of the corresponding vowels generated by an articulatory model of the vocal tract. The resulting formant patterns are compared to those provided by normalization techniques published in the literature. The proposed method is found to be efficient, but a number of limitations are also observed and discussed. These limitations can be associated with the formant-cavity affiliation model itself or with a possible influence of speaker-specific vocal-tract geometry in the cross-sectional direction, which the model might not have taken into account.
Function and Evolution of Vibrato-like Frequency Modulation in Mammals.
Charlton, Benjamin D; Taylor, Anna M; Reby, David
2017-09-11
Why do distantly related mammals like sheep, giant pandas, and fur seals produce bleats that are characterized by vibrato-like fundamental frequency (F0) modulation? To answer this question, we used psychoacoustic tests and comparative analyses to investigate whether this distinctive vocal feature has evolved to improve the perception of formants, key acoustic components of animal calls that encode important information about the caller's size and identity [1]. Psychoacoustic tests on humans confirmed that vibrato-like F0 modulation improves the ability of listeners to detect differences in the formant patterns of synthetic bleat-like stimuli. Subsequent phylogenetically controlled comparative analyses revealed that vibrato-like F0 modulation has evolved independently in six mammalian orders in vocal signals with relatively high F0 and, therefore, low spectral density (i.e., less harmonic overtones). We also found that mammals modulate the vibrato in these calls over greater frequency extents when the number of harmonic overtones per formant is low, suggesting that this is a mechanism to improve formant perception in calls with low spectral density. Our findings constitute the first evidence that formant perception in non-speech sounds is improved by fundamental frequency modulation and provide a mechanism for the convergent evolution of bleat-like calls in mammals. They also indicate that selection pressures for animals to transmit important information encoded by formant frequencies (on size and identity, for example) are likely to have been a key driver in the evolution of mammal vocal diversity. Copyright © 2017 Elsevier Ltd. All rights reserved.
Volitional exaggeration of body size through fundamental and formant frequency modulation in humans
Pisanski, Katarzyna; Mora, Emanuel C.; Pisanski, Annette; Reby, David; Sorokowski, Piotr; Frackowiak, Tomasz; Feinberg, David R.
2016-01-01
Several mammalian species scale their voice fundamental frequency (F0) and formant frequencies in competitive and mating contexts, reducing vocal tract and laryngeal allometry thereby exaggerating apparent body size. Although humans’ rare capacity to volitionally modulate these same frequencies is thought to subserve articulated speech, the potential function of voice frequency modulation in human nonverbal communication remains largely unexplored. Here, the voices of 167 men and women from Canada, Cuba, and Poland were recorded in a baseline condition and while volitionally imitating a physically small and large body size. Modulation of F0, formant spacing (∆F), and apparent vocal tract length (VTL) were measured using Praat. Our results indicate that men and women spontaneously and systemically increased VTL and decreased F0 to imitate a large body size, and reduced VTL and increased F0 to imitate small size. These voice modulations did not differ substantially across cultures, indicating potentially universal sound-size correspondences or anatomical and biomechanical constraints on voice modulation. In each culture, men generally modulated their voices (particularly formants) more than did women. This latter finding could help to explain sexual dimorphism in F0 and formants that is currently unaccounted for by sexual dimorphism in human vocal anatomy and body size. PMID:27687571
The analysis and detection of hypernasality based on a formant extraction algorithm
NASA Astrophysics Data System (ADS)
Qian, Jiahui; Fu, Fanglin; Liu, Xinyi; He, Ling; Yin, Heng; Zhang, Han
2017-08-01
In the clinical practice, the effective assessment of cleft palate speech disorders is important. For hypernasal speech, the resonance between nasal cavity and oral cavity causes an additional nasal formant. Thus, the formant frequency is a crucial cue for the judgment of hypernasality in cleft palate speech. Due to the existence of nasal formant, the peak merger occurs to the spectrum of nasal speech more often. However, the peak merger could not be solved by classical linear prediction coefficient root extraction method. In this paper, a method is proposed to detect the additional nasal formant in low-frequency region and obtain the formant frequency. The experiment results show that the proposed method could locate the nasal formant preferably. Moreover, the formants are regarded as the extraction features to proceed the detection of hypernasality. 436 phonemes, which are collected from Hospital of Stomatology, are used to carry out the experiment. The detection accuracy of hypernasality in cleft palate speech is 95.2%.
NASA Astrophysics Data System (ADS)
DeRosa, Angela
The present study analyzed the acoustic and perceptual differences in non-singer's singing voice before and after a vocal warm-up. Experiments were conducted with 12 females who had no singing experience and considered themselves to be non-singers. Participants were recorded performing 3 tasks: a musical scale stretching to their most comfortable high and low pitches, sustained productions of the vowels /a/ and /i/, and singing performance of the "Star Spangled Banner." Participants were recorded performing these three tasks before a vocal warm-up, after a vocal warm-up, and then again 2-3 weeks later after 2-3 weeks of practice. Acoustical analysis consisted of formant frequency analysis, singer's formant/singing power ratio analysis, maximum phonation frequency range analysis, and an analysis of jitter, noise to harmonic ratio (NHR), relative average perturbation (RAP), and voice turbulence index (VTI). A perceptual analysis was also conducted with 12 listeners rating comparison performances of before vs. after the vocal warm-up, before vs. after the second vocal warm-up, and after both vocal warm-ups. There were no significant findings for the formant frequency analysis of the vowel /a/, but there was significance for the 1st formant frequency analysis of the vowel /i/. Singer's formant analyzed via Singing Power Ratio analysis showed significance only for the vowel /i/. Maximum phonation frequency range analysis showed a significant increase after the vocal warm-ups. There were no significant findings for the acoustic measures of jitter, NHR, RAP, and VTI. Perceptual analysis showed a significant difference after a vocal warm-up. The results indicate that a singing vocal warm-up can have a significant positive influence on the singing voice of non-singers.
Schenk, Barbara S; Baumgartner, Wolf Dieter; Hamzavi, Jafar Sasan
2003-12-01
The most obvious and best documented changes in speech of postlingually deafened speakers are the rate, fundamental frequency, and volume (energy). These changes are due to the lack of auditory feedback. But auditory feedback affects not only the suprasegmental parameters of speech. The aim of this study was to determine the change at the segmental level of speech in terms of vowel formants. Twenty-three postlingually deafened and 18 normally hearing speakers were recorded reading a German text. The frequencies of the first and second formants and the vowel spaces of selected vowels in word-in-context condition were compared. All first formant frequencies (F1) of the postlingually deafened speakers were significantly different from those of the normally hearing people. The values of F1 were higher for the vowels /e/ (418+/-61 Hz compared with 359+/-52 Hz, P=0.006) and /o/ (459+/-58 compared with 390+/-45 Hz, P=0.0003) and lower for /a/ (765+/-115 Hz compared with 851+/-146 Hz, P=0.038). The second formant frequency (F2) only showed a significant increase for the vowel/e/(2016+/-347 Hz compared with 2279+/-250 Hz, P=0.012). The postlingually deafened people were divided into two subgroups according to duration of deafness (shorter/longer than 10 years of deafness). There was no significant difference in formant changes between the two groups. Our report demonstrated an effect of auditory feedback also on segmental features of speech of postlingually deafened people.
Formant transitions in the fluent speech of Farsi-speaking people who stutter.
Dehqan, Ali; Yadegari, Fariba; Blomgren, Michael; Scherer, Ronald C
2016-06-01
Second formant (F2) transitions can be used to infer attributes of articulatory transitions. This study compared formant transitions during fluent speech segments of Farsi (Persian) speaking people who stutter and normally fluent Farsi speakers. Ten Iranian males who stutter and 10 normally fluent Iranian males participated. Sixteen different "CVt" tokens were embedded within the phrase "Begu CVt an". Measures included overall F2 transition frequency extents, durations, and derived overall slopes, initial F2 transition slopes at 30ms and 60ms, and speaking rate. (1) Mean overall formant frequency extent was significantly greater in 14 of the 16 CVt tokens for the group of stuttering speakers. (2) Stuttering speakers exhibited significantly longer overall F2 transitions for all 16 tokens compared to the nonstuttering speakers. (3) The overall F2 slopes were similar between the two groups. (4) The stuttering speakers exhibited significantly greater initial F2 transition slopes (positive or negative) for five of the 16 tokens at 30ms and six of the 16 tokens at 60ms. (5) The stuttering group produced a slower syllable rate than the non-stuttering group. During perceptually fluent utterances, the stuttering speakers had greater F2 frequency extents during transitions, took longer to reach vowel steady state, exhibited some evidence of steeper slopes at the beginning of transitions, had overall similar F2 formant slopes, and had slower speaking rates compared to nonstuttering speakers. Findings support the notion of different speech motor timing strategies in stuttering speakers. Findings are likely to be independent of the language spoken. Educational objectives This study compares aspects of F2 formant transitions between 10 stuttering and 10 nonstuttering speakers. Readers will be able to describe: (a) characteristics of formant frequency as a specific acoustic feature used to infer speech movements in stuttering and nonstuttering speakers, (b) two methods of measuring second formant (F2) transitions: the visual criteria method and fixed time criteria method, (c) characteristics of F2 transitions in the fluent speech of stuttering speakers and how those characteristics appear to differ from normally fluent speakers, and (d) possible cross-linguistic effects on acoustic analyses of stuttering. Copyright © 2016 Elsevier Inc. All rights reserved.
Center-of-gravity effects in the perception of high front vowels
NASA Astrophysics Data System (ADS)
Jacewicz, Ewa; Feth, Lawrence L.
2002-05-01
When two formant peaks are close in frequency, changing their amplitude ratio can shift the perceived vowel quality. This center-of-gravity effect (COG) was studied particularly in back vowels whose F1 and F2 are close in frequency. Chistovich and Lublinskaja (1979) show that the effect occurs when the frequency separation between the formants does not exceed 3.5 bark. The COG and critical distance effects were manifested when a two-formant reference signal was matched by a single-formant target of variable frequency. This study investigates whether the COG effect extends to closely spaced higher formants as in English /i/ and /I/. In /i/, the frequency separation between F2, F3, and F4 does not exceed 3.5 bark, suggesting the existence of one COG which may affect all three closely spaced formants (F2=2030, F3=2970, F4=3400 Hz). In /I/, each of the F2-F3 and F3-F4 separations is less than 3.5 bark but the F2-F4 separation exceeds the critical distance, indicating two COGs (F2=1780, F3=2578, F4=3400 Hz). We examine the COG effects using matching of four-formant reference signals, in which we change the amplitude ratios, by two-formant targets with variable frequency of F2. The double-staircase adaptive procedure is used. [Work supported by an INRS award from NIH to R. Fox.
On Short-Time Estimation of Vocal Tract Length from Formant Frequencies
Lammert, Adam C.; Narayanan, Shrikanth S.
2015-01-01
Vocal tract length is highly variable across speakers and determines many aspects of the acoustic speech signal, making it an essential parameter to consider for explaining behavioral variability. A method for accurate estimation of vocal tract length from formant frequencies would afford normalization of interspeaker variability and facilitate acoustic comparisons across speakers. A framework for considering estimation methods is developed from the basic principles of vocal tract acoustics, and an estimation method is proposed that follows naturally from this framework. The proposed method is evaluated using acoustic characteristics of simulated vocal tracts ranging from 14 to 19 cm in length, as well as real-time magnetic resonance imaging data with synchronous audio from five speakers whose vocal tracts range from 14.5 to 18.0 cm in length. Evaluations show improvements in accuracy over previously proposed methods, with 0.631 and 1.277 cm root mean square error on simulated and human speech data, respectively. Empirical results show that the effectiveness of the proposed method is based on emphasizing higher formant frequencies, which seem less affected by speech articulation. Theoretical predictions of formant sensitivity reinforce this empirical finding. Moreover, theoretical insights are explained regarding the reason for differences in formant sensitivity. PMID:26177102
2014-01-01
How speech is separated perceptually from other speech remains poorly understood. Recent research indicates that the ability of an extraneous formant to impair intelligibility depends on the variation of its frequency contour. This study explored the effects of manipulating the depth and pattern of that variation. Three formants (F1+F2+F3) constituting synthetic analogues of natural sentences were distributed across the 2 ears, together with a competitor for F2 (F2C) that listeners must reject to optimize recognition (left = F1+F2C; right = F2+F3). The frequency contours of F1 − F3 were each scaled to 50% of their natural depth, with little effect on intelligibility. Competitors were created either by inverting the frequency contour of F2 about its geometric mean (a plausibly speech-like pattern) or using a regular and arbitrary frequency contour (triangle wave, not plausibly speech-like) matched to the average rate and depth of variation for the inverted F2C. Adding a competitor typically reduced intelligibility; this reduction depended on the depth of F2C variation, being greatest for 100%-depth, intermediate for 50%-depth, and least for 0%-depth (constant) F2Cs. This suggests that competitor impact depends on overall depth of frequency variation, not depth relative to that for the target formants. The absence of tuning (i.e., no minimum in intelligibility for the 50% case) suggests that the ability to reject an extraneous formant does not depend on similarity in the depth of formant-frequency variation. Furthermore, triangle-wave competitors were as effective as their more speech-like counterparts, suggesting that the selection of formants from the ensemble also does not depend on speech-specific constraints. PMID:24842068
Burns, P
1986-05-01
An acoustical analysis of the speaking and singing voices of two types of professional singers was conducted. The vowels /i/, /a/, and /o/ were spoken and sung ten times each by seven opera and seven country and western singers. Vowel spectra were derived by computer software techniques allowing quantitative assessment of formant structure (F1-F4), relative amplitude of resonance peaks (F1-F4), fundamental frequency, and harmonic high frequency energy. Formant analysis was the most effective parameter differentiating the two groups. Only opera singers lowered their fourth formant creating a wide-band resonance area (approximately 2,800 Hz) corresponding to the well-known "singing formant." Country and western singers revealed similar resonatory voice characteristics for both spoken and sung output. These results implicate faulty vocal technique in country and western singers as a contributory reason for vocal abuse/fatigue.
Jafari, Narges; Yadegari, Fariba; Jalaie, Shohreh
2016-11-01
Vowel production in essence is auditorily controlled; hence, the role of the auditory feedback in vowel production is very important. The purpose of this study was to compare formant frequencies and vowel space in Persian-speaking deaf children with cochlear implantation (CI), hearing-impaired children with hearing aid (HA), and their normal-hearing (NH) peers. A total of 40 prelingually children with hearing impairment and 20 NH groups participated in this study. Participants were native Persian speakers. The average of first formant frequency (F 1 ) and second formant frequency (F 2 ) of the six vowels were measured using Praat software (version 5.1.44). One-way analysis of variance (ANOVA) was used to analyze the differences between the three3 groups. The mean value of F 1 for vowel /i/ was significantly different (between CI and NH children and also between HA and NH groups) (F 2, 57 = 9.229, P < 0.001). For vowel /a/, the mean value of F 1 was significantly different (between HA and NH groups) (F 2, 57 = 3.707, P < 0.05). Regarding the second formant frequency, a post hoc Tukey test revealed that the differences were between HA and NH children (P < 0.05). F 2 for vowel /o/ was significantly different (F 2, 57 = 4.572, P < 0.05). Also, the mean value of F 2 for vowel /a/ was significantly different (F 2, 57 = 3.184, P < 0.05). About 1 year after implantation, the formants shift closer to those of the NH listeners who tend to have more expanded vowel spaces than hearing-impaired listeners with hearing aids. Probably, this condition is because CI has a subtly positive impact on the place of articulation of vowels. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Formant and voice source properties in two male Kunqu Opera roles: a pilot study.
Dong, Li; Sundberg, Johan; Kong, Jiangping
2013-01-01
This investigation analyzes flow glottogram and electroglottogram (EGG) parameters as well as the relationship between formant frequencies and partials in two male Kunqu Opera roles, Colorful face (CF) and Old man (OM). Four male professional Kunqu Opera singers volunteered as participants, 2 singers for each role. Using inverse filtering of the audio signal flow glottogram parameters and formant frequencies were measured in each note of scales. Two EGG parameters, contact quotient (CoQ) and speed quotient, were measured. Formant tuning was observed only in 1 of the OM singers and appeared in a pitch range lower than the passaggio range of Western male opera singers. Both the CF and the OM role singers showed high CoQ values and low values of the normalized amplitude quotient in singing. For 3 of the 4 singers CoQ and the level difference between the first and second partials showed a positive and a negative correlation with fundamental frequency (F0), respectively. Formant tuning may be applied by a singer of the OM role, and both CF and OM role singers may use a rather pressed type of phonation, CF singers more than OM singers in the lower part of the pitch range. Most singers increased glottal adduction with rising F0.
Hodges-Simeon, Carolyn R; Gurven, Michael; Puts, David A; Gaulin, Steven J C
2014-07-01
Fundamental and formant frequencies influence perceived pitch and are sexually dimorphic in humans. The information content of these acoustic parameters can illuminate the forces of sexual selection shaping vocal sex differences as well as the mechanisms that ensure signal reliability. We use multiple regression to examine the relationships between somatic (height, adiposity, and strength) and acoustic (fundamental frequency [ F 0 ], formant position [ P f ], and fundamental frequency variation [ F 0 -SD]) characteristics in a sample of peripubertal Bolivian Tsimane. Results indicate that among males-but not females-strength is the strongest predictor of F 0 and P f and that F 0 and P f are independent predictors of strength when height and adiposity are controlled. These findings suggest that listeners may attend to vocal frequencies because they signal honest, nonredundant information about male strength and threat potential, which are strongly related to physical maturity and which cannot be ascertained from visual or other indicators of height or adiposity alone.
Lee, Shao-Hsuan; Hsiao, Tzu-Yu; Lee, Guo-She
2015-06-01
Sustained vocalizations of vowels [a], [i], and syllable [mə] were collected in twenty normal-hearing individuals. On vocalizations, five conditions of different audio-vocal feedback were introduced separately to the speakers including no masking, wearing supra-aural headphones only, speech-noise masking, high-pass noise masking, and broad-band-noise masking. Power spectral analysis of vocal fundamental frequency (F0) was used to evaluate the modulations of F0 and linear-predictive-coding was used to acquire first two formants. The results showed that while the formant frequencies were not significantly shifted, low-frequency modulations (<3 Hz) of F0 significantly increased with reduced audio-vocal feedback across speech sounds and were significantly correlated with auditory awareness of speakers' own voices. For sustained speech production, the motor speech controls on F0 may depend on a feedback mechanism while articulation should rely more on a feedforward mechanism. Power spectral analysis of F0 might be applied to evaluate audio-vocal control for various hearing and neurological disorders in the future. Copyright © 2015 Elsevier B.V. All rights reserved.
Free-Ranging Male Koalas Use Size-Related Variation in Formant Frequencies to Assess Rival Males
Charlton, Benjamin D.; Whisson, Desley A.; Reby, David
2013-01-01
Although the use of formant frequencies in nonhuman animal vocal communication systems has received considerable recent interest, only a few studies have examined the importance of these acoustic cues to body size during intra-sexual competition between males. Here we used playback experiments to present free-ranging male koalas with re-synthesised bellow vocalisations in which the formants were shifted to simulate either a large or a small adult male. We found that male looking responses did not differ according to the size variant condition played back. In contrast, male koalas produced longer bellows and spent more time bellowing when they were presented with playbacks simulating larger rivals. In addition, males were significantly slower to respond to this class of playback stimuli than they were to bellows simulating small males. Our results indicate that male koalas invest more effort into their vocal responses when they are presented with bellows that have lower formants indicative of larger rivals, but also show that males are slower to engage in vocal exchanges with larger males that represent more dangerous rivals. By demonstrating that male koalas use formants to assess rivals during the breeding season we have provided evidence that male-male competition constitutes an important selection pressure for broadcasting and attending to size-related formant information in this species. Further empirical studies should investigate the extent to which the use of formants during intra-sexual competition is widespread throughout mammals. PMID:23922967
NASA Technical Reports Server (NTRS)
Lokerson, D. C. (Inventor)
1977-01-01
A speech signal is analyzed by applying the signal to formant filters which derive first, second and third signals respectively representing the frequency of the speech waveform in the first, second and third formants. A first pulse train having approximately a pulse rate representing the average frequency of the first formant is derived; second and third pulse trains having pulse rates respectively representing zero crossings of the second and third formants are derived. The first formant pulse train is derived by establishing N signal level bands, where N is an integer at least equal to two. Adjacent ones of the signal bands have common boundaries, each of which is a predetermined percentage of the peak level of a complete cycle of the speech waveform.
NASA Astrophysics Data System (ADS)
Efremova, Kseniya O.; Volodin, Ilya A.; Volodina, Elena V.; Frey, Roland; Lapshina, Ekaterina N.; Soldatova, Natalia V.
2011-11-01
In goitred gazelles ( Gazella subgutturosa), sexual dimorphism of larynx size and position is reminiscent of the case in humans, suggesting shared features of vocal ontogenesis in both species. This study investigates the ontogeny of nasal and oral calls in 23 (10 male and 13 female) individually identified goitred gazelles from shortly after birth up to adolescence. The fundamental frequency (f0) and formants were measured as the acoustic correlates of the developing sexual dimorphism. Settings for LPC analysis of formants were based on anatomical dissections of 5 specimens. Along ontogenesis, compared to females, male f0 was consistently lower both in oral and nasal calls and male formants were lower in oral calls, whereas the first two formants of nasal calls did not differ between sexes. In goitred gazelles, significant sex differences in f0 and formants appeared as early as the second week of life, while in humans they emerge only before puberty. This result suggests different pathways of vocal ontogenesis in the goitred gazelles and in humans.
Zourmand, Alireza; Ting, Hua-Nong; Mirhassani, Seyed Mostafa
2013-03-01
Speech is one of the prevalent communication mediums for humans. Identifying the gender of a child speaker based on his/her speech is crucial in telecommunication and speech therapy. This article investigates the use of fundamental and formant frequencies from sustained vowel phonation to distinguish the gender of Malay children aged between 7 and 12 years. The Euclidean minimum distance and multilayer perceptron were used to classify the gender of 360 Malay children based on different combinations of fundamental and formant frequencies (F0, F1, F2, and F3). The Euclidean minimum distance with normalized frequency data achieved a classification accuracy of 79.44%, which was higher than that of the nonnormalized frequency data. Age-dependent modeling was used to improve the accuracy of gender classification. The Euclidean distance method obtained 84.17% based on the optimal classification accuracy for all age groups. The accuracy was further increased to 99.81% using multilayer perceptron based on mel-frequency cepstral coefficients. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Human listeners attend to size information in domestic dog growls.
Taylor, Anna M; Reby, David; McComb, Karen
2008-05-01
The acoustic features of vocalizations have the potential to transmit information about the size of callers. Most acoustic studies have focused on intraspecific perceptual abilities, but here, the ability of humans to use growls to assess the size of adult domestic dogs was tested. In a first experiment, the formants of growls were shifted to create playback stimuli with different formant dispersions (Deltaf), simulating different vocal tract lengths within the natural range of variation. Mean fundamental frequency (F0) was left unchanged and treated as a covariate. In a second experiment, F0 was resynthesized and Deltaf was left unchanged. In both experiments Deltaf and F0 influenced how participants rated the size of stimuli. Lower formant and fundamental frequencies were rated as belonging to larger dogs. Crucially, when F0 was manipulated and Deltaf was natural, ratings were strongly correlated with the actual weight of the dogs, while when Deltaf was varied and F0 was natural, ratings were not related to the actual weight. Taken together, this suggests that participants relied more heavily on Deltaf, in accordance with the fact that formants are better predictors of body size than F0.
Real time speech formant analyzer and display
Holland, George E.; Struve, Walter S.; Homer, John F.
1987-01-01
A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user.
Real time speech formant analyzer and display
Holland, G.E.; Struve, W.S.; Homer, J.F.
1987-02-03
A speech analyzer for interpretation of sound includes a sound input which converts the sound into a signal representing the sound. The signal is passed through a plurality of frequency pass filters to derive a plurality of frequency formants. These formants are converted to voltage signals by frequency-to-voltage converters and then are prepared for visual display in continuous real time. Parameters from the inputted sound are also derived and displayed. The display may then be interpreted by the user. The preferred embodiment includes a microprocessor which is interfaced with a television set for displaying of the sound formants. The microprocessor software enables the sound analyzer to present a variety of display modes for interpretive and therapeutic used by the user. 19 figs.
2015-01-01
An important aspect of speech perception is the ability to group or select formants using cues in the acoustic source characteristics—for example, fundamental frequency (F0) differences between formants promote their segregation. This study explored the role of more radical differences in source characteristics. Three-formant (F1+F2+F3) synthetic speech analogues were derived from natural sentences. In Experiment 1, F1+F3 were generated by passing a harmonic glottal source (F0 = 140 Hz) through second-order resonators (H1+H3); in Experiment 2, F1+F3 were tonal (sine-wave) analogues (T1+T3). F2 could take either form (H2 or T2). In some conditions, the target formants were presented alone, either monaurally or dichotically (left ear = F1+F3; right ear = F2). In others, they were accompanied by a competitor for F2 (F1+F2C+F3; F2), which listeners must reject to optimize recognition. Competitors (H2C or T2C) were created using the time-reversed frequency and amplitude contours of F2. Dichotic presentation of F2 and F2C ensured that the impact of the competitor arose primarily through informational masking. In the absence of F2C, the effect of a source mismatch between F1+F3 and F2 was relatively modest. When F2C was present, intelligibility was lowest when F2 was tonal and F2C was harmonic, irrespective of which type matched F1+F3. This finding suggests that source type and context, rather than similarity, govern the phonetic contribution of a formant. It is proposed that wideband harmonic analogues are more effective informational maskers than narrowband tonal analogues, and so become dominant in across-frequency integration of phonetic information when placed in competition. PMID:25751040
Visualizing sound emission of elephant vocalizations: evidence for two rumble production types.
Stoeger, Angela S; Heilmann, Gunnar; Zeppelzauer, Matthias; Ganswindt, André; Hensman, Sean; Charlton, Benjamin D
2012-01-01
Recent comparative data reveal that formant frequencies are cues to body size in animals, due to a close relationship between formant frequency spacing, vocal tract length and overall body size. Accordingly, intriguing morphological adaptations to elongate the vocal tract in order to lower formants occur in several species, with the size exaggeration hypothesis being proposed to justify most of these observations. While the elephant trunk is strongly implicated to account for the low formants of elephant rumbles, it is unknown whether elephants emit these vocalizations exclusively through the trunk, or whether the mouth is also involved in rumble production. In this study we used a sound visualization method (an acoustic camera) to record rumbles of five captive African elephants during spatial separation and subsequent bonding situations. Our results showed that the female elephants in our analysis produced two distinct types of rumble vocalizations based on vocal path differences: a nasally- and an orally-emitted rumble. Interestingly, nasal rumbles predominated during contact calling, whereas oral rumbles were mainly produced in bonding situations. In addition, nasal and oral rumbles varied considerably in their acoustic structure. In particular, the values of the first two formants reflected the estimated lengths of the vocal paths, corresponding to a vocal tract length of around 2 meters for nasal, and around 0.7 meters for oral rumbles. These results suggest that African elephants may be switching vocal paths to actively vary vocal tract length (with considerable variation in formants) according to context, and call for further research investigating the function of formant modulation in elephant vocalizations. Furthermore, by confirming the use of the elephant trunk in long distance rumble production, our findings provide an explanation for the extremely low formants in these calls, and may also indicate that formant lowering functions to increase call propagation distances in this species'.
Visualizing Sound Emission of Elephant Vocalizations: Evidence for Two Rumble Production Types
Stoeger, Angela S.; Heilmann, Gunnar; Zeppelzauer, Matthias; Ganswindt, André; Hensman, Sean; Charlton, Benjamin D.
2012-01-01
Recent comparative data reveal that formant frequencies are cues to body size in animals, due to a close relationship between formant frequency spacing, vocal tract length and overall body size. Accordingly, intriguing morphological adaptations to elongate the vocal tract in order to lower formants occur in several species, with the size exaggeration hypothesis being proposed to justify most of these observations. While the elephant trunk is strongly implicated to account for the low formants of elephant rumbles, it is unknown whether elephants emit these vocalizations exclusively through the trunk, or whether the mouth is also involved in rumble production. In this study we used a sound visualization method (an acoustic camera) to record rumbles of five captive African elephants during spatial separation and subsequent bonding situations. Our results showed that the female elephants in our analysis produced two distinct types of rumble vocalizations based on vocal path differences: a nasally- and an orally-emitted rumble. Interestingly, nasal rumbles predominated during contact calling, whereas oral rumbles were mainly produced in bonding situations. In addition, nasal and oral rumbles varied considerably in their acoustic structure. In particular, the values of the first two formants reflected the estimated lengths of the vocal paths, corresponding to a vocal tract length of around 2 meters for nasal, and around 0.7 meters for oral rumbles. These results suggest that African elephants may be switching vocal paths to actively vary vocal tract length (with considerable variation in formants) according to context, and call for further research investigating the function of formant modulation in elephant vocalizations. Furthermore, by confirming the use of the elephant trunk in long distance rumble production, our findings provide an explanation for the extremely low formants in these calls, and may also indicate that formant lowering functions to increase call propagation distances in this species'. PMID:23155427
ERIC Educational Resources Information Center
Skuk, Verena G.; Schweinberger, Stefan R.
2014-01-01
Purpose: To determine the relative importance of acoustic parameters (fundamental frequency [F0], formant frequencies [FFs], aperiodicity, and spectrum level [SL]) on voice gender perception, the authors used a novel parameter-morphing approach that, unlike spectral envelope shifting, allows the application of nonuniform scale factors to transform…
Yilmaz, Atilla; Sarac, Elif Tuğba; Aydinli, Fatma Esen; Yildizgoren, Mustafa Turgut; Okuyucu, Emine Esra; Serarslan, Yurdal
2018-06-25
Parkinson's disease (PD) is the second most frequent progressive neuro-degenerative disorder. In addition to motor symptoms, nonmotor symptoms and voice and speech disorders can also develop in 90% of PD patients. The aim of our study was to investigate the effects of DBS and different DBS frequencies on speech acoustics of vowels in PD patients. The study included 16 patients who underwent STN-DBS surgery due to PD. The voice recordings for the vowels including [a], [e], [i], and [o] were performed at frequencies including 230, 130, 90, and 60 Hz and off-stimulation. The voice recordings were gathered and evaluated by the Praat software, and the effects on the first (F1), second (F2), and third formant (F3) frequencies were analyzed. A significant difference was found for the F1 value of the vowel [a] at 130 Hz compared to off-stimulation. However, no significant difference was found between the three formant frequencies with regard to the stimulation frequencies and off-stimulation. In addition, though not statistically significant, stimulation at 60 and 230 Hz led to several differences in the formant frequencies of other three vowels. Our results indicated that STN-DBS stimulation at 130 Hz had a significant positive effect on articulation of [a] compared to off-stimulation. Although there is not any statistical significant stimulation at 60 and 230 Hz may also have an effect on the articulation of [e], [i], and [o] but this effect needs to be investigated in future studies with higher numbers of participants.
Lexical effects on speech production and intelligibility in Parkinson's disease
NASA Astrophysics Data System (ADS)
Chiu, Yi-Fang
Individuals with Parkinson's disease (PD) often have speech deficits that lead to reduced speech intelligibility. Previous research provides a rich database regarding the articulatory deficits associated with PD including restricted vowel space (Skodda, Visser, & Schlegel, 2011) and flatter formant transitions (Tjaden & Wilding, 2004; Walsh & Smith, 2012). However, few studies consider the effect of higher level structural variables of word usage frequency and the number of similar sounding words (i.e. neighborhood density) on lower level articulation or on listeners' perception of dysarthric speech. The purpose of the study is to examine the interaction of lexical properties and speech articulation as measured acoustically in speakers with PD and healthy controls (HC) and the effect of lexical properties on the perception of their speech. Individuals diagnosed with PD and age-matched healthy controls read sentences with words that varied in word frequency and neighborhood density. Acoustic analysis was performed to compare second formant transitions in diphthongs, an indicator of the dynamics of tongue movement during speech production, across different lexical characteristics. Young listeners transcribed the spoken sentences and the transcription accuracy was compared across lexical conditions. The acoustic results indicate that both PD and HC speakers adjusted their articulation based on lexical properties but the PD group had significant reductions in second formant transitions compared to HC. Both groups of speakers increased second formant transitions for words with low frequency and low density, but the lexical effect is diphthong dependent. The change in second formant slope was limited in the PD group when the required formant movement for the diphthong is small. The data from listeners' perception of the speech by PD and HC show that listeners identified high frequency words with greater accuracy suggesting the use of lexical knowledge during the recognition process. The relationship between acoustic results and perceptual accuracy is limited in this study suggesting that listeners incorporate acoustic and non-acoustic information to maximize speech intelligibility.
Rendall, Drew; Vasey, Paul L; McKenzie, Jared
2008-02-01
Popular stereotypes concerning the speech of homosexuals typically attribute speech patterns characteristic of the opposite-sex, i.e., broadly feminized speech in gay men and broadly masculinized speech in lesbian women. A small body of recent empirical research has begun to address the subject more systematically and to consider specific mechanistic hypotheses to account for the potentially distinctive features of homosexual speech. Results do not yet fully endorse the stereotypes but they do not entirely discount them either; nor do they cleanly favor any single mechanistic hypothesis. To contribute to this growing body of research, we report acoustic analyses of 2,875 vowel sounds from a balanced set of 125 speakers representing heterosexual and homosexual individuals of each sex from southern Alberta, Canada. Analyses focused on voice pitch and formant frequencies which together determine the principle perceptual features of vowels. There was no significant difference in mean voice pitch between heterosexual and homosexual men or between heterosexual and homosexual women, but there were significant differences in the formant frequencies of vowels produced by both homosexual groups compared to their heterosexual counterparts. Formant frequency differences were specific to only certain vowel sounds and some could be attributed to basic differences in body size between heterosexual and homosexual speakers. The remaining formant frequency differences were not obviously due to differences in vocal tract anatomy between heterosexual and homosexual speakers, nor did they reflect global feminization or masculinization of vowel production patterns in homosexual men and women, respectively. The vowel-specific differences observed could reflect social modeling processes in which only certain speech patterns of the opposite-sex, or of same-sex homosexuals, are selectively adopted. However, we introduce an alternative biosocial hypothesis, specifically that the distinctive, vowel-specific features of homosexual speakers relative to heterosexual speakers arise incidentally as a product of broader psychobehavioral differences between the two groups that are, in turn, continuous with and flow from the physiological processes that affect sexual orientation to begin with.
NASA Astrophysics Data System (ADS)
Mattock, Karen; Rvachew, Susan; Polka, Linda; Turner, Sara
2005-04-01
It is well established that normally developing infants typically enter the canonical babbling stage of production between 6 and 8 months of age. However, whether the linguistic environment affects babbling, either in terms of the phonetic inventory of vowels produced by infants [Oller & Eiler (1982)] or the acoustics of vowel formants [Boysson-Bardies et al. (1989)] is controversial. The spontaneous speech of 42 Canadian English- and Canadian French-learning infants aged 8 to 11, 12 to 15 and 16 to 18 months of age was recorded and digitized to yield a total of 1253 vowels that were spectrally analyzed and statistically compared for differences in first and second formant frequencies. Language-specific influences on vowel acoustics were hypothesized. Preliminary results reveal changes in formant frequencies as a function of age and language background. There is evidence of decreases over age in the F1 values of French but not English infants vowels, and decreases over age in the F2 values of English but not French infants vowels. The notion of an age-related shift in infants attention to language-specific acoustic features and the implications of this for early vocal development as well as for the production of Canadian English and Canadian French vowels will be discussed.
Synthesis fidelity and time-varying spectral change in vowels
NASA Astrophysics Data System (ADS)
Assmann, Peter F.; Katz, William F.
2005-02-01
Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels. .
Speech Spectrum's Correlation with Speakers' Eysenck Personality Traits
Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue
2012-01-01
The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014
a Comparative Analysis of Fluent and Cerebral Palsied Speech.
NASA Astrophysics Data System (ADS)
van Doorn, Janis Lee
Several features of the acoustic waveforms of fluent and cerebral palsied speech were compared, using six fluent and seven cerebral palsied subjects, with a major emphasis being placed on an investigation of the trajectories of the first three formants (vocal tract resonances). To provide an overall picture which included other acoustic features, fundamental frequency, intensity, speech timing (speech rate and syllable duration), and prevocalization (vocalization prior to initial stop consonants found in cerebral palsied speech) were also investigated. Measurements were made using repetitions of a test sentence which was chosen because it required large excursions of the speech articulators (lips, tongue and jaw), so that differences in the formant trajectories for the fluent and cerebral palsied speakers would be emphasized. The acoustic features were all extracted from the digitized speech waveform (10 kHz sampling rate): the fundamental frequency contours were derived manually, the intensity contours were measured using the signal covariance, speech rate and syllable durations were measured manually, as were the prevocalization durations, while the formant trajectories were derived from short time spectra which were calculated for each 10 ms of speech using linear prediction analysis. Differences which were found in the acoustic features can be summarized as follows. For cerebral palsied speakers, the fundamental frequency contours generally showed inappropriate exaggerated fluctuations, as did some of the intensity contours; the mean fundamental frequencies were either higher or the same as for the fluent subjects; speech rates were reduced, and syllable durations were longer; prevocalization was consistently present at the beginning of the test sentence; formant trajectories were found to have overall reduced frequency ranges, and to contain anomalous transitional features, but it is noteworthy that for any one cerebral palsied subject, the inappropriate trajectory pattern was generally reproducible. The anomalous transitional features took the form of (a) inappropriate transition patterns, (b) reduced frequency excursions, (c) increased transition durations, and (d) decreased maximum rates of frequency change.
The role of first formant information in simulated electro-acoustic hearing.
Verschuur, Carl; Boland, Conor; Frost, Emily; Constable, Jack
2013-06-01
Cochlear implant (CI) recipients with residual hearing show improved performance with the addition of low-frequency acoustic stimulation (electro-acoustic stimulation, EAS). The present study sought to determine whether a synthesized first formant (F1) signal provided benefit to speech recognition in simulated EAS hearing and to compare such benefit with that from other low-frequency signals. A further aim was to determine if F1 amplitude or frequency was more important in determining benefit and if F1 benefit varied with formant bandwidth. In two experiments, sentence recordings from a male speaker were processed via a simulation of a partial insertion CI, and presented to normal hearing listeners in combination with various low-frequency signals, including a tone tracking fundamental frequency (F0), low-pass filtered speech, and signals based on F1 estimation. A simulated EAS benefit was found with F1 signals, and was similar to the benefit from F0 or low-pass filtered speech. The benefit did not differ significantly with the narrowing or widening of the F1 bandwidth. The benefit from low-frequency envelope signals was significantly less than the benefit from any low-frequency signal containing fine frequency information. Results indicate that F1 provides a benefit in simulated EAS hearing but low frequency envelope information is less important than low frequency fine structure in determining such benefit.
Formants and musical harmonics matching in Brazilian lied
NASA Astrophysics Data System (ADS)
Raposo de Medeiros, Beatriz
2004-05-01
This paper reports a comparison of the formant patterns of speech and singing. Measurements of the first three formants were made on the stable portion of the vowels. The main finding of the study is an acoustic effect that can be described as the matching of the vowel formants to the harmonics of the sung note (A flat, 420 Hz). For example, for the vowel [a], F1 generally matched with the second harmonic (840 Hz) and F2 with the third harmonic. This finding is complementary to that of Sundberg (1977) according to which the higher the fundamental frequency of the musical note, e.g., 700 Hz, the more the mandible is lowered causing the elevation of the first formant of the sung vowel. As Sundberg himself named this phenomenon, there is a matching between the first formant and the phonation frequency, causing an increase in the sound energy. The present study establishes that the matching affects not only F1 but also F2 and F3. This finding will be discussed in connection with other manoeuvres (e.g., tongue movements) used by singers.
Harnsberger, James D.; Svirsky, Mario A.; Kaiser, Adam R.; Pisoni, David B.; Wright, Richard; Meyer, Ted A.
2012-01-01
Cochlear implant (CI) users differ in their ability to perceive and recognize speech sounds. Two possible reasons for such individual differences may lie in their ability to discriminate formant frequencies or to adapt to the spectrally shifted information presented by cochlear implants, a basalward shift related to the implant’s depth of insertion in the cochlea. In the present study, we examined these two alternatives using a method-of-adjustment (MOA) procedure with 330 synthetic vowel stimuli varying in F1 and F2 that were arranged in a two-dimensional grid. Subjects were asked to label the synthetic stimuli that matched ten monophthongal vowels in visually presented words. Subjects then provided goodness ratings for the stimuli they had chosen. The subjects’ responses to all ten vowels were used to construct individual perceptual “vowel spaces.” If CI users fail to adapt completely to the basalward spectral shift, then the formant frequencies of their vowel categories should be shifted lower in both F1 and F2. However, with one exception, no systematic shifts were observed in the vowel spaces of CI users. Instead, the vowel spaces differed from one another in the relative size of their vowel categories. The results suggest that differences in formant frequency discrimination may account for the individual differences in vowel perception observed in cochlear implant users. PMID:11386565
Acoustic analysis of speech under stress.
Sondhi, Savita; Khan, Munna; Vijay, Ritu; Salhan, Ashok K; Chouhan, Satish
2015-01-01
When a person is emotionally charged, stress could be discerned in his voice. This paper presents a simplified and a non-invasive approach to detect psycho-physiological stress by monitoring the acoustic modifications during a stressful conversation. Voice database consists of audio clips from eight different popular FM broadcasts wherein the host of the show vexes the subjects who are otherwise unaware of the charade. The audio clips are obtained from real-life stressful conversations (no simulated emotions). Analysis is done using PRAAT software to evaluate mean fundamental frequency (F0) and formant frequencies (F1, F2, F3, F4) both in neutral and stressed state. Results suggest that F0 increases with stress; however, formant frequency decreases with stress. Comparison of Fourier and chirp spectra of short vowel segment shows that for relaxed speech, the two spectra are similar; however, for stressed speech, they differ in the high frequency range due to increased pitch modulation.
Formant characteristics of human laughter.
Szameitat, Diana P; Darwin, Chris J; Szameitat, André J; Wildgruber, Dirk; Alter, Kai
2011-01-01
Although laughter is an important aspect of nonverbal vocalization, its acoustic properties are still not fully understood. Extreme articulation during laughter production, such as wide jaw opening, suggests that laughter can have very high first formant (F(1)) frequencies. We measured fundamental frequency and formant frequencies of the vowels produced in the vocalic segments of laughter. Vocalic segments showed higher average F(1) frequencies than those previously reported and individual values could be as high as 1100 Hz for male speakers and 1500 Hz for female speakers. To our knowledge, these are the highest F(1) frequencies reported to date for human vocalizations, exceeding even the F(1) frequencies reported for trained soprano singers. These exceptionally high F(1) values are likely to be based on the extreme positions adopted by the vocal tract during laughter in combination with physiological constraints accompanying the production of a "pressed" voice. Copyright © 2011 The Voice Foundation. All rights reserved.
Three-month-old human infants use vocal cues of body size.
Pietraszewski, David; Wertz, Annie E; Bryant, Gregory A; Wynn, Karen
2017-06-14
Differences in vocal fundamental ( F 0 ) and average formant ( F n ) frequencies covary with body size in most terrestrial mammals, such that larger organisms tend to produce lower frequency sounds than smaller organisms, both between species and also across different sex and life-stage morphs within species. Here we examined whether three-month-old human infants are sensitive to the relationship between body size and sound frequencies. Using a violation-of-expectation paradigm, we found that infants looked longer at stimuli inconsistent with the relationship-that is, a smaller organism producing lower frequency sounds, and a larger organism producing higher frequency sounds-than at stimuli that were consistent with it. This effect was stronger for fundamental frequency than it was for average formant frequency. These results suggest that by three months of age, human infants are already sensitive to the biologically relevant covariation between vocalization frequencies and visual cues to body size. This ability may be a consequence of developmental adaptations for building a phenotype capable of identifying and representing an organism's size, sex and life-stage. © 2017 The Author(s).
Voice Formants in Individuals With Congenital, Isolated, Lifetime Growth Hormone Deficiency.
Valença, Eugenia H O; Salvatori, Roberto; Souza, Anita H O; Oliveira-Neto, Luiz A; Oliveira, Alaíde H A; Gonçalves, Maria I R; Oliveira, Carla R P; D'Ávila, Jeferson S; Melo, Valdinaldo A; de Carvalho, Susana; de Andrade, Bruna M R; Nascimento, Larisse S; Rocha, Savinny B de V; Ribeiro, Thais R; Prado-Barreto, Valeria M; Melo, Enaldo V; Aguiar-Oliveira, Manuel H
2016-05-01
To analyze the voice formants (F1, F2, F3, and F4 in Hz) of seven oral vowels, in Brazilian Portuguese, [a, ε, e, i, ɔ, o, and u] in adult individuals with congenital lifetime untreated isolated growth hormone deficiency (IGHD). This is a cross-sectional study. Acoustic analysis of isolated vowels was performed in 33 individuals with IGHD, age 44.5 (17.6) years (16 women), and 29 controls, age 51.1 (17.6) years (15 women). Compared with controls, IGHD men showed higher values of F3 [i, e, and ε], P = 0.006, P = 0.022, and P = 0.006, respectively and F4 [i], P = 0.001 and lower values of F2 [u], P = 0.034; IGHD women presented higher values of F1 [i and e] P = 0.029 and P = 0.036; F2 [ɔ] P = 0.006; F4 [ɔ] P = 0.031 and lower values of F2 [i] P = 0.004. IGHD abolished most of the gender differences in formant frequencies present in controls. Congenital, severe IGHD results in higher values of most formant frequencies, suggesting smaller oral and pharyngeal cavities. In addition, it causes a reduction in the effect of gender on the structure of the formants, maintaining a prepubertal acoustic prediction. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Acoustic correlates of body size and individual identity in banded penguins
Gamba, Marco; Gili, Claudia; Pessani, Daniela
2017-01-01
Animal vocalisations play a role in individual recognition and mate choice. In nesting penguins, acoustic variation in vocalisations originates from distinctiveness in the morphology of the vocal apparatus. Using the source-filter theory approach, we investigated vocal individuality cues and correlates of body size and mass in the ecstatic display songs the Humboldt and Magellanic penguins. We demonstrate that both fundamental frequency (f0) and formants (F1-F4) are essential vocal features to discriminate among individuals. However, we show that only duration and f0 are honest indicators of the body size and mass, respectively. We did not find any effect of body dimension on formants, formant dispersion nor estimated vocal tract length of the emitters. Overall, our findings provide the first evidence that the resonant frequencies of the vocal tract do not correlate with body size in penguins. Our results add important information to a growing body of literature on the role of the different vocal parameters in conveying biologically meaningful information in bird vocalisations. PMID:28199318
Dmitrieva, E S; Gel'man, V Ia; Zaĭtseva, K A; Orlov, A M
2009-01-01
Comparative study of acoustic correlates of emotional intonation was conducted on two types of speech material: sensible speech utterances and short meaningless words. The corpus of speech signals of different emotional intonations (happy, angry, frightened, sad and neutral) was created using the actor's method of simulation of emotions. Native Russian 20-70-year-old speakers (both professional actors and non-actors) participated in the study. In the corpus, the following characteristics were analyzed: mean values and standard deviations of the power, fundamental frequency, frequencies of the first and second formants, and utterance duration. Comparison of each emotional intonation with "neutral" utterances showed the greatest deviations of the fundamental frequency and frequencies of the first formant. The direction of these deviations was independent of the semantic content of speech utterance and its duration, age, gender, and being actor or non-actor, though the personal features of the speakers affected the absolute values of these frequencies.
ERIC Educational Resources Information Center
Molis, Michelle R.; Leek, Marjorie R.
2011-01-01
Purpose: This study examined the influence of presentation level and mild-to-moderate hearing loss on the identification of a set of vowel tokens systematically varying in the frequency locations of their second and third formants. Method: Five listeners with normal hearing (NH listeners) and five listeners with hearing impairment (HI listeners)…
Probing the independence of formant control using altered auditory feedback
MacDonald, Ewen N.; Purcell, David W.; Munhall, Kevin G.
2011-01-01
Two auditory feedback perturbation experiments were conducted to examine the nature of control of the first two formants in vowels. In the first experiment, talkers heard their auditory feedback with either F1 or F2 shifted in frequency. Talkers altered production of the perturbed formant by changing its frequency in the opposite direction to the perturbation but did not produce a correlated alteration of the unperturbed formant. Thus, the motor control system is capable of fine-grained independent control of F1 and F2. In the second experiment, a large meta-analysis was conducted on data from talkers who received feedback where both F1 and F2 had been perturbed. A moderate correlation was found between individual compensations in F1 and F2 suggesting that the control of F1 and F2 is processed in a common manner at some level. While a wide range of individual compensation magnitudes were observed, no significant correlations were found between individuals’ compensations and vowel space differences. Similarly, no significant correlations were found between individuals’ compensations and variability in normal vowel production. Further, when receiving normal auditory feedback, most of the population exhibited no significant correlation between the natural variation in production of F1 and F2. PMID:21361452
Soul and Musical Theater: A Comparison of Two Vocal Styles.
Hallqvist, Hanna; Lã, Filipa M B; Sundberg, Johan
2017-03-01
The phonatory and resonatory characteristics of nonclassical styles of singing have been rarely analyzed in voice research. Six professional singers volunteered to sing excerpts from two songs pertaining to the musical theater and to the soul styles of singing. Voice source parameters and formant frequencies were analyzed by inverse filtering tones, sung at the same fundamental frequencies in both excerpts. As compared with musical theater, the soul style was characterized by significantly higher subglottal pressure and maximum flow declination rate. Yet sound pressure level was lower, suggesting higher glottal resistance. The differences would be the effects of firmer glottal adduction and a greater frequency separation between the first formant and its closest spectrum partial in soul than in musical theater. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Vocal fundamental and formant frequencies affect perceptions of speaker cooperativeness.
Knowles, Kristen K; Little, Anthony C
2016-01-01
In recent years, the perception of social traits in faces and voices has received much attention. Facial and vocal masculinity are linked to perceptions of trustworthiness; however, while feminine faces are generally considered to be trustworthy, vocal trustworthiness is associated with masculinized vocal features. Vocal traits such as pitch and formants have previously been associated with perceived social traits such as trustworthiness and dominance, but the link between these measurements and perceptions of cooperativeness have yet to be examined. In Experiment 1, cooperativeness ratings of male and female voices were examined against four vocal measurements: fundamental frequency (F0), pitch variation (F0-SD), formant dispersion (Df), and formant position (Pf). Feminine pitch traits (F0 and F0-SD) and masculine formant traits (Df and Pf) were associated with higher cooperativeness ratings. In Experiment 2, manipulated voices with feminized F0 were found to be more cooperative than voices with masculinized F0(,) among both male and female speakers, confirming our results from Experiment 1. Feminine pitch qualities may indicate an individual who is friendly and non-threatening, while masculine formant qualities may reflect an individual that is socially dominant or prestigious, and the perception of these associated traits may influence the perceived cooperativeness of the speakers.
ERIC Educational Resources Information Center
Carrell, Thomas D.
This study investigated the contributions of fundamental frequency, formant spacing, and glottal waveform to talker identification. The first two experiments focused on the effect of glottal waveform in the perception of talker identity. Subjects in the first experiment, 30 undergraduate students enrolled in an introductory psychology course,…
Comparison of snoring sounds between natural and drug-induced sleep recorded using a smartphone.
Koo, Soo Kweon; Kwon, Soon Bok; Moon, Ji Seung; Lee, Sang Hoon; Lee, Ho Byung; Lee, Sang Jun
2018-08-01
Snoring is an important clinical feature of obstructive sleep apnea (OSA), and recent studies suggest that the acoustic quality of snoring sounds is markedly different in drug-induced sleep compared with natural sleep. However, considering differences in sound recording methods and analysis parameters, further studies are required. This study explored whether acoustic analysis of drug-induced sleep is useful as a screening test that reflects the characteristics of natural sleep in snoring patients. The snoring sounds of 30 male subjects (mean age=41.8years) were recorded using a smartphone during natural and induced sleep, with the site of vibration noted during drug-induced sleep endoscopy (DISE); then, we compared the sound intensity (dB), formant frequencies, and spectrograms of snoring sounds. Regarding the intensity of snoring sounds, there were minor differences within the retrolingual level obstruction group, but there was no significant difference between natural and induced sleep at either obstruction site. There was no significant difference in the F 1 and F 2 formant frequencies of snoring sounds between natural sleep and induced sleep at either obstruction site. Compared with natural sleep, induced sleep was slightly more irregular, with a stronger intensity on the spectrogram, but the spectrograms showed the same pattern at both obstruction sites. Although further studies are required, the spectrograms and formant frequencies of the snoring sounds of induced sleep did not differ significantly from those of natural sleep, and may be used as a screening test that reflects the characteristics of natural sleep according to the obstruction site. Copyright © 2017 Elsevier B.V. All rights reserved.
Lundeborg, Inger; Hultcrantz, Elisabeth; Ericsson, Elisabeth; McAllister, Anita
2012-07-01
To evaluate outcome of two types of tonsil surgery (tonsillectomy [TE]+adenoidectomy or tonsillotomy [TT]+adenoidectomy) on vocal function perceptually and acoustically. Sixty-seven children, aged 50-65 months, on waiting list for tonsil surgery were randomized to TE (n=33) or TT (n=34). Fifty-seven age- and gender-matched healthy preschool children were controls. Twenty-eight of them, aged 48-59 months, served as control group before surgery, and 29, aged 60-71 months, served as control group after surgery. Before surgery and 6 months postoperatively, the children were recorded producing three sustained vowels (/ɑ/, /u/, and /i/) and 14 words. The control groups were recorded only once. Three trained speech and language pathologists performed the perceptual analysis using visual analog scale for eight voice quality parameters. Acoustic analysis from sustained vowels included average fundamental frequency, jitter percent, shimmer percent, noise-to-harmonic ratio, and the center frequencies of formants 1-3. Before surgery, the children were rated to have more hyponasality and compressed/throaty voice (P<0.05) and lower mean pitch (P<0.01) in comparison to the control group. They also had higher perturbation measures and lower frequencies of the second and third formants. After surgery, there were no differences perceptually. Perturbation measures decreased but were still higher compared with those of control group (P<0.05). Differences in formant frequencies for /i/ and /u/ remained. No differences were found between the two surgical methods. Voice quality is affected perceptually and acoustically by adenotonsillar hypertrophy. After surgery, the voice is perceptually normalized but acoustic differences remain. Outcome was equal for both surgical methods. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Montero Benavides, Ana; Blanco Murillo, José Luis; Fernández Pozo, Rubén; Espinoza Cuadros, Fernando; Torre Toledano, Doroteo; Alcázar-Ramírez, José D; Hernández Gómez, Luis A
2016-01-01
We investigated whether differences in formants and their bandwidths, previously reported comparing small sample population of healthy individuals and patients with obstructive sleep apnea (OSA), are detected on a larger population representative of a clinical practice scenario. We examine possible indirect or mediated effects of clinical variables, which may shed some light on the connection between speech and OSA. In a retrospective study, 241 male subjects suspected to suffer from OSA were examined. The apnea-hypopnea index (AHI) was obtained for every subject using overnight polysomnography. Furthermore, the clinical variables usually reported as predictors of OSA, body mass index (BMI), cervical perimeter, height, weight, and age, were collected. Voice samples of sustained phonations of the vowels /a/, /e/, /i/, /o/, and /u/ were recorded. Formant frequencies F1, F2, and F3 and bandwidths BW1, BW2, and BW3 of the sustained vowels were determined using spectrographic analysis. Correlations among AHI, clinical parameters, and formants and bandwidths were determined. Correlations between AHI and clinical variables were stronger than those between AHI and voice features. AHI only correlates poorly with BW2 of /a/ and BW3 of /e/. A number of further weak but significant correlations have been detected between voice and clinical variables. Most of them were for height and age, with two higher values for age and F2 of /o/ and F2 of /u/. Only few very weak correlations were detected between voice and BMI, weight and cervical perimeter, wich are the clinical variables more correlated with AHI. No significant correlations were detected between AHI and formant frequencies and bandwidths. Correlations between voice and other clinical factors characterizing OSA are weak but highlight the importance of considering indirect or mediated effects of such clinical variables in any research on speech and OSA. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A Formant Range Profile for Singers.
Titze, Ingo R; Maxfield, Lynn M; Walker, Megan C
2017-05-01
Vowel selection is important in differentiating between singing styles. The timbre of the vocal instrument, which is related to its frequency spectrum, is governed by both the glottal sound source and the vowel choices made by singers. Consequently, the ability to modify the vowel space is a measure of how successfully a singer can maintain a desired timbre across a range of pitches. Formant range profiles were produced as a means of quantifying this ability. Seventy-seven subjects (including trained and untrained vocalists) participated, producing vowels with three intended mouth shapes: (1) neutral or speech-like, (2) megaphone-shaped (wide open mouth), and (3) inverted-megaphone-shaped (widened oropharynx with moderate mouth opening). The first and second formant frequencies (F 1 and F 2 ) were estimated with fry phonation for each shape and values were plotted in F1-F2 space. By taking four vowels of a quadrangle /i, æ, a, u/, the resulting area was quantified in kHz 2 (kHz squared) as a measure of the subject's ability to modify their vocal tract for spectral differences. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
A FORMANT RANGE PROFILE FOR SINGERS
Titze, Ingo R.; Maxfield, Lynn; Walker, Megan
2016-01-01
Vowel selection is important in differentiating between singing styles. The timbre of the vocal instrument, which is related to its frequency spectrum, is governed by both the glottal sound source and the vowel choices made by singers. Consequently, the ability to modify the vowel space is a measure of how successfully a singer can maintain a desired timbre across a range of pitches. Formant range profiles (FRPs) were produced as a means of quantifying this ability. 77 subjects (including trained and untrained vocalists) participated, producing vowels with three intended mouth shapes, (1) neutral or speech-like, (2) megaphone-shaped (wide open mouth), and (3) inverted-megaphone-shaped (widened oropharynx with moderate mouth opening). The first and second formant frequencies (F1 and F2) were estimated with fry phonation for each shape and values were plotted in F1–F2 space. By taking four vowels of a quadrangle /i, æ, a, u/, the resulting area was quantified in kHz2 (kHz squared) as a measure of the subject’s ability to modify their vocal tract for spectral differences. PMID:28029556
Thermal welding vs. cold knife tonsillectomy: a comparison of voice and speech.
Celebi, Saban; Yelken, Kursat; Celik, Oner; Taskin, Umit; Topak, Murat
2011-01-01
To compare acoustic, aerodynamic and perceptual voice and speech parameters in thermal welding system tonsillectomy and cold knife tonsillectomy patients in order to determine the impact of operation technique on voice and speech. Thirty tonsillectomy patients (22 children, 8 adults) participated in this study. The preferred technique was cold knife tonsillectomy in 15 patients and thermal welding system tonsillectomy in the remaining 15 patients. One week before and 1 month after surgery the following parameters were estimated: average of fundamental frequency, Jitter, Shimmer, harmonic to noise ratio, formant frequency analyses of sustained vowels. Perceptual speech analysis and aerodynamic measurements (maximum phonation time and s/z ratio) were also conducted. There was no significant difference in any of the parameters between cold knife tonsillectomy and thermal welding system tonsillectomy groups (p>0.05). When the groups were contrasted among themselves with regards to preoperative and postoperative rates, fundamental frequency was found to be significantly decreased after tonsillectomy in both of the groups (p<0.001). First formant for the vowel /a/ in the cold knife tonsillectomy group and for the vowel /i/ in the thermal welding system tonsillectomy group, second formant for the vowel /u/ in the thermal welding system tonsillectomy group and third formant for the vowel /u/ in the cold knife tonsillectomy group were found to be significantly decreased (p<0.05). The surgical technique, whether it is cold knife or thermal welding system, does not appear to affect voice and speech in tonsillectomy patients. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
Reliability of human-supervised formant-trajectory measurement for forensic voice comparison.
Zhang, Cuiling; Morrison, Geoffrey Stewart; Ochoa, Felipe; Enzinger, Ewald
2013-01-01
Acoustic-phonetic approaches to forensic voice comparison often include human-supervised measurement of vowel formants, but the reliability of such measurements is a matter of concern. This study assesses the within- and between-supervisor variability of three sets of formant-trajectory measurements made by each of four human supervisors. It also assesses the validity and reliability of forensic-voice-comparison systems based on these measurements. Each supervisor's formant-trajectory system was fused with a baseline mel-frequency cepstral-coefficient system, and performance was assessed relative to the baseline system. Substantial improvements in validity were found for all supervisors' systems, but some supervisors' systems were more reliable than others.
Mandarin compound vowels produced by prelingually deafened children with cochlear implants.
Yang, Jing; Xu, Li
2017-06-01
Compound vowels including diphthongs and triphthongs have complex, dynamic spectral features. The production of compound vowels by children with cochlear implants (CIs) has not been studied previously. The present study examined the dynamic features of compound vowels in native Mandarin-speaking children with CIs. Fourteen prelingually deafened children with CIs (aged 2.9-8.3 years old) and 14 age-matched, normal-hearing (NH) children produced monosyllables containing six Mandarin compound vowels (i.e., /aɪ/, /aʊ/, /uo/, /iɛ/, /iaʊ/, /ioʊ/). The frequency values of the first two formants were measured at nine equidistant time points over the course of the vowel duration. All formant frequency values were normalized and then used to calculate vowel trajectory length and overall spectral rate of change. The results revealed that the CI children produced significantly longer durations for all six compound vowels. The CI children's ability to produce formant movement for the compound vowels varied considerably. Some CI children produced relatively static formant trajectories for certain diphthongs, whereas others produced certain vowels with greater formant movement than did the NH children. As a group, the CI children roughly followed the NH children on the pattern of magnitude of formant movement, but they showed a slower rate of formant change than did the NH children. The findings suggested that prelingually deafened children with CIs, during the early stage of speech acquisition, had not established appropriate targets and articulatory coordination for compound vowel productions. This preliminary study may shed light on rehabilitation of prelingually deafened children with CIs. Copyright © 2017 Elsevier B.V. All rights reserved.
Regional dialect variation in the vowel systems of typically developing children
Jacewicz, Ewa; Fox, Robert Allen; Salmons, Joseph
2015-01-01
Purpose To investigate regional dialect variation in the vowel systems of normally developing 8–12 years-old children. Method Thirteen vowels in isolated h_d words were produced by 94 children and 93 adults, males and females. All participants spoke American English and were born and raised in one of three distinct dialect regions in the United States: western North Carolina (Southern dialect), central Ohio (Midland) and southeastern Wisconsin (Northern Midwestern dialect). Acoustic analysis included formant frequencies (F1 and F2) measured at five equidistant time points in a vowel and formant movement (trajectory length). Results Children’s productions showed many dialect-specific features comparable to those in adult speakers, both in terms of vowel dispersion patterns and formant movement. Different features were also found including systemic vowel changes, significant monophthongization of selected vowels and greater formant movement in diphthongs. Conclusions The acoustic results provide evidence for regional distinctiveness in children’s vowel systems. Children acquire not only the systemic relations among vowels but also their dialect-specific patterns of formant dynamics. Directing attention to the regional variation in the production of American English vowels, this work may prove helpful in better understanding and interpretation of the development of vowel categories and vowel systems in children. PMID:20966384
Variability in Phonetics. York Papers in Linguistics, No. 6.
ERIC Educational Resources Information Center
Tatham, M. A. A.
Variability is a term used to cover several types of phenomena in language sound patterns and in phonetic realization of those patterns. Variability refers to the fact that every repetition of an utterance is different, in amplitude, rate of delivery, formant frequencies, fundamental frequency or minor phase relationship changes across the sound…
Auditory Spectral Integration in the Perception of Static Vowels
ERIC Educational Resources Information Center
Fox, Robert Allen; Jacewicz, Ewa; Chang, Chiung-Yun
2011-01-01
Purpose: To evaluate potential contributions of broadband spectral integration in the perception of static vowels. Specifically, can the auditory system infer formant frequency information from changes in the intensity weighting across harmonics when the formant itself is missing? Does this type of integration produce the same results in the lower…
[The investigation of formant on different artistic voice].
Wang, Jianqun; Gao, Xia; Liu, Xiaozhou; Feng, Yulin; Shen, Xiaohui; Yu, Chenjie; Yang, Ye
2008-08-01
To explore the characteristic of formant-a very important parameter in the spectrogram of three types of artistic voice (western mode; Chinese mode; Beijing opera). We used MATLAB software to make the short-time Fourier transform and spectrogram analysis on the homeostasis vowel examples of the three types. The western mode had different representation "singer formant" (Fs) based on the voice part; the Chinese mode's notable features were that F1, F2, F3, were continuous and the energy of them changed softly; the Beijing opera had the common representation which was a very wide formant and there was soft transition between formants and various harmonic, besides it showed a similar component like the "Fs" (two formants connected normally). Different artistic voice showed their own characteristics of the formant parameter in the spectrogram, which had important value on the identification, objective evaluation and prediction.
Acoustics and perception of overtone singing.
Bloothooft, G; Bringmann, E; van Cappellen, M; van Luipen, J B; Thomassen, K P
1992-10-01
Overtone singing, a technique of Asian origin, is a special type of voice production resulting in a very pronounced, high and separate tone that can be heard over a more or less constant drone. An acoustic analysis is presented of the phenomenon and the results are described in terms of the classical theory of speech production. The overtone sound may be interpreted as the result of an interaction of closely spaced formants. For the lower overtones, these may be the first and second formant, separated from the lower harmonics by a nasal pole-zero pair, as the result of a nasalized articulation shifting from /c/ to /a/, or, as an alternative, the second formant alone, separated from the first formant by the nasal pole-zero pair, again as the result of a nasalized articulation around /c/. For overtones with a frequency higher than 800 Hz, the overtone sound can be explained as a combination of the second and third formant as the result of a careful, retroflex, and rounded articulation from /c/, via schwa /e/ to /y/ and /i/ for the highest overtones. The results indicate a firm and relatively long closure of the glottis during overtone phonation. The corresponding short open duration of the glottis introduces a glottal formant that may enhance the amplitude of the intended overtone. Perception experiments showed that listeners categorized the overtone sounds differently from normally sung vowels, which possibly has its basis in an independent perception of the small bandwidth of the resonance underlying the overtone. Their verbal judgments were in agreement with the presented phonetic-acoustic explanation.
NASA Astrophysics Data System (ADS)
Rendall, Drew; Owren, Michael J.; Weerts, Elise; Hienz, Robert D.
2004-01-01
This study quantifies sex differences in the acoustic structure of vowel-like grunt vocalizations in baboons (Papio spp.) and tests the basic perceptual discriminability of these differences to baboon listeners. Acoustic analyses were performed on 1028 grunts recorded from 27 adult baboons (11 males and 16 females) in southern Africa, focusing specifically on the fundamental frequency (F0) and formant frequencies. The mean F0 and the mean frequencies of the first three formants were all significantly lower in males than they were in females, more dramatically so for F0. Experiments using standard psychophysical procedures subsequently tested the discriminability of adult male and adult female grunts. After learning to discriminate the grunt of one male from that of one female, five baboon subjects subsequently generalized this discrimination both to new call tokens from the same individuals and to grunts from novel males and females. These results are discussed in the context of both the possible vocal anatomical basis for sex differences in call structure and the potential perceptual mechanisms involved in their processing by listeners, particularly as these relate to analogous issues in human speech production and perception.
A comparison of vowel normalization procedures for language variation research
NASA Astrophysics Data System (ADS)
Adank, Patti; Smits, Roel; van Hout, Roeland
2004-11-01
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels (``vowel-extrinsic'' information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself (``vowel-intrinsic'' information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., ``formant-extrinsic'' F2-F1). .
Formant discrimination in noise for isolated vowels
NASA Astrophysics Data System (ADS)
Liu, Chang; Kewley-Port, Diane
2004-11-01
Formant discrimination for isolated vowels presented in noise was investigated for normal-hearing listeners. Discrimination thresholds for F1 and F2, for the seven American English vowels /eye, smcapi, eh, æ, invv, aye, you/, were measured under two types of noise, long-term speech-shaped noise (LTSS) and multitalker babble, and also under quiet listening conditions. Signal-to-noise ratios (SNR) varied from -4 to +4 dB in steps of 2 dB. All three factors, formant frequency, signal-to-noise ratio, and noise type, had significant effects on vowel formant discrimination. Significant interactions among the three factors showed that threshold-frequency functions depended on SNR and noise type. The thresholds at the lowest levels of SNR were highly elevated by a factor of about 3 compared to those in quiet. The masking functions (threshold vs SNR) were well described by a negative exponential over F1 and F2 for both LTSS and babble noise. Speech-shaped noise was a slightly more effective masker than multitalker babble, presumably reflecting small benefits (1.5 dB) due to the temporal variation of the babble. .
A comparison of vowel normalization procedures for language variation research.
Adank, Patti; Smits, Roel; van Hout, Roeland
2004-11-01
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).
Maxfield, Lynn; Palaparthi, Anil; Titze, Ingo
2017-03-01
The traditional source-filter theory of voice production describes a linear relationship between the source (glottal flow pulse) and the filter (vocal tract). Such a linear relationship does not allow for nor explain how changes in the filter may impact the stability and regularity of the source. The objective of this experiment was to examine what effect unpredictable changes to vocal tract dimensions could have on fo stability and individual harmonic intensities in situations in which low frequency harmonics cross formants in a fundamental frequency glide. To determine these effects, eight human subjects (five male, three female) were recorded producing fo glides while their vocal tracts were artificially lengthened by a section of vinyl tubing inserted into the mouth. It was hypothesized that if the source and filter operated as a purely linear system, harmonic intensities would increase and decrease at nearly the same rates as they passed through a formant bandwidth, resulting in a relatively symmetric peak on an intensity-time contour. Additionally, fo stability should not be predictably perturbed by formant/harmonic crossings in a linear system. Acoustic analysis of these recordings, however, revealed that harmonic intensity peaks were asymmetric in 76% of cases, and that 85% of fo instabilities aligned with a crossing of one of the first four harmonics with the first three formants. These results provide further evidence that nonlinear dynamics in the source-filter relationship can impact fo stability as well as harmonic intensities as harmonics cross through formant bandwidths. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Perceptual and acoustic study of professionally trained versus untrained voices.
Brown, W S; Rothman, H B; Sapienza, C M
2000-09-01
Acoustic and perceptual analyses were completed to determine the effect of vocal training on professional singers when speaking and singing. Twenty professional singers and 20 nonsingers, acting as the control, were recorded while sustaining a vowel, reading a modified Rainbow Passage, and singing "America the Beautiful." Acoustic measures included fundamental frequency, duration, percent jitter, percent shimmer, noise-to-harmonic ratio, and determination of the presence or absence of both vibrato and the singer's formant. Results indicated that, whereas certain acoustic parameters differentiated singers from nonsingers within sex, no consistently significant trends were found across males and females for either speaking or singing. The most consistent differences were the presence or absence of the singer's vibrato and formant in the singers versus the nonsingers, respectively. Perceptual analysis indicated that singers could be correctly identified with greater frequency than by chance alone from their singing, but not their speaking utterances.
Human frequency-following response to speech-like sounds: correlates of off-frequency masking.
Krishnan, Ananthanarayan; Agrawal, Smita
2010-01-01
Off-frequency masking of the second formant by energy at the first formant has been shown to influence both identification and discrimination of the second formant in normal-hearing and hearing-impaired listeners. While both excitatory spread and two-tone suppression have been implicated in this simultaneous masking, their relative contribution has been shown to depend on both the level of the masker and the frequency separation between the probe and the masker. Off-frequency masking effects were evaluated in 10 normal-hearing human adults using the frequency-following response (FFR) to two two-tone approximations of vowel stimuli (/a/ and /u/). In the first experiment, the masking effect of F(1) on F(2) was evaluated by attenuating the level of F(1) relative to a fixed F(2) level. In the second experiment, the masking effect was evaluated by increasing the frequency separation between F(1) and F(2) using F(2) frequency as the variable. Results revealed that both attenuation of the F(1) level, and increasing the frequency separation between F(1) and F(2) increased the magnitude of the FFR component at F(2). These results are consistent with a release from off-frequency masking. Given that the results presented here are for high signal and masker levels and for relatively smaller frequency separation between the masker and the probe, it is possible that both suppression and excitatory spread contributed to the masking effects observed in our data. Copyright2009 S. Karger AG, Basel.
Acoustic-articulatory mapping in vowels by locally weighted regression
McGowan, Richard S.; Berger, Michael A.
2009-01-01
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented. PMID:19813812
ERIC Educational Resources Information Center
Kazi, Rehan; Prasad, Vyas M. N.; Kanagalingam, Jeeve; Georgalas, Christos; Venkitaraman, Ramachandran; Nutting, Christopher M.; Clarke, Peter; Rhys-Evans, Peter; Harrington, Kevin J.
2007-01-01
Aims: To compare voice quality as defined by formant analysis using a sustained vowel in patients who have undergone a partial glossectomy with a group of normal subjects. Methods & Procedures: The design consisted of a single centre, cross-sectional cohort study. The setting was an Adult Tertiary Referral Unit. A total of 26 patients (19…
Bele, Irene Velsvik
2006-12-01
The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.
Koo, Soo Kweon; Kwon, Soon Bok; Kim, Yang Jae; Moon, J I Seung; Kim, Young Jun; Jung, Sung Hoon
2017-03-01
Snoring is a sign of increased upper airway resistance and is the most common symptom suggestive of obstructive sleep apnea. Acoustic analysis of snoring sounds is a non-invasive diagnostic technique and may provide a screening test that can determine the location of obstruction sites. We recorded snoring sounds according to obstruction level, measured by DISE, using a smartphone and focused on the analysis of formant frequencies. The study group comprised 32 male patients (mean age 42.9 years). The spectrogram pattern, intensity (dB), fundamental frequencies (F 0 ), and formant frequencies (F 1 , F 2 , and F 3 ) of the snoring sounds were analyzed for each subject. On spectrographic analysis, retropalatal level obstruction tended to produce sharp and regular peaks, while retrolingual level obstruction tended to show peaks with a gradual onset and decay. On formant frequency analysis, F 1 (retropalatal level vs. retrolingual level: 488.1 ± 125.8 vs. 634.7 ± 196.6 Hz) and F 2 (retropalatal level vs. retrolingual level: 1267.3 ± 306.6 vs. 1723.7 ± 550.0 Hz) of retrolingual level obstructions showed significantly higher values than retropalatal level obstruction (p < 0.05). This suggests that the upper airway is more severely obstructed with retrolingual level obstruction and that there is a greater change in tongue position. Acoustic analysis of snoring is a non-invasive diagnostic technique that can be easily applied at a relatively low cost. The analysis of formant frequencies will be a useful screening test for the prediction of occlusion sites. Moreover, smartphone can be effective for recording snoring sounds.
Vowel Formant Values in Hearing and Hearing-Impaired Children: A Discriminant Analysis
ERIC Educational Resources Information Center
Ozbic, Martina; Kogovsek, Damjana
2010-01-01
Hearing-impaired speakers show changes in vowel production and formant pitch and variability, as well as more cases of overlapping between vowels and more restricted formant space, than hearing speakers; consequently their speech is less intelligible. The purposes of this paper were to determine the differences in vowel formant values between 32…
The effect of filtered speech feedback on the frequency of stuttering
NASA Astrophysics Data System (ADS)
Rami, Manish Krishnakant
2000-10-01
This study investigated the effects of filtered components of speech and whispered speech on the frequency of stuttering. It is known that choral speech, shadowing, and altered auditory feedback are the only conditions which induce fluency without any additional effort than normally required to speak on the part of people who stutter. All these conditions use speech as a second signal. This experiment examined the role of components of speech signal as delineated by the source- filter theory of speech production. Three filtered speech signals, a whispered speech signal, and a choral speech signal formed the stimuli. It was postulated that if the speech signal in whole was necessary for producing fluency in people who stutter, then all other conditions except choral speech should fail to produce fluency enhancement. If the glottal source alone was adequate in restoring fluency, then only the conditions of NAF and whispered speech should fail in promoting fluency. In the event that full filter characteristics are necessary for the fluency creating effects, then all conditions except the choral speech and whispered speech should fail to produce fluency. If any part of the filter characteristics is sufficient in yielding fluency, then only the NAF and the approximate glottal source should fail to demonstrate an increase in the amount of fluency. Twelve adults who stuttered read passages under the six conditions while receiving auditory feedback consisting of one of the six experimental conditions: (a)NAF; (b)approximate glottal source; (c)glottal source and first formant; (d)glottal source and first two formants; and (e)whispered speech. Frequencies of stuttering were obtained for each condition and submitted to descriptive and inferential statistical analysis. Statistically significant differences in means were found within the choral feedback conditions. Specifically, the choral speech, the source and first formant, source and the first two formants, and the whispered speech conditions all decreased the frequency of stuttering while the approximate glottal source did not. It is suggested that articulatory events, chiefly the encoded speech output of the vocal tract origin, afford effective cues and induces fluent speech in people who stutter.
NASA Astrophysics Data System (ADS)
Alexander, Joshua; Keith, Kluender
2005-09-01
All speech contrasts are multiply specified. For example, in addition to onsets and trajectories of formant transitions, gross spectral properties such as tilt, and duration of spectral change (both local and global) contribute to perception of contrasts between stops such as /b,d,g/. It is likely that listeners resort to different acoustic characteristics under different listening conditions. Hearing-impaired listeners, for whom spectral details are compromised, may be more likely to use short-term gross spectral characteristics as well as durational information. Here, contributions of broad spectral onset properties as well as duration of spectral change are investigated in perception experiments with normal-hearing listeners. Two series of synthesized CVs, each varying perceptually from /b/ to /d/, were synthesized. Onset frequency of F2, duration of formant transitions, and gross spectral tilts were manipulated parametrically. Perception of /b/ was encouraged by shorter formant transition durations and by more negative spectral tilt at onset independent of the rate of change in spectral tilt. Effects of spectral tilt at onset were contextual and depended on the tilt of the following vowel. Parallel studies with listeners with hearing impairment are ongoing. [Work supported by NIDCD.
Study of acoustic correlates associate with emotional speech
NASA Astrophysics Data System (ADS)
Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth
2004-10-01
This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.
Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Dénes
2011-01-01
Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory processing of brief, rapidly successive acoustic changes is compromised in dyslexia, thereby affecting phonetic discrimination (e.g. discriminating /b/ from /d/) via impaired discrimination of formant transitions (rapid acoustic changes in frequency and intensity). However, an alternative auditory temporal hypothesis is that the basic auditory processing of the slower amplitude modulation cues in speech is compromised (Goswami et al., 2002). Here, we contrast children's perception of a synthetic speech contrast (ba/wa) when it is based on the speed of the rate of change of frequency information (formant transition duration) versus the speed of the rate of change of amplitude modulation (rise time). We show that children with dyslexia have excellent phonetic discrimination based on formant transition duration, but poor phonetic discrimination based on envelope cues. The results explain why phonetic discrimination may be allophonic in developmental dyslexia (Serniclaes et al., 2004), and suggest new avenues for the remediation of developmental dyslexia. © 2010 Blackwell Publishing Ltd.
Representations of Spectral Differences between Vowels in Tonotopic Regions of Auditory Cortex
ERIC Educational Resources Information Center
Fisher, Julia
2017-01-01
This work examines the link between low-level cortical acoustic processing and higher-level cortical phonemic processing. Specifically, using functional magnetic resonance imaging, it looks at 1) whether or not the vowels [alpha] and [i] are distinguishable in regions of interest defined by the first two resonant frequencies (formants) of those…
Perceptual aspects of singing.
Sundberg, J
1994-06-01
The relations between acoustic and perceived characteristics of vowel sounds are demonstrated with respect to timbre, loudness, pitch, and expressive time patterns. The conditions for perceiving an ensemble of sine tones as one tone or several tones are reviewed. There are two aspects of timbre of voice sounds: vowel quality and voice quality. Although vowel quality depends mainly on the frequencies of the lowest two formants. In particular, the center frequency of the so-called singer's formant seems perceptually relevant. Vocal loudness, generally assumed to correspond closely to the sound pressure level, depends rather on the amplitude balance between the lower and the higher spectrum partials. The perceived pitch corresponds to the fundamental frequency, or for vibrato tones, the mean of this frequency. In rapid passages, such as coloratura singing, special patterns are used. Pitch and duration differences are categorically perceived in music. This means that small variations in tuning or duration do not affect the musical interval and the note value perceived. Categorical perception is used extensively in music performance for the purpose of musical expression because without violating the score, the singer may sharpen or flatten and lengthen or shorten the tones, thereby creating musical expression.
Flaherty, Mary; Dent, Micheal L.; Sawusch, James R.
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with “d” or “t” and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal. PMID:28562597
Flaherty, Mary; Dent, Micheal L; Sawusch, James R
2017-01-01
The influence of experience with human speech sounds on speech perception in budgerigars, vocal mimics whose speech exposure can be tightly controlled in a laboratory setting, was measured. Budgerigars were divided into groups that differed in auditory exposure and then tested on a cue-trading identification paradigm with synthetic speech. Phonetic cue trading is a perceptual phenomenon observed when changes on one cue dimension are offset by changes in another cue dimension while still maintaining the same phonetic percept. The current study examined whether budgerigars would trade the cues of voice onset time (VOT) and the first formant onset frequency when identifying syllable initial stop consonants and if this would be influenced by exposure to speech sounds. There were a total of four different exposure groups: No speech exposure (completely isolated), Passive speech exposure (regular exposure to human speech), and two Speech-trained groups. After the exposure period, all budgerigars were tested for phonetic cue trading using operant conditioning procedures. Birds were trained to peck keys in response to different synthetic speech sounds that began with "d" or "t" and varied in VOT and frequency of the first formant at voicing onset. Once training performance criteria were met, budgerigars were presented with the entire intermediate series, including ambiguous sounds. Responses on these trials were used to determine which speech cues were used, if a trading relation between VOT and the onset frequency of the first formant was present, and whether speech exposure had an influence on perception. Cue trading was found in all birds and these results were largely similar to those of a group of humans. Results indicated that prior speech experience was not a requirement for cue trading by budgerigars. The results are consistent with theories that explain phonetic cue trading in terms of a rich auditory encoding of the speech signal.
Discrimination of synthesized English vowels by American and Korean listeners
NASA Astrophysics Data System (ADS)
Yang, Byunggon
2004-05-01
This study explored the discrimination of synthesized English vowel pairs by 27 American and Korean, male and female listeners. The average formant values of nine monophthongs produced by ten American English male speakers were employed to synthesize the vowels. Then, subjects were instructed explicitly to respond to AX discrimination tasks in which the standard vowel was followed by another one with the increment or decrement of the original formant values. The highest and lowest formant values of the same vowel quality were collected and compared to examine patterns of vowel discrimination. Results showed that the American and Korean groups discriminated the vowel pairs almost identically and their center formant frequency values of the high and low boundary fell almost exactly on those of the standards. In addition, the acceptable range of the same vowel quality was similar among the language and gender groups. The acceptable thresholds of each vowel formed an oval to maintain perceptual contrast from adjacent vowels. Pedagogical implications of those findings are discussed.
An Acoustic Study of Vowels Produced by Alaryngeal Speakers in Taiwan.
Liao, Jia-Shiou
2016-11-01
This study investigated the acoustic properties of 6 Taiwan Southern Min vowels produced by 10 laryngeal speakers (LA), 10 speakers with a pneumatic artificial larynx (PA), and 8 esophageal speakers (ES). Each of the 6 monophthongs of Taiwan Southern Min (/i, e, a, ɔ, u, ə/) was represented by a Taiwan Southern Min character and appeared randomly on a list 3 times (6 Taiwan Southern Min characters × 3 repetitions = 18 tokens). Each Taiwan Southern Min character in this study has the same syllable structure, /V/, and all were read with tone 1 (high and level). Acoustic measurements of the 1st formant, 2nd formant, and 3rd formant were taken for each vowel. Then, vowel space areas (VSAs) enclosed by /i, a, u/ were calculated for each group of speakers. The Euclidean distance between vowels in the pairs /i, a/, /i, u/, and /a, u/ was also calculated and compared across the groups. PA and ES have higher 1st or 2nd formant values than LA for each vowel. The distance is significantly shorter between vowels in the corner vowel pairs /i, a/ and /i, u/. PA and ES have a significantly smaller VSA compared with LA. In accordance with previous studies, alaryngeal speakers have higher formant frequency values than LA because they have a shortened vocal tract as a result of their total laryngectomy. Furthermore, the resonance frequencies are inversely related to the length of the vocal tract (on the basis of the assumption of the source filter theory). PA and ES have a smaller VSA and shorter distances between corner vowels compared with LA, which may be related to speech intelligibility. This hypothesis needs further support from future study.
Acoustic Properties of the Voice Source and the Vocal Tract: Are They Perceptually Independent?
Erickson, Molly L
2016-11-01
This study sought to determine whether the properties of the voice source and vocal tract are perceptually independent. Within-subjects design. This study employed a paired-comparison paradigm where listeners heard synthetic voices and rated them as same or different using a visual analog scale. Stimuli were synthesized using three different source slopes and two different formant patterns (mezzo-soprano and soprano) on the vowel /a/ at four pitches: A3, C4, B4, and F5. Whereas formant pattern was the strongest effect, difference in source slope also affected perceived quality difference. Source slope and formant pattern were not independently perceived. These results suggest that when judging laryngeal adduction using perceptual information, judgments may not be accurate when the stimuli are of differing formant patterns. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Digitized Speech Characteristics in Patients with Maxillectomy Defects.
Elbashti, Mahmoud E; Sumita, Yuka I; Hattori, Mariko; Aswehlee, Amel M; Taniguchi, Hisashi
2017-12-06
Accurate evaluation of speech characteristics through formant frequency measurement is important for proper speech rehabilitation in patients after maxillectomy. This study aimed to evaluate the utility of digital acoustic analysis and vowel pentagon space for the prediction of speech ability after maxillectomy, by comparing the acoustic characteristics of vowel articulation in three classes of maxillectomy defects. Aramany's classifications I, II, and IV were used to group 27 male patients after maxillectomy. Digital acoustic analysis of five Japanese vowels-/a/, /e/, /i/, /o/, and /u/-was performed using a speech analysis system. First formant (F1) and second formant (F2) frequencies were calculated using an autocorrelation method. Data were plotted on an F1-F2 plane for each patient, and the F1 and F2 ranges were calculated. The vowel pentagon spaces were also determined. One-way ANOVA was applied to compare all results between the three groups. Class II maxillectomy patients had a significantly higher F2 range than did Class I and Class IV patients (p = 0.002). In contrast, there was no significant difference in the F1 range between the three classes. The vowel pentagon spaces were significantly larger in class II maxillectomy patients than in Class I and Class IV patients (p = 0.014). The results of this study indicate that the acoustic characteristics of maxillectomy patients are affected by the defect area. This finding may provide information for obturator design based on vowel articulation and defect class. © 2017 by the American College of Prosthodontists.
Benders, Titia
2013-12-01
Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers' realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers' expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. Copyright © 2013 Elsevier Inc. All rights reserved.
Vampola, Tomáš; Horáček, Jaromír; Laukkanen, Anne-Maria; Švec, Jan G
2015-04-01
Resonance frequencies of the vocal tract have traditionally been modelled using one-dimensional models. These cannot accurately represent the events in the frequency region of the formant cluster around 2.5-4.5 kHz, however. Here, the vocal tract resonance frequencies and their mode shapes are studied using a three-dimensional finite element model obtained from computed tomography measurements of a subject phonating on vowel [a:]. Instead of the traditional five, up to eight resonance frequencies of the vocal tract were found below the prominent antiresonance around 4.7 kHz. The three extra resonances were found to correspond to modes which were axially asymmetric and involved the piriform sinuses, valleculae, and transverse vibrations in the oral cavity. The results therefore suggest that the phenomenon of speaker's and singer's formant clustering may be more complex than originally thought.
Production and perception of whispered vowels
NASA Astrophysics Data System (ADS)
Kiefte, Michael
2005-09-01
Information normally associated with pitch, such as intonation, can still be conveyed in whispered speech despite the absence of voicing. For example, it is possible to whisper the question ``You are going today?'' without any syntactic information to distinguish this sentence from a simple declarative. It has been shown that pitch change in whispered speech is correlated with the simultaneous raising or lowering of several formants [e.g., M. Kiefte, J. Acoust. Soc. Am. 116, 2546 (2004)]. However, spectral peak frequencies associated with formants have been identified as important correlates to vowel identity. Spectral peak frequencies may serve two roles in the perception of whispered speech: to indicate both vowel identity and intended pitch. Data will be presented to examine the relative importance of several acoustic properties including spectral peak frequencies and spectral shape parameters in both the production and perception of whispered vowels. Speakers were asked to phonate and whisper vowels at three different pitches across a range of roughly a musical fifth. It will be shown that relative spectral change is preserved within vowels across intended pitches in whispered speech. In addition, several models of vowel identification by listeners will be presented. [Work supported by SSHRC.
Catalan speakers' perception of word stress in unaccented contexts.
Ortega-Llebaria, Marta; del Mar Vanrell, Maria; Prieto, Pilar
2010-01-01
In unaccented contexts, formant frequency differences related to vowel reduction constitute a consistent cue to word stress in English, whereas in languages such as Spanish that have no systematic vowel reduction, stress perception is based on duration and intensity cues. This article examines the perception of word stress by speakers of Central Catalan, in which, due to its vowel reduction patterns, words either alternate stressed open vowels with unstressed mid-central vowels as in English or contain no vowel quality cues to stress, as in Spanish. Results show that Catalan listeners perceive stress based mainly on duration cues in both word types. Other cues pattern together with duration to make stress perception more robust. However, no single cue is absolutely necessary and trading effects compensate for a lack of differentiation in one dimension by changes in another dimension. In particular, speakers identify longer mid-central vowels as more stressed than shorter open vowels. These results and those obtained in other stress-accent languages provide cumulative evidence that word stress is perceived independently of pitch accents by relying on a set of cues with trading effects so that no single cue, including formant frequency differences related to vowel reduction, is absolutely necessary for stress perception.
Neural Representation of Concurrent Vowels in Macaque Primary Auditory Cortex123
Micheyl, Christophe; Steinschneider, Mitchell
2016-01-01
Abstract Successful speech perception in real-world environments requires that the auditory system segregate competing voices that overlap in frequency and time into separate streams. Vowels are major constituents of speech and are comprised of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). The pitch and identity of a vowel are determined by its F0 and spectral envelope (formant structure), respectively. When two spectrally overlapping vowels differing in F0 are presented concurrently, they can be readily perceived as two separate “auditory objects” with pitches at their respective F0s. A difference in pitch between two simultaneous vowels provides a powerful cue for their segregation, which in turn, facilitates their individual identification. The neural mechanisms underlying the segregation of concurrent vowels based on pitch differences are poorly understood. Here, we examine neural population responses in macaque primary auditory cortex (A1) to single and double concurrent vowels (/a/ and /i/) that differ in F0 such that they are heard as two separate auditory objects with distinct pitches. We find that neural population responses in A1 can resolve, via a rate-place code, lower harmonics of both single and double concurrent vowels. Furthermore, we show that the formant structures, and hence the identities, of single vowels can be reliably recovered from the neural representation of double concurrent vowels. We conclude that A1 contains sufficient spectral information to enable concurrent vowel segregation and identification by downstream cortical areas. PMID:27294198
Acoustic Characteristics in Epiglottic Cyst.
Lee, YeonWoo; Kim, GeunHyo; Wang, SooGeun; Jang, JeonYeob; Cha, Wonjae; Choi, HongSik; Kim, HyangHee
2018-05-03
The purpose of this study was to analyze the acoustic characteristics associated with alternation deformation of the vocal tract due to large epiglottic cyst, and to confirm the relation between the anatomical change and resonant function of the vocal tract. Eight men with epiglottic cyst were enrolled in this study. The jitter, shimmer, noise-to-harmonic ratio, and first two formants were analyzed in vowels /a:/, /e:/, /i:/, /o:/, and /u:/. These values were analyzed before and after laryngeal microsurgery. The F1 value of /a:/ was significantly raised after surgery. Significant differences of formant frequencies in other vowels, jitter, shimmer, and noise-to-harmonic ratio were not presented. The results of this study could be used to analyze changes in the resonance of vocal tracts due to the epiglottic cysts. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Cross-linguistic studies of children’s and adults’ vowel spacesa
Chung, Hyunju; Kong, Eun Jong; Edwards, Jan; Weismer, Gary; Fourakis, Marios; Hwang, Youngdeok
2012-01-01
This study examines cross-linguistic variation in the location of shared vowels in the vowel space across five languages (Cantonese, American English, Greek, Japanese, and Korean) and three age groups (2-year-olds, 5-year-olds, and adults). The vowels /a/, /i/, and /u/ were elicited in familiar words using a word repetition task. The productions of target words were recorded and transcribed by native speakers of each language. For correctly produced vowels, first and second formant frequencies were measured. In order to remove the effect of vocal tract size on these measurements, a normalization approach that calculates distance and angular displacement from the speaker centroid was adopted. Language-specific differences in the location of shared vowels in the formant values as well as the shape of the vowel spaces were observed for both adults and children. PMID:22280606
Laroche, Marilyn; Dajani, Hilmi R; Prévost, François; Marcoux, André M
2013-01-01
This study investigated speech auditory brainstem responses (speech ABR) with variants of a synthetic vowel in quiet and in background noise. Its objectives were to study the noise robustness of the brainstem response at the fundamental frequency F0 and at the first formant F1, evaluate how the resolved/unresolved harmonics regions in speech contribute to the response at F0, and investigate the origin of the response at F0 to resolved and unresolved harmonics in speech. In total, 18 normal-hearing subjects (11 women, aged 18-33 years) participated in this study. Speech ABRs were recorded using variants of a 300 msec formant-synthesized /a/ vowel in quiet and in white noise. The first experiment employed three variants containing the first three formants F1 to F3, F1 only, and F2 and F3 only with relative formant levels following those reported in the literature. The second experiment employed three variants containing F1 only, F2 only, and F3 only, with the formants equalized to the same level and the signal-to-noise ratio (SNR) maintained at -5 dB. Overall response latency was estimated, and the amplitude and local SNR of the envelope following response at F0 and of the frequency following response at F1 were compared for the different stimulus variants in quiet and in noise. The response at F0 was more robust to noise than that at F1. There were no statistically significant differences in the response at F0 caused by the three stimulus variants in both experiments in quiet. However, the response at F0 with the variant dominated by resolved harmonics was more robust to noise than the response at F0 with the stimulus variants dominated by unresolved harmonics. The latencies of the responses in all cases were very similar in quiet, but the responses at F0 due to resolved and unresolved harmonics combined nonlinearly when both were present in the stimulus. Speech ABR has been suggested as a marker of central auditory processing. The results of this study support earlier work on the differential susceptibility to noise of the F0 and F1 components of the evoked response. In the case of F0, the results support the view that in speech, the pitch of resolved harmonics and that of unresolved harmonics are processed in different but interacting pathways that converge in the upper brainstem. Pitch plays an important role in speech perception, and speech ABR can offer a window into the neural extraction of the pitch of speech and how it may change with hearing impairment.
2015-01-01
Abstract Vowels provide the acoustic foundation of communication through speech and song, but little is known about how the brain orchestrates their production. Positron emission tomography was used to study regional cerebral blood flow (rCBF) during sustained production of the vowel /a/. Acoustic and blood flow data from 13, normal, right-handed, native speakers of American English were analyzed to identify CBF patterns that predicted the stability of the first and second formants of this vowel. Formants are bands of resonance frequencies that provide vowel identity and contribute to voice quality. The results indicated that formant stability was directly associated with blood flow increases and decreases in both left- and right-sided brain regions. Secondary brain regions (those associated with the regions predicting formant stability) were more likely to have an indirect negative relationship with first formant variability, but an indirect positive relationship with second formant variability. These results are not definitive maps of vowel production, but they do suggest that the level of motor control necessary to produce stable vowels is reflected in the complexity of an underlying neural system. These results also extend a systems approach to functional image analysis, previously applied to normal and ataxic speech rate that is solely based on identifying patterns of brain activity associated with specific performance measures. Understanding the complex relationships between multiple brain regions and the acoustic characteristics of vocal stability may provide insight into the pathophysiology of the dysarthrias, vocal disorders, and other speech changes in neurological and psychiatric disorders. PMID:25295385
Story, Brad H.
2008-01-01
A new set of area functions for vowels has been obtained with Magnetic Resonance Imaging (MRI) from the same speaker as that previously reported in 1996 [Story, Titze, & Hoffman, JASA, 100, 537–554 (1996)]. The new area functions were derived from image data collected in 2002, whereas the previously reported area functions were based on MR images obtained in 1994. When compared, the new area function sets indicated a tendency toward a constricted pharyngeal region and expanded oral cavity relative to the previous set. Based on calculated formant frequencies and sensitivity functions, these morphological differences were shown to have the primary acoustic effect of systematically shifting the second formant (F2) downward in frequency. Multiple instances of target vocal tract shapes from a specific speaker provide additional sampling of the possible area functions that may be produced during speech production. This may be of benefit for understanding intra-speaker variability in vowel production and for further development of speech synthesizers and speech models that utilize area function information. PMID:18177162
Vocal tract length and acoustics of vocalization in the domestic dog (Canis familiaris).
Riede, T; Fitch, T
1999-10-01
The physical nature of the vocal tract results in the production of formants during vocalisation. In some animals (including humans), receivers can derive information (such as body size) about sender characteristics on the basis of formant characteristics. Domestication and selective breeding have resulted in a high variability in head size and shape in the dog (Canis familiaris), suggesting that there might be large differences in the vocal tract length, which could cause formant behaviour to affect interbreed communication. Lateral radiographs were made of dogs from several breeds ranging in size from a Yorkshire terrier (2.5 kg) to a German shepherd (50 kg) and were used to measure vocal tract length. In addition, we recorded an acoustic signal (growling) from some dogs. Significant correlations were found between vocal tract length, body mass and formant dispersion, suggesting that formant dispersion can deliver information about the body size of the vocalizer. Because of the low correlation between vocal tract length and the first formant, we predict a non-uniform vocal tract shape.
A Investigation of the Laryngeal System as the Resonance Source of the Singer's Formant.
NASA Astrophysics Data System (ADS)
Detweiler, Rebecca Finley
Since its introduction in 1974, Johan Sundberg's model of the laryngeal system as the resonance source of the singer's formant (Fs) has gained wide acceptance. There have heretofore been no studies directly testing its validity in vivo. The purpose of this study was to undertake a direct test of that hypothesis, utilizing as subjects professional male singers trained in the western Classical tradition. The vocal behaviors of three trained singer-subjects were evaluated during modal and pulse register phonation via magnetic resonance imaging (M.R.I.), strobolaryngoscopy, and acoustical analysis. Dr. Sundberg's hypothesis rests upon two premises: (1) that the laryngeal system is acoustically isolated and therefore capable of independent resonation during artistic singing, and (2) that the laryngeal ventricle contains an air volume adequate to function as the volume element of the proposed two-tube resonating system (Sundberg, 1974). Results of the above analyses revealed that none of the subjects achieved the requisite 6:1 laryngopharynx:laryngeal outlet area ratio to support acoustic isolation and independent resonation of the laryngeal system. Further, subjects demonstrated robust and stable singer's formants in pulse register phonation concomitant to the occlusion of the laryngeal ventricular spaces as documented by M.R.I. Therefore, these data indicated that the subjects' behaviors do not fit the model of the laryngeal system as the resonance source of the singer's formant, and that the model is inadequate to account for the generation of the singer's formant in these three subjects. Further analysis of these data suggested that the singer's formant is resolvable into two component formants, termed Fs1 and Fs2. These formants are apparently analogous to F4 and F5 of speech, but are approximated by the singer to produce the desired high amplitude energy concentration. It was hypothesized that Fs1 arises from excitation of the fourth natural mode of the quarter wave resonance of the vocal tract by the optimized voice source of the trained singer. Application of this model to data obtained in this and previous studies reported in the literature predicted the frequency locus of Fs1 with an accuracy of 92-100%.
Titze, Ingo R.; Palaparthi, Anil; Smith, Simeon L.
2014-01-01
Time-domain computer simulation of sound production in airways is a widely used tool, both for research and synthetic speech production technology. Speed of computation is generally the rationale for one-dimensional approaches to sound propagation and radiation. Transmission line and wave-reflection (scattering) algorithms are used to produce formant frequencies and bandwidths for arbitrarily shaped airways. Some benchmark graphs and tables are provided for formant frequencies and bandwidth calculations based on specific mathematical terms in the one-dimensional Navier–Stokes equation. Some rules are provided here for temporal and spatial discretization in terms of desired accuracy and stability of the solution. Kinetic losses, which have been difficult to quantify in frequency-domain simulations, are quantified here on the basis of the measurements of Scherer, Torkaman, Kucinschi, and Afjeh [(2010). J. Acoust. Soc. Am. 128(2), 828–838]. PMID:25480071
Fels, S S; Hinton, G E
1997-01-01
Glove-Talk II is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-Talk II uses several input devices, a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. With Glove-Talk II, the subject can speak slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
2014-01-01
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined. Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production. PMID:25060583
Zourmand, Alireza; Mirhassani, Seyed Mostafa; Ting, Hua-Nong; Bux, Shaik Ismail; Ng, Kwan Hoong; Bilgen, Mehmet; Jalaludin, Mohd Amin
2014-07-25
The phonetic properties of six Malay vowels are investigated using magnetic resonance imaging (MRI) to visualize the vocal tract in order to obtain dynamic articulatory parameters during speech production. To resolve image blurring due to the tongue movement during the scanning process, a method based on active contour extraction is used to track tongue contours. The proposed method efficiently tracks tongue contours despite the partial blurring of MRI images. Consequently, the articulatory parameters that are effectively measured as tongue movement is observed, and the specific shape of the tongue and its position for all six uttered Malay vowels are determined.Speech rehabilitation procedure demands some kind of visual perceivable prototype of speech articulation. To investigate the validity of the measured articulatory parameters based on acoustic theory of speech production, an acoustic analysis based on the uttered vowels by subjects has been performed. As the acoustic speech and articulatory parameters of uttered speech were examined, a correlation between formant frequencies and articulatory parameters was observed. The experiments reported a positive correlation between the constriction location of the tongue body and the first formant frequency, as well as a negative correlation between the constriction location of the tongue tip and the second formant frequency. The results demonstrate that the proposed method is an effective tool for the dynamic study of speech production.
The impact of perilaryngeal vibration on the self-perception of loudness and the Lombard effect.
Brajot, François-Xavier; Nguyen, Don; DiGiovanni, Jeffrey; Gracco, Vincent L
2018-04-05
The role of somatosensory feedback in speech and the perception of loudness was assessed in adults without speech or hearing disorders. Participants completed two tasks: loudness magnitude estimation of a short vowel and oral reading of a standard passage. Both tasks were carried out in each of three conditions: no-masking, auditory masking alone, and mixed auditory masking plus vibration of the perilaryngeal area. A Lombard effect was elicited in both masking conditions: speakers unconsciously increased vocal intensity. Perilaryngeal vibration further increased vocal intensity above what was observed for auditory masking alone. Both masking conditions affected fundamental frequency and the first formant frequency as well, but only vibration was associated with a significant change in the second formant frequency. An additional analysis of pure-tone thresholds found no difference in auditory thresholds between masking conditions. Taken together, these findings indicate that perilaryngeal vibration effectively masked somatosensory feedback, resulting in an enhanced Lombard effect (increased vocal intensity) that did not alter speakers' self-perception of loudness. This implies that the Lombard effect results from a general sensorimotor process, rather than from a specific audio-vocal mechanism, and that the conscious self-monitoring of speech intensity is not directly based on either auditory or somatosensory feedback.
"Ring" in the solo child singing voice.
Howard, David M; Williams, Jenevora; Herbst, Christian T
2014-03-01
Listeners often describe the voices of solo child singers as being "pure" or "clear"; these terms would suggest that the voice is not only pleasant but also clearly audible. The audibility or clarity could be attributed to the presence of high-frequency partials in the sound: a "brightness" or "ring." This article aims to investigate spectrally the acoustic nature of this ring phenomenon in children's solo voices, and in particular, relating it to their "nonring" production. Additionally, this is set in the context of establishing to what extent, if any, the spectral characteristics of ring are shared with those of the singer's formant cluster associated with professional adult opera singers in the 2.5-3.5kHz region. A group of child solo singers, acknowledged as outstanding by a singing teacher who specializes in teaching professional child singers, were recorded in a major UK concert hall performing Come unto him, all ye that labour, from the aria He shall feed his flock from The Messiah by GF Handel. Their singing was accompanied by a recording of a piano played through in-ear headphones. Sound pressure recordings were made from well within the critical distance in the hall. The singers were observed to produce notes with and without ring, and these recordings were analyzed in the frequency domain to investigate their spectra. The results indicate that there is evidence to suggest that ring in child solo singers is carried in two areas of the output spectrum: first in the singer's formant cluster region, centered around 4kHz, which is more than 1000Hz higher than what is observed in adults; and second in the region around 7.5-11kHz where a significant strengthening of harmonic presence is observed. A perceptual test has been carried out demonstrating that 94% of 62 listeners label a synthesized version of the calculated overall average ring spectrum for all subjects as having ring when compared with a synthesized version of the calculated overall average nonring spectrum. The notion of ring in the child solo voice manifests itself not only with spectral features in common with the projection peak found in adult singers but also in a higher frequency region. It is suggested that the formant cluster at around 4kHz is the children's equivalent of the singers' formant cluster; the frequency is higher than in the adult, most likely due to the smaller dimensions of the epilaryngeal tube. The frequency cluster observed as a strong peak at about 7.5-11kHz, when added to the children's singers' formant cluster, may be the key to cueing the notion of ring in the child solo voice. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Kluender, K R; Lotto, A J
1994-02-01
When F1-onset frequency is lower, longer F1 cut-back (VOT) is required for human listeners to perceive synthesized stop consonants as voiceless. K. R. Kluender [J. Acoust. Soc. Am. 90, 83-96 (1991)] found comparable effects of F1-onset frequency on the "labeling" of stop consonants by Japanese quail (coturnix coturnix japonica) trained to distinguish stop consonants varying in F1 cut-back. In that study, CVs were synthesized with natural-like rising F1 transitions, and endpoint training stimuli differed in the onset frequency of F1 because a longer cut-back resulted in a higher F1 onset. In order to assess whether earlier results were due to auditory predispositions or due to animals having learned the natural covariance between F1 cut-back and F1-onset frequency, the present experiment was conducted with synthetic continua having either a relatively low (375 Hz) or high (750 Hz) constant-frequency F1. Six birds were trained to respond differentially to endpoint stimuli from three series of synthesized /CV/s varying in duration of F1 cut-back. Second and third formant transitions were appropriate for labial, alveolar, or velar stops. Despite the fact that there was no opportunity for animal subjects to use experienced covariation of F1-onset frequency and F1 cut-back, quail typically exhibited shorter labeling boundaries (more voiceless stops) for intermediate stimuli of the continua when F1 frequency was higher. Responses by human subjects listening to the same stimuli were also collected. Results lend support to the earlier conclusion that part or all of the effect of F1 onset frequency on perception of voicing may be adequately explained by general auditory processes.(ABSTRACT TRUNCATED AT 250 WORDS)
Developmental weighting shifts for noise components of fricative-vowel syllables.
Nittrouer, S; Miller, M E
1997-07-01
Previous studies have convincingly shown that the weight assigned to vocalic formant transitions in decisions of fricative identity for fricative-vowel syllables decreases with development. Although these same studies suggested a developmental increase in the weight assigned to the noise spectrum, the role of the aperiodic-noise portions of the signals in these fricative decisions have not been as well-studied. The purpose of these experiments was to examine more closely developmental shifts in the weight assigned to the aperiodic-noise components of the signals in decisions of syllable-initial fricative identity. Two experiments used noises varying along continua from a clear /s/ percept to a clear /[symbol: see text]/ percept. In experiment 1, these noises were created by combining /s/ and /[symbol: see text]/ noises produced by a human vocal tract at different amplitude ratios, a process that resulted in stimuli differing primarily in the amplitude of a relatively low-frequency (roughly 2.2-kHz) peak. In experiment 2, noises that varied only in the amplitude of a similar low-frequency peak were created with a software synthesizer. Both experiments used synthetic /a/ and /u/ portions, and efforts were made to minimize possible contributions of vocalic formant transitions to fricative labeling. Children and adults labeled the resulting stimuli as /s/ vowel or /[symbol: see text]/ vowel. Combined results of the two experiments showed that children's responses were less influenced than those of adults by the amplitude of the low-frequency peak of fricative noises.
Age-related changes in the anticipatory coarticulation in the speech of young children
NASA Astrophysics Data System (ADS)
Parson, Mathew; Lloyd, Amanda; Stoddard, Kelly; Nissen, Shawn L.
2003-10-01
This paper investigates the possible patterns of anticipatory coarticulation in the speech of young children. Speech samples were elicited from three groups of children between 3 and 6 years of age and one comparison group of adults. The utterances were recorded online in a quiet room environment using high quality microphones and direct analog-to-digital conversion to computer disk. Formant frequency measures (F1, F2, and F3) were extracted from a centralized and unstressed vowel (schwa) spoken prior to two different sets of productions. The first set of productions consisted of the target vowel followed by a series of real words containing an initial CV(C) syllable (voiceless obstruent-monophthongal vowel) in a range of phonetic contexts, while the second set consisted of a series of nonword productions with a relatively constrained phonetic context. An analysis of variance was utilized to determine if the formant frequencies varied systematically as a function of age, gender, and phonetic context. Results will also be discussed in association with spectral moment measures extracted from the obstruent segment immediately following the target vowel. [Work supported by research funding from Brigham Young University.
The Effects of Emotion on Second Formant Frequency Fluctuations in Adults Who Stutter.
Bauerly, Kim R
2018-06-05
Changes in second formant frequency fluctuations (FFF2) were examined in adults who stutter (AWS) and adults who do not stutter (ANS) when producing nonwords under varying emotional conditions. Ten AWS and 10 ANS viewed images selected from the International Affective Picture System representing dimensions of arousal (e.g., excited versus bored) and hedonic valence (e.g., happy versus sad). Immediately following picture presentation, participants produced a consonant-vowel + final /t/ (CVt) nonword consisting of the initial sounds /p/, /b/, /s/, or /z/, followed by a vowel (/i/, /u/, /ε/) and a final /t/. CVt tokens were assessed for word duration and FFF2. Significantly slower word durations were shown in the AWS compared to the ANS across conditions. Although these differences appeared to increase under arousing conditions, no interaction was found. Results for FFF2 revealed a significant group-condition interaction. Post hoc analysis indicated that this was due to the AWS showing significantly greater FFF2 when speaking under conditions eliciting increases in arousal and unpleasantness. ANS showed little change in FFF2 across conditions. The results suggest that AWS' articulatory stability is more susceptible to breakdown under negative emotional influences. © 2018 S. Karger AG, Basel.
Fels, S S; Hinton, G E
1998-01-01
Glove-TalkII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to ten control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a Cyberglove, a ContactGlove, a three-space tracker, and a foot pedal), a parallel formant speech synthesizer, and three neural networks. The gesture-to-speech task is divided into vowel and consonant production by using a gating network to weight the outputs of a vowel and a consonant neural network. The gating network and the consonant network are trained with examples from the user. The vowel network implements a fixed user-defined relationship between hand position and vowel sound and does not require any training examples from the user. Volume, fundamental frequency, and stop consonants are produced with a fixed mapping from the input devices. One subject has trained to speak intelligibly with Glove-TalkII. He speaks slowly but with far more natural sounding pitch variations than a text-to-speech synthesizer.
Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants.
Vorperian, Houri K; Kurtzweil, Sara L; Fourakis, Marios; Kent, Ray D; Tillman, Katelyn K; Austin, Diane
2015-08-01
The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1-F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved.
The effect of obturator bulb height on speech in maxillectomy patients.
Kwon, H B; Chang, S W; Lee, S H
2011-03-01
The purpose of this study was to compare the speech function of low height bulb obturators with that of high height bulb obturators. Thirteen maxillectomy patients, who underwent post-operative prosthodontic rehabilitations, were included. Two obturators of the same design except for different bulb heights were fabricated for each maxillectomy patient. One of the two obturators had high bulb design and the other had low bulb design. After one of the obturators was used for a period of 3 weeks, the patient's speaking functions were evaluated by measuring nasalance scores, formant frequencies, and vowel working space areas. The same procedures were repeated with the second obturator following another 3-week period of usage. In addition, the effect of delivery sequence and anatomic conditions related to maxillectomy were analysed. The results demonstrated that the nasalance scores with the low bulb obturators were significantly higher than those with the high bulb obturators. There were no significant differences in formant frequencies based on the bulb height of the obturators. The vowel working spaces for the two obturators were similar in shape and there were no significant differences between the vowel working space areas created by the two obturators. The delivery sequence affected the results. However, there were no significant differences related to the other anatomical variables. Although low bulb obturators might function similarly with high bulb obturators in terms of the articulation of speech, they would exhibit a difficulty in controlling hypernasality in maxillectomy patients. © 2010 Blackwell Publishing Ltd.
Two-dimensional model of vocal fold vibration for sound synthesis of voice and soprano singing
NASA Astrophysics Data System (ADS)
Adachi, Seiji; Yu, Jason
2005-05-01
Voiced sounds were simulated with a computer model of the vocal fold composed of a single mass vibrating both parallel and perpendicular to the airflow. Similarities with the two-mass model are found in the amplitudes of the glottal area and the glottal volume flow velocity, the variation in the volume flow waveform with the vocal tract shape, and the dependence of the oscillation amplitude upon the average opening area of the glottis, among other similar features. A few dissimilarities are also found in the more symmetric glottal and volume flow waveforms in the rising and falling phases. The major improvement of the present model over the two-mass model is that it yields a smooth transition between oscillations with an inductive load and a capacitive load of the vocal tract with no sudden jumps in the vibration frequency. Self-excitation is possible both below and above the first formant frequency of the vocal tract. By taking advantage of the wider continuous frequency range, the two-dimensional model can successfully be applied to the sound synthesis of a high-pitched soprano singing, where the fundamental frequency sometimes exceeds the first formant frequency. .
A comparative analysis of whispered and normally phonated speech using an LPC-10 vocoder
NASA Astrophysics Data System (ADS)
Wilson, J. B.; Mosko, J. D.
1985-12-01
The determination of the performance of an LPC-10 vocoder in the processing of adult male and female whispered and normally phonated connected speech was the focus of this study. The LPC-10 vocoder's analysis of whispered speech compared quite favorably with similar studies which used sound spectrographic processing techniques. Shifting from phonated speech to whispered speech caused a substantial increase in the phonomic formant frequencies and formant bandwidths for both male and female speakers. The data from this study showed no evidence that the LPC-10 vocoder's ability to process voices with pitch extremes and quality extremes was limited in any significant manner. A comparison of the unprocessed natural vowel waveforms and qualities with the synthesized vowel waveforms and qualities revealed almost imperceptible differences. An LPC-10 vocoder's ability to process linguistic and dialectical suprasegmental features such as intonation, rate and stress at low bit rates should be a critical issue of concern for future research.
Quantitative and descriptive comparison of four acoustic analysis systems: vowel measurements.
Burris, Carlyn; Vorperian, Houri K; Fourakis, Marios; Kent, Ray D; Bolt, Daniel M
2014-02-01
This study examines accuracy and comparability of 4 trademarked acoustic analysis software packages (AASPs): Praat, WaveSurfer, TF32, and CSL by using synthesized and natural vowels. Features of AASPs are also described. Synthesized and natural vowels were analyzed using each of the AASP's default settings to secure 9 acoustic measures: fundamental frequency (F0), formant frequencies (F1-F4), and formant bandwidths (B1-B4). The discrepancy between the software measured values and the input values (synthesized, previously reported, and manual measurements) was used to assess comparability and accuracy. Basic AASP features are described. Results indicate that Praat, WaveSurfer, and TF32 generate accurate and comparable F0 and F1-F4 data for synthesized vowels and adult male natural vowels. Results varied by vowel for women and children, with some serious errors. Bandwidth measurements by AASPs were highly inaccurate as compared with manual measurements and published data on formant bandwidths. Values of F0 and F1-F4 are generally consistent and fairly accurate for adult vowels and for some child vowels using the default settings in Praat, WaveSurfer, and TF32. Manipulation of default settings yields improved output values in TF32 and CSL. Caution is recommended especially before accepting F1-F4 results for children and B1-B4 results for all speakers.
Contribution of the supraglottic larynx to the vocal product: imaging and acoustic analysis
NASA Astrophysics Data System (ADS)
Gracco, L. Carol
1996-04-01
Horizontal supraglottic laryngectomy is a surgical procedure to remove a mass lesion located in the region of the pharynx superior to the true vocal folds. In contrast to full or partial laryngectomy, patients who undergo horizontal supraglottic laryngectomy often present with little or nor involvement to the true vocal folds. This population provides an opportunity to examine the acoustic consequences of altering the pharynx while sparing the laryngeal sound source. Acoustic and magnetic resonance imaging (MRI) data were acquired in a group of four patients before and after supraglottic laryngectomy. Acoustic measures included the identification of vocal tract resonances and the fundamental frequency of the vocal fold vibration. 3D reconstruction of the pharyngeal portion of each subjects' vocal tract were made from MRIs taken during phonation and volume measures were obtained. These measures reveal a variable, but often dramatic difference in the surgically-altered area of the pharynx and changes in the formant frequencies of the vowel/i/post surgically. In some cases the presence of the tumor created a deviation from the expected formant values pre-operatively with post-operative values approaching normal. Patients who also underwent radiation treatment post surgically tended to have greater constriction in the pharyngeal area of the vocal tract.
Hunter, Eric J; Svec, Jan G; Titze, Ingo R
2006-12-01
Frequency and intensity ranges (in true decibel sound pressure level, 20 microPa at 1 m) of voice production in trained and untrained vocalists were compared with the perceived dynamic range (phons) and units of loudness (sones) of the ear. Results were reported in terms of standard voice range profiles (VRPs), perceived VRPs (as predicted by accepted measures of auditory sensitivities), and a new metric labeled as an overall perceptual level construct. Trained classical singers made use of the most sensitive part of the hearing range (around 3-4 kHz) through the use of the singer's formant. When mapped onto the contours of equal loudness (depicting nonuniform spectral and dynamic sensitivities of the auditory system), the formant is perceived at an even higher sound level, as measured in phons, than a flat or A-weighted spectrum would indicate. The contributions of effects like the singer's formant and the sensitivities of the auditory system helped the trained singers produce 20% to 40% more units of loudness, as measured in sones, than the untrained singers. Trained male vocalists had a maximum overall perceptual level construct that was 40% higher than the untrained male vocalists. Although the A-weighted spectrum (commonly used in VRP measurement) is a reasonable first-order approximation of auditory sensitivities, it misrepresents the most salient part of the sensitivities (where the singer's formant is found) by nearly 10 dB.
Women use voice parameters to assess men's characteristics
Bruckert, Laetitia; Liénard, Jean-Sylvain; Lacroix, André; Kreutzer, Michel; Leboucher, Gérard
2005-01-01
The purpose of this study was: (i) to provide additional evidence regarding the existence of human voice parameters, which could be reliable indicators of a speaker's physical characteristics and (ii) to examine the ability of listeners to judge voice pleasantness and a speaker's characteristics from speech samples. We recorded 26 men enunciating five vowels. Voices were played to 102 female judges who were asked to assess vocal attractiveness and speakers' age, height and weight. Statistical analyses were used to determine: (i) which physical component predicted which vocal component and (ii) which vocal component predicted which judgment. We found that men with low-frequency formants and small formant dispersion tended to be older, taller and tended to have a high level of testosterone. Female listeners were consistent in their pleasantness judgment and in their height, weight and age estimates. Pleasantness judgments were based mainly on intonation. Female listeners were able to correctly estimate age by using formant components. They were able to estimate weight but we could not explain which acoustic parameters they used. However, female listeners were not able to estimate height, possibly because they used intonation incorrectly. Our study confirms that in all mammal species examined thus far, including humans, formant components can provide a relatively accurate indication of a vocalizing individual's characteristics. Human listeners have the necessary information at their disposal; however, they do not necessarily use it. PMID:16519239
Fundamental frequency estimation of singing voice
NASA Astrophysics Data System (ADS)
de Cheveigné, Alain; Henrich, Nathalie
2002-05-01
A method of fundamental frequency (F0) estimation recently developped for speech [de Cheveigné and Kawahara, J. Acoust. Soc. Am. (to be published)] was applied to singing voice. An electroglottograph signal recorded together with the microphone provided a reference by which estimates could be validated. Using standard parameter settings as for speech, error rates were low despite the wide range of F0s (about 100 to 1600 Hz). Most ``errors'' were due to irregular vibration of the vocal folds, a sharp formant resonance that reduced the waveform to a single harmonic, or fast F0 changes such as in high-amplitude vibrato. Our database (18 singers from baritone to soprano) included examples of diphonic singing for which melody is carried by variations of the frequency of a narrow formant rather than F0. Varying a parameter (ratio of inharmonic to total power) the algorithm could be tuned to follow either frequency. Although the method has not been formally tested on a wide range of instruments, it seems appropriate for musical applications because it is accurate, accepts a wide range of F0s, and can be implemented with low latency for interactive applications. [Work supported by the Cognitique programme of the French Ministry of Research and Technology.
JND measurements of the speech formants parameters and its implication in the LPC pole quantization
NASA Astrophysics Data System (ADS)
Orgad, Yaakov
1988-08-01
The inherent sensitivity of auditory perception is explicitly used with the objective of designing an efficient speech encoder. Speech can be modelled by a filter representing the vocal tract shape that is driven by an excitation signal representing glottal air flow. This work concentrates on the filter encoding problem, assuming that excitation signal encoding is optimal. Linear predictive coding (LPC) techniques were used to model a short speech segment by an all-pole filter; each pole was directly related to the speech formants. Measurements were made of the auditory just noticeable difference (JND) corresponding to the natural speech formants, with the LPC filter poles as the best candidates to represent the speech spectral envelope. The JND is the maximum precision required in speech quantization; it was defined on the basis of the shift of one pole parameter of a single frame of a speech segment, necessary to induce subjective perception of the distortion, with .75 probability. The average JND in LPC filter poles in natural speech was found to increase with increasing pole bandwidth and, to a lesser extent, frequency. The JND measurements showed a large spread of the residuals around the average values, indicating that inter-formant coupling and, perhaps, other, not yet fully understood, factors were not taken into account at this stage of the research. A future treatment should consider these factors. The average JNDs obtained in this work were used to design pole quantization tables for speech coding and provided a better bit-rate than the standard quantizer of reflection coefficient; a 30-bits-per-frame pole quantizer yielded a speech quality similar to that obtained with a standard 41-bits-per-frame reflection coefficient quantizer. Owing to the complexity of the numerical root extraction system, the practical implementation of the pole quantization approach remains to be proved.
Ng, Manwa L; Yan, Nan; Chan, Venus; Chen, Yang; Lam, Paul K Y
2018-06-28
Previous studies of the laryngectomized vocal tract using formant frequencies reported contradictory findings. Imagining studies of the vocal tract in alaryngeal speakers are limited due to the possible radiation effect as well as the cost and time associated with the studies. The present study examined the vocal tract configuration of laryngectomized individuals using acoustic reflection technology. Thirty alaryngeal and 30 laryngeal male speakers of Cantonese participated in the study. A pharyngometer was used to obtain volumetric information of the vocal tract. All speakers were instructed to imitate the production of /a/ when the length and volume information of the oral cavity, pharyngeal cavity, and the entire vocal tract were obtained. The data of alaryngeal and laryngeal speakers were compared. Pharyngometric measurements revealed no significant difference in the vocal tract dimensions between laryngeal and alaryngeal speakers. Despite the removal of the larynx and a possible alteration in the pharyngeal cavity during total laryngectomy, the vocal tract configuration (length and volume) in laryngectomized individuals was not significantly different from laryngeal speakers. It is suggested that other factors might have affected formant measures in previous studies. © 2018 S. Karger AG, Basel.
2015-01-01
The goal of this study was to analyse perceptually and acoustically the voices of patients with Unilateral Vocal Fold Paralysis (UVFP) and compare them to the voices of normal subjects. These voices were analysed perceptually with the GRBAS scale and acoustically using the following parameters: mean fundamental frequency (F0), standard-deviation of F0, jitter (ppq5), shimmer (apq11), mean harmonics-to-noise ratio (HNR), mean first (F1) and second (F2) formants frequency, and standard-deviation of F1 and F2 frequencies. Statistically significant differences were found in all of the perceptual parameters. Also the jitter, shimmer, HNR, standard-deviation of F0, and standard-deviation of the frequency of F2 were statistically different between groups, for both genders. In the male data differences were also found in F1 and F2 frequencies values and in the standard-deviation of the frequency of F1. This study allowed the documentation of the alterations resulting from UVFP and addressed the exploration of parameters with limited information for this pathology. PMID:26557690
Bizley, Jennifer K; Walker, Kerry M M; King, Andrew J; Schnupp, Jan W H
2013-01-01
Spectral timbre is an acoustic feature that enables human listeners to determine the identity of a spoken vowel. Despite its importance to sound perception, little is known about the neural representation of sound timbre and few psychophysical studies have investigated timbre discrimination in non-human species. In this study, ferrets were positively conditioned to discriminate artificial vowel sounds in a two-alternative-forced-choice paradigm. Animals quickly learned to discriminate the vowel sound /u/ from /ε/ and were immediately able to generalize across a range of voice pitches. They were further tested in a series of experiments designed to assess how well they could discriminate these vowel sounds under different listening conditions. First, a series of morphed vowels was created by systematically shifting the location of the first and second formant frequencies. Second, the ferrets were tested with single formant stimuli designed to assess which spectral cues they could be using to make their decisions. Finally, vowel discrimination thresholds were derived in the presence of noise maskers presented from either the same or a different spatial location. These data indicate that ferrets show robust vowel discrimination behavior across a range of listening conditions and that this ability shares many similarities with human listeners.
Bizley, Jennifer K; Walker, Kerry MM; King, Andrew J; Schnupp, Jan WH
2013-01-01
Spectral timbre is an acoustic feature that enables human listeners to determine the identity of a spoken vowel. Despite its importance to sound perception, little is known about the neural representation of sound timbre and few psychophysical studies have investigated timbre discrimination in non-human species. In this study, ferrets were positively conditioned to discriminate artificial vowel sounds in a two-alternative-forced-choice paradigm. Animals quickly learned to discriminate the vowel sound /u/ from /ε/, and were immediately able to generalize across a range of voice pitches. They were further tested in a series of experiments designed to assess how well they could discriminate these vowel sounds under different listening conditions. First, a series of morphed vowels was created by systematically shifting the location of the first and second formant frequencies. Second, the ferrets were tested with single formant stimuli designed to assess which spectral cues they could be using to make their decisions. Finally, vowel discrimination thresholds were derived in the presence of noise maskers presented from either the same or a different spatial location. These data indicate that ferrets show robust vowel discrimination behavior across a range of listening conditions and that this ability shares many similarities with human listeners. PMID:23297909
Hunter, Eric J.; Švec, Jan G.; Titze, Ingo R.
2016-01-01
Frequency and intensity ranges (in true dB SPL re 20 μPa at 1 meter) of voice production in trained and untrained vocalists were compared to the perceived dynamic range (phons) and units of loudness (sones) of the ear. Results were reported in terms of standard Voice Range Profiles (VRPs), perceived VRPs (as predicted by accepted measures of auditory sensitivities), and a new metric labeled as an Overall Perceptual Level Construct. Trained classical singers made use of the most sensitive part of the hearing range (around 3–4 KHz) through the use of the singer’s formant. When mapped onto the contours of equal-loudness (depicting non-uniform spectral and dynamic sensitivities of the auditory system), the formant is perceived at an even higher sound level, as measured in phons, than a flat or A-weighted spectrum would indicate. The contributions of effects like the singer’s formant and the sensitivities of the auditory system helped the trained singers produce 20–40 percent more units of loudness, as measured in sones, than the untrained singers. Trained male vocalists had a maximum Overall Perceptual Level Construct that was 40% higher than the untrained male vocalists. While the A-weighted spectrum (commonly used in VRP measurement) is a reasonable first order approximation of auditory sensitivities, it misrepresents the most salient part of the sensitivities (where the singer’s formant is found) by nearly 10 dB. PMID:16325373
Manfredi, Claudia; Barbagallo, Davide; Baracca, Giovanna; Orlandi, Silvia; Bandini, Andrea; Dejonckere, Philippe H
2015-07-01
The obvious perceptual differences between various singing styles like Western operatic and jazz rely on specific dissimilarities in vocal technique. The present study focuses on differences in vibrato acoustics and in singer's formant as analyzed by a novel software tool, named BioVoice, based on robust high-resolution and adaptive techniques that have proven its validity on synthetic voice signals. A total of 48 professional singers were investigated (29 females; 19 males; 29 Western operatic; and 19 jazz). They were asked to sing "a cappella," but with artistic expression, a well-known musical phrase from Gershwin's Porgy and Bess, in their own style: either operatic or jazz. A specific sustained note was extracted for detailed vibrato analysis. Beside rate (s(-1)) and extent (cents), duration (seconds) and regularity were computed. Two new concepts are introduced: vibrato jitter and vibrato shimmer, by analogy with the traditional jitter and shimmer of voice signals. For the singer's formant, on the same sustained tone, the ratio of the acoustic energy in formants 1-2 to the energy in formants 3, 4, and 5 was automatically computed, providing a quality ratio (QR). Vibrato rates did not differ among groups. Extent was significantly larger in operatic singers, particularly females. Vibrato jitter and vibrato shimmer were significantly smaller in operatic singers. Duration of vibrato was also significantly longer in operatic singers. QR was significantly lower in male operatic singers. Some vibrato characteristics (extent, regularity, and duration) very clearly differentiate the Western operatic singing style from the jazz singing style. The singer's formant is typical of male operatic singers. The new software tool is well suited to provide useful feedback in a pedagogical context. Copyright © 2015 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Yu, Chengzhu; Hansen, John H L
2017-03-01
Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.
Voice recognition through phonetic features with Punjabi utterances
NASA Astrophysics Data System (ADS)
Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.
2017-07-01
This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
Perceptual, auditory and acoustic vocal analysis of speech and singing in choir conductors.
Rehder, Maria Inês Beltrati Cornacchioni; Behlau, Mara
2008-01-01
the voice of choir conductors. to evaluate the vocal quality of choir conductors based on the production of a sustained vowel during singing and when speaking in order to observe auditory and acoustic differences. participants of this study were 100 choir conductors, with an equal distribution between genders. Participants were asked to produce the sustained vowel "é" using a singing and speaking voice. Speech samples were analyzed based on auditory-perceptive and acoustic parameters. The auditory-perceptive analysis was carried out by two speech-language pathologist, specialists in this field of knowledge. The acoustic analysis was carried out with the support of the computer software Doctor Speech (Tiger Electronics, SRD, USA, version 4.0), using the Real Analysis module. the auditory-perceptive analysis of the vocal quality indicated that most conductors have adapted voices, presenting more alterations in their speaking voice. The acoustic analysis indicated different values between genders and between the different production modalities. The fundamental frequency was higher in the singing voice, as well as the values for the first formant; the second formant presented lower values in the singing voice, with statistically significant results only for women. the voice of choir conductors is adapted, presenting fewer deviations in the singing voice when compared to the speaking voice. Productions differ based the voice modality, singing or speaking.
Ahadi, Mohsen; Pourbakht, Akram; Jafari, Amir Homayoun; Shirjian, Zahra; Jafarpisheh, Amir Salar
2014-06-01
To investigate the influence of gender on subcortical representation of speech acoustic parameters where simultaneously presented to both ears. Two-channel speech-evoked auditory brainstem responses were obtained in 25 female and 23 male normal hearing young adults by using binaural presentation of the 40 ms synthetic consonant-vowel/da/, and the encoding of the fast and slow elements of speech stimuli at subcortical level were compared in the temporal and spectral domains between the sexes using independent sample, two tailed t-test. Highly detectable responses were established in both groups. Analysis in the time domain revealed earlier and larger Fast-onset-responses in females but there was no gender related difference in sustained segment and offset of the response. Interpeak intervals between Frequency Following Response peaks were also invariant to sex. Based on shorter onset responses in females, composite onset measures were also sex dependent. Analysis in the spectral domain showed more robust and better representation of fundamental frequency as well as the first formant and high frequency components of first formant in females than in males. Anatomical, biological and biochemical distinctions between females and males could alter the neural encoding of the acoustic cues of speech stimuli at subcortical level. Females have an advantage in binaural processing of the slow and fast elements of speech. This could be a physiological evidence for better identification of speaker and emotional tone of voice, as well as better perceiving the phonetic information of speech in women. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Brainstem Correlates of Temporal Auditory Processing in Children with Specific Language Impairment
ERIC Educational Resources Information Center
Basu, Madhavi; Krishnan, Ananthanarayan; Weber-Fox, Christine
2010-01-01
Deficits in identification and discrimination of sounds with short inter-stimulus intervals or short formant transitions in children with specific language impairment (SLI) have been taken to reflect an underlying temporal auditory processing deficit. Using the sustained frequency following response (FFR) and the onset auditory brainstem responses…
Digitised evaluation of speech intelligibility using vowels in maxillectomy patients.
Sumita, Y I; Hattori, M; Murase, M; Elbashti, M E; Taniguchi, H
2018-03-01
Among the functional disabilities that patients face following maxillectomy, speech impairment is a major factor influencing quality of life. Proper rehabilitation of speech, which may include prosthodontic and surgical treatments and speech therapy, requires accurate evaluation of speech intelligibility (SI). A simple, less time-consuming yet accurate evaluation is desirable both for maxillectomy patients and the various clinicians providing maxillofacial treatment. This study sought to determine the utility of digital acoustic analysis of vowels for the prediction of SI in maxillectomy patients, based on a comprehensive understanding of speech production in the vocal tract of maxillectomy patients and its perception. Speech samples were collected from 33 male maxillectomy patients (mean age 57.4 years) in two conditions, without and with a maxillofacial prosthesis, and formant data for the vowels /a/,/e/,/i/,/o/, and /u/ were calculated based on linear predictive coding. The frequency range of formant 2 (F2) was determined by differences between the minimum and maximum frequency. An SI test was also conducted to reveal the relationship between SI score and F2 range. Statistical analyses were applied. F2 range and SI score were significantly different between the two conditions without and with a prosthesis (both P < .0001). F2 range was significantly correlated with SI score in both the conditions (Spearman's r = .843, P < .0001; r = .832, P < .0001, respectively). These findings indicate that calculating the F2 range from 5 vowels has clinical utility for the prediction of SI after maxillectomy. © 2017 John Wiley & Sons Ltd.
Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T
2001-09-01
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Whitfield, Jason A; Dromey, Christopher; Palmer, Panika
2018-05-17
The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces. Young adult speakers produced 3 repetitions of 2 different sentences at 3 different loudness levels. Lingual kinematic and acoustic signals were collected and analyzed. Acoustic and kinematic variants of several vowel space metrics were calculated from the formant frequencies and the position of 2 lingual markers. Traditional metrics included triangular vowel space area and the vowel articulation index. Acoustic and kinematic variants of sentence-level metrics based on the articulatory-acoustic vowel space and the vowel space hull area were also calculated. Both acoustic and kinematic variants of the sentence-level metrics significantly increased with an increase in loudness, whereas no statistically significant differences in traditional vowel-point metrics were observed for either the kinematic or acoustic variants across the 3 loudness conditions. In addition, moderate-to-strong relationships between the acoustic and kinematic variants of the sentence-level vowel space metrics were observed for the majority of participants. These data suggest that both kinematic and acoustic vowel space metrics that reflect the dynamic contributions of both consonant and vowel segments are sensitive to within-speaker changes in articulation associated with manipulations of speech intensity.
Acoustic and Durational Properties of Indian English Vowels
ERIC Educational Resources Information Center
Maxwell, Olga; Fletcher, Janet
2009-01-01
This paper presents findings of an acoustic phonetic analysis of vowels produced by speakers of English as a second language from northern India. The monophthongal vowel productions of a group of male speakers of Hindi and male speakers of Punjabi were recorded, and acoustic phonetic analyses of vowel formant frequencies and vowel duration were…
Maurer, D; Hess, M; Gross, M
1996-12-01
Theoretic investigations of the "source-filter" model have indicated a pronounced acoustic interaction of glottal source and vocal tract. Empirical investigations of formant pattern variations apart from changes in vowel identity have demonstrated a direct relationship between the fundamental frequency and the patterns. As a consequence of both findings, independence of phonation and articulation may be limited in the speech process. Within the present study, possible interdependence of phonation and phoneme was investigated: vocal fold vibrations and larynx position for vocalizations of different vowels in a healthy man and woman were examined by high-speed light-intensified digital imaging. We found 1) different movements of the vocal folds for vocalizations of different vowel identities within one speaker and at similar fundamental frequency, and 2) constant larynx position within vocalization of one vowel identity, but different positions for vocalizations of different vowel identities. A possible relationship between the vocal fold vibrations and the phoneme is discussed.
NASA Astrophysics Data System (ADS)
Ménard, Lucie; Schwartz, Jean-Luc; Boë, Louis-Jean; Kandel, Sonia; Vallée, Nathalie
2002-04-01
The present article aims at exploring the invariant parameters involved in the perceptual normalization of French vowels. A set of 490 stimuli, including the ten French vowels eye, why, you, ee, ø oh, eh, oelig, openo, aye produced by an articulatory model, simulating seven growth stages and seven fundamental frequency values, has been submitted as a perceptual identification test to 43 subjects. The results confirm the important effect of the tonality distance between F1 and f0 in perceived height. It does not seem, however, that height perception involves a binary organization determined by the 3-3.5-Bark critical distance. Regarding place of articulation, the tonotopic distance between F1 and F2 appears to be the best predictor of the perceived front-back dimension. Nevertheless, the role of the difference between F2 and F3 remains important. Roundedness is also examined and correlated to the effective second formant, involving spectral integration of higher formants within the 3.5-Bark critical distance. The results shed light on the issue of perceptual invariance, and can be interpreted as perceptual constraints imposed on speech production.
Varnet, Léo; Knoblauch, Kenneth; Serniclaes, Willy; Meunier, Fanny; Hoen, Michel
2015-01-01
Although there is a large consensus regarding the involvement of specific acoustic cues in speech perception, the precise mechanisms underlying the transformation from continuous acoustical properties into discrete perceptual units remains undetermined. This gap in knowledge is partially due to the lack of a turnkey solution for isolating critical speech cues from natural stimuli. In this paper, we describe a psychoacoustic imaging method known as the Auditory Classification Image technique that allows experimenters to estimate the relative importance of time-frequency regions in categorizing natural speech utterances in noise. Importantly, this technique enables the testing of hypotheses on the listening strategies of participants at the group level. We exemplify this approach by identifying the acoustic cues involved in da/ga categorization with two phonetic contexts, Al- or Ar-. The application of Auditory Classification Images to our group of 16 participants revealed significant critical regions on the second and third formant onsets, as predicted by the literature, as well as an unexpected temporal cue on the first formant. Finally, through a cluster-based nonparametric test, we demonstrate that this method is sufficiently sensitive to detect fine modifications of the classification strategies between different utterances of the same phoneme.
Acoustic passaggio pedagogy for the male voice.
Bozeman, Kenneth Wood
2013-07-01
Awareness of interactions between the lower harmonics of the voice source and the first formant of the vocal tract, and of the passive vowel modifications that accompany them, can assist in working out a smooth transition through the passaggio of the male voice. A stable vocal tract length establishes the general location of all formants, including the higher formants that form the singer's formant cluster. Untrained males instinctively shorten the tube to preserve the strong F1/H2 acoustic coupling of voce aperta, resulting in 'yell' timbre. If tube length and shape are kept stable during pitch ascent, the yell can be avoided by allowing the second harmonic to rise above the first formant, creating the balanced timbre of voce chiusa.
The relationship between professional operatic soprano voice and high range spectral energy
NASA Astrophysics Data System (ADS)
Barnes, Jennifer J.; Davis, Pamela; Oates, Jennifer; Chapman, Janice
2004-07-01
Operatic sopranos need to be audible over an orchestra yet they are not considered to possess a singer's formant. As in other voice types, some singers are more successful than others at being heard and so this work investigated the frequency range of the singer's formant between 2000 and 4000 Hz to consider the question of extra energy in this range. Such energy would give an advantage over an orchestra, so the aims were to ascertain what levels of excess energy there might be and look at any relationship between extra energy levels and performance level. The voices of six operatic sopranos (national and international standard) were recorded performing vowel and song tasks and subsequently analyzed acoustically. Measures taken from vowel data were compared with song task data to assess the consistency of the approaches. Comparisons were also made with regard to two conditions of intended projection (maximal and comfortable), two song tasks (anthem and aria), two recording environments (studio and anechoic room), and between subjects. Ranking the singers from highest energy result to lowest showed the consistency of the results from both vowel and song methods and correlated reasonably well with the performance level of the subjects. The use of formant tuning is considered and examined.
Bálint, Anna; Faragó, Tamás; Miklósi, Ádám; Pongrácz, Péter
2016-11-01
Body size is an important feature that affects fighting ability; however, size-related parameters of agonistic vocalizations are difficult to manipulate because of anatomical constraints within the vocal production system. Rare examples of acoustic size modulation are due to specific features that enable the sender to steadily communicate exaggerated body size. However, one could argue that it would be more adaptive if senders could adjust their signaling behavior to the fighting potential of their actual opponent. So far there has been no experimental evidence for this possibility. We tested this hypothesis by exposing family dogs (Canis familiaris) to humans with potentially different fighting ability. In a within-subject experiment, 64 dogs of various breeds consecutively faced two threateningly approaching humans, either two men or two women of different stature, or a man and a woman of similar or different stature. We found that the dogs' vocal responses were affected by the gender of the threatening stranger and the dog owner's gender. Dogs with a female owner, or those dogs which came from a household where both genders were present, reacted with growls of lower values of the Pitch-Formant component (including deeper fundamental frequency and lower formant dispersion) to threatening men. Our results are the first to show that non-human animals react with dynamic alteration of acoustic parameters related to their individual indexical features (body size), depending on the level of threat in an agonistic encounter.
Vowel Acoustic Space Development in Children: A Synthesis of Acoustic and Anatomic Data
ERIC Educational Resources Information Center
Vorperian, Houri K.; Kent, Ray D.
2007-01-01
Purpose: This article integrates published acoustic data on the development of vowel production. Age specific data on formant frequencies are considered in the light of information on the development of the vocal tract (VT) to create an anatomic-acoustic description of the maturation of the vowel acoustic space for English. Method: Literature…
NASA Astrophysics Data System (ADS)
Sibiryakova, Olga V.; Volodin, Ilya A.; Frey, Roland; Zuther, Steffen; Kisebaev, Talgat B.; Salemgareev, Albert R.; Volodina, Elena V.
2017-04-01
Saiga antelopes Saiga tatarica tatarica give birth in large aggregations, and offspring follow the herd soon after birth. Herding is advantageous as anti-predator strategy; however, communication between mothers and neonates is strongly complicated in large aggregations. Individual series of nasal and oral contact calls of mother and neonate saiga antelopes were selected from recordings made with automated recording systems placed near the hiding neonates on the saiga breeding grounds in Northern Kazakhstan during synchronized parturitions of 30,000 calving females. We used for comparison of the acoustic structure of nasal and oral contact calls 168 nasal calls of 18 mothers, 192 oral calls of 21 mothers, 78 nasal calls of 16 neonates, and 197 oral calls of 22 neonates. In the oral calls of either mothers or neonates, formant frequencies were higher and the duration was longer than in the nasal calls, whereas fundamental frequencies did not differ between oral and nasal calls. Discriminant function analysis (DFA) based on six acoustic variables, accurately classified individual identity for 99.4% of oral calls of 18 mothers, for 89.3% of nasal calls of 18 mothers, and for 94.4% of oral calls of 18 neonates. The average value of correct classification to individual was higher in mother oral than in mother nasal calls and in mother oral calls than in neonate oral calls; no significant difference was observed between mother nasal and neonate oral calls. Variables mainly responsible for vocal identity were the fundamental frequency and the second and third formants in either mothers or neonates, and in either nasal or oral calls. The high vocal identity of mothers and neonates suggests a powerful potential for the mutual mother-offspring recognition in dense aggregations of saiga antelopes as an important component of their survival strategy.
Formant Amplitude of Children with Down's Syndrome.
ERIC Educational Resources Information Center
Pentz, Arthur L., Jr.
1987-01-01
The sustained vowel sounds of 14 noninstitutionalized 7- to 10-year-old children with Down's syndrome were analyzed acoustically for vowel formant amplitude levels. The subjects with Down's syndrome had formant amplitude intensity levels significantly lower than those of a similar group of speakers without Down's syndrome. (Author/DB)
[Singing formant analysis of KunQu actors in their mutation and grown-up].
Zhu, Mei; Zhang, Dao-Xing; Liu, Yong-Xiang; Yang, Xiao-ju
2005-04-01
To compare the singing formant differences between successful opera actors and non-successful opera actors during their adolescence period, and to compare the same index between adolescence and adult period of successful actors. From 1985 to 1986, the author had 21 adolescent actors' voice recorded, all of them were from Beijing KunQu opera troupe. In 2000, all the 21 subjects had their voice recorded and singing formant (Fs) analyzed by using computer and sound spectrograph, 7 of them had become adult actors, others quitted their actors career after adolescents period. Successful actors have obvious Fs, and stronger acoustic energy; successful actors had weaker Fs value during adolescence period than during adult period (t = 2. 9600, P < 0.05). Fs's presence and its acoustic energy were important to evaluate adolescent actors future locality potential.
Vocal production mechanisms in a non-human primate: morphological data and a model.
Riede, Tobias; Bronson, Ellen; Hatzikirou, Haralambos; Zuberbühler, Klaus
2005-01-01
Human beings are thought to be unique amongst the primates in their capacity to produce rapid changes in the shape of their vocal tracts during speech production. Acoustically, vocal tracts act as resonance chambers, whose geometry determines the position and bandwidth of the formants. Formants provide the acoustic basis for vowels, which enable speakers to refer to external events and to produce other kinds of meaningful communication. Formant-based referential communication is also present in non-human primates, most prominently in Diana monkey alarm calls. Previous work has suggested that the acoustic structure of these calls is the product of a non-uniform vocal tract capable of some degree of articulation. In this study we test this hypothesis by providing morphological measurements of the vocal tract of three adult Diana monkeys, using both radiography and dissection. We use these data to generate a vocal tract computational model capable of simulating the formant structures produced by wild individuals. The model performed best when it combined a non-uniform vocal tract consisting of three different tubes with a number of articulatory manoeuvres. We discuss the implications of these findings for evolutionary theories of human and non-human vocal production.
An Analysis of The Parameters Used In Speech ABR Assessment Protocols.
Sanfins, Milaine D; Hatzopoulos, Stavros; Donadon, Caroline; Diniz, Thais A; Borges, Leticia R; Skarzynski, Piotr H; Colella-Santos, Maria Francisca
2018-04-01
The aim of this study was to assess the parameters of choice, such as duration, intensity, rate, polarity, number of sweeps, window length, stimulated ear, fundamental frequency, first formant, and second formant, from previously published speech ABR studies. To identify candidate articles, five databases were assessed using the following keyword descriptors: speech ABR, ABR-speech, speech auditory brainstem response, auditory evoked potential to speech, speech-evoked brainstem response, and complex sounds. The search identified 1288 articles published between 2005 and 2015. After filtering the total number of papers according to the inclusion and exclusion criteria, 21 studies were selected. Analyzing the protocol details used in 21 studies suggested that there is no consensus to date on a speech-ABR protocol and that the parameters of analysis used are quite variable between studies. This inhibits the wider generalization and extrapolation of data across languages and studies.
Neural Coding of Formant-Exaggerated Speech in the Infant Brain
ERIC Educational Resources Information Center
Zhang, Yang; Koerner, Tess; Miller, Sharon; Grice-Patil, Zach; Svec, Adam; Akbari, David; Tusler, Liz; Carney, Edward
2011-01-01
Speech scientists have long proposed that formant exaggeration in infant-directed speech plays an important role in language acquisition. This event-related potential (ERP) study investigated neural coding of formant-exaggerated speech in 6-12-month-old infants. Two synthetic /i/ vowels were presented in alternating blocks to test the effects of…
Hearing Loss Severity: Impaired Processing of Formant Transition Duration
ERIC Educational Resources Information Center
Coez, A.; Belin, P.; Bizaguet, E.; Ferrary, E.; Zilbovicius, M.; Samson, Y.
2010-01-01
Normal hearing listeners exploit the formant transition (FT) detection to identify place of articulation for stop consonants. Neuro-imaging studies revealed that short FT induced less cortical activation than long FT. To determine the ability of hearing impaired listeners to distinguish short and long formant transitions (FT) from vowels of the…
Speech evaluation after palatal augmentation in patients undergoing glossectomy.
de Carvalho-Teles, Viviane; Sennes, Luiz Ubirajara; Gielow, Ingrid
2008-10-01
To assess, in patients undergoing glossectomy, the influence of the palatal augmentation prosthesis on the speech intelligibility and acoustic spectrographic characteristics of the formants of oral vowels in Brazilian Portuguese, specifically the first 3 formants (F1 [/a,e,u/], F2 [/o,ó,u/], and F3 [/a,ó/]). Speech evaluation with and without a palatal augmentation prosthesis using blinded randomized listener judgments. Tertiary referral center. Thirty-six patients (33 men and 3 women) aged 30 to 80 (mean [SD], 53.9 [10.5]) years underwent glossectomy (14, total glossectomy; 12, total glossectomy and partial mandibulectomy; 6, hemiglossectomy; and 4, subtotal glossectomy) with use of the augmentation prosthesis for at least 3 months before inclusion in the study. Spontaneous speech intelligibility (assessed by expert listeners using a 4-category scale) and spectrographic formants assessment. We found a statistically significant improvement of spontaneous speech intelligibility and the average number of correctly identified syllables with the use of the prosthesis (P < .05). Statistically significant differences occurred for the F1 values of the vowels /a,e,u/; for F2 values, there was a significant difference of the vowels /o,ó,u/; and for F3 values, there was a significant difference of the vowels /a,ó/ (P < .001). The palatal augmentation prosthesis improved the intelligibility of spontaneous speech and syllables for patients who underwent glossectomy. It also increased the F2 and F3 values for all vowels and the F1 values for the vowels /o,ó,u/. This effect brought the values of many vowel formants closer to normal.
On the number of channels needed to classify vowels: Implications for cochlear implants
NASA Astrophysics Data System (ADS)
Fourakis, Marios; Hawks, John W.; Davis, Erin
2005-09-01
In cochlear implants the incoming signal is analyzed by a bank of filters. Each filter is associated with an electrode to constitute a channel. The present research seeks to determine the number of channels needed for optimal vowel classification. Formant measurements of vowels produced by men and women [Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)] were converted to channel assignments. The number of channels varied from 4 to 20 over two frequency ranges (180-4000 and 180-6000 Hz) in equal bark steps. Channel assignments were submitted to linear discriminant analysis (LDA). Classification accuracy increased with the number of channels, ranging from 30% with 4 channels to 98% with 20 channels, both for the female voice. To determine asymptotic performance, LDA classification scores were plotted against the number of channels and fitted with quadratic equations. The number of channels at which no further improvement occurred was determined, averaging 19 across all conditions with little variation. This number of channels seems to resolve the frequency range spanned by the first three formants finely enough to maximize vowel classification. This resolution may not be achieved using six or eight channels as previously proposed. [Work supported by NIH.
Multichannel Compression: Effects of Reduced Spectral Contrast on Vowel Identification
ERIC Educational Resources Information Center
Bor, Stephanie; Souza, Pamela; Wright, Richard
2008-01-01
Purpose: To clarify if large numbers of wide dynamic range compression channels provide advantages for vowel identification and to measure its acoustic effects. Methods: Eight vowels produced by 12 talkers in the /hVd/ context were compressed using 1, 2, 4, 8, and 16 channels. Formant contrast indices (mean formant peak minus mean formant trough;…
Cameron, Sharon; Chong-White, Nicky; Mealings, Kiri; Beechey, Tim; Dillon, Harvey; Young, Taegan
2018-02-01
Previous research suggests that a proportion of children experiencing reading and listening difficulties may have an underlying primary deficit in the way that the central auditory nervous system analyses the perceptually important, rapidly varying, formant frequency components of speech. The Phoneme Identification Test (PIT) was developed to investigate the ability of children to use spectro-temporal cues to perceptually categorize speech sounds based on their rapidly changing formant frequencies. The PIT uses an adaptive two-alternative forced-choice procedure whereby the participant identifies a synthesized consonant-vowel (CV) (/ba/ or /da/) syllable. CV syllables differed only in the second formant (F2) frequency along an 11-step continuum (between 0% and 100%-representing an ideal /ba/ and /da/, respectively). The CV syllables were presented in either quiet (PIT Q) or noise at a 0 dB signal-to-noise ratio (PIT N). Development of the PIT stimuli and test protocols, and collection of normative and test-retest reliability data. Twelve adults (aged 23 yr 10 mo to 50 yr 9 mo, mean 32 yr 5 mo) and 137 typically developing, primary-school children (aged 6 yr 0 mo to 12 yr 4 mo, mean 9 yr 3 mo). There were 73 males and 76 females. Data were collected using a touchscreen computer. Psychometric functions were automatically fit to individual data by the PIT software. Performance was determined by the width of the continuum for which responses were neither clearly /ba/ nor /da/ (referred to as the uncertainty region [UR]). A shallower psychometric function slope reflected greater uncertainty. Age effects were determined based on raw scores. Z scores were calculated to account for the effect of age on performance. Outliers, and individual data for which the confidence interval of the UR exceeded a maximum allowable value, were removed. Nonparametric tests were used as the data were skewed toward negative performance. Across participants, the median value of the F2 range that resulted in uncertain responses was 33% in quiet and 40% in noise. There was a significant effect of age on the width of this UR (p < 0.00001) in both quiet and noise, with performance becoming adult like by age 9 on the PIT Q and age 10 on the PIT N. A skewed distribution toward negative performance occurred in both quiet (p = 0.01) and noise (p = 0.006). Median UR scores were significantly wider in noise than in quiet (T = 2041, p < 0.0000001). Performance (z scores) across the two tests was significantly correlated (r = 0.36, p = 0.000009). Test-retest z scores were significantly correlated in both quiet and noise (r = 0.4 and 0.37, respectively, p < 0.0001). The PIT normative data show that the ability to identify phonemes based on changes in formant transitions improves with age, and that some children in the general population have performance much worse than their age peers. In children, uncertainty increases when the stimuli are presented in noise. The test is suitable for use in planned studies in a clinical population. American Academy of Audiology
Effect of cognitive load on articulation rate and formant frequencies during simulator flights.
Huttunen, Kerttu H; Keränen, Heikki I; Pääkkönen, Rauno J; Päivikki Eskelinen-Rönkä, R; Leino, Tuomo K
2011-03-01
It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings. © 2011 Acoustical Society of America
Flow Glottogram Characteristics and Perceived Degree of Phonatory Pressedness.
Millgård, Moa; Fors, Tobias; Sundberg, Johan
2016-05-01
Phonatory pressedness is a clinically relevant aspect of voice, which generally is analyzed by auditory perception. The present investigation aimed at identifying voice source and formant characteristics related to experts' ratings of phonatory pressedness. Experimental study of the relations between visual analog scale ratings of phonatory pressedness and voice source parameters in healthy voices. Audio, electroglottogram, and subglottal pressure, estimated from oral pressure during /p/ occlusion, were recorded from five female and six male subjects, each of whom deliberately varied phonation type between neutral, flow, and pressed in the syllable /pae/, produced at three loudness levels and three pitches. Speech-language pathologists rated, along a visual analog scale, the degree of perceived phonatory pressedness in these samples. The samples were analyzed by means of inverse filtering with regard to closed quotient, dominance of the voice source fundamental, normalized amplitude quotient, peak-to-peak flow amplitude, as well as formant frequencies and the alpha ratio of spectrum energy above and below 1000 Hz. The results were compared with the rating data, which showed that the ratings were closely related to voice source parameters. Approximately, 70% of the variance of the ratings could be explained by the voice source parameters. A multiple linear regression analysis suggested that perceived phonatory pressedness is related most closely to subglottal pressure, closed quotient, and the two lowest formants. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Phonetic Modification of Vowel Space in Storybook Speech to Infants up to 2 Years of Age
Burnham, Evamarie B.; Wieland, Elizabeth A.; Kondaurova, Maria V.; McAuley, J. Devin; Bergeson, Tonya R.
2015-01-01
Purpose A large body of literature has indicated vowel space area expansion in infant-directed (ID) speech compared with adult-directed (AD) speech, which may promote language acquisition. The current study tested whether this expansion occurs in storybook speech read to infants at various points during their first 2 years of life. Method In 2 studies, mothers read a storybook containing target vowels in ID and AD speech conditions. Study 1 was longitudinal, with 11 mothers recorded when their infants were 3, 6, and 9 months old. Study 2 was cross-sectional, with 48 mothers recorded when their infants were 3, 9, 13, or 20 months old (n = 12 per group). The 1st and 2nd formants of vowels /i/, /ɑ/, and /u/ were measured, and vowel space area and dispersion were calculated. Results Across both studies, 1st and/or 2nd formant frequencies shifted systematically for /i/ and /u/ vowels in ID compared with AD speech. No difference in vowel space area or dispersion was found. Conclusions The results suggest that a variety of communication and situational factors may affect phonetic modifications in ID speech, but that vowel space characteristics in speech to infants stay consistent across the first 2 years of life. PMID:25659121
Vowel reduction across tasks for male speakers of American English.
Kuo, Christina; Weismer, Gary
2016-07-01
This study examined acoustic variation of vowels within speakers across speech tasks. The overarching goal of the study was to understand within-speaker variation as one index of the range of normal speech motor behavior for American English vowels. Ten male speakers of American English performed four speech tasks including citation form sentence reading with a clear-speech style (clear-speech), citation form sentence reading (citation), passage reading (reading), and conversational speech (conversation). Eight monophthong vowels in a variety of consonant contexts were studied. Clear-speech was operationally defined as the reference point for describing variation. Acoustic measures associated with the conventions of vowel targets were obtained and examined. These included temporal midpoint formant frequencies for the first three formants (F1, F2, and F3) and the derived Euclidean distances in the F1-F2 and F2-F3 planes. Results indicated that reduction toward the center of the F1-F2 and F2-F3 planes increased in magnitude across the tasks in the order of clear-speech, citation, reading, and conversation. The cross-task variation was comparable for all speakers despite fine-grained individual differences. The characteristics of systematic within-speaker acoustic variation across tasks have potential implications for the understanding of the mechanisms of speech motor control and motor speech disorders.
Obstructive Sleep Apnea in Women: Study of Speech and Craniofacial Characteristics
Tyan, Marina; Fernández Pozo, Rubén; Toledano, Doroteo; Lopez Gonzalo, Eduardo; Alcazar Ramirez, Jose Daniel; Hernandez Gomez, Luis Alfonso
2017-01-01
Background Obstructive sleep apnea (OSA) is a common sleep disorder characterized by frequent cessation of breathing lasting 10 seconds or longer. The diagnosis of OSA is performed through an expensive procedure, which requires an overnight stay at the hospital. This has led to several proposals based on the analysis of patients’ facial images and speech recordings as an attempt to develop simpler and cheaper methods to diagnose OSA. Objective The objective of this study was to analyze possible relationships between OSA and speech and facial features on a female population and whether these possible connections may be affected by the specific clinical characteristics in OSA population and, more specifically, to explore how the connection between OSA and speech and facial features can be affected by gender. Methods All the subjects are Spanish subjects suspected to suffer from OSA and referred to a sleep disorders unit. Voice recordings and photographs were collected in a supervised but not highly controlled way, trying to test a scenario close to a realistic clinical practice scenario where OSA is assessed using an app running on a mobile device. Furthermore, clinical variables such as weight, height, age, and cervical perimeter, which are usually reported as predictors of OSA, were also gathered. Acoustic analysis is centered in sustained vowels. Facial analysis consists of a set of local craniofacial features related to OSA, which were extracted from images after detecting facial landmarks by using the active appearance models. To study the probable OSA connection with speech and craniofacial features, correlations among apnea-hypopnea index (AHI), clinical variables, and acoustic and facial measurements were analyzed. Results The results obtained for female population indicate mainly weak correlations (r values between .20 and .39). Correlations between AHI, clinical variables, and speech features show the prevalence of formant frequencies over bandwidths, with F2/i/ being the most appropriate formant frequency for OSA prediction in women. Results obtained for male population indicate mainly very weak correlations (r values between .01 and .19). In this case, bandwidths prevail over formant frequencies. Correlations between AHI, clinical variables, and craniofacial measurements are very weak. Conclusions In accordance with previous studies, some clinical variables are found to be good predictors of OSA. Besides, strong correlations are found between AHI and some clinical variables with speech and facial features. Regarding speech feature, the results show the prevalence of formant frequency F2/i/ over the rest of features for the female population as OSA predictive feature. Although the correlation reported is weak, this study aims to find some traces that could explain the possible connection between OSA and speech in women. In the case of craniofacial measurements, results evidence that some features that can be used for predicting OSA in male patients are not suitable for testing female population. PMID:29109068
Mi, Lin; Tao, Sha; Wang, Wenjing; Dong, Qi; Guan, Jingjing; Liu, Chang
2016-03-01
The purpose of this study was to examine the relationship between English vowel identification and English vowel formant discrimination for native Mandarin Chinese- and native English-speaking listeners. The identification of 12 English vowels was measured with the duration cue preserved or removed. The thresholds of vowel formant discrimination on the F2 of two English vowels,/Λ/and/i/, were also estimated using an adaptive-tracking procedure. Native Mandarin Chinese-speaking listeners showed significantly higher thresholds of vowel formant discrimination and lower identification scores than native English-speaking listeners. The duration effect on English vowel identification was similar between native Mandarin Chinese- and native English-speaking listeners. Moreover, regardless of listeners' language background, vowel identification was significantly correlated with vowel formant discrimination for the listeners who were less dependent on duration cues, whereas the correlation between vowel identification and vowel formant discrimination was not significant for the listeners who were highly dependent on duration cues. This study revealed individual variability in using multiple acoustic cues to identify English vowels for both native and non-native listeners. Copyright © 2016 Elsevier B.V. All rights reserved.
Speaker Recognition Through NLP and CWT Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
Speaker recognition through NLP and CWT modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
ERIC Educational Resources Information Center
Masapollo, Matthew; Polka, Linda; Ménard, Lucie
2016-01-01
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to…
Sapira, J D
1995-09-01
Egophony is a change in timbre (Ee to A) but not pitch or volume. It is due to a decrease in the amplitude and an increase in the frequency [corrected] of the second formant, produced by solid (including compressed lung) interposed between the resonator and the stethoscope head. This explains certain difficulties in learning this valuable but currently neglected sign as well as in understanding certain physiologic false-positive occurrences.
Dong, Li; Kong, Jiangping; Sundberg, Johan
2014-07-01
Long-term-average spectrum (LTAS) characteristics were analyzed for ten Kunqu Opera singers, two in each of five roles. Each singer performed singing, stage speech, and conversational speech. Differences between the roles and between their performances of these three conditions are examined. After compensating for Leq difference LTAS characteristics still differ between the roles but are similar for the three conditions, especially for Colorful face (CF) and Old man roles, and especially between reading and singing. The curves show no evidence of a singer's formant cluster peak, but the CF role demonstrates a speaker's formant peak near 3 kHz. The LTAS characteristics deviate markedly from non-singers' standard conversational speech as well as from those of Western opera singing.
Effect of artificially lengthened vocal tract on vocal fold oscillation's fundamental frequency.
Hanamitsu, Masakazu; Kataoka, Hideyuki
2004-06-01
The fundamental frequency of vocal fold oscillation (F(0)) is controlled by laryngeal mechanics and aerodynamic properties. F(0) change per unit change of transglottal pressure (dF/dP) using a shutter valve has been studied and found to have nonlinear, V-shaped relationship with F(0). On the other hand, the vocal tract is also known to affect vocal fold oscillation. This study examined the effect of artificially lengthened vocal tract length on dF/dP. dF/dP was measured in six men using two mouthpieces of different lengths. The dF/dP graph for the longer vocal tract was shifted leftward relative to the shorter one. Using the one-mass model, the nadir of the "V" on the dF/dP graph was strongly influenced by the resonance around the first formant frequency. However, a more precise model is needed to account for the effects of viscosity and turbulence.
Vowel category dependence of the relationship between palate height, tongue height, and oral area.
Hasegawa-Johnson, Mark; Pizza, Shamala; Alwan, Abeer; Cha, Jul Setsu; Haker, Katherine
2003-06-01
This article evaluates intertalker variance of oral area, logarithm of the oral area, tongue height, and formant frequencies as a function of vowel category. The data consist of coronal magnetic resonance imaging (MRI) sequences and acoustic recordings of 5 talkers, each producing 11 different vowels. Tongue height (left, right, and midsagittal), palate height, and oral area were measured in 3 coronal sections anterior to the oropharyngeal bend and were subjected to multivariate analysis of variance, variance ratio analysis, and regression analysis. The primary finding of this article is that oral area (between palate and tongue) showed less intertalker variance during production of vowels with an oral place of articulation (palatal and velar vowels) than during production of vowels with a uvular or pharyngeal place of articulation. Although oral area variance is place dependent, percentage variance (log area variance) is not place dependent. Midsagittal tongue height in the molar region was positively correlated with palate height during production of palatal vowels, but not during production of nonpalatal vowels. Taken together, these results suggest that small oral areas are characterized by relatively talker-independent vowel targets and that meeting these talker-independent targets is important enough that each talker adjusts his or her own tongue height to compensate for talker-dependent differences in constriction anatomy. Computer simulation results are presented to demonstrate that these results may be explained by an acoustic control strategy: When talkers with very different anatomical characteristics try to match talker-independent formant targets, the resulting area variances are minimized near the primary vocal tract constriction.
Snoring classified: The Munich-Passau Snore Sound Corpus.
Janott, Christoph; Schmitt, Maximilian; Zhang, Yue; Qian, Kun; Pandit, Vedhas; Zhang, Zixing; Heiser, Clemens; Hohenhorst, Winfried; Herzog, Michael; Hemmert, Werner; Schuller, Björn
2018-03-01
Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kim, Yunjung; Weismer, Gary; Kent, Ray D.
2005-09-01
In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.
How musical expertise shapes speech perception: evidence from auditory classification images.
Varnet, Léo; Wang, Tianyun; Peter, Chloe; Meunier, Fanny; Hoen, Michel
2015-09-24
It is now well established that extensive musical training percolates to higher levels of cognition, such as speech processing. However, the lack of a precise technique to investigate the specific listening strategy involved in speech comprehension has made it difficult to determine how musicians' higher performance in non-speech tasks contributes to their enhanced speech comprehension. The recently developed Auditory Classification Image approach reveals the precise time-frequency regions used by participants when performing phonemic categorizations in noise. Here we used this technique on 19 non-musicians and 19 professional musicians. We found that both groups used very similar listening strategies, but the musicians relied more heavily on the two main acoustic cues, at the first formant onset and at the onsets of the second and third formants onsets. Additionally, they responded more consistently to stimuli. These observations provide a direct visualization of auditory plasticity resulting from extensive musical training and shed light on the level of functional transfer between auditory processing and speech perception.
Auditory Cortex Processes Variation in Our Own Speech
Sitek, Kevin R.; Mathalon, Daniel H.; Roach, Brian J.; Houde, John F.; Niziolek, Caroline A.; Ford, Judith M.
2013-01-01
As we talk, we unconsciously adjust our speech to ensure it sounds the way we intend it to sound. However, because speech production involves complex motor planning and execution, no two utterances of the same sound will be exactly the same. Here, we show that auditory cortex is sensitive to natural variations in self-produced speech from utterance to utterance. We recorded event-related potentials (ERPs) from ninety-nine subjects while they uttered “ah” and while they listened to those speech sounds played back. Subjects' utterances were sorted based on their formant deviations from the previous utterance. Typically, the N1 ERP component is suppressed during talking compared to listening. By comparing ERPs to the least and most variable utterances, we found that N1 was less suppressed to utterances that differed greatly from their preceding neighbors. In contrast, an utterance's difference from the median formant values did not affect N1. Trial-to-trial pitch (f0) deviation and pitch difference from the median similarly did not affect N1. We discuss mechanisms that may underlie the change in N1 suppression resulting from trial-to-trial formant change. Deviant utterances require additional auditory cortical processing, suggesting that speaking-induced suppression mechanisms are optimally tuned for a specific production. PMID:24349399
On pure word deafness, temporal processing, and the left hemisphere.
Stefanatos, Gerry A; Gershkoff, Arthur; Madigan, Sean
2005-07-01
Pure word deafness (PWD) is a rare neurological syndrome characterized by severe difficulties in understanding and reproducing spoken language, with sparing of written language comprehension and speech production. The pathognomonic disturbance of auditory comprehension appears to be associated with a breakdown in processes involved in mapping auditory input to lexical representations of words, but the functional locus of this disturbance and the localization of the responsible lesion have long been disputed. We report here on a woman with PWD resulting from a circumscribed unilateral infarct involving the left superior temporal lobe who demonstrated significant problems processing transitional spectrotemporal cues in both speech and nonspeech sounds. On speech discrimination tasks, she exhibited poor differentiation of stop consonant-vowel syllables distinguished by voicing onset and brief formant frequency transitions. Isolated formant transitions could be reliably discriminated only at very long durations (> 200 ms). By contrast, click fusion threshold, which depends on millisecond-level resolution of brief auditory events, was normal. These results suggest that the problems with speech analysis in this case were not secondary to general constraints on auditory temporal resolution. Rather, they point to a disturbance of left hemisphere auditory mechanisms that preferentially analyze rapid spectrotemporal variations in frequency. The findings have important implications for our conceptualization of PWD and its subtypes.
An acoustic study of nasal consonants in three Central Australian languages.
Tabain, Marija; Butcher, Andrew; Breen, Gavan; Beare, Richard
2016-02-01
This study presents nasal consonant data from 21 speakers of three Central Australian languages: Arrernte, Pitjantjatjara and Warlpiri. The six nasals considered are bilabial /m/, dental /n/, alveolar /n/, retroflex /ɳ/, alveo-palatal /ɲ/, and velar /ŋ/. Nasal formant and bandwidth values are examined, as are the locations of spectral minima. Several differences are found between the bilabial /m/ and the velar /ŋ/, and also the palatal /ɲ/. The remaining coronal nasals /n n ɳ/ are not well differentiated within the nasal murmur, but their average bandwidths are lower than for the other nasal consonants. Broader spectral shape measures (Centre of Gravity and Standard Deviation) are also considered, and comparisons are made with data for stops and laterals in these languages based on the same spectral measures. It is suggested that nasals are not as easily differentiated using the various measures examined here as are stops and laterals. It is also suggested that existing models of nasal consonants do not fully account for the observed differences between the various nasal places of articulation; and that oral formants, in addition to anti-formants, contribute substantially to the output spectrum of nasal consonants.
2017-01-01
In three experiments, we asked whether diverse scripts contain interpretable information about the speech sounds they represent. When presented with a pair of unfamiliar letters, adult readers correctly guess which is /i/ (the ‘ee’ sound in ‘feet’), and which is /u/ (the ‘oo’ sound in ‘shoe’) at rates higher than expected by chance, as shown in a large sample of Singaporean university students (Experiment 1) and replicated in a larger sample of international Internet users (Experiment 2). To uncover what properties of the letters contribute to different scripts' ‘guessability,’ we analysed the visual spatial frequencies in each letter (Experiment 3). We predicted that the lower spectral frequencies in the formants of the vowel /u/ would pattern with lower spatial frequencies in the corresponding letters. Instead, we found that across all spatial frequencies, the letter with more black/white cycles (i.e. more ink) was more likely to be guessed as /u/, and the larger the difference between the glyphs in a pair, the higher the script's guessability. We propose that diverse groups of humans across historical time and geographical space tend to employ similar iconic strategies for representing speech in visual form, and provide norms for letter pairs from 56 diverse scripts. PMID:28989784
Body height, immunity, facial and vocal attractiveness in young men.
Skrinda, Ilona; Krama, Tatjana; Kecko, Sanita; Moore, Fhionna R; Kaasik, Ants; Meija, Laila; Lietuvietis, Vilnis; Rantala, Markus J; Krams, Indrikis
2014-12-01
Health, facial and vocal attributes and body height of men may affect a diverse range of social outcomes such as attractiveness to potential mates and competition for resources. Despite evidence that each parameter plays a role in mate choice, the relative role of each and inter-relationships between them, is still poorly understood. In this study, we tested relationships both between these parameters and with testosterone and immune function. We report positive relationships between testosterone with facial masculinity and attractiveness, and we found that facial masculinity predicted facial attractiveness and antibody response to a vaccine. Moreover, the relationship between antibody response to a hepatitis B vaccine and body height was found to be non-linear, with a positive relationship up to a height of 188 cm, but an inverse relationship in taller men. We found that vocal attractiveness was dependent upon vocal masculinity. The relationship between vocal attractiveness and body height was also non-linear, with a positive relationship of up to 178 cm, which then decreased in taller men. We did not find a significant relationship between body height and the fundamental frequency of vowel sounds provided by young men, while body height negatively correlated with the frequency of second formant. However, formant frequency was not associated with the strength of immune response. Our results demonstrate the potential of vaccination research to reveal costly traits that govern evolution of mate choice in humans and the importance of trade-offs among these traits.
Body height, immunity, facial and vocal attractiveness in young men
NASA Astrophysics Data System (ADS)
Skrinda, Ilona; Krama, Tatjana; Kecko, Sanita; Moore, Fhionna R.; Kaasik, Ants; Meija, Laila; Lietuvietis, Vilnis; Rantala, Markus J.; Krams, Indrikis
2014-12-01
Health, facial and vocal attributes and body height of men may affect a diverse range of social outcomes such as attractiveness to potential mates and competition for resources. Despite evidence that each parameter plays a role in mate choice, the relative role of each and inter-relationships between them, is still poorly understood. In this study, we tested relationships both between these parameters and with testosterone and immune function. We report positive relationships between testosterone with facial masculinity and attractiveness, and we found that facial masculinity predicted facial attractiveness and antibody response to a vaccine. Moreover, the relationship between antibody response to a hepatitis B vaccine and body height was found to be non-linear, with a positive relationship up to a height of 188 cm, but an inverse relationship in taller men. We found that vocal attractiveness was dependent upon vocal masculinity. The relationship between vocal attractiveness and body height was also non-linear, with a positive relationship of up to 178 cm, which then decreased in taller men. We did not find a significant relationship between body height and the fundamental frequency of vowel sounds provided by young men, while body height negatively correlated with the frequency of second formant. However, formant frequency was not associated with the strength of immune response. Our results demonstrate the potential of vaccination research to reveal costly traits that govern evolution of mate choice in humans and the importance of trade-offs among these traits.
Obstructive Sleep Apnea in Women: Study of Speech and Craniofacial Characteristics.
Tyan, Marina; Espinoza-Cuadros, Fernando; Fernández Pozo, Rubén; Toledano, Doroteo; Lopez Gonzalo, Eduardo; Alcazar Ramirez, Jose Daniel; Hernandez Gomez, Luis Alfonso
2017-11-06
Obstructive sleep apnea (OSA) is a common sleep disorder characterized by frequent cessation of breathing lasting 10 seconds or longer. The diagnosis of OSA is performed through an expensive procedure, which requires an overnight stay at the hospital. This has led to several proposals based on the analysis of patients' facial images and speech recordings as an attempt to develop simpler and cheaper methods to diagnose OSA. The objective of this study was to analyze possible relationships between OSA and speech and facial features on a female population and whether these possible connections may be affected by the specific clinical characteristics in OSA population and, more specifically, to explore how the connection between OSA and speech and facial features can be affected by gender. All the subjects are Spanish subjects suspected to suffer from OSA and referred to a sleep disorders unit. Voice recordings and photographs were collected in a supervised but not highly controlled way, trying to test a scenario close to a realistic clinical practice scenario where OSA is assessed using an app running on a mobile device. Furthermore, clinical variables such as weight, height, age, and cervical perimeter, which are usually reported as predictors of OSA, were also gathered. Acoustic analysis is centered in sustained vowels. Facial analysis consists of a set of local craniofacial features related to OSA, which were extracted from images after detecting facial landmarks by using the active appearance models. To study the probable OSA connection with speech and craniofacial features, correlations among apnea-hypopnea index (AHI), clinical variables, and acoustic and facial measurements were analyzed. The results obtained for female population indicate mainly weak correlations (r values between .20 and .39). Correlations between AHI, clinical variables, and speech features show the prevalence of formant frequencies over bandwidths, with F2/i/ being the most appropriate formant frequency for OSA prediction in women. Results obtained for male population indicate mainly very weak correlations (r values between .01 and .19). In this case, bandwidths prevail over formant frequencies. Correlations between AHI, clinical variables, and craniofacial measurements are very weak. In accordance with previous studies, some clinical variables are found to be good predictors of OSA. Besides, strong correlations are found between AHI and some clinical variables with speech and facial features. Regarding speech feature, the results show the prevalence of formant frequency F2/i/ over the rest of features for the female population as OSA predictive feature. Although the correlation reported is weak, this study aims to find some traces that could explain the possible connection between OSA and speech in women. In the case of craniofacial measurements, results evidence that some features that can be used for predicting OSA in male patients are not suitable for testing female population. ©Marina Tyan, Fernando Espinoza-Cuadros, Rubén Fernández Pozo, Doroteo Toledano, Eduardo Lopez Gonzalo, Jose Daniel Alcazar Ramirez, Luis Alfonso Hernandez Gomez. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 06.11.2017.
Jafari, Narges; Drinnan, Michael; Mohamadi, Reyhane; Yadegari, Fariba; Nourbakhsh, Mandana; Torabinezhad, Farhad
2016-05-01
Normal-hearing (NH) acuity and auditory feedback control are crucial for human voice production and articulation. The lack of auditory feedback in individuals with profound hearing impairment changes their vowel production. The purpose of this study was to compare Persian vowel production in deaf children with cochlear implants (CIs) and that in NH children. The participants were 20 children (12 girls and 8 boys) with age range of 5 years; 1 month to 9 years. All patients had congenital hearing loss and received a multichannel CI at an average age of 3 years. They had at least 6 months experience of their current device (CI). The control group consisted of 20 NH children (12 girls and 8 boys) with age range of 5 to 9 years old. The two groups were matched by age. Participants were native Persian speakers who were asked to produce the vowels /i/, /e/, /ӕ/, /u/, /o/, and /a/. The averages for first formant frequency (F1) and second formant frequency (F2) of six vowels were measured using Praat software (Version 5.1.44, Boersma & Weenink, 2012). The independent samples t test was conducted to assess the differences in F1 and F2 values and the area of the vowel space between the two groups. Mean values of F1 were increased in CI children; the mean values of F1 for vowel /i/ and /a/, F2 for vowel /a/ and /o/ were significantly different (P < 0.05). The changes in F1 and F2 showed a centralized vowel space for CI children. F1 is increased in CI children, probably because CI children tend to overarticulate. We hypothesis this is due to a lack of auditory feedback; there is an attempt by hearing-impaired children to compensate via proprioceptive feedback during articulatory process. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Modelling category goodness judgments in children with residual sound errors.
Dugan, Sarah Hamilton; Silbert, Noah; McAllister, Tara; Preston, Jonathan L; Sotto, Carolyn; Boyce, Suzanne E
2018-05-24
This study investigates category goodness judgments of /r/ in adults and children with and without residual speech errors (RSEs) using natural speech stimuli. Thirty adults, 38 children with RSE (ages 7-16) and 35 age-matched typically developing (TD) children provided category goodness judgments on whole words, recorded from 27 child speakers, with /r/ in various phonetic environments. The salient acoustic property of /r/ - the lowered third formant (F3) - was normalized in two ways. A logistic mixed-effect model quantified the relationships between listeners' responses and the third formant frequency, vowel context and clinical group status. Goodness judgments from the adult group showed a statistically significant interaction with the F3 parameter when compared to both child groups (p < 0.001) using both normalization methods. The RSE group did not differ significantly from the TD group in judgments of /r/. All listeners were significantly more likely to judge /r/ as correct in a front-vowel context. Our results suggest that normalized /r/ F3 is a statistically significant predictor of category goodness judgments for both adults and children, but children do not appear to make adult-like judgments. Category goodness judgments do not have a clear relationship with /r/ production abilities in children with RSE. These findings may have implications for clinical activities that include category goodness judgments in natural speech, especially for recorded productions.
The "Overdrive" Mode in the "Complete Vocal Technique": A Preliminary Study.
Sundberg, Johan; Bitelli, Maddalena; Holmberg, Annika; Laaksonen, Ville
2017-09-01
"Complete Vocal Technique," or CVT, is an internationally widespread method for teaching voice. It classifies voicing into four types, referred to as "vocal modes," one of which is called "Overdrive." The physiological correlates of these types are unclear. This study presents an attempt to analyze its voice source and formant frequency characteristics. A male and a female expert of CVT sang a set of "Overdrive" and falsetto tones on the syllable /pᴂ/. The voice source could be analyzed by inverse filtering in the case of the male subject. Results showed that subglottal pressure, measured as the oral pressure during /p/ occlusion, was low in falsetto and high in "Overdrive", and it was strongly correlated with each of the voice source parameters. These correlations could be described in terms of equations. The deviations from these equations of the different voice source parameters for the various voice samples suggested that "Overdrive" phonation was produced with stronger vocal fold adduction than the falsetto tones. Further, the subject was also found to tune the first formant to the second partial in "Overdrive" tones. The results support the conclusion that the method used, to compensate for the influence of subglottal pressure on the voice source, seems promising to use for analyses of other CVT vocal modes and also for other types of phonation. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Gilichinskaya, Yana D.; Hisagi, Miwako; Law, Franzo F.; Berkowitz, Shari; Ito, Kikuyo
2005-04-01
Contextual variability of vowels in three languages with large vowel inventories was examined previously. Here, variability of vowels in two languages with small inventories (Russian, Japanese) was explored. Vowels were produced by three female speakers of each language in four contexts: (Vba) disyllables and in 3-syllable nonsense words (gaC1VC2a) embedded within carrier sentences; contexts included bilabial stops (bVp) in normal rate sentences and alveolar stops (dVt) in both normal and rapid rate sentences. Dependent variables were syllable durations and formant frequencies at syllable midpoint. Results showed very little variation across consonant and rate conditions in formants for /i/ in both languages. Japanese short /u, o, a/ showed fronting (F2 increases) in alveolar context relative to labial context (1.3-2.0 Barks), which was more pronounced in rapid sentences. Fronting of Japanese long vowels was less pronounced (0.3 to 0.9 Barks). Japanese long/short vowel ratios varied with speaking style (syllables versus sentences) and speaking rate. All Russian vowels except /i/ were fronted in alveolar vs labial context (1.1-3.1 Barks) but showed little change in either spectrum or duration with speaking rate. Comparisons of these patterns of variability with American English, French and German vowel results will be discussed.
Reby, D; Wyman, M T; Frey, R; Passilongo, D; Gilbert, J; Locatelli, Y; Charlton, B D
2016-04-15
With an average male body mass of 320 kg, the wapiti, ITALIC! Cervus canadensis, is the largest extant species of Old World deer (Cervinae). Despite this large body size, male wapiti produce whistle-like sexual calls called bugles characterised by an extremely high fundamental frequency. Investigations of the biometry and physiology of the male wapiti's relatively large larynx have so far failed to account for the production of such a high fundamental frequency. Our examination of spectrograms of male bugles suggested that the complex harmonic structure is best explained by a dual-source model (biphonation), with one source oscillating at a mean of 145 Hz (F0) and the other oscillating independently at an average of 1426 Hz (G0). A combination of anatomical investigations and acoustical modelling indicated that the F0 of male bugles is consistent with the vocal fold dimensions reported in this species, whereas the secondary, much higher source at G0 is more consistent with an aerodynamic whistle produced as air flows rapidly through a narrow supraglottic constriction. We also report a possible interaction between the higher frequency G0 and vocal tract resonances, as G0 transiently locks onto individual formants as the vocal tract is extended. We speculate that male wapiti have evolved such a dual-source phonation to advertise body size at close range (with a relatively low-frequency F0 providing a dense spectrum to highlight size-related information contained in formants) while simultaneously advertising their presence over greater distances using the very high-amplitude G0 whistle component. © 2016. Published by The Company of Biologists Ltd.
Zhou, Xinhui; Espy-Wilson, Carol Y.; Boyce, Suzanne; Tiede, Mark; Holland, Christy; Choe, Ann
2008-01-01
Speakers of rhotic dialects of North American English show a range of different tongue configurations for ∕r∕. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). “A dialect study of American English r’s by x-ray motion picture,” Linguistics 44, 28–69; Westbury, J. R. et al. (1998), “Differences among speakers in lingual articulation for American English ∕r∕,” Speech Commun. 26, 203–206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of “retroflex” ∕r∕ and “bunched” ∕r∕ show similar patterns of F1–F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4∕F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4∕F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. PMID:18537397
Torsional resonance frequency analysis: a novel method for assessment of dental implant stability.
Tang, Yu-Long; Li, Bing; Jin, Wei; Li, De-Hua
2015-06-01
To establish and experimentally validate a novel resonance frequency analysis (RFA) method for measurement of dental implant stability by analyzing torsional resonance frequency (TRF). A numerical study and in vitro measurements were performed to evaluate the feasibility and reliability of the method of torsional RFA (T-RFA) using a T-shaped bilateral cantilever beam transducer. The sensitivity of this method was assessed by measuring the TRFs of dental implants with 8 sizes of T-shaped transducers during polymerization, which simulated the process of bone healing around an implant. The TRFs of the test implants detected using this new method and the bending resonance frequencies (BRFs) measured by Osstell(®) ISQ were compared. TRFs and BRFs on implant models in polymethyl methacrylate (PMMA) blocks with three exposure heights were also measured to assess the specificity of this method. Finite element analysis showed two bending modes (5333 and 6008 Hz) following a torsional mode (8992 Hz) in the lower rank frequency. During in vitro measurements, a bending formant (mean 6075 Hz) and a torsional formant (mean 10225 Hz) appeared, which were verified by multipoint measurement with invariable excitation frequency in the laboratory. In the self-curing resin experiments, the average growth rate at all time points of TRFs using the new method with Transducer II was 2.36% and that of BRFs using Osstell(®) ISQ was 1.97%. In the implant exposure height tests, the mean declined rate of TRFs was 2.06% and that of BRFs using Osstell(®) ISQ was 12.34%. A novel method for assessment of implant stability through TRF was established using a T-shape transducer, which showed high reliability and sensibility. The method alleviated the effects of implant exposure height on the measurements compared with Osstell(®) ISQ. The application of T-RFA represents another way in the investigation of dental implant osseointegration. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Objective and subjective assessment of tracheoesophageal prosthesis voice outcome.
D'Alatri, Lucia; Bussu, Francesco; Scarano, Emanuele; Paludetti, Gaetano; Marchese, Maria Raffaella
2012-09-01
To investigate the relationships between objective measures and the results of subjective assessment of voice quality and speech intelligibility in patients submitted to total laryngectomy and tracheoesophageal (TE) puncture. Retrospective. Twenty patients implanted with voice prosthesis were studied. After surgery, the entire sample performed speech rehabilitation. The assessment protocol included maximum phonation time (MPT), number of syllables per deep breath, acoustic analysis of the sustained vowel /a/ and of a bisyllabic word, perceptual evaluation (pleasantness and intelligibility%), and self-assessment. The correlation between pleasantness and intelligibility% was statistically significant. Both the latter were significantly correlated with the acoustic signal type, the number of formant peaks, and the F2-F1 difference. The intelligibility% and number of formant peaks were significantly correlated with the MPT and number of syllables per deep breath. Moreover, significant correlations were found between the number of formant peaks and both intelligibility% and pleasantness. The higher the number of syllables per deep breath and the longer the MPT, significantly higher was the number of formant peaks and the intelligibility%. The study failed to show significant correlation between patient's self-assessment of voice quality and both pleasantness and communication effectiveness. The multidimensional assessment seems to be a reliable tool to evaluate the TE functional outcome. Particularly, the results showed that both pleasantness and intelligibility of TE speech are correlated to the availability of expired air and the function of the vocal tract. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
OVERLAP OF HEARING AND VOICING RANGES IN SINGING
Hunter, Eric J.; Titze, Ingo R.
2008-01-01
Frequency and intensity ranges in voice production by trained and untrained singers were superimposed onto the average normal human hearing range. The vocal output for all subjects was shown both in Voice Range Profiles and Spectral Level Profiles. Trained singers took greater advantage of the dynamic range of the auditory system with harmonic energy (45% of the hearing range compared to 38% for untrained vocalists). This difference seemed to come from the trained singers ablily to exploit the most sensitive part of the hearing range (around 3 to 4 kHz) through the use of the singer’s formant. The trained vocalists’ average maximum third-octave spectral band level was 95 dB SPL, compared to 80 dB SPL for untrained. PMID:19844607
OVERLAP OF HEARING AND VOICING RANGES IN SINGING.
Hunter, Eric J; Titze, Ingo R
2005-04-01
Frequency and intensity ranges in voice production by trained and untrained singers were superimposed onto the average normal human hearing range. The vocal output for all subjects was shown both in Voice Range Profiles and Spectral Level Profiles. Trained singers took greater advantage of the dynamic range of the auditory system with harmonic energy (45% of the hearing range compared to 38% for untrained vocalists). This difference seemed to come from the trained singers ablily to exploit the most sensitive part of the hearing range (around 3 to 4 kHz) through the use of the singer's formant. The trained vocalists' average maximum third-octave spectral band level was 95 dB SPL, compared to 80 dB SPL for untrained.
Acoustic evolution of old Italian violins from Amati to Stradivari.
Tai, Hwan-Ching; Shen, Yen-Ping; Lin, Jer-Horng; Chung, Dai-Ting
2018-06-05
The shape and design of the modern violin are largely influenced by two makers from Cremona, Italy: The instrument was invented by Andrea Amati and then improved by Antonio Stradivari. Although the construction methods of Amati and Stradivari have been carefully examined, the underlying acoustic qualities which contribute to their popularity are little understood. According to Geminiani, a Baroque violinist, the ideal violin tone should "rival the most perfect human voice." To investigate whether Amati and Stradivari violins produce voice-like features, we recorded the scales of 15 antique Italian violins as well as male and female singers. The frequency response curves are similar between the Andrea Amati violin and human singers, up to ∼4.2 kHz. By linear predictive coding analyses, the first two formants of the Amati exhibit vowel-like qualities (F1/F2 = 503/1,583 Hz), mapping to the central region on the vowel diagram. Its third and fourth formants (F3/F4 = 2,602/3,731 Hz) resemble those produced by male singers. Using F1 to F4 values to estimate the corresponding vocal tract length, we observed that antique Italian violins generally resemble basses/baritones, but Stradivari violins are closer to tenors/altos. Furthermore, the vowel qualities of Stradivari violins show reduced backness and height. The unique formant properties displayed by Stradivari violins may represent the acoustic correlate of their distinctive brilliance perceived by musicians. Our data demonstrate that the pioneering designs of Cremonese violins exhibit voice-like qualities in their acoustic output. Copyright © 2018 the Author(s). Published by PNAS.
Distinct Acoustic Features and Glottal Changes Define Two Modes of Singing in Peking Opera.
Li, Gelin; Li, Haiqing; Hou, Qian; Jiang, Zhen
2018-04-06
We aimed to delineate the acoustic characteristics of the Laodan and Qingyi role in Peking Opera and define glottis closure states and mucosal wave changes during singing in the two roles. The range of singing in A4 (440 Hz) pitch in seven female Peking Opera singers was determined using two classic pieces of Peking Opera. Glottal changes during singing were examined by stroboscopic laryngoscope. The fundamental frequency of /i/ in the first 15 seconds of the two pieces and the /i/ pitch range were determined. The relative length of the glottis fissure and the relative maximum mucosal amplitude were calculated. Qingyi had significantly higher mean fundamental frequency than Laodan. The long-term average spectrum showed an obvious formant cluster near 3000 Hz in Laodan versus Qingyi. No formant cluster was observed in singing in the regular mode. Strobe laryngoscopy showed complete glottal closure in Laodan and incomplete glottal closure in Qingyi in the maximal glottis closure phase. The relative length of the glottis fissure of Laodan was significantly lower than that of Qingyi in the singing mode. The relative maximum mucosal amplitude of Qingyi was significantly lower than that of Laodan. The Laodan role and the Qingyi role in Peking Opera sing in a fundamental frequency range compatible with the respective use of da sang (big voice) and xiao sang (small voice). The morphological patterns of glottal changes also indicate that the Laodan role and the Qingyi role sing with da sang and xiao sang, respectively. Copyright © 2018 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Clear Speech Variants: An Acoustic Study in Parkinson's Disease.
Lam, Jennifer; Tjaden, Kris
2016-08-01
The authors investigated how different variants of clear speech affect segmental and suprasegmental acoustic measures of speech in speakers with Parkinson's disease and a healthy control group. A total of 14 participants with Parkinson's disease and 14 control participants served as speakers. Each speaker produced 18 different sentences selected from the Sentence Intelligibility Test (Yorkston & Beukelman, 1996). All speakers produced stimuli in 4 speaking conditions (habitual, clear, overenunciate, and hearing impaired). Segmental acoustic measures included vowel space area and first moment (M1) coefficient difference measures for consonant pairs. Second formant slope of diphthongs and measures of vowel and fricative durations were also obtained. Suprasegmental measures included fundamental frequency, sound pressure level, and articulation rate. For the majority of adjustments, all variants of clear speech instruction differed from the habitual condition. The overenunciate condition elicited the greatest magnitude of change for segmental measures (vowel space area, vowel durations) and the slowest articulation rates. The hearing impaired condition elicited the greatest fricative durations and suprasegmental adjustments (fundamental frequency, sound pressure level). Findings have implications for a model of speech production for healthy speakers as well as for speakers with dysarthria. Findings also suggest that particular clear speech instructions may target distinct speech subsystems.
Alexander, Joshua M.
2016-01-01
By varying parameters that control nonlinear frequency compression (NFC), this study examined how different ways of compressing inaudible mid- and/or high-frequency information at lower frequencies influences perception of consonants and vowels. Twenty-eight listeners with mild to moderately severe hearing loss identified consonants and vowels from nonsense syllables in noise following amplification via a hearing aid simulator. Low-pass filtering and the selection of NFC parameters fixed the output bandwidth at a frequency representing a moderately severe (3.3 kHz, group MS) or a mild-to-moderate (5.0 kHz, group MM) high-frequency loss. For each group (n = 14), effects of six combinations of NFC start frequency (SF) and input bandwidth [by varying the compression ratio (CR)] were examined. For both groups, the 1.6 kHz SF significantly reduced vowel and consonant recognition, especially as CR increased; whereas, recognition was generally unaffected if SF increased at the expense of a higher CR. Vowel recognition detriments for group MS were moderately correlated with the size of the second formant frequency shift following NFC. For both groups, significant improvement (33%–50%) with NFC was confined to final /s/ and /z/ and to some VCV tokens, perhaps because of listeners' limited exposure to each setting. No set of parameters simultaneously maximized recognition across all tokens. PMID:26936574
Behavioral and subcortical signatures of musical expertise in Mandarin Chinese speakers
Tervaniemi, Mari; Aalto, Daniel
2018-01-01
Both musical training and native language have been shown to have experience-based plastic effects on auditory processing. However, the combined effects within individuals are unclear. Recent research suggests that musical training and tone language speaking are not clearly additive in their effects on processing of auditory features and that there may be a disconnect between perceptual and neural signatures of auditory feature processing. The literature has only recently begun to investigate the effects of musical expertise on basic auditory processing for different linguistic groups. This work provides a profile of primary auditory feature discrimination for Mandarin speaking musicians and nonmusicians. The musicians showed enhanced perceptual discrimination for both frequency and duration as well as enhanced duration discrimination in a multifeature discrimination task, compared to nonmusicians. However, there were no differences between the groups in duration processing of nonspeech sounds at a subcortical level or in subcortical frequency representation of a nonnative tone contour, for fo or for the first or second formant region. The results indicate that musical expertise provides a cognitive, but not subcortical, advantage in a population of Mandarin speakers. PMID:29300756
The Interaction of Lexical Characteristics and Speech Production in Parkinson's Disease.
Chiu, Yi-Fang; Forrest, Karen
2017-01-01
This study sought to investigate the interaction of speech movement execution with higher order lexical parameters. The authors examined how lexical characteristics affect speech output in individuals with Parkinson's disease (PD) and healthy control (HC) speakers. Twenty speakers with PD and 12 healthy speakers read sentences with target words that varied in word frequency and neighborhood density. The formant transitions (F2 slopes) of the diphthongs in the target words were compared across lexical categories between PD and HC groups. Both groups of speakers produced steeper F2 slopes for the diphthongs in less frequent words and words from sparse neighborhoods. The magnitude of the increase in F2 slopes was significantly less in the PD than HC group. The lexical effect on the F2 slope differed among the diphthongs and between the 2 groups. PD and healthy speakers varied their acoustic output on the basis of word frequency and neighborhood density. F2 slope variations can be traced to higher level lexical differences. This lexical effect on articulation, however, appears to be constrained by PD.
Social Communication and Vocal Recognition in Free-Ranging Rhesus Monkeys
NASA Astrophysics Data System (ADS)
Rendall, Christopher Andrew
Kinship and individual identity are key determinants of primate sociality, and the capacity for vocal recognition of individuals and kin is hypothesized to be an important adaptation facilitating intra-group social communication. Research was conducted on adult female rhesus monkeys on Cayo Santiago, Puerto Rico to test this hypothesis for three acoustically distinct calls characterized by varying selective pressures on communicating identity: coos (contact calls), grunts (close range social calls), and noisy screams (agonistic recruitment calls). Vocalization playback experiments confirmed a capacity for both individual and kin recognition of coos, but not screams (grunts were not tested). Acoustic analyses, using traditional spectrographic methods as well as linear predictive coding techniques, indicated that coos (but not grunts or screams) were highly distinctive, and that the effects of vocal tract filtering--formants --contributed more to statistical discriminations of both individuals and kin groups than did temporal or laryngeal source features. Formants were identified from very short (23 ms.) segments of coos and were stable within calls, indicating that formant cues to individual and kin identity were available throughout a call. This aspect of formant cues is predicted to be an especially important design feature for signaling identity efficiently in complex acoustic environments. Results of playback experiments involving manipulated coo stimuli provided preliminary perceptual support for the statistical inference that formant cues take precedence in facilitating vocal recognition. The similarity of formants among female kin suggested a mechanism for the development of matrilineal vocal signatures from the genetic and environmental determinants of vocal tract morphology shared among relatives. The fact that screams --calls strongly expected to communicate identity--were not individually distinctive nor recognized suggested the possibility that their acoustic structure and role in signaling identity might be constrained by functional or morphological design requirements associated with their role in signaling submission.
The influence of the level formants on the perception of synthetic vowel sounds
NASA Astrophysics Data System (ADS)
Kubzdela, Henryk; Owsianny, Mariuz
A computer model of a generator of periodic complex sounds simulating consonants was developed. The system makes possible independent regulation of the level of each of the formants and instant generation of the sound. A trapezoid approximates the curve of the spectrum within the range of the formant. In using this model, each person in a group of six listeners experimentally selected synthesis parameters for six sounds that to him seemed optimal approximations of Polish consonants. From these, another six sounds were selected that were identified by a majority of the six persons and several additional listeners as being best qualified to serve as prototypes of Polish consonants. These prototypes were then used to randomly create sounds with various combinations at the level of the second and third formant and these were presented to seven listeners for identification. The results of the identifications are presented in table form in three variants and are described from the point of view of the requirements of automatic recognition of consonants in continuous speech.
Speaking-rate-induced variability in F2 trajectories.
Tjaden, K; Weismer, G
1998-10-01
This study examined speaking-rate-induced spectral and temporal variability of F2 formant trajectories for target words produced in a carrier phrase at speaking rates ranging from fast to slow. F2 onset frequency measured at the first glottal pulse following the stop consonant release in target words was used to quantify the extent to which adjacent consonantal and vocalic gestures overlapped; F2 target frequency was operationally defined as the first occurrence of a frequency minimum or maximum following F2 onset frequency. Regression analyses indicated 70% of functions relating F2 onset and vowel duration were statistically significant. The strength of the effect was variable, however, and the direction of significant functions often differed from that predicted by a simple model of overlapping, sliding gestures. Results of a partial correlation analysis examining interrelationships among F2 onset, F2 target frequency, and vowel duration across the speaking rate range indicated that covariation of F2 target with vowel duration may obscure the relationship between F2 onset and vowel duration across rate. The results further suggested that a sliding based model of acoustic variability associated with speaking rate change only partially accounts for the present data, and that such a view accounts for some speakers' data better than others.
Statistical learning of music- and language-like sequences and tolerance for spectral shifts.
Daikoku, Tatsuya; Yatomi, Yutaka; Yumoto, Masato
2015-02-01
In our previous study (Daikoku, Yatomi, & Yumoto, 2014), we demonstrated that the N1m response could be a marker for the statistical learning process of pitch sequence, in which each tone was ordered by a Markov stochastic model. The aim of the present study was to investigate how the statistical learning of music- and language-like auditory sequences is reflected in the N1m responses based on the assumption that both language and music share domain generality. By using vowel sounds generated by a formant synthesizer, we devised music- and language-like auditory sequences in which higher-ordered transitional rules were embedded according to a Markov stochastic model by controlling fundamental (F0) and/or formant frequencies (F1-F2). In each sequence, F0 and/or F1-F2 were spectrally shifted in the last one-third of the tone sequence. Neuromagnetic responses to the tone sequences were recorded from 14 right-handed normal volunteers. In the music- and language-like sequences with pitch change, the N1m responses to the tones that appeared with higher transitional probability were significantly decreased compared with the responses to the tones that appeared with lower transitional probability within the first two-thirds of each sequence. Moreover, the amplitude difference was even retained within the last one-third of the sequence after the spectral shifts. However, in the language-like sequence without pitch change, no significant difference could be detected. The pitch change may facilitate the statistical learning in language and music. Statistically acquired knowledge may be appropriated to process altered auditory sequences with spectral shifts. The relative processing of spectral sequences may be a domain-general auditory mechanism that is innate to humans. Copyright © 2014 Elsevier Inc. All rights reserved.
Techniques for decoding speech phonemes and sounds: A concept
NASA Technical Reports Server (NTRS)
Lokerson, D. C.; Holby, H. G.
1975-01-01
Techniques studied involve conversion of speech sounds into machine-compatible pulse trains. (1) Voltage-level quantizer produces number of output pulses proportional to amplitude characteristics of vowel-type phoneme waveforms. (2) Pulses produced by quantizer of first speech formants are compared with pulses produced by second formants.
Range and Precision of Formant Movement in Pediatric Dysarthria
ERIC Educational Resources Information Center
Allison, Kristen M.; Annear, Lucas; Annear, Lucas; Policicchio, Marisa; Hustad, Katherine C.
2017-01-01
Purpose: This study aimed to improve understanding of speech characteristics associated with dysarthria in children with cerebral palsy by analyzing segmental and global formant measures in single-word and sentence contexts. Method: Ten 5-year-old children with cerebral palsy and dysarthria and 10 age-matched, typically developing children…
Borch, D Zangger; Sundberg, Johan
2011-09-01
This investigation aims at describing voice function of four nonclassical styles of singing, Rock, Pop, Soul, and Swedish Dance Band. A male singer, professionally experienced in performing in these genres, sang representative tunes, both with their original lyrics and on the syllable /pae/. In addition, he sang tones in a triad pattern ranging from the pitch Bb2 to the pitch C4 on the syllable /pae/ in pressed and neutral phonation. An expert panel was successful in classifying the samples, thus suggesting that the samples were representative of the various styles. Subglottal pressure was estimated from oral pressure during the occlusion for the consonant [p]. Flow glottograms were obtained from inverse filtering. The four lowest formant frequencies differed between the styles. The mean of the subglottal pressure and the mean of the normalized amplitude quotient (NAQ), that is, the ratio between the flow pulse amplitude and the product of period and maximum flow declination rate, were plotted against the mean of fundamental frequency. In these graphs, Rock and Swedish Dance Band assumed opposite extreme positions with respect to subglottal pressure and mean phonation frequency, whereas the mean NAQ values differed less between the styles. Copyright © 2011 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Differences between vocalization evoked by social stimuli in feral cats and house cats.
Yeon, Seong C; Kim, Young K; Park, Se J; Lee, Scott S; Lee, Seung Y; Suh, Euy H; Houpt, Katherine A; Chang, Hong H; Lee, Hee C; Yang, Byung G; Lee, Hyo J
2011-06-01
To investigate how socialization can affect the types and characteristics of vocalization produced by cats, feral cats (n=25) and house cats (n=13) were used as subjects, allowing a comparison between cats socialized to people and non-socialized cats. To record vocalization and assess the cats' responses to behavioural stimuli, five test situations were used: approach by a familiar caretaker, by a threatening stranger, by a large doll, by a stranger with a dog and by a stranger with a cat. Feral cats showed extremely aggressive and defensive behaviour in most test situations, and produced higher call rates than those of house cats in the test situations, which could be attributed to less socialization to other animals and to more sensitivity to fearful situations. Differences were observed in the acoustic parameters of feral cats in comparison to those of house cats. The feral cat produced significantly higher frequency in fundamental frequency, peak frequency, 1st quartile frequency, 3rd quartile frequency of growls and hisses in agonistic test situations. In contrast to the growls and hisses, in meow, all acoustic parameters like fundamental frequency, first formant, peak frequency, 1st quartile frequency, and 3rd quartile frequency of house cats were of significantly higher frequency than those of feral cats. Also, house cats produced calls of significantly shorter in duration than feral cats in agonistic test situations. These results support the conclusion that a lack of socialization may affect usage of types of vocalizations, and the vocal characteristics, so that the proper socialization of cat may be essential to be a suitable companion house cat. Copyright © 2011 Elsevier B.V. All rights reserved.
Acoustics of snoring and automatic snore sound detection in children.
Çavuşoğlu, M; Poets, C F; Urschitz, M S
2017-10-31
Acoustic analyses of snoring sounds have been used to objectively assess snoring and applied in various clinical problems for adult patients. Such studies require highly automatized tools to analyze the sound recordings of the whole night's sleep, in order to extract clinically relevant snore- related statistics. The existing techniques and software used for adults are not efficiently applicable to snoring sounds in children, basically because of different acoustic signal properties. In this paper, we present a broad range of acoustic characteristics of snoring sounds in children (N = 38) in comparison to adult (N = 30) patients. Acoustic characteristics of the signals were calculated, including frequency domain representations, spectrogram-based characteristics, spectral envelope analysis, formant structures and loudness of the snoring sounds. We observed significant differences in spectral features, formant structures and loudness of the snoring signals of children compared to adults that may arise from the diversity of the upper airway anatomy as the principal determinant of the snore sound generation mechanism. Furthermore, based on the specific audio features of snoring children, we proposed a novel algorithm for the automatic detection of snoring sounds from ambient acoustic data specifically in a pediatric population. The respiratory sounds were recorded using a pair of microphones and a multi-channel data acquisition system simultaneously with full-night polysomnography during sleep. Brief sound chunks of 0.5 s were classified as either belonging to a snoring event or not with a multi-layer perceptron, which was trained in a supervised fashion using stochastic gradient descent on a large hand-labeled dataset using frequency domain features. The method proposed here has been used to extract snore-related statistics that can be calculated from the detected snore episodes for the whole night's sleep, including number of snore episodes (total snoring time), ratio of snore to whole sleep time, variation of snoring rate, regularity of snoring episodes in time and amplitude and snore loudness. These statistics will ultimately serve as a clinical tool providing information for the objective evaluation of snoring for several clinical applications.
Effect on LTAS of vocal loudness variation.
Nordenberg, Maria; Sundberg, Johan
2004-01-01
Long-term-average spectrum (LTAS) is an efficient method for voice analysis, revealing both voice source and formant characteristics. However, the LTAS contour is non-uniformly affected by vocal loudness. This variation was analyzed in 15 male and 16 female untrained voices reading a text 7 times at different degrees of vocal loudness, mean change in overall equivalent sound level (Leq) amounting to 27.9 dB and 28.4 dB for the female and male subjects. For all frequency values up to 4 kHz, spectrum level was strongly and linearly correlated with Leq for each subject. The gain factor, that is to say, the rate of level increase, varied with frequency, from about 0.5 at low frequencies to about 1.5 in the frequency range 1.5-3 kHz. Using the gain factors for a subject, LTAS contours could be predicted at any Leq within the measured range, with an average accuracy of 2-3 dB below 4 kHz. Mean LTAS calculated for an Leq of 70 dB for each subject showed considerable individual variation for both males and females, SD of the level varying between 7 dB and 4 dB depending on frequency. On the other hand, the results also suggest that meaningful comparisons of LTAS, recorded for example before and after voice therapy, can be made, provided that the documentation includes a set of recordings at different loudness levels from one recording session.
Masapollo, Matthew; Polka, Linda; Ménard, Lucie
2016-03-01
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant-directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4- to 6-month-olds are beginning to produce vowel sounds. We created infant and adult /i/ ('ee') vowels using a production-based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant-like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development. © 2015 John Wiley & Sons Ltd.
Speech perception of sine-wave signals by children with cochlear implants
Nittrouer, Susan; Kuess, Jamie; Lowenstein, Joanna H.
2015-01-01
Children need to discover linguistically meaningful structures in the acoustic speech signal. Being attentive to recurring, time-varying formant patterns helps in that process. However, that kind of acoustic structure may not be available to children with cochlear implants (CIs), thus hindering development. The major goal of this study was to examine whether children with CIs are as sensitive to time-varying formant structure as children with normal hearing (NH) by asking them to recognize sine-wave speech. The same materials were presented as speech in noise, as well, to evaluate whether any group differences might simply reflect general perceptual deficits on the part of children with CIs. Vocabulary knowledge, phonemic awareness, and “top-down” language effects were all also assessed. Finally, treatment factors were examined as possible predictors of outcomes. Results showed that children with CIs were as accurate as children with NH at recognizing sine-wave speech, but poorer at recognizing speech in noise. Phonemic awareness was related to that recognition. Top-down effects were similar across groups. Having had a period of bimodal stimulation near the time of receiving a first CI facilitated these effects. Results suggest that children with CIs have access to the important time-varying structure of vocal-tract formants. PMID:25994709
An acoustic study of multiple lateral consonants in three Central Australian languages.
Tabain, Marija; Butcher, Andrew; Breen, Gavan; Beare, Richard
2016-01-01
This study presents dental, alveolar, retroflex, and palatal lateral /̪ll ɭ ʎ/ data from three Central Australian languages: Arrernte, Pitjantjatjara, and Warlpiri. Formant results show that the laminal laterals (dental /̪l/ and palatal /ʎ/) have a relatively low F1, presumably due to a high jaw position for these sounds, as well as higher F4. In addition, the palatal /ʎ/ has very high F2. There is relatively little difference in F3 between the four lateral places of articulation. However, the retroflex /ɭ/ appears to have slightly lower F3 and F4 in comparison to the other lateral sounds. Importantly, spectral moment analyses suggest that centre of gravity and standard deviation (first and second spectral moments) are sufficient to characterize the four places of articulation. The retroflex has a concentration of energy at slightly lower frequencies than the alveolar, while the palatal has a concentration of energy at higher frequencies. The dental is characterized by a more even spread of energy. These various results are discussed in light of different acoustic models of lateral production, and the possibility of spectral cues to place of articulation across manners of articulation is considered.
Formant Centralization Ratio: A Proposal for a New Acoustic Measure of Dysarthric Speech
ERIC Educational Resources Information Center
Sapir, Shimon; Ramig, Lorraine O.; Spielman, Jennifer L.; Fox, Cynthia
2010-01-01
Purpose: The vowel space area (VSA) has been used as an acoustic metric of dysarthric speech, but with varying degrees of success. In this study, the authors aimed to test an alternative metric to the VSA--the "formant centralization ratio" (FCR), which is hypothesized to more effectively differentiate dysarthric from healthy speech and register…
ERIC Educational Resources Information Center
Varghese, Peter; Kalashnikova, Marina; Burnham, Denis
2018-01-01
Purpose: An important skill in the development of speech perception is to apply optimal weights to acoustic cues so that phonemic information is recovered from speech with minimum effort. Here, we investigated the development of acoustic cue weighting of amplitude rise time (ART) and formant rise time (FRT) cues in children as measured by mismatch…
Mefferd, Antje S.
2016-01-01
The degree of speech movement pattern consistency can provide information about speech motor control. Although tongue motor control is particularly important because of the tongue's primary contribution to the speech acoustic signal, capturing tongue movements during speech remains difficult and costly. This study sought to determine if formant movements could be used to estimate tongue movement pattern consistency indirectly. Two age groups (seven young adults and seven older adults) and six speech conditions (typical, slow, loud, clear, fast, bite block speech) were selected to elicit an age- and task-dependent performance range in tongue movement pattern consistency. Kinematic and acoustic spatiotemporal indexes (STI) were calculated based on sentence-length tongue movement and formant movement signals, respectively. Kinematic and acoustic STI values showed strong associations across talkers and moderate to strong associations for each talker across speech tasks; although, in cases where task-related tongue motor performance changes were relatively small, the acoustic STI values were poorly associated with kinematic STI values. These findings suggest that, depending on the sensitivity needs, formant movement pattern consistency could be used in lieu of direct kinematic analysis to indirectly examine speech motor control. PMID:27908069
The 'F-complex' and MMN tap different aspects of deviance.
Laufer, Ilan; Pratt, Hillel
2005-02-01
To compare the 'F(fusion)-complex' with the Mismatch negativity (MMN), both components associated with automatic detection of changes in the acoustic stimulus flow. Ten right-handed adult native Hebrew speakers discriminated vowel-consonant-vowel (V-C-V) sequences /ada/ (deviant) and /aga/ (standard) in an active auditory 'Oddball' task, and the brain potentials associated with performance of the task were recorded from 21 electrodes. Stimuli were generated by fusing the acoustic elements of the V-C-V sequences as follows: base was always presented in front of the subject, and formant transitions were presented to the front, left or right in a virtual reality room. An illusion of a lateralized echo (duplex sensation) accompanied base fusion with the lateralized formant locations. Source current density estimates were derived for the net response to the fusion of the speech elements (F-complex) and for the MMN, using low-resolution electromagnetic tomography (LORETA). Statistical non-parametric mapping was used to estimate the current density differences between the brain sources of the F-complex and the MMN. Occipito-parietal regions and prefrontal regions were associated with the F-complex in all formant locations, whereas the vicinity of the supratemporal plane was bilaterally associated with the MMN, but only in case of front-fusion (no duplex effect). MMN is sensitive to the novelty of the auditory object in relation to other stimuli in a sequence, whereas the F-complex is sensitive to the acoustic features of the auditory object and reflects a process of matching them with target categories. The F-complex and MMN reflect different aspects of auditory processing in a stimulus-rich and changing environment: content analysis of the stimulus and novelty detection, respectively.
Compton, Michael T; Lunden, Anya; Cleary, Sean D; Pauselli, Luca; Alolayan, Yazeed; Halpern, Brooke; Broussard, Beth; Crisafio, Anthony; Capulong, Leslie; Balducci, Pierfrancesco Maria; Bernardini, Francesco; Covington, Michael A
2018-02-12
Acoustic phonetic methods are useful in examining some symptoms of schizophrenia; we used such methods to understand the underpinnings of aprosody. We hypothesized that, compared to controls and patients without clinically rated aprosody, patients with aprosody would exhibit reduced variability in: pitch (F0), jaw/mouth opening and tongue height (formant F1), tongue front/back position and/or lip rounding (formant F2), and intensity/loudness. Audiorecorded speech was obtained from 98 patients (including 25 with clinically rated aprosody and 29 without) and 102 unaffected controls using five tasks: one describing a drawing, two based on spontaneous speech elicited through a question (Tasks 2 and 3), and two based on reading prose excerpts (Tasks 4 and 5). We compared groups on variation in pitch (F0), formant F1 and F2, and intensity/loudness. Regarding pitch variation, patients with aprosody differed significantly from controls in Task 5 in both unadjusted tests and those adjusted for sociodemographics. For the standard deviation (SD) of F1, no significant differences were found in adjusted tests. Regarding SD of F2, patients with aprosody had lower values than controls in Task 3, 4, and 5. For variation in intensity/loudness, patients with aprosody had lower values than patients without aprosody and controls across the five tasks. Findings could represent a step toward developing new methods for measuring and tracking the severity of this specific negative symptom using acoustic phonetic parameters; such work is relevant to other psychiatric and neurological disorders. Copyright © 2018 Elsevier B.V. All rights reserved.
Understanding the intentional acoustic behavior of humpback whales: a production-based approach.
Cazau, Dorian; Adam, Olivier; Laitman, Jeffrey T; Reidenberg, Joy S
2013-09-01
Following a production-based approach, this paper deals with the acoustic behavior of humpback whales. This approach investigates various physical factors, which are either internal (e.g., physiological mechanisms) or external (e.g., environmental constraints) to the respiratory tractus of the whale, for their implications in sound production. This paper aims to describe a functional scenario of this tractus for the generation of vocal sounds. To do so, a division of this tractus into three different configurations is proposed, based on the air recirculation process which determines air sources and laryngeal valves. Then, assuming a vocal function (in sound generation or modification) for several specific anatomical components, an acoustic characterization of each of these configurations is proposed to link different spectral features, namely, fundamental frequencies and formant structures, to specific vocal production mechanisms. A discussion around the question of whether the whale is able to fully exploit the acoustic potential of its respiratory tractus is eventually provided.
Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments
NASA Astrophysics Data System (ADS)
Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos
2009-12-01
This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.
The contribution of waveform interactions to the perception of concurrent vowels.
Assmann, P F; Summerfield, Q
1994-01-01
Models of the auditory and phonetic analysis of speech must account for the ability of listeners to extract information from speech when competing voices are present. When two synthetic vowels are presented simultaneously and monaurally, listeners can exploit cues provided by a difference in fundamental frequency (F0) between the vowels to help determine their phonemic identities. Three experiments examined the effects of stimulus duration on the perception of such "double vowels." Experiment 1 confirmed earlier findings that a difference in F0 provides a smaller advantage when the duration of the stimulus is brief (50 ms rather than 200 ms). With brief stimuli, there may be insufficient time for attentional mechanisms to switch from the "dominant" member of the pair to the "nondominant" vowel. Alternatively, brief segments may restrict the availability of cues that are distributed over the time course of a longer segment of a double vowel. In experiment 1, listeners did not perform better when the same 50-ms segment was presented four times in succession (with 100-ms silent intervals) rather than only once, suggesting that limits on attention switching do not underlie the duration effect. However, performance improved in some conditions when four successive 50-ms segments were extracted from the 200-ms double vowels and presented in sequence, again with 100-ms silent intervals. Similar improvements were observed in experiment 2 between performance with the first 50-ms segment and one or more of the other three segments when the segments were presented individually. Experiment 3 demonstrated that part of the improvement observed in experiments 1 and 2 could be attributed to waveform interactions that either reinforce or attenuate harmonics that lie near vowel formants. Such interactions were beneficial only when the difference in F0 was small (0.25-1 semitone). These results are compatible with the idea that listeners benefit from small differences in F0 by performing a sequence of analyses of different time segments of a double vowel to determine where the formants of the constituent vowels are best defined.
The shift-invariant discrete wavelet transform and application to speech waveform analysis.
Enders, Jörg; Geng, Weihua; Li, Peijun; Frazier, Michael W; Scholl, David J
2005-04-01
The discrete wavelet transform may be used as a signal-processing tool for visualization and analysis of nonstationary, time-sampled waveforms. The highly desirable property of shift invariance can be obtained at the cost of a moderate increase in computational complexity, and accepting a least-squares inverse (pseudoinverse) in place of a true inverse. A new algorithm for the pseudoinverse of the shift-invariant transform that is easier to implement in array-oriented scripting languages than existing algorithms is presented together with self-contained proofs. Representing only one of the many and varied potential applications, a recorded speech waveform illustrates the benefits of shift invariance with pseudoinvertibility. Visualization shows the glottal modulation of vowel formants and frication noise, revealing secondary glottal pulses and other waveform irregularities. Additionally, performing sound waveform editing operations (i.e., cutting and pasting sections) on the shift-invariant wavelet representation automatically produces quiet, click-free section boundaries in the resulting sound. The capabilities of this wavelet-domain editing technique are demonstrated by changing the rate of a recorded spoken word. Individual pitch periods are repeated to obtain a half-speed result, and alternate individual pitch periods are removed to obtain a double-speed result. The original pitch and formant frequencies are preserved. In informal listening tests, the results are clear and understandable.
The shift-invariant discrete wavelet transform and application to speech waveform analysis
NASA Astrophysics Data System (ADS)
Enders, Jörg; Geng, Weihua; Li, Peijun; Frazier, Michael W.; Scholl, David J.
2005-04-01
The discrete wavelet transform may be used as a signal-processing tool for visualization and analysis of nonstationary, time-sampled waveforms. The highly desirable property of shift invariance can be obtained at the cost of a moderate increase in computational complexity, and accepting a least-squares inverse (pseudoinverse) in place of a true inverse. A new algorithm for the pseudoinverse of the shift-invariant transform that is easier to implement in array-oriented scripting languages than existing algorithms is presented together with self-contained proofs. Representing only one of the many and varied potential applications, a recorded speech waveform illustrates the benefits of shift invariance with pseudoinvertibility. Visualization shows the glottal modulation of vowel formants and frication noise, revealing secondary glottal pulses and other waveform irregularities. Additionally, performing sound waveform editing operations (i.e., cutting and pasting sections) on the shift-invariant wavelet representation automatically produces quiet, click-free section boundaries in the resulting sound. The capabilities of this wavelet-domain editing technique are demonstrated by changing the rate of a recorded spoken word. Individual pitch periods are repeated to obtain a half-speed result, and alternate individual pitch periods are removed to obtain a double-speed result. The original pitch and formant frequencies are preserved. In informal listening tests, the results are clear and understandable. .
Long-term-average spectrum characteristics of country singers during speaking and singing.
Cleveland, T F; Sundberg, J; Stone, R E
2001-03-01
Five premier male country singers involved in our previous studies spoke and sang the words of both the national anthem and a country song of their choice. Long-term-average spectra were made of the spoken and sung material of each singer. The spectral characteristics of county singers' speech and singing were similar. A prominent peak in the upper part of the spectrum, previously described as the "speaker's formant," was found in the county singers' speech and singing. The singer's formant, a strong spectral peak near 2.8 kHz, an important part of the spectrum of classically trained singers, was not found in the spectra of the country singers. The results support the conclusion that the resonance characteristics in speech and singing are similar in country singing and that county singing is not characterized by a singer's formant.
Cai, Shanqing; Beal, Deryk S.; Ghosh, Satrajit S.; Tiede, Mark K.; Guenther, Frank H.; Perkell, Joseph S.
2012-01-01
Previous empirical observations have led researchers to propose that auditory feedback (the auditory perception of self-produced sounds when speaking) functions abnormally in the speech motor systems of persons who stutter (PWS). Researchers have theorized that an important neural basis of stuttering is the aberrant integration of auditory information into incipient speech motor commands. Because of the circumstantial support for these hypotheses and the differences and contradictions between them, there is a need for carefully designed experiments that directly examine auditory-motor integration during speech production in PWS. In the current study, we used real-time manipulation of auditory feedback to directly investigate whether the speech motor system of PWS utilizes auditory feedback abnormally during articulation and to characterize potential deficits of this auditory-motor integration. Twenty-one PWS and 18 fluent control participants were recruited. Using a short-latency formant-perturbation system, we examined participants’ compensatory responses to unanticipated perturbation of auditory feedback of the first formant frequency during the production of the monophthong [ε]. The PWS showed compensatory responses that were qualitatively similar to the controls’ and had close-to-normal latencies (∼150 ms), but the magnitudes of their responses were substantially and significantly smaller than those of the control participants (by 47% on average, p<0.05). Measurements of auditory acuity indicate that the weaker-than-normal compensatory responses in PWS were not attributable to a deficit in low-level auditory processing. These findings are consistent with the hypothesis that stuttering is associated with functional defects in the inverse models responsible for the transformation from the domain of auditory targets and auditory error information into the domain of speech motor commands. PMID:22911857
Using speech sounds to test functional spectral resolution in listeners with cochlear implants
Winn, Matthew B.; Litovsky, Ruth Y.
2015-01-01
In this study, spectral properties of speech sounds were used to test functional spectral resolution in people who use cochlear implants (CIs). Specifically, perception of the /ba/-/da/ contrast was tested using two spectral cues: Formant transitions (a fine-resolution cue) and spectral tilt (a coarse-resolution cue). Higher weighting of the formant cues was used as an index of better spectral cue perception. Participants included 19 CI listeners and 10 listeners with normal hearing (NH), for whom spectral resolution was explicitly controlled using a noise vocoder with variable carrier filter widths to simulate electrical current spread. Perceptual weighting of the two cues was modeled with mixed-effects logistic regression, and was found to systematically vary with spectral resolution. The use of formant cues was greatest for NH listeners for unprocessed speech, and declined in the two vocoded conditions. Compared to NH listeners, CI listeners relied less on formant transitions, and more on spectral tilt. Cue-weighting results showed moderately good correspondence with word recognition scores. The current approach to testing functional spectral resolution uses auditory cues that are known to be important for speech categorization, and can thus potentially serve as the basis upon which CI processing strategies and innovations are tested. PMID:25786954
Daliri, Ayoub; Max, Ludo
2018-02-01
Auditory modulation during speech movement planning is limited in adults who stutter (AWS), but the functional relevance of the phenomenon itself remains unknown. We investigated for AWS and adults who do not stutter (AWNS) (a) a potential relationship between pre-speech auditory modulation and auditory feedback contributions to speech motor learning and (b) the effect on pre-speech auditory modulation of real-time versus delayed auditory feedback. Experiment I used a sensorimotor adaptation paradigm to estimate auditory-motor speech learning. Using acoustic speech recordings, we quantified subjects' formant frequency adjustments across trials when continually exposed to formant-shifted auditory feedback. In Experiment II, we used electroencephalography to determine the same subjects' extent of pre-speech auditory modulation (reductions in auditory evoked potential N1 amplitude) when probe tones were delivered prior to speaking versus not speaking. To manipulate subjects' ability to monitor real-time feedback, we included speaking conditions with non-altered auditory feedback (NAF) and delayed auditory feedback (DAF). Experiment I showed that auditory-motor learning was limited for AWS versus AWNS, and the extent of learning was negatively correlated with stuttering frequency. Experiment II yielded several key findings: (a) our prior finding of limited pre-speech auditory modulation in AWS was replicated; (b) DAF caused a decrease in auditory modulation for most AWNS but an increase for most AWS; and (c) for AWS, the amount of auditory modulation when speaking with DAF was positively correlated with stuttering frequency. Lastly, AWNS showed no correlation between pre-speech auditory modulation (Experiment II) and extent of auditory-motor learning (Experiment I) whereas AWS showed a negative correlation between these measures. Thus, findings suggest that AWS show deficits in both pre-speech auditory modulation and auditory-motor learning; however, limited pre-speech modulation is not directly related to limited auditory-motor adaptation; and in AWS, DAF paradoxically tends to normalize their otherwise limited pre-speech auditory modulation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Shao, Jing; Huang, Xunan
2017-01-01
Congenital amusia is a lifelong disorder of fine-grained pitch processing in music and speech. However, it remains unclear whether amusia is a pitch-specific deficit, or whether it affects frequency/spectral processing more broadly, such as the perception of formant frequency in vowels, apart from pitch. In this study, in order to illuminate the scope of the deficits, we compared the performance of 15 Cantonese-speaking amusics and 15 matched controls on the categorical perception of sound continua in four stimulus contexts: lexical tone, pure tone, vowel, and voice onset time (VOT). Whereas lexical tone, pure tone and vowel continua rely on frequency/spectral processing, the VOT continuum depends on duration/temporal processing. We found that the amusic participants performed similarly to controls in all stimulus contexts in the identification, in terms of the across-category boundary location and boundary width. However, the amusic participants performed systematically worse than controls in discriminating stimuli in those three contexts that depended on frequency/spectral processing (lexical tone, pure tone and vowel), whereas they performed normally when discriminating duration differences (VOT). These findings suggest that the deficit of amusia is probably not pitch specific, but affects frequency/spectral processing more broadly. Furthermore, there appeared to be differences in the impairment of frequency/spectral discrimination in speech and nonspeech contexts. The amusic participants exhibited less benefit in between-category discriminations than controls in speech contexts (lexical tone and vowel), suggesting reduced categorical perception; on the other hand, they performed inferiorly compared to controls across the board regardless of between- and within-category discriminations in nonspeech contexts (pure tone), suggesting impaired general auditory processing. These differences imply that the frequency/spectral-processing deficit might be manifested differentially in speech and nonspeech contexts in amusics—it is manifested as a deficit of higher-level phonological processing in speech sounds, and as a deficit of lower-level auditory processing in nonspeech sounds. PMID:28829808
Laukkanen, Anne-Maria; Horáček, Jaromir; Havlík, Radan
2012-07-01
Vocal warm-up (WU)-related changes were studied in one male musical singer and one female speech trainer. They sustained vowels before and after WU in a magnetic resonance imaging (MRI) device. Acoustic recordings were made in a studio. The vocal tract area increased after WU, a formant cluster appeared between 2 and 4.5 kHz, and SPL increased. Evidence of larynx lowering was only found for the male. The pharyngeal inlet over the epilaryngeal outlet ratio (A(ph)/A(e)) increased by 10%-28%, being 3-4 for the male and 5-7 for the female. The results seem to represent different voice training traditions. A singer's formant cluster may be achievable without a high A(ph)/A(e) (≥ 6), but limitations of the 2D method should be taken into account.
Analysis of levels of support and resonance demonstrated by an elite singing teacher
NASA Astrophysics Data System (ADS)
Scherer, Ronald C.; Radhakrishnan, Nandhakumar; Poulimenos, Andreas
2003-04-01
This was a study of levels of singing expertise demonstrated by an elite operatic singer and teacher. This approach may prove advantageous because the teacher demonstrates what he thinks is important, not what the nonsinging scientist thinks should be important. Two pedagogical sequences were studied: (1) the location of support-glottis (poor), chest (better), abdomen (best); (2) locations of resonance-hard palate/straight tone (poor), mouth (better), sinus/head (best). Measures were obtained for a single frequency (196 Hz), the vowel /ae/, and for mezzo-forte loudness using the /pae pae pae/ technique. Sequence differences: The support sequence was characterized by formant frequency lowering suggestive of vocal tract lengthening. The resonance sequence was characterized by flow (AC, mean flow) and abduction increases. Sequence similarities: The best locations had the widest F2 bandwidths. The better and best locations had the largest dB difference between F2 and F3. Although acoustic power increased through the sequences, the acoustic efficiency was not a discriminating factor. Open and speed quotients were not differentiating. The flow resistance was highest and aerodynamic power the lowest for the first of each sequence. Combined data: The maximum flow declination rate correlated highly with the AC flow (r=-0.92) and SPL (r=0.901).
Akimov, Alexander G; Egorova, Marina A; Ehret, Günter
2017-02-01
Selectivity for processing of species-specific vocalizations and communication sounds has often been associated with the auditory cortex. The midbrain inferior colliculus, however, is the first center in the auditory pathways of mammals integrating acoustic information processed in separate nuclei and channels in the brainstem and, therefore, could significantly contribute to enhance the perception of species' communication sounds. Here, we used natural wriggling calls of mouse pups, which communicate need for maternal care to adult females, and further 15 synthesized sounds to test the hypothesis that neurons in the central nucleus of the inferior colliculus of adult females optimize their response rates for reproduction of the three main harmonics (formants) of wriggling calls. The results confirmed the hypothesis showing that average response rates, as recorded extracellularly from single units, were highest and spectral facilitation most effective for both onset and offset responses to the call and call models with three resolved frequencies according to critical bands in perception. In addition, the general on- and/or off-response enhancement in almost half the investigated 122 neurons favors not only perception of single calls but also of vocalization rhythm. In summary, our study provides strong evidence that critical-band resolved frequency components within a communication sound increase the probability of its perception by boosting the signal-to-noise ratio of neural response rates within the inferior colliculus for at least 20% (our criterion for facilitation). These mechanisms, including enhancement of rhythm coding, are generally favorable to processing of other animal and human vocalizations, including formants of speech sounds. © 2016 Federation of European Neuroscience Societies and John Wiley & Sons Ltd.
Investigation of the impact of thyroid surgery on vocal tract steadiness.
Timon, Conrad I; Hirani, Shashi P; Epstein, Ruth; Rafferty, Mark A
2010-09-01
Subjective nonspecific upper aerodigestive symptoms are not uncommon after thyroid surgery. These are postulated to be related to injury of an extrinsic perithyroid nerve plexus that innervates the muscles of the supraglottic and glottic larynx. This plexus is thought to receive contributing branches from both the recurrent and superior laryngeal nerves. The technique of linear predictive coding was used to estimate the F(2) values from a sustained vowel /a/ in patients before and 48 hours after thyroid or parathyroid surgery. These patients were controlled against a matched pair undergoing surgery without any theoretical effect on the supraglottic musculature. In total, 12 patients were recruited into each group. Each patient had the formant frequency fluctuation (FFF) and the formant frequency fluctuation ratio (FFFR) calculated for F(1) and F(2). Mixed analysis of variance (ANOVA) for all acoustic parameters revealed that the chiF(2)FF showed a significant "time" main effect (F(1,22)=7.196, P=0.014, partial eta(2)=0.246) and a significant "time by group interaction" effect (F(1,22)=8.036, P=0.010, eta(p)(2)=0.268), with changes over time for the thyroid group but not for the controls. Similarly, mean chiF(2)FFR showed a similar significant "time" main effect (F(1,22)=6.488, P=0.018, eta(p)(2)=0.228) and a "time by group interaction" effect (F(1,22)=7.134, P=0.014, eta(p)(2)=0.245). This work suggests that thyroid surgery produces a significant reduction in vocal tract stability in contrast to the controls. This noninvasive measurement offers a potential instrument to investigate the functional implications of any disturbance that thyroid surgery may have on pharyngeal innervations. 2010 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Physiological characteristics of the supported singing voice. A preliminary study.
Griffin, B; Woo, P; Colton, R; Casper, J; Brewer, D
1995-03-01
The purpose of this study was to develop a definition of the supported singing voice based on physiological characteristics by comparing the subjects' concepts of a supported voice with objective measurements of their supported and unsupported voice. This preliminary report presents findings based on data from eight classically trained singers. Subjects answered questions about their concepts of the characteristics of the supported singing voice and how it is produced. Samples of the supported and unsupported singing voice produced at low, medium, and high pitches at a comfortable loudness level were collected for acoustic, spectral, airflow, electroglottographic, air volume, and stroboscopic analyses. Significant differences between the supported and unsupported voice were found for sound pressure level (SPL), peak airflow, subglottal pressure (Ps), glottal open time, and frequency of the fourth formant (F4). Mean flow and F2 frequency differences were sex and pitch related. Males adjusted laryngeal configuration to produce supported voice, whereas glottal configuration differences were greater in females. Breathing patterns were variable and not significantly different between supported and unsupported voice. Subjects in this study believe that the supported singing voice is resonant, clear, and easy to manage and is produced by correct breath management. Results of data analysis show that the supported singing voice has different spectral characteristics from and higher SPL, peak airflow, and Ps than the unsupported voice. Singers adjust laryngeal and/or glottal configuration to account for these changes, but no significant differences in breathing activity were found.
Volodin, Ilya A; Matrosova, Vera A; Frey, Roland; Kozhevnikova, Julia D; Isaeva, Inna L; Volodina, Elena V
2018-06-11
Non-hibernating pikas collect winter food reserves and store them in hay piles. Individualization of alarm calls might allow discrimination between colony members and conspecifics trying to steal food items from a colony pile. We investigated vocal posture, vocal tract length, and individual acoustic variation of alarm calls, emitted by wild-living Altai pikas Ochotona alpina toward a researcher. Recording started when a pika started calling and lasted as long as possible. The alarm call series of 442 individual callers from different colonies consisted of discrete short (0.073-0.157 s), high-frequency (7.31-15.46 kHz), and frequency-modulated calls separated by irregular intervals. Analysis of 442 discrete calls, the second of each series, revealed that 44.34% calls lacked nonlinear phenomena, in 7.02% nonlinear phenomena covered less than half of call duration, and in 48.64% nonlinear phenomena covered more than half of call duration. Peak frequencies varied among individuals but always fitted one of three maxima corresponding to the vocal tract resonance frequencies (formants) calculated for an estimated 45-mm oral vocal tract. Discriminant analysis using variables of 8 calls per series of 36 different callers, each from a different colony, correctly assigned over 90% of the calls to individuals. Consequently, Altai pika alarm calls are individualistic and nonlinear phenomena might further increase this acoustic individualization. Additionally, video analysis revealed a call-synchronous, very fast (0.13-0.23 s) folding, depression, and subsequent re-expansion of the pinna confirming an earlier report of this behavior that apparently contributes to protecting the hearing apparatus from damage by the self-generated high-intensity alarm calls.
Chang, Yen-Liang; Hung, Chao-Ho; Chen, Po-Yueh; Chen, Wei-Chang; Hung, Shih-Han
2015-10-01
Acoustic analysis is often used in speech evaluation but seldom for the evaluation of oral prostheses designed for reconstruction of surgical defect. This study aimed to introduce the application of acoustic analysis for patients with velopharyngeal insufficiency (VPI) due to oral surgery and rehabilitated with oral speech-aid prostheses. The pre- and postprosthetic rehabilitation acoustic features of sustained vowel sounds from two patients with VPI were analyzed and compared with the acoustic analysis software Praat. There were significant differences in the octave spectrum of sustained vowel speech sound between the pre- and postprosthetic rehabilitation. Acoustic measurements of sustained vowels for patients before and after prosthetic treatment showed no significant differences for all parameters of fundamental frequency, jitter, shimmer, noise-to-harmonics ratio, formant frequency, F1 bandwidth, and band energy difference. The decrease in objective nasality perceptions correlated very well with the decrease in dips of the spectra for the male patient with a higher speech bulb height. Acoustic analysis may be a potential technique for evaluating the functions of oral speech-aid prostheses, which eliminates dysfunctions due to the surgical defect and contributes to a high percentage of intelligible speech. Octave spectrum analysis may also be a valuable tool for detecting changes in nasality characteristics of the voice during prosthetic treatment of VPI. Copyright © 2014. Published by Elsevier B.V.
Evaluation of articulation simulation system using artificial maxillectomy models.
Elbashti, M E; Hattori, M; Sumita, Y I; Taniguchi, H
2015-09-01
Acoustic evaluation is valuable for guiding the treatment of maxillofacial defects and determining the effectiveness of rehabilitation with an obturator prosthesis. Model simulations are important in terms of pre-surgical planning and pre- and post-operative speech function. This study aimed to evaluate the acoustic characteristics of voice generated by an articulation simulation system using a vocal tract model with or without artificial maxillectomy defects. More specifically, we aimed to establish a speech simulation system for maxillectomy defect models that both surgeons and maxillofacial prosthodontists can use in guiding treatment planning. Artificially simulated maxillectomy defects were prepared according to Aramany's classification (Classes I-VI) in a three-dimensional vocal tract plaster model of a subject uttering the vowel /a/. Formant and nasalance acoustic data were analysed using Computerized Speech Lab and the Nasometer, respectively. Formants and nasalance of simulated /a/ sounds were successfully detected and analysed. Values of Formants 1 and 2 for the non-defect model were 675.43 and 976.64 Hz, respectively. Median values of Formants 1 and 2 for the defect models were 634.36 and 1026.84 Hz, respectively. Nasalance was 11% in the non-defect model, whereas median nasalance was 28% in the defect models. The results suggest that an articulation simulation system can be used to help surgeons and maxillofacial prosthodontists to plan post-surgical defects that will be facilitate maxillofacial rehabilitation. © 2015 John Wiley & Sons Ltd.
The Effect of Timbre and Vibrato on Vocal Pitch Matching Accuracy
NASA Astrophysics Data System (ADS)
Duvvuru, Sirisha
Research has shown that singers are better able to match pitch when the target stimulus has a timbre close to their own voice. This study seeks to answer the following questions: (1) Do classically trained female singers more accurately match pitch when the target stimulus is more similar to their own timbre? (2) Does the ability to match pitch vary with increasing pitch? (3) Does the ability to match pitch differ depending on whether the target stimulus is produced with or without vibrato? (4) Are mezzo sopranos less accurate than sopranos?
Acoustic markers to differentiate gender in prepubescent children's speaking and singing voice.
Guzman, Marco; Muñoz, Daniel; Vivero, Martin; Marín, Natalia; Ramírez, Mirta; Rivera, María Trinidad; Vidal, Carla; Gerhard, Julia; González, Catalina
2014-10-01
Investigation sought to determine whether there is any acoustic variable to objectively differentiate gender in children with normal voices. A total of 30 children, 15 boys and 15 girls, with perceptually normal voices were examined. They were between 7 and 10 years old (mean: 8.1, SD: 0.7 years). Subjects were required to perform the following phonatory tasks: (1) to phonate sustained vowels [a:], [i:], [u:], (2) to read a phonetically balanced text, and (3) to sing a song. Acoustic analysis included long-term average spectrum (LTAS), fundamental frequency (F0), speaking fundamental frequency (SFF), equivalent continuous sound level (Leq), linear predictive code (LPC) to obtain formant frequencies, perturbation measures, harmonic to noise ratio (HNR), and Cepstral peak prominence (CPP). Auditory perceptual analysis was performed by four blinded judges to determine gender. No significant gender-related differences were found for most acoustic variables. Perceptual assessment showed good intra and inter rater reliability for gender. Cepstrum for [a:], alpha ratio in text, shimmer for [i:], F3 in [a:], and F3 in [i:], were the parameters that composed the multivariate logistic regression model to best differentiate male and female children's voices. Since perceptual assessment reliably detected gender, it is likely that other acoustic markers (not evaluated in the present study) are able to make clearer gender differences. For example, gender-specific patterns of intonation may be a more accurate feature for differentiating gender in children's voices. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier
2005-07-01
Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
Lip Movement Exaggerations During Infant-Directed Speech
Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana
2011-01-01
Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
Makagon, Maja M; Funayama, E Sumie; Owren, Michael J
2008-07-01
Relatively few empirical data are available concerning the role of auditory experience in nonverbal human vocal behavior, such as laughter production. This study compared the acoustic properties of laughter in 19 congenitally, bilaterally, and profoundly deaf college students and in 23 normally hearing control participants. Analyses focused on degree of voicing, mouth position, air-flow direction, temporal features, relative amplitude, fundamental frequency, and formant frequencies. Results showed that laughter produced by the deaf participants was fundamentally similar to that produced by the normally hearing individuals, which in turn was consistent with previously reported findings. Finding comparable acoustic properties in the sounds produced by deaf and hearing vocalizers confirms the presumption that laughter is importantly grounded in human biology, and that auditory experience with this vocalization is not necessary for it to emerge in species-typical form. Some differences were found between the laughter of deaf and hearing groups; the most important being that the deaf participants produced lower-amplitude and longer-duration laughs. These discrepancies are likely due to a combination of the physiological and social factors that routinely affect profoundly deaf individuals, including low overall rates of vocal fold use and pressure from the hearing world to suppress spontaneous vocalizations.
Effects of frequency shifts and visual gender information on vowel category judgments
NASA Astrophysics Data System (ADS)
Glidden, Catherine; Assmann, Peter F.
2003-10-01
Visual morphing techniques were used together with a high-quality vocoder to study the audiovisual contribution of talker gender to the identification of frequency-shifted vowels. A nine-step continuum ranging from ``bit'' to ``bet'' was constructed from natural recorded syllables spoken by an adult female talker. Upward and downward frequency shifts in spectral envelope (scale factors of 0.85 and 1.0) were applied in combination with shifts in fundamental frequency, F0 (scale factors of 0.5 and 1.0). Downward frequency shifts generally resulted in malelike voices whereas upward shifts were perceived as femalelike. Two separate nine-step visual continua from ``bit'' to ``bet'' were also constructed, one from a male face and the other a female face, each producing the end-point words. Each step along the two visual continua was paired with the corresponding step on the acoustic continuum, creating natural audiovisual utterances. Category boundary shifts were found for both acoustic cues (F0 and formant frequency shifts) and visual cues (visual gender). The visual gender effect was larger when acoustic and visual information were matched appropriately. These results suggest that visual information provided by the speech signal plays an important supplemental role in talker normalization.
The influence of speaking rate on nasality in the speech of hearing-impaired individuals.
Dwyer, Claire H; Robb, Michael P; O'Beirne, Greg A; Gilbert, Harvey R
2009-10-01
The purpose of this study was to determine whether deliberate increases in speaking rate would serve to decrease the amount of nasality in the speech of severely hearing-impaired individuals. The participants were 11 severely to profoundly hearing-impaired students, ranging in age from 12 to 19 years (M = 16 years). Each participant provided a baseline speech sample (R1) followed by 3 training sessions during which participants were trained to increase their speaking rate. Following the training sessions, a second speech sample was obtained (R2). Acoustic and perceptual analyses of the speech samples obtained at R1 and R2 were undertaken. The acoustic analysis focused on changes in first (F(1)) and second (F(2)) formant frequency and formant bandwidths. The perceptual analysis involved listener ratings of the speech samples (at R1 and R2) for perceived nasality. Findings indicated a significant increase in speaking rate at R2. In addition, significantly narrower F(2) bandwidth and lower perceptual rating scores of nasality were obtained at R2 across all participants, suggesting a decrease in nasality as speaking rate increases. The nasality demonstrated by hearing-impaired individuals is amenable to change when speaking rate is increased. The influences of speaking rate changes on the perception and production of nasality in hearing-impaired individuals are discussed.
Human vocal attractiveness as signaled by body size projection.
Xu, Yi; Lee, Albert; Wu, Wing-Li; Liu, Xuan; Birkholz, Peter
2013-01-01
Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language.
Modeling source-filter interaction in belting and high-pitched operatic male singing
Titze, Ingo R.; Worley, Albert S.
2009-01-01
Nonlinear source-filter theory is applied to explain some acoustic differences between two contrasting male singing productions at high pitches: operatic style versus jazz belt or theater belt. Several stylized vocal tract shapes (caricatures) are discussed that form the bases of these styles. It is hypothesized that operatic singing uses vowels that are modified toward an inverted megaphone mouth shape for transitioning into the high-pitch range. This allows all the harmonics except the fundamental to be “lifted” over the first formant. Belting, on the other hand, uses vowels that are consistently modified toward the megaphone (trumpet-like) mouth shape. Both the fundamental and the second harmonic are then kept below the first formant. The vocal tract shapes provide collective reinforcement to multiple harmonics in the form of inertive supraglottal reactance and compliant subglottal reactance. Examples of lip openings from four well-known artists are used to infer vocal tract area functions and the corresponding reactances. PMID:19739766
Relationship between perceived politeness and spectral characteristics of voice
NASA Astrophysics Data System (ADS)
Ito, Mika
2005-04-01
This study investigates the role of voice quality in perceiving politeness under conditions of varying relative social status among Japanese male speakers. The work focuses on four important methodological issues: experimental control of sociolinguistic aspects, eliciting natural spontaneous speech, obtaining recording quality suitable for voice quality analysis, and assessment of glottal characteristics through the use of non-invasive direct measurements of the speech spectrum. To obtain natural, unscripted utterances, the speech data were collected with a Map Task. This methodology allowed us to study the effect of manipulating relative social status among participants in the same community. We then computed the relative amplitudes of harmonics and formant peaks in spectra obtained from the Map Task recordings. Finally, an experiment was conducted to observe the alignment between acoustic measures and the perceived politeness of the voice samples. The results suggest that listeners' perceptions of politeness are determined by spectral characteristics of speakers, in particular, spectral tilts obtained by computing the difference in amplitude between the first harmonic and the third formant.
Spectral characteristics of speech with fixed jaw displacements
NASA Astrophysics Data System (ADS)
Solomon, Nancy P.; Makashay, Matthew J.; Munson, Benjamin
2004-05-01
During speech, movements of the mandible and the tongue are interdependent. For some research purposes, the mandible may be constrained to ensure independent tongue motion. To examine specific spectral characteristics of speech with different jaw positions, ten normal adults produced sentences with multiple instances of /t/, /s/, /squflg/, /i/, /ai/, and /squflgi/. Talkers produced stimuli with the jaw free to vary, and while gently biting on 2- and 5-mm bite blocks unilaterally. Spectral moments of /s/ and /squflg/ frication and /t/ bursts differed such that mean spectral energy decreased, and diffuseness and skewness increased with bite blocks. The specific size of the bite block had minimal effect on these results, which were most consistent for /s/. Formant analysis for the vocoids revealed lower F2 frequency in /i/ and at the end of the transition in /ai/ when bite blocks were used; F2 slope for diphthongs was not sensitive to differences in jaw position. Two potential explanations for these results involve the physical presence of the bite blocks in the lateral oral cavity, and the oromotor system's ability to compensate for fixed jaw displacements. [Work supported by NIDCD R03-DC06096.
Peters, Jörg; Heeringa, Wilbert J; Schoormann, Heike E
2017-08-01
The present study compares the acoustic realization of Saterland Frisian, Low German, and High German vowels by trilingual speakers in the Saterland. The Saterland is a rural municipality in northwestern Germany. It offers the unique opportunity to study trilingualism with languages that differ both by their vowel inventories and by external factors, such as their social status and the autonomy of their speech communities. The objective of the study was to examine whether the trilingual speakers differ in their acoustic realizations of vowel categories shared by the three languages and whether those differences can be interpreted as effects of either the differences in the vowel systems or of external factors. Monophthongs produced in a /hVt/ frame revealed that High German vowels show the most divergent realizations in terms of vowel duration and formant frequencies, whereas Saterland Frisian and Low German vowels show small differences. These findings suggest that vowels of different languages are likely to share the same phonological space when the speech communities largely overlap, as is the case with Saterland Frisian and Low German, but may resist convergence if at least one language is shared with a larger, monolingual speech community, as is the case with High German.
Acoustic characteristics of modern Greek Orthodox Church music.
Delviniotis, Dimitrios S
2013-09-01
Some acoustic characteristics of the two types of vocal music of the Greek Orthodox Church Music, the Byzantine chant (BC) and ecclesiastical speech (ES), are studied in relation to the common Greek speech and the Western opera. Vocal samples were obtained, and their acoustic parameters of sound pressure level (SPL), fundamental frequency (F0), and the long-time average spectrum (LTAS) characteristics were analyzed. Twenty chanters, including two chanters-singers of opera, sang (BC) and read (ES) the same hymn of Byzantine music (BM), the two opera singers sang the same aria of opera, and common speech samples were obtained, and all audio were analyzed. The distribution of SPL values showed that the BC and ES have higher SPL by 9 and 12 dB, respectively, than common speech. The average F0 in ES tends to be lower than the common speech, and the smallest standard deviation (SD) of F0 values characterizes its monotonicity. The tone-scale intervals of BC are close enough to the currently accepted theory with SD equal to 0.24 semitones. The rate and extent of vibrato, which is rare in BC, equals 4.1 Hz and 0.6 semitones, respectively. The average LTAS slope is greatest in BC (+4.5 dB) but smaller than in opera (+5.7 dB). In both BC and ES, instead of a singer's formant appearing in an opera voice, a speaker's formant (SPF) was observed around 3300 Hz, with relative levels of +6.3 and +4.6 dB, respectively. The two vocal types of BM, BC, and ES differ both to each other and common Greek speech and opera style regarding SPL, the mean and SD of F0, the LTAS slope, and the relative level of SPF. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Auditory Perceptual Abilities Are Associated with Specific Auditory Experience
Zaltz, Yael; Globerson, Eitan; Amir, Noam
2017-01-01
The extent to which auditory experience can shape general auditory perceptual abilities is still under constant debate. Some studies show that specific auditory expertise may have a general effect on auditory perceptual abilities, while others show a more limited influence, exhibited only in a relatively narrow range associated with the area of expertise. The current study addresses this issue by examining experience-dependent enhancement in perceptual abilities in the auditory domain. Three experiments were performed. In the first experiment, 12 pop and rock musicians and 15 non-musicians were tested in frequency discrimination (DLF), intensity discrimination, spectrum discrimination (DLS), and time discrimination (DLT). Results showed significant superiority of the musician group only for the DLF and DLT tasks, illuminating enhanced perceptual skills in the key features of pop music, in which miniscule changes in amplitude and spectrum are not critical to performance. The next two experiments attempted to differentiate between generalization and specificity in the influence of auditory experience, by comparing subgroups of specialists. First, seven guitar players and eight percussionists were tested in the DLF and DLT tasks that were found superior for musicians. Results showed superior abilities on the DLF task for guitar players, though no difference between the groups in DLT, demonstrating some dependency of auditory learning on the specific area of expertise. Subsequently, a third experiment was conducted, testing a possible influence of vowel density in native language on auditory perceptual abilities. Ten native speakers of German (a language characterized by a dense vowel system of 14 vowels), and 10 native speakers of Hebrew (characterized by a sparse vowel system of five vowels), were tested in a formant discrimination task. This is the linguistic equivalent of a DLS task. Results showed that German speakers had superior formant discrimination, demonstrating highly specific effects for auditory linguistic experience as well. Overall, results suggest that auditory superiority is associated with the specific auditory exposure. PMID:29238318
Dilley, Laura C; Wieland, Elizabeth A; Gamache, Jessica L; McAuley, J Devin; Redford, Melissa A
2013-02-01
As children mature, changes in voice spectral characteristics co-vary with changes in speech, language, and behavior. In this study, spectral characteristics were manipulated to alter the perceived ages of talkers' voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Speech was modified by lowering formants and fundamental frequency, for 5-year-old children's utterances, or raising them, for adult caregivers' utterances. Next, participants differing in awareness of the manipulation (Experiment 1A) or amount of speech-language training (Experiment 1B) made judgments of prosodic, segmental, and talker attributes. Experiment 2 investigated the effects of spectral modification on intelligibility. Finally, in Experiment 3, trained analysts used formal prosody coding to assess prosodic characteristics of spectrally modified and unmodified speech. Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work.
Effects of Levodopa on Vowel Articulation in Patients with Parkinson's Disease.
Okada, Yukihiro; Murata, Miho; Toda, Tatsushi
2016-04-27
The effects of levodopa on articulatory dysfunction in patients with Parkinson's disease remain inconclusive. This study aimed to investigate the effects of levodopa on isolated vowel articulation and motor performance in patients with moderate to severe Parkinson's disease, excluding speech fluctuations caused by dyskinesia. 21 patients (14 males and 7 females) and 21 age- and sex- matched healthy subjects were enrolled. Together with motor assessment, the patients phonated five Japanese isolated vowels (/a/, /i/, /u/, /e/, and /o/) 20 times before and 1 h after levodopa treatment. We made the frequency analysis of each vowel and measured the first and second formants. From these formants we constructed the pentagonal vowel space area which should be the good indicator for articulatory dysfunction of vowels. In control subjects, only speech samples were analyzed. To investigate the sequential relationship between plasma levodopa concentrations, motor performances, and acoustic measurements after treatment, entire drug cycle tests were performed in 4 patients. The pentagonal vowel space area was significantly expanded together with motor amelioration after levodopa treatment, although the enlargement is not enough for the space area of control subjects. Drug cycle tests revealed that sequential increases or decreases in plasma levodopa levels after treatment correlated well with expansion or decrease of the vowel space areas and improvement or deterioration of motor manifestations. Levodopa expanded the vowel space area and ameliorated motor performance, suggesting that dysfunctions in vowel articulation and motor performance in patients with Parkinson's disease are based on dopaminergic pathology.
Simultaneous F 0-F 1 modifications of Arabic for the improvement of natural-sounding
NASA Astrophysics Data System (ADS)
Ykhlef, F.; Bensebti, M.
2013-03-01
Pitch (F 0) modification is one of the most important problems in the area of speech synthesis. Several techniques have been developed in the literature to achieve this goal. The main restrictions of these techniques are in the modification range and the synthesised speech quality, intelligibility and naturalness. The control of formants in a spoken language can significantly improve the naturalness of the synthesised speech. This improvement is mainly dependent on the control of the first formant (F 1). Inspired by this observation, this article proposes a new approach that modifies both F 0 and F 1 of Arabic voiced sounds in order to improve the naturalness of the pitch shifted speech. The developed strategy takes a parallel processing approach, in which the analysis segments are decomposed into sub-bands in the wavelet domain, modified in the desired sub-band by using a resampling technique and reconstructed without affecting the remained sub-bands. Pitch marking and voicing detection are performed in the frequency decomposition step based on the comparison of the multi-level approximation and detail signals. The performance of the proposed technique is evaluated by listening tests and compared to the pitch synchronous overlap and add (PSOLA) technique in the third approximation level. Experimental results have shown that the manipulation in the wavelet domain of F 0 in conjunction with F 1 guarantees natural-sounding of the synthesised speech compared to the classical pitch modification technique. This improvement was appropriate for high pitch modifications.
Lee, Norman; Schrode, Katrina M; Bee, Mark A
2017-09-01
Diverse animals communicate using multicomponent signals. How a receiver's central nervous system integrates multiple signal components remains largely unknown. We investigated how female green treefrogs (Hyla cinerea) integrate the multiple spectral components present in male advertisement calls. Typical calls have a bimodal spectrum consisting of formant-like low-frequency (~0.9 kHz) and high-frequency (~2.7 kHz) components that are transduced by different sensory organs in the inner ear. In behavioral experiments, only bimodal calls reliably elicited phonotaxis in no-choice tests, and they were selectively chosen over unimodal calls in two-alternative choice tests. Single neurons in the inferior colliculus of awake, passively listening subjects were classified as combination-insensitive units (27.9%) or combination-sensitive units (72.1%) based on patterns of relative responses to the same bimodal and unimodal calls. Combination-insensitive units responded similarly to the bimodal call and one or both unimodal calls. In contrast, combination-sensitive units exhibited both linear responses (i.e., linear summation) and, more commonly, nonlinear responses (e.g., facilitation, compressive summation, or suppression) to the spectral combination in the bimodal call. These results are consistent with the hypothesis that nonlinearities play potentially critical roles in spectral integration and in the neural processing of multicomponent communication signals.
Makagon, Maja M.; Funayama, E. Sumie; Owren, Michael J.
2008-01-01
Relatively few empirical data are available concerning the role of auditory experience in nonverbal human vocal behavior, such as laughter production. This study compared the acoustic properties of laughter in 19 congenitally, bilaterally, and profoundly deaf college students and in 23 normally hearing control participants. Analyses focused on degree of voicing, mouth position, air-flow direction, temporal features, relative amplitude, fundamental frequency, and formant frequencies. Results showed that laughter produced by the deaf participants was fundamentally similar to that produced by the normally hearing individuals, which in turn was consistent with previously reported findings. Finding comparable acoustic properties in the sounds produced by deaf and hearing vocalizers confirms the presumption that laughter is importantly grounded in human biology, and that auditory experience with this vocalization is not necessary for it to emerge in species-typical form. Some differences were found between the laughter of deaf and hearing groups; the most important being that the deaf participants produced lower-amplitude and longer-duration laughs. These discrepancies are likely due to a combination of the physiological and social factors that routinely affect profoundly deaf individuals, including low overall rates of vocal fold use and pressure from the hearing world to suppress spontaneous vocalizations. PMID:18646991
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses.
Waaramaa, Teija; Palo, Pertti; Kankare, Elina
2015-12-01
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
Perceptual effects of dialectal and prosodic variation in vowels
NASA Astrophysics Data System (ADS)
Fox, Robert Allen; Jacewicz, Ewa; Hatcher, Kristin; Salmons, Joseph
2005-09-01
As was reported earlier [Fox et al., J. Acoust. Soc. Am. 114, 2396 (2003)], certain vowels in the Ohio and Wisconsin dialects of American English are shifting in different directions. In addition, we have found that the spectral characteristics of these vowels (e.g., duration and formant frequencies) changed systematically under varying degrees of prosodic prominence, with somewhat different changes occurring within each dialect. The question addressed in the current study is whether naive listeners from these two dialects are sensitive to both the dialect variations and to the prosodically induced spectral differences. Listeners from Ohio and Wisconsin listened to the stimulus tokens [beIt] and [bɛt] produced in each of three prosodic contexts (representing three different levels of prominence). These words were produced by speakers from Ohio or from Wisconsin (none of the listeners were also speakers). Listeners identified the stimulus tokens in terms of vowel quality and indicated whether it was a good, fair, or poor exemplar of that phonetic category. Results showed that both phonetic quality decisions and goodness ratings were systematically and significantly affected by speaker dialect, listener dialect, and prosodic context. Implications of source and nature of ongoing vowel changes in these two dialects will be discussed. [Work partially supported by NIDCD R03 DC005560-01.
Age of acquisition and allophony in Spanish-English bilinguals.
Barlow, Jessica A
2014-01-01
This study examines age of acquisition (AoA) in Spanish-English bilinguals' phonetic and phonological knowledge of /l/ in English and Spanish. In English, the lateral approximant /l/ varies in darkness by context [based on the second formant (F2) and the difference between F2 and the first formant (F1)], but the Spanish /l/ does not. Further, English /l/ is overall darker than Spanish /l/. Thirty-eight college-aged adults participated: 11 Early Spanish-English bilinguals who learned English before the age of 5 years, 14 Late Spanish-English bilinguals who learned English after the age of 6 years, and 13 English monolinguals. Participants' /l/ productions were acoustically analyzed by language and context. The results revealed a Spanish-to-English phonetic influence on /l/ productions for both Early and Late bilinguals, as well as an English-to-Spanish phonological influence on the patterning of /l/ for the Late Bilinguals. These findings are discussed in terms of the Speech Learning Model and the effect of AoA on the interaction between a bilingual speaker's two languages.
Age of acquisition and allophony in Spanish-English bilinguals
Barlow, Jessica A.
2014-01-01
This study examines age of acquisition (AoA) in Spanish-English bilinguals’ phonetic and phonological knowledge of /l/ in English and Spanish. In English, the lateral approximant /l/ varies in darkness by context [based on the second formant (F2) and the difference between F2 and the first formant (F1)], but the Spanish /l/ does not. Further, English /l/ is overall darker than Spanish /l/. Thirty-eight college-aged adults participated: 11 Early Spanish-English bilinguals who learned English before the age of 5 years, 14 Late Spanish-English bilinguals who learned English after the age of 6 years, and 13 English monolinguals. Participants’ /l/ productions were acoustically analyzed by language and context. The results revealed a Spanish-to-English phonetic influence on /l/ productions for both Early and Late bilinguals, as well as an English-to-Spanish phonological influence on the patterning of /l/ for the Late Bilinguals. These findings are discussed in terms of the Speech Learning Model and the effect of AoA on the interaction between a bilingual speaker’s two languages. PMID:24795664
Stop-like modification of the dental fricative ∕ð∕: An acoustic analysis
Zhao, Sherry Y.
2010-01-01
This study concentrates on one of the commonly occurring phonetic variations in English: the stop-like modification of the dental fricative ∕ð∕. The variant exhibits a drastic change from the canonical ∕ð∕; the manner of articulation is changed from one that is fricative to one that is stop-like. Furthermore, the place of articulation of stop-like ∕ð∕ has been a point of uncertainty, leading to confusion between stop-like ∕ð∕ and ∕d/. In this study, acoustic and spectral moment measures were taken from 100 stop-like ∕ð∕ and 102 ∕d/ tokens produced by 59 male and 23 female speakers in the TIMIT corpus. Data analysis indicated that stop-like ∕ð∕ is significantly different from ∕d/ in burst amplitude, burst spectrum shape, burst peak frequency, second formant at following-vowel onset, and spectral moments. Moreover, the acoustic differences from ∕d/ are consistent with those expected for a dental stop-like ∕ð∕. Automatic classification experiments involving these acoustic measures suggested that they are salient in distinguishing stop-like ∕ð∕ from ∕d/. PMID:20968372
Improved vocal tract reconstruction and modeling using an image super-resolution technique.
Zhou, Xinhui; Woo, Jonghye; Stone, Maureen; Prince, Jerry L; Espy-Wilson, Carol Y
2013-06-01
Magnetic resonance imaging has been widely used in speech production research. Often only one image stack (sagittal, axial, or coronal) is used for vocal tract modeling. As a result, complementary information from other available stacks is not utilized. To overcome this, a recently developed super-resolution technique was applied to integrate three orthogonal low-resolution stacks into one isotropic volume. The results on vowels show that the super-resolution volume produces better vocal tract visualization than any of the low-resolution stacks. Its derived area functions generally produce formant predictions closer to the ground truth, particularly for those formants sensitive to area perturbations at constrictions.
Reliable but weak voice-formant cues to body size in men but not women
NASA Astrophysics Data System (ADS)
Rendall, Drew; Vokey, John R.; Nemeth, Christie; Ney, Christina
2005-04-01
Whether voice formants provide reliable cues to adult body size has been contested recently for some animals and humans and the outcome bears critically on theories of social competition and mate choice, language origins, and speaker normalization. We report two experiments to test listeners' ability to assess speaker body size. In Experiment 1, listeners heard paired comparisons of the same short phrase spoken by two adults of the same sex paired randomly with respect to height and indicated which was larger. Both sexes (M=20; F=22) showed an equal but modest ability to identify the larger male (mean correct=58.5% T=31.5, P<0.001) that correlated with the magnitude of their height difference but could not pick the larger female (mean correct=52.0% T=1.05, P=0.305) regardless of the height difference. Experiment 2 used single word comparisons, focused only on male voices, and controlled F0 while manipulating F1-F4 between speakers. When F0 was equal but F1-F4 predicted the height difference between speakers, both sexes (M=12; F=18) correctly chose the taller male (80%). When F1-F4 values of the shorter male were reduced below those of the taller male (or vice versa), subjects shifted to pick the shorter male as being larger.
Colloquial Arabic vowels in Israel: a comparative acoustic study of two dialects.
Amir, Noam; Amir, Ofer; Rosenhouse, Judith
2014-10-01
This study explores the acoustic properties of the vowel systems of two dialects of colloquial Arabic spoken in Israel. One dialect is spoken in the Galilee region in the north of Israel, and the other is spoken in the Triangle (Muthallath) region, in central Israel. These vowel systems have five short and five long vowels /i, i:, e, e:, a, a:, o, o:, u, u:/. Twenty men and twenty women from each region were included, uttering 30 vowels each. All speakers were adult Muslim native speakers of these two dialects. The studied vowels were uttered in non-pharyngeal and non-laryngeal environments in the context of CVC words, embedded in a carrier sentence. The acoustic parameters studied were the two first formants, F0, and duration. Results revealed that long vowels were approximately twice as long as short vowels and differed also in their formant values. The two dialects diverged mainly in the short vowels rather than in the long ones. An overlap was found between the two short vowel pairs /i/-/e/ and /u/-/o/. This study demonstrates the existence of dialectal differences in the colloquial Arabic vowel systems, underlining the need for further research into the numerous additional dialects found in the region.
Speaker normalization for chinese vowel recognition in cochlear implants.
Luo, Xin; Fu, Qian-Jie
2005-07-01
Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.
Acoustic characteristics of the vowel systems of six regional varieties of American English
NASA Astrophysics Data System (ADS)
Clopper, Cynthia G.; Pisoni, David B.; de Jong, Kenneth
2005-09-01
Previous research by speech scientists on the acoustic characteristics of American English vowel systems has typically focused on a single regional variety, despite decades of sociolinguistic research demonstrating the extent of regional phonological variation in the United States. In the present study, acoustic measures of duration and first and second formant frequencies were obtained from five repetitions of 11 different vowels produced by 48 talkers representing both genders and six regional varieties of American English. Results revealed consistent variation due to region of origin, particularly with respect to the production of low vowels and high back vowels. The Northern talkers produced shifted low vowels consistent with the Northern Cities Chain Shift, the Southern talkers produced fronted back vowels consistent with the Southern Vowel Shift, and the New England, Midland, and Western talkers produced the low back vowel merger. These findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in terms of a single set of idealized acoustic-phonetic baselines of ``General'' American English and provide benchmark data for six regional varieties.
Speech Perception and Short Term Memory Deficits in Persistent Developmental Speech Disorder
Kenney, Mary Kay; Barac-Cikoja, Dragana; Finnegan, Kimberly; Jeffries, Neal; Ludlow, Christy L.
2008-01-01
Children with developmental speech disorders may have additional deficits in speech perception and/or short-term memory. To determine whether these are only transient developmental delays that can accompany the disorder in childhood or persist as part of the speech disorder, adults with a persistent familial speech disorder were tested on speech perception and short-term memory. Nine adults with a persistent familial developmental speech disorder without language impairment were compared with 20 controls on tasks requiring the discrimination of fine acoustic cues for word identification and on measures of verbal and nonverbal short-term memory. Significant group differences were found in the slopes of the discrimination curves for first formant transitions for word identification with stop gaps of 40 and 20 ms with effect sizes of 1.60 and 1.56. Significant group differences also occurred on tests of nonverbal rhythm and tonal memory, and verbal short-term memory with effect sizes of 2.38, 1.56 and 1.73. No group differences occurred in the use of stop gap durations for word identification. Because frequency-based speech perception and short-term verbal and nonverbal memory deficits both persisted into adulthood in the speech-impaired adults, these deficits may be involved in the persistence of speech disorders without language impairment. PMID:15896836
Sharp and round shapes of seen objects have distinct influences on vowel and consonant articulation.
Vainio, L; Tiainen, M; Tiippana, K; Rantala, A; Vainio, M
2017-07-01
The shape and size-related sound symbolism phenomena assume that, for example, the vowel [i] and the consonant [t] are associated with sharp-shaped and small-sized objects, whereas [ɑ] and [m] are associated with round and large objects. It has been proposed that these phenomena are mostly based on the involvement of articulatory processes in representing shape and size properties of objects. For example, [i] might be associated with sharp and small objects, because it is produced by a specific front-close shape of articulators. Nevertheless, very little work has examined whether these object properties indeed have impact on speech sound vocalization. In the present study, the participants were presented with a sharp- or round-shaped object in a small or large size. They were required to pronounce one out of two meaningless speech units (e.g., [i] or [ɑ]) according to the size or shape of the object. We investigated how a task-irrelevant object property (e.g., the shape when responses are made according to size) influences reaction times, accuracy, intensity, fundamental frequency, and formant 1 and formant 2 of vocalizations. The size did not influence vocal responses but shape did. Specifically, the vowel [i] and consonant [t] were vocalized relatively rapidly when the object was sharp-shaped, whereas [u] and [m] were vocalized relatively rapidly when the object was round-shaped. The study supports the view that the shape-related sound symbolism phenomena might reflect mapping of the perceived shape with the corresponding articulatory gestures.
Variations in respiratory sounds in relation to fluid accumulation in the upper airways.
Yadollahi, Azadeh; Rudzicz, Frank; Montazeri, Aman; Bradley, T Douglas
2013-01-01
Obstructive sleep apnea (OSA) is a common disorder due to recurrent collapse of the upper airway (UA) during sleep that increases the risk for several cardiovascular diseases. Recently, we showed that nocturnal fluid accumulation in the neck can narrow the UA and predispose to OSA. Our goal is to develop non-invasive methods to study the pathogenesis of OSA and the factors that increase the risks of developing it. Respiratory sound analysis is a simple and non-invasive way to study variations in the properties of the UA. In this study we examine whether such analysis can be used to estimate the amount of neck fluid volume and whether fluid accumulation in the neck alters the properties of these sounds. Our acoustic features include estimates of formants, pitch, energy, duration, zero crossing rate, average power, Mel frequency power, Mel cepstral coefficients, skewness, and kurtosis across segments of sleep. Our results show that while all acoustic features vary significantly among subjects, only the variations in respiratory sound energy, power, duration, pitch, and formants varied significantly over time. Decreases in energy and power over time accompany increases in neck fluid volume which may indicate narrowing of UA and consequently an increased risk of OSA. Finally, simple discriminant analysis was used to estimate broad classes of neck fluid volume from acoustic features with an accuracy of 75%. These results suggest that acoustic analysis of respiratory sounds might be used to assess the role of fluid accumulation in the neck on the pathogenesis of OSA.
Acoustic voice analysis of prelingually deaf adults before and after cochlear implantation.
Evans, Maegan K; Deliyski, Dimitar D
2007-11-01
It is widely accepted that many severe to profoundly deaf adults have benefited from cochlear implants (CIs). However, limited research has been conducted to investigate changes in voice and speech of prelingually deaf adults who receive CIs, a population well known for presenting with a variety of voice and speech abnormalities. The purpose of this study was to use acoustic analysis to explore changes in voice and speech for three prelingually deaf males pre- and postimplantation over 6 months. The following measurements, some measured in varying contexts, were obtained: fundamental frequency (F0), jitter, shimmer, noise-to-harmonic ratio, voice turbulence index, soft phonation index, amplitude- and F0-variation, F0-range, speech rate, nasalance, and vowel production. Characteristics of vowel production were measured by determining the first formant (F1) and second formant (F2) of vowels in various contexts, magnitude of F2-variation, and rate of F2-variation. Perceptual measurements of pitch, pitch variability, loudness variability, speech rate, and intonation were obtained for comparison. Results are reported using descriptive statistics. The results showed patterns of change for some of the parameters while there was considerable variation across the subjects. All participants demonstrated a decrease in F0 in at least one context and demonstrated a change in nasalance toward the norm as compared to their normal hearing control. The two participants who were oral-language communicators were judged to produce vowels with an average of 97.2% accuracy and the sign-language user demonstrated low percent accuracy for vowel production.
Dilley, Laura C.; Wieland, Elizabeth A.; Gamache, Jessica L.; McAuley, J. Devin; Redford, Melissa A.
2013-01-01
Purpose As children mature, changes in voice spectral characteristics covary with changes in speech, language, and behavior. Spectral characteristics were manipulated to alter the perceived ages of talkers’ voices while leaving critical acoustic-prosodic correlates intact, to determine whether perceived age differences were associated with differences in judgments of prosodic, segmental, and talker attributes. Method Speech was modified by lowering formants and fundamental frequency, for 5-year-old children’s utterances, or raising them, for adult caregivers’ utterances. Next, participants differing in awareness of the manipulation (Exp. 1a) or amount of speech-language training (Exp. 1b) made judgments of prosodic, segmental, and talker attributes. Exp. 2 investigated the effects of spectral modification on intelligibility. Finally, in Exp. 3 trained analysts used formal prosody coding to assess prosodic characteristics of spectrally-modified and unmodified speech. Results Differences in perceived age were associated with differences in ratings of speech rate, fluency, intelligibility, likeability, anxiety, cognitive impairment, and speech-language disorder/delay; effects of training and awareness of the manipulation on ratings were limited. There were no significant effects of the manipulation on intelligibility or formally coded prosody judgments. Conclusions Age-related voice characteristics can greatly affect judgments of speech and talker characteristics, raising cautionary notes for developmental research and clinical work. PMID:23275414
Mecke, Ann-Christine; Sundberg, Johan; Richter, Bernhard
2010-10-01
In this investigation the voice source from trained boy singers was processed with a transfer function that contained the singer's formant cluster of a bass, a baritone, or a tenor. The modified voices were evaluated by a panel of highly specialized experts. The experts were asked 1) to assess how similar the examples sounded to the voice of the last castrato Alessandro Moreschi, and 2) to rate how similar they thought the examples were to their imagination of an 18th-century castrato voice. For both questions, the voices with tenor formants produced significantly higher ratings than the other voice types. However, the mean ratings for the second question were generally lower than those for the first.
Acoustical study of classical Peking Opera singing.
Sundberg, Johan; Gu, Lide; Huang, Qiang; Huang, Ping
2012-03-01
Acoustic characteristics of classical opera singing differ considerably between the Western and the Chinese cultures. Singers in the classical Peking opera tradition specialize on one out of a limited number of standard roles. Audio and electroglottograph signals were recorded for four performers of the Old Man role and three performers of the Colorful Face role. Recordings were made of the singers' speech and when they sang recitatives and songs from their roles. Sound pressure level, fundamental frequency, and spectrum characteristics were analyzed. Histograms showing the distribution of fundamental frequency showed marked peaks for the songs, suggesting a scale tone structure. Some of the intervals between these peaks were similar to those used in Western music. Vibrato rate was about 3.5Hz, that is, considerably slower than in Western classical singing. Spectra of vibrato-free tones contained unbroken series of harmonic partials sometimes reaching up to 17 000Hz. Long-term-average spectrum (LTAS) curves showed no trace of a singer's formant cluster. However, the Colorful Face role singers' LTAS showed a marked peak near 3300Hz, somewhat similar to that found in Western pop music singers. The mean LTAS spectrum slope between 700 and 6000Hz decreased by about 0.2dB/octave per dB of equivalent sound level. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Comparison of hearing and voicing ranges in singing
NASA Astrophysics Data System (ADS)
Hunter, Eric J.; Titze, Ingo R.
2003-04-01
The spectral and dynamic ranges of the human voice of professional and nonprofessional vocalists were compared to the auditory hearing and feeling thresholds at a distance of one meter. In order to compare these, an analysis was done in true dB SPL, not just relative dB as is usually done in speech analysis. The methodology of converting the recorded acoustic signal to absolute pressure units was described. The human voice range of a professional vocalist appeared to match the dynamic range of the auditory system at some frequencies. In particular, it was demonstrated that professional vocalists were able to make use of the most sensitive part of the hearing thresholds (around 4 kHz) through the use of a learned vocal ring or singer's formant. [Work sponsored by NIDCD.
Cross-dialectal variation in formant dynamics of American English vowels
Fox, Robert Allen; Jacewicz, Ewa
2009-01-01
This study aims to characterize the nature of the dynamic spectral change in vowels in three distinct regional varieties of American English spoken in the Western North Carolina, in Central Ohio, and in Southern Wisconsin. The vowels ∕ɪ, ε, e, æ, aɪ∕ were produced by 48 women for a total of 1920 utterances and were contained in words of the structure ∕bVts∕ and ∕bVdz∕ in sentences which elicited nonemphatic and emphatic vowels. Measurements made at the vowel target (i.e., the central 60% of the vowel) produced a set of acoustic parameters which included position and movement in the F1 by F2 space, vowel duration, amount of spectral change [measured as vector length (VL) and trajectory length (TL)], and spectral rate of change. Results revealed expected variation in formant dynamics as a function of phonetic factors (vowel emphasis and consonantal context). However, for each vowel and for each measure employed, dialect was a strong source of variation in vowel-inherent spectral change. In general, the dialect-specific nature and amount of spectral change can be characterized quite effectively by position and movement in the F1 by F2 space, vowel duration, TL (but not VL which underestimates formant movement), and spectral rate of change. PMID:19894839
Baudonck, Nele; Van Lierde, K; Dhooge, I; Corthals, P
2011-01-01
The purpose of this study was to compare vowel productions by deaf cochlear implant (CI) children, hearing-impaired hearing aid (HA) children and normal-hearing (NH) children. 73 children [mean age: 9;14 years (years;months)] participated: 40 deaf CI children, 34 moderately to profoundly hearing-impaired HA children and 42 NH children. For the 3 corner vowels [a], [i] and [u], F(1), F(2) and the intrasubject SD were measured using the Praat software. Spectral separation between these vowel formants and vowel space were calculated. The significant effects in the CI group all pertain to a higher intrasubject variability in formant values, whereas the significant effects in the HA group all pertain to lower formant values. Both hearing-impaired subgroups showed a tendency toward greater intervowel distances and vowel space. Several subtle deviations in the vowel production of deaf CI children and hearing-impaired HA children could be established, using a well-defined acoustic analysis. CI children as well as HA children in this study tended to overarticulate, which hypothetically can be explained by a lack of auditory feedback and an attempt to compensate it by proprioceptive feedback during articulatory maneuvers. Copyright © 2010 S. Karger AG, Basel.
Speaker compensation for local perturbation of fricative acoustic feedback.
Casserly, Elizabeth D
2011-04-01
Feedback perturbation studies of speech acoustics have revealed a great deal about how speakers monitor and control their productions of segmental (e.g., formant frequencies) and non-segmental (e.g., pitch) linguistic elements. The majority of previous work, however, overlooks the role of acoustic feedback in consonant production and makes use of acoustic manipulations that effect either entire utterances or the entire acoustic signal, rather than more temporally and phonetically restricted alterations. This study, therefore, seeks to expand the feedback perturbation literature by examining perturbation of consonant acoustics that is applied in a time-restricted and phonetically specific manner. The spectral center of the alveopalatal fricative [∫] produced in vowel-fricative-vowel nonwords was incrementally raised until it reached the potential for [s]-like frequencies, but the characteristics of high-frequency energy outside the target fricative remained unaltered. An "offline," more widely accessible signal processing method was developed to perform this manipulation. The local feedback perturbation resulted in changes to speakers' fricative production that were more variable, idiosyncratic, and restricted than the compensation seen in more global acoustic manipulations reported in the literature. Implications and interpretations of the results, as well as future directions for research based on the findings, are discussed.
Perceptual adaptation of voice gender discrimination with spectrally shifted vowels.
Li, Tianhao; Fu, Qian-Jie
2011-08-01
To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F(0) > 180 Hz and 3 male talkers with F(0) < 170 Hz. There was no significant adaptation with the 50-Hz cutoff frequency. Temporal envelope cues are important for voice gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination.
The temporal representation of speech in a nonlinear model of the guinea pig cochlea
NASA Astrophysics Data System (ADS)
Holmes, Stephen D.; Sumner, Christian J.; O'Mard, Lowel P.; Meddis, Ray
2004-12-01
The temporal representation of speechlike stimuli in the auditory-nerve output of a guinea pig cochlea model is described. The model consists of a bank of dual resonance nonlinear filters that simulate the vibratory response of the basilar membrane followed by a model of the inner hair cell/auditory nerve complex. The model is evaluated by comparing its output with published physiological auditory nerve data in response to single and double vowels. The evaluation includes analyses of individual fibers, as well as ensemble responses over a wide range of best frequencies. In all cases the model response closely follows the patterns in the physiological data, particularly the tendency for the temporal firing pattern of each fiber to represent the frequency of a nearby formant of the speech sound. In the model this behavior is largely a consequence of filter shapes; nonlinear filtering has only a small contribution at low frequencies. The guinea pig cochlear model produces a useful simulation of the measured physiological response to simple speech sounds and is therefore suitable for use in more advanced applications including attempts to generalize these principles to the response of human auditory system, both normal and impaired. .
Perceptual Adaptation of Voice Gender Discrimination with Spectrally Shifted Vowels
Li, Tianhao; Fu, Qian-Jie
2013-01-01
Purpose To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Method Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the speech of 5 male and 5 female talkers with 16-channel sine-wave vocoders. The subjects were randomly divided into 2 groups; one subjected to 50-Hz, and the other to 200-Hz, temporal envelope cutoff frequencies. No preview or feedback was provided. Results: There was significant adaptation in voice gender discrimination with the 200-Hz cutoff frequency, but significant improvement was observed only for 3 female talkers with F0 > 180 Hz and 3 male talkers with F0 < 170 Hz. There was no significant adaptation with the 50-Hz cutoff frequency. Conclusions Temporal envelope cues are important for voice gender discrimination under spectral shift conditions with perceptual adaptation, but spectral shift may limit the exclusive use of spectral information and/or the use of formant structure on voice gender discrimination. The results have implications for cochlear implant users and for understanding voice gender discrimination. PMID:21173392
The singer's formant and speaker's ring resonance: a long-term average spectrum analysis.
Lee, Sang-Hyuk; Kwon, Hee-Jun; Choi, Hyun-Jin; Lee, Nam-Hun; Lee, Sung-Jin; Jin, Sung-Min
2008-06-01
We previously showed that a trained tenor's voice has the conventional singer's formant at the region of 3 kHz and another energy peak at 8-9 kHz. Singers in other operatic voice ranges are assumed to have the same peak in their singing and speaking voice. However, to date, no specific measurement of this has been made. Tenors, baritones, sopranos and mezzo sopranos were chosen to participate in this study of the singer's formant and the speaker's ring resonance. Untrained males (n=15) and females (n=15) were included in the control group. Each subject was asked to produce successive /a/ vowel sounds in their singing and speaking voice. For singing, the low pitch was produced in the chest register and the high notes in the head register. We collected the data on the long-term average spectra of the speaking and singing voices of the trained singers and the control groups. For the sounds produced from the head register, a significant energy concentration was seen in both 2.2-3.4 kHz and 7.5-8.4 kHz regions (except for the voices of the mezzo sopranos) in the trained singer group when compared to the control groups. Also, the chest register had a significant energy concentration in the 4 trained singer groups at the 2.2-3.1 kHz and 7.8-8.4 kHz. For speaking sound, all trained singers had a significant energy concentration at 2.2-5.3 kHz and sopranos had another energy concentration at 9-10 kHz. The results of this study suggest that opera singers have more energy concentration in the singer's formant/speaker's ring region, in both singing and speaking voices. Furthermore, another region of energy concentration was identified in opera singer's singing sound and in sopranos' speaking sound at 8-9 kHz. The authors believe that these energy concentrations may contribute to the rich voice of trained singers.
Native language shapes automatic neural processing of speech.
Intartaglia, Bastien; White-Schwoch, Travis; Meunier, Christine; Roman, Stéphane; Kraus, Nina; Schön, Daniele
2016-08-01
The development of the phoneme inventory is driven by the acoustic-phonetic properties of one's native language. Neural representation of speech is known to be shaped by language experience, as indexed by cortical responses, and recent studies suggest that subcortical processing also exhibits this attunement to native language. However, most work to date has focused on the differences between tonal and non-tonal languages that use pitch variations to convey phonemic categories. The aim of this cross-language study is to determine whether subcortical encoding of speech sounds is sensitive to language experience by comparing native speakers of two non-tonal languages (French and English). We hypothesized that neural representations would be more robust and fine-grained for speech sounds that belong to the native phonemic inventory of the listener, and especially for the dimensions that are phonetically relevant to the listener such as high frequency components. We recorded neural responses of American English and French native speakers, listening to natural syllables of both languages. Results showed that, independently of the stimulus, American participants exhibited greater neural representation of the fundamental frequency compared to French participants, consistent with the importance of the fundamental frequency to convey stress patterns in English. Furthermore, participants showed more robust encoding and more precise spectral representations of the first formant when listening to the syllable of their native language as compared to non-native language. These results align with the hypothesis that language experience shapes sensory processing of speech and that this plasticity occurs as a function of what is meaningful to a listener. Copyright © 2016 Elsevier Ltd. All rights reserved.
Inferior colliculus contributions to phase encoding of stop consonants in an animal model
Warrier, Catherine M; Abrams, Daniel A; Nicol, Trent G; Kraus, Nina
2011-01-01
The human auditory brainstem is known to be exquisitely sensitive to fine-grained spectro-temporal differences between speech sound contrasts, and the ability of the brainstem to discriminate between these contrasts is important for speech perception. Recent work has described a novel method for translating brainstem timing differences in response to speech contrasts into frequency-specific phase differentials. Results from this method have shown that the human brainstem response is surprisingly sensitive to phase-differences inherent to the stimuli across a wide extent of the spectrum. Here we use an animal model of the auditory brainstem to examine whether the stimulus-specific phase signatures measured in human brainstem responses represent an epiphenomenon associated with far field (i.e., scalp-recorded) measurement of neural activity, or alternatively whether these specific activity patterns are also evident in auditory nuclei that contribute to the scalp-recorded response, thereby representing a more fundamental temporal processing phenomenon. Responses in anaesthetized guinea pigs to three minimally-contrasting consonant-vowel stimuli were collected simultaneously from the cortical surface vertex and directly from central nucleus of the inferior colliculus (ICc), measuring volume conducted neural activity and multiunit, near-field activity, respectively. Guinea pig surface responses were similar to human scalp-recorded responses to identical stimuli in gross morphology as well as phase characteristics. Moreover, surface recorded potentials shared many phase characteristics with near-field ICc activity. Response phase differences were prominent during formant transition periods, reflecting spectro-temporal differences between syllables, and showed more subtle differences during the identical steady-state periods. ICc encoded stimulus distinctions over a broader frequency range, with differences apparent in the highest frequency ranges analyzed, up to 3000 Hz. Based on the similarity of phase encoding across sites, and the consistency and sensitivity of response phase measured within ICc, results suggest that a general property of the auditory system is a high degree of sensitivity to fine-grained phase information inherent to complex acoustical stimuli. Furthermore, results suggest that temporal encoding in ICc contributes to temporal features measured in speech-evoked scalp-recorded responses. PMID:21945200
Matsushima, J; Kumagai, M; Harada, C; Takahashi, K; Inuyama, Y; Ifukube, T
1992-09-01
Our previous reports showed that second formant information, using a speech coding method, could be transmitted through an electrode on the promontory. However, second formant information can also be transmitted by tactile stimulation. Therefore, to find out whether electrical stimulation of the auditory nerve would be superior to tactile stimulation for our speech coding method, the time resolutions of the two modes of stimulation were compared. The results showed that the time resolution of electrical promontory stimulation was three times better than the time resolution of tactile stimulation of the finger. This indicates that electrical stimulation of the auditory nerve is much better for our speech coding method than tactile stimulation of the finger.
Kayes, Gillyanne; Welch, Graham F
2017-05-01
Using an empirical design, this study investigated perceptual and acoustic differences between the recorded vocal products of songs and scales of professional female singers of classical Western Lyric (WL) and non-legit Musical Theater (MT) styles. A total of 54 audio-recorded samples of songs and scales from professional female singers were rated in a blind randomized testing process by seven expert listeners as being performed by either a WL or MT singer. Songs and scales that were accurately perceived by genre were then analyzed intra- and inter-genre using long-term average spectrum analysis. A high level of agreement was found between judges in ratings for both songs and scales according to genre (P < 0.0001). Judges were more successful in locating WL than MT, but accuracy was always >50%. For the long-term average spectrum analysis intra-genre, song and scale matched better than chance. The highest spectral peak for the WL singers was at the mean fundamental frequency, whereas this spectral area was weaker for the MT singers, who showed a marked peak at 1 kHz. The other main inter-genre difference appeared in the higher frequency region, with a peak in the MT spectrum between 4 and 5 kHz-the region of the "speaker's formant." In comparing female singers of WL and MT styles, scales as well as song tasks appear to be indicative of singer genre behavior. This implied difference in vocal production may be useful to teachers and clinicians dealing with multiple genres. The addition of a scale-in-genre task may be useful in future research seeking to identify genre-distinctive behaviors. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Loud speech over noise: some spectral attributes, with gender differences.
Ternström, Sten; Bohman, Mikael; Södersten, Maria
2006-03-01
In seeking an acoustic description of overloaded voice, simulated environmental noise was used to elicit loud speech. A total of 23 adults, 12 females and 11 males, read six passages of 90 s duration, over realistic noise presented over loudspeakers. The noise was canceled out, exposing the speech signal to analysis. Spectrum balance (SB) was defined as the level of the 2-6 kHz band relative to the 0.1-1 kHz band. SB averaged across many similar vowel segments became less negative with increasing sound pressure level (SPL), as described in the literature, but only at moderate SPL. At high SPL, SB exhibited a personal "saturation" point, above which the high-band level no longer increased faster than the overall SPL, or even stopped increasing altogether, on average at 90.3 dB (@30 cm) for females and 95.5 dB for males. Saturation occurred 6-8 dB below the personal maximum SPL, regardless of gender. The loudest productions were often characterized by a relative increase in low-frequency energy, apparently in a sharpened first formant. This suggests a change of vocal strategy when the high spectrum can rise no further. The progression of SB with SPL was characteristically different for individual subjects.
A mathematical model of vowel identification by users of cochlear implants
Sagi, Elad; Meyer, Ted A.; Kaiser, Adam R.; Teoh, Su Wooi; Svirsky, Mario A.
2010-01-01
A simple mathematical model is presented that predicts vowel identification by cochlear implant users based on these listeners’ resolving power for the mean locations of first, second, and∕or third formant energies along the implanted electrode array. This psychophysically based model provides hypotheses about the mechanism cochlear implant users employ to encode and process the input auditory signal to extract information relevant for identifying steady-state vowels. Using one free parameter, the model predicts most of the patterns of vowel confusions made by users of different cochlear implant devices and stimulation strategies, and who show widely different levels of speech perception (from near chance to near perfect). Furthermore, the model can predict results from the literature, such as Skinner, et al. [(1995). Ann. Otol. Rhinol. Laryngol. 104, 307–311] frequency mapping study, and the general trend in the vowel results of Zeng and Galvin’s [(1999). Ear Hear. 20, 60–74] studies of output electrical dynamic range reduction. The implementation of the model presented here is specific to vowel identification by cochlear implant users, but the framework of the model is more general. Computational models such as the one presented here can be useful for advancing knowledge about speech perception in hearing impaired populations, and for providing a guide for clinical research and clinical practice. PMID:20136228
Analysis of speech and tongue motion in normal and post-glossectomy speaker using cine MRI.
Ha, Jinhee; Sung, Iel-Yong; Son, Jang-Ho; Stone, Maureen; Ord, Robert; Cho, Yeong-Cheol
2016-01-01
Since the tongue is the oral structure responsible for mastication, pronunciation, and swallowing functions, patients who undergo glossectomy can be affected in various aspects of these functions. The vowel /i/ uses the tongue shape, whereas /u/ uses tongue and lip shapes. The purpose of this study is to investigate the morphological changes of the tongue and the adaptation of pronunciation using cine MRI for speech of patients who undergo glossectomy. Twenty-three controls (11 males and 12 females) and 13 patients (eight males and five females) volunteered to participate in the experiment. The patients underwent glossectomy surgery for T1 or T2 lateral lingual tumors. The speech tasks "a souk" and "a geese" were spoken by all subjects providing data for the vowels /u/ and /i/. Cine MRI and speech acoustics were recorded and measured to compare the changes in the tongue with vowel acoustics after surgery. 2D measurements were made of the interlip distance, tongue-palate distance, tongue position (anterior-posterior and superior-inferior), tongue height on the left and right sides, and pharynx size. Vowel formants Fl, F2, and F3 were measured. The patients had significantly lower F2/Fl ratios (F=5.911, p=0.018), and lower F3/F1 ratios that approached significance. This was seen primarily in the /u/ data. Patients had flatter tongue shapes than controls with a greater effect seen in /u/ than /i/. The patients showed complex adaptation motion in order to preserve the acoustic integrity of the vowels, and the tongue modified cavity size relationships to maintain the value of the formant frequencies.
A nose that roars: anatomical specializations and behavioural features of rutting male saiga
Frey, Roland; Volodin, Ilya; Volodina, Elena
2007-01-01
The involvement of the unique saiga nose in vocal production has been neglected so far. Rutting male saigas produce loud nasal roars. Prior to roaring, they tense and extend their noses in a highly stereotypic manner. This change of nose configuration includes dorsal folding and convex curving of the nasal vestibulum and is maintained until the roar ends. Red and fallow deer males that orally roar achieve a temporary increase of vocal tract length (vtl) by larynx retraction. Saiga males attain a similar effect by pulling their flexible nasal vestibulum rostrally, allowing for a temporary elongation of the nasal vocal tract by about 20%. Decrease of formant frequencies and formant dispersion, as acoustic effects of an increase of vtl, are assumed to convey important information on the quality of a dominant male to conspecifics, e.g. on body size and fighting ability. Nasal roaring in saiga may equally serve to deter rival males and to attract females. Anatomical constraints might have set a limit to the rostral pulling of the nasal vestibulum. It seems likely that the sexual dimorphism of the saiga nose was induced by sexual selection. Adult males of many mammalian species, after sniffing or licking female urine or genital secretions, raise their head and strongly retract their upper lip and small nasal vestibulum while inhalating orally. This flehmen behaviour is assumed to promote transport of non-volatile substances via the incisive ducts into the vomeronasal organs for pheromone detection. The flehmen aspect in saiga involves the extensive flexible walls of the greatly enlarged nasal vestibulum and is characterized by a distinctly concave configuration of the nose region, the reverse of that observed in nasal roaring. A step-by-step model for the gradual evolution of the saiga nose is presented here. PMID:17971116
Different Vocal Parameters Predict Perceptions of Dominance and Attractiveness.
Hodges-Simeon, Carolyn R; Gaulin, Steven J C; Puts, David A
2010-12-01
Low mean fundamental frequency (F(0)) in men's voices has been found to positively influence perceptions of dominance by men and attractiveness by women using standardized speech. Using natural speech obtained during an ecologically valid social interaction, we examined relationships between multiple vocal parameters and dominance and attractiveness judgments. Male voices from an unscripted dating game were judged by men for physical and social dominance and by women in fertile and non-fertile menstrual cycle phases for desirability in short-term and long-term relationships. Five vocal parameters were analyzed: mean F(0) (an acoustic correlate of vocal fold size), F(0) variation, intensity (loudness), utterance duration, and formant dispersion (D(f), an acoustic correlate of vocal tract length). Parallel but separate ratings of speech transcripts served as controls for content. Multiple regression analyses were used to examine the independent contributions of each of the predictors. Physical dominance was predicted by low F(0) variation and physically dominant word content. Social dominance was predicted only by socially dominant word content. Ratings of attractiveness by women were predicted by low mean F(0), low D(f), high intensity, and attractive word content across cycle phase and mating context. Low D(f) was perceived as attractive by fertile-phase women only. We hypothesize that competitors and potential mates may attend more strongly to different components of men's voices because of the different types of information these vocal parameters provide.
Acoustic characteristics of the vowel systems of six regional varieties of American English
Clopper, Cynthia G.; Pisoni, David B.; de Jong, Kenneth
2012-01-01
Previous research by speech scientists on the acoustic characteristics of American English vowel systems has typically focused on a single regional variety, despite decades of sociolinguistic research demonstrating the extent of regional phonological variation in the United States. In the present study, acoustic measures of duration and first and second formant frequencies were obtained from five repetitions of 11 different vowels produced by 48 talkers representing both genders and six regional varieties of American English. Results revealed consistent variation due to region of origin, particularly with respect to the production of low vowels and high back vowels. The Northern talkers produced shifted low vowels consistent with the Northern Cities Chain Shift, the Southern talkers produced fronted back vowels consistent with the Southern Vowel Shift, and the New England, Midland, and Western talkers produced the low back vowel merger. These findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in terms of a single set of idealized acoustic-phonetic baselines of “General” American English and provide benchmark data for six regional varieties. PMID:16240825
Some components of the ``cocktail-party effect,'' as revealed when it fails
NASA Astrophysics Data System (ADS)
Divenyi, Pierre L.; Gygi, Brian
2003-04-01
The precise way listeners cope with cocktail-party situations, i.e., understand speech in the midst of other, simultaneously ongoing conversations, has by-and-large remained a puzzle, despite research committed to studying the problem over the past half century. In contrast, it is widely acknowledged that the cocktail-party effect (CPE) deteriorates in aging. Our investigations during the last decade have assessed the deterioration of the CPE in elderly listeners and attempted to uncover specific auditory tasks, on which the performance of the same listeners will also exhibit a deficit. Correlated performance on CPE and such auditory tasks arguably signify that the tasks in question are necessary for perceptual segregation of the target speech and the background babble. We will present results on three tasks correlated with CPE performance. All three tasks require temporal processing-based perceptual segregation of specific non-speech stimuli (amplitude- and/or frequency-modulated sinusoidal complexes): discrimination of formant transition patterns, segregation of streams with different syllabic rhythms, and selective attention to AM or FM features in the designated stream. [Work supported by a grant from the National Institute on Aging and by the V.A. Medical Research.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing.
Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk
2015-01-01
The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21:9%) and volume (+ 16:8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer`s formant cluster.
Lower Vocal Tract Morphologic Adjustments Are Relevant for Voice Timbre in Singing
Mainka, Alexander; Poznyakovskiy, Anton; Platzek, Ivan; Fleischer, Mario; Sundberg, Johan; Mürbe, Dirk
2015-01-01
The vocal tract shape is crucial to voice production. Its lower part seems particularly relevant for voice timbre. This study analyzes the detailed morphology of parts of the epilaryngeal tube and the hypopharynx for the sustained German vowels /a/, /e/, /i/, /o/, and /u/ by thirteen male singer subjects who were at the beginning of their academic singing studies. Analysis was based on two different phonatory conditions: a natural, speech-like phonation and a singing phonation, like in classical singing. 3D models of the vocal tract were derived from magnetic resonance imaging and compared with long-term average spectrum analysis of audio recordings from the same subjects. Comparison of singing to the speech-like phonation, which served as reference, showed significant adjustments of the lower vocal tract: an average lowering of the larynx by 8 mm and an increase of the hypopharyngeal cross-sectional area (+ 21.9%) and volume (+ 16.8%). Changes in the analyzed epilaryngeal portion of the vocal tract were not significant. Consequently, lower larynx-to-hypopharynx area and volume ratios were found in singing compared to the speech-like phonation. All evaluated measures of the lower vocal tract varied significantly with vowel quality. Acoustically, an increase of high frequency energy in singing correlated with a wider hypopharyngeal area. The findings offer an explanation how classical male singers might succeed in producing a voice timbre with increased high frequency energy, creating a singer‘s formant cluster. PMID:26186691
Vowel change across three age groups of speakers in three regional varieties of American English
Jacewicz, Ewa; Fox, Robert A.; Salmons, Joseph
2011-01-01
This acoustic study examines sound (vowel) change in apparent time across three successive generations of 123 adult female speakers ranging in age from 20 to 65 years old, representing three regional varieties of American English, typical of western North Carolina, central Ohio and southeastern Wisconsin. A set of acoustic measures characterized the dynamic nature of formant trajectories, the amount of spectral change over the course of vowel duration and the position of the spectral centroid. The study found a set of systematic changes to /I, ε, æ/ including positional changes in the acoustic space (mostly lowering of the vowels) and significant variation in formant dynamics (increased monophthongization). This common sound change is evident in both emphatic (articulated clearly) and nonemphatic (casual) productions and occurs regardless of dialect-specific vowel dispersions in the vowel space. The cross-generational and cross-dialectal patterns of variation found here support an earlier report by Jacewicz, Fox, and Salmons (2011) which found this recent development in these three dialect regions in isolated citation-form words. While confirming the new North American Shift in different styles of production, the study underscores the importance of addressing the stress-related variation in vowel production in a careful and valid assessment of sound change. PMID:22125350
The Singer's Formant and Speaker's Ring Resonance: A Long-Term Average Spectrum Analysis
Lee, Sang-Hyuk; Kwon, Hee-Jun; Choi, Hyun-Jin; Lee, Nam-Hun; Lee, Sung-Jin
2008-01-01
Objectives We previously showed that a trained tenor's voice has the conventional singer's formant at the region of 3 kHz and another energy peak at 8-9 kHz. Singers in other operatic voice ranges are assumed to have the same peak in their singing and speaking voice. However, to date, no specific measurement of this has been made. Methods Tenors, baritones, sopranos and mezzo sopranos were chosen to participate in this study of the singer's formant and the speaker's ring resonance. Untrained males (n=15) and females (n=15) were included in the control group. Each subject was asked to produce successive /a/ vowel sounds in their singing and speaking voice. For singing, the low pitch was produced in the chest register and the high notes in the head register. We collected the data on the long-term average spectra of the speaking and singing voices of the trained singers and the control groups. Results For the sounds produced from the head register, a significant energy concentration was seen in both 2.2-3.4 kHz and 7.5-8.4 kHz regions (except for the voices of the mezzo sopranos) in the trained singer group when compared to the control groups. Also, the chest register had a significant energy concentration in the 4 trained singer groups at the 2.2-3.1 kHz and 7.8-8.4 kHz. For speaking sound, all trained singers had a significant energy concentration at 2.2-5.3 kHz and sopranos had another energy concentration at 9-10 kHz. Conclusion The results of this study suggest that opera singers have more energy concentration in the singer's formant/speaker's ring region, in both singing and speaking voices. Furthermore, another region of energy concentration was identified in opera singer's singing sound and in sopranos' speaking sound at 8-9 kHz. The authors believe that these energy concentrations may contribute to the rich voice of trained singers. PMID:19434279
Sex-biased sound symbolism in english-language first names.
Pitcher, Benjamin J; Mesoudi, Alex; McElligott, Alan G
2013-01-01
Sexual selection has resulted in sex-based size dimorphism in many mammals, including humans. In Western societies, average to taller stature men and comparatively shorter, slimmer women have higher reproductive success and are typically considered more attractive. This size dimorphism also extends to vocalisations in many species, again including humans, with larger individuals exhibiting lower formant frequencies than smaller individuals. Further, across many languages there are associations between phonemes and the expression of size (e.g. large /a, o/, small /i, e/), consistent with the frequency-size relationship in vocalisations. We suggest that naming preferences are a product of this frequency-size relationship, driving male names to sound larger and female names smaller, through sound symbolism. In a 10-year dataset of the most popular British, Australian and American names we show that male names are significantly more likely to contain larger sounding phonemes (e.g. "Thomas"), while female names are significantly more likely to contain smaller phonemes (e.g. "Emily"). The desire of parents to have comparatively larger, more masculine sons, and smaller, more feminine daughters, and the increased social success that accompanies more sex-stereotyped names, is likely to be driving English-language first names to exploit sound symbolism of size in line with sexual body size dimorphism.
Sex-Biased Sound Symbolism in English-Language First Names
Pitcher, Benjamin J.; Mesoudi, Alex; McElligott, Alan G.
2013-01-01
Sexual selection has resulted in sex-based size dimorphism in many mammals, including humans. In Western societies, average to taller stature men and comparatively shorter, slimmer women have higher reproductive success and are typically considered more attractive. This size dimorphism also extends to vocalisations in many species, again including humans, with larger individuals exhibiting lower formant frequencies than smaller individuals. Further, across many languages there are associations between phonemes and the expression of size (e.g. large /a, o/, small /i, e/), consistent with the frequency-size relationship in vocalisations. We suggest that naming preferences are a product of this frequency-size relationship, driving male names to sound larger and female names smaller, through sound symbolism. In a 10-year dataset of the most popular British, Australian and American names we show that male names are significantly more likely to contain larger sounding phonemes (e.g. “Thomas”), while female names are significantly more likely to contain smaller phonemes (e.g. “Emily”). The desire of parents to have comparatively larger, more masculine sons, and smaller, more feminine daughters, and the increased social success that accompanies more sex-stereotyped names, is likely to be driving English-language first names to exploit sound symbolism of size in line with sexual body size dimorphism. PMID:23755148
A recursive linear predictive vocoder
NASA Astrophysics Data System (ADS)
Janssen, W. A.
1983-12-01
A non-real time 10 pole recursive autocorrelation linear predictive coding vocoder was created for use in studying effects of recursive autocorrelation on speech. The vocoder is composed of two interchangeable pitch detectors, a speech analyzer, and speech synthesizer. The time between updating filter coefficients is allowed to vary from .125 msec to 20 msec. The best quality was found using .125 msec between each update. The greatest change in quality was noted when changing from 20 msec/update to 10 msec/update. Pitch period plots for the center clipping autocorrelation pitch detector and simplified inverse filtering technique are provided. Plots of speech into and out of the vocoder are given. Formant versus time three dimensional plots are shown. Effects of noise on pitch detection and formants are shown. Noise effects the voiced/unvoiced decision process causing voiced speech to be re-constructed as unvoiced.
Wood, Adrienne; Martin, Jared; Niedenthal, Paula
2017-01-01
Recent work has identified the physical features of smiles that accomplish three tasks fundamental to human social living: rewarding behavior, establishing and managing affiliative bonds, and negotiating social status. The current work extends the social functional account to laughter. Participants (N = 762) rated the degree to which reward, affiliation, or dominance (between-subjects) was conveyed by 400 laughter samples acquired from a commercial sound effects website. Inclusion of a fourth rating dimension, spontaneity, allowed us to situate the current approach in the context of existing laughter research, which emphasizes the distinction between spontaneous and volitional laughter. We used 11 acoustic properties extracted from the laugh samples to predict participants' ratings. Actor sex moderated, and sometimes even reversed, the relation between acoustics and participants' judgments. Spontaneous laughter appears to serve the reward function in the current framework, as similar acoustic properties guided perceiver judgments of spontaneity and reward: reduced voicing and increased pitch, increased duration for female actors, and increased pitch slope, center of gravity, first formant, and noisiness for male actors. Affiliation ratings diverged from reward in their sex-dependent relationship to intensity and, for females, reduced pitch range and raised second formant. Dominance displayed the most distinct pattern of acoustic predictors, including increased pitch range, reduced second formant in females, and decreased pitch variability in males. We relate the current findings to existing findings on laughter and human and non-human vocalizations, concluding laughter can signal much more that felt or faked amusement.
Martin, Jared; Niedenthal, Paula
2017-01-01
Recent work has identified the physical features of smiles that accomplish three tasks fundamental to human social living: rewarding behavior, establishing and managing affiliative bonds, and negotiating social status. The current work extends the social functional account to laughter. Participants (N = 762) rated the degree to which reward, affiliation, or dominance (between-subjects) was conveyed by 400 laughter samples acquired from a commercial sound effects website. Inclusion of a fourth rating dimension, spontaneity, allowed us to situate the current approach in the context of existing laughter research, which emphasizes the distinction between spontaneous and volitional laughter. We used 11 acoustic properties extracted from the laugh samples to predict participants’ ratings. Actor sex moderated, and sometimes even reversed, the relation between acoustics and participants’ judgments. Spontaneous laughter appears to serve the reward function in the current framework, as similar acoustic properties guided perceiver judgments of spontaneity and reward: reduced voicing and increased pitch, increased duration for female actors, and increased pitch slope, center of gravity, first formant, and noisiness for male actors. Affiliation ratings diverged from reward in their sex-dependent relationship to intensity and, for females, reduced pitch range and raised second formant. Dominance displayed the most distinct pattern of acoustic predictors, including increased pitch range, reduced second formant in females, and decreased pitch variability in males. We relate the current findings to existing findings on laughter and human and non-human vocalizations, concluding laughter can signal much more that felt or faked amusement. PMID:28850589
Preliminary comparison of infants speech with and without hearing loss
NASA Astrophysics Data System (ADS)
McGowan, Richard S.; Nittrouer, Susan; Chenausky, Karen
2005-04-01
The speech of ten children with hearing loss and ten children without hearing loss aged 12 months is examined. All the children with hearing loss were identified before six months of age, and all have parents who wish them to become oral communicators. The data are from twenty minute sessions with the caregiver and child, with their normal prostheses in place, in semi-structured settings. These data are part of a larger test battery applied to both caregiver and child that is part of a project comparing the development of children with hearing loss to those without hearing loss, known as the Early Development of Children with Hearing Loss. The speech comparisons are in terms of number of utterances, syllable shapes, and segment type. A subset of the data was given a detailed acoustic analysis, including formant frequencies and voice quality measures. [Work supported by NIDCD R01 006237 to Susan Nittrouer.
Neural mechanisms underlying auditory feedback control of speech
Reilly, Kevin J.; Guenther, Frank H.
2013-01-01
The neural substrates underlying auditory feedback control of speech were investigated using a combination of functional magnetic resonance imaging (fMRI) and computational modeling. Neural responses were measured while subjects spoke monosyllabic words under two conditions: (i) normal auditory feedback of their speech, and (ii) auditory feedback in which the first formant frequency of their speech was unexpectedly shifted in real time. Acoustic measurements showed compensation to the shift within approximately 135 ms of onset. Neuroimaging revealed increased activity in bilateral superior temporal cortex during shifted feedback, indicative of neurons coding mismatches between expected and actual auditory signals, as well as right prefrontal and Rolandic cortical activity. Structural equation modeling revealed increased influence of bilateral auditory cortical areas on right frontal areas during shifted speech, indicating that projections from auditory error cells in posterior superior temporal cortex to motor correction cells in right frontal cortex mediate auditory feedback control of speech. PMID:18035557
Martin, B A; Sigal, A; Kurtzberg, D; Stapells, D R
1997-03-01
This study investigated the effects of decreased audibility produced by high-pass noise masking on cortical event-related potentials (ERPs) N1, N2, and P3 to the speech sounds /ba/and/da/presented at 65 and 80 dB SPL. Normal-hearing subjects pressed a button in response to the deviant sound in an oddball paradigm. Broadband masking noise was presented at an intensity sufficient to completely mask the response to the 65-dB SPL speech sounds, and subsequently high-pass filtered at 4000, 2000, 1000, 500, and 250 Hz. With high-pass masking noise, pure-tone behavioral thresholds increased by an average of 38 dB at the high-pass cutoff and by 50 dB one octave above the cutoff frequency. Results show that as the cutoff frequency of the high-pass masker was lowered, ERP latencies to speech sounds increased and amplitudes decreased. The cutoff frequency where these changes first occurred and the rate of the change differed for N1 compared to N2, P3, and the behavioral measures. N1 showed gradual changes as the masker cutoff frequency was lowered. N2, P3, and behavioral measures showed marked changes below a masker cutoff of 2000 Hz. These results indicate that the decreased audibility resulting from the noise masking affects the various ERP components in a differential manner. N1 is related to the presence of audible stimulus energy, being present whether audible stimuli are discriminable or not. In contrast, N2 and P3 were absent when the stimuli were audible but not discriminable (i.e., when the second formant transitions were masked), reflecting stimulus discrimination. These data have implications regarding the effects of decreased audibility on cortical processing of speech sounds and for the study of cortical ERPs in populations with hearing impairment.
Neural processing of amplitude and formant rise time in dyslexia.
Peter, Varghese; Kalashnikova, Marina; Burnham, Denis
2016-06-01
This study aimed to investigate how children with dyslexia weight amplitude rise time (ART) and formant rise time (FRT) cues in phonetic discrimination. Passive mismatch responses (MMR) were recorded for a/ba/-/wa/contrast in a multiple deviant odd-ball paradigm to identify the neural response to cue weighting in 17 children with dyslexia and 17 age-matched control children. The deviant stimuli had either partial or full ART or FRT cues. The results showed that ART did not generate an MMR in either group, whereas both partial and full FRT cues generated MMR in control children while only full FRT cues generated MMR in children with dyslexia. These findings suggest that children, both controls and those with dyslexia, discriminate speech based on FRT cues and not ART cues. However, control children have greater sensitivity to FRT cues in speech compared to children with dyslexia. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Acoustic characteristics of Punjabi retroflex and dental stops.
Hussain, Qandeel; Proctor, Michael; Harvey, Mark; Demuth, Katherine
2017-06-01
The phonological category "retroflex" is found in many Indo-Aryan languages; however, it has not been clearly established which acoustic characteristics reliably differentiate retroflexes from other coronals. This study investigates the acoustic phonetic properties of Punjabi retroflex /ʈ/ and dental /ʈ̪/ in word-medial and word-initial contexts across /i e a o u/, and in word-final context across /i a u/. Formant transitions, closure and release durations, and spectral moments of release bursts are compared in 2280 stop tokens produced by 30 speakers. Although burst spectral measures and formant transitions do not consistently differentiate retroflexes from dentals in some vowel contexts, stop release duration, and total stop duration reliably differentiate Punjabi retroflex and dental stops across all word contexts and vocalic environments. These results suggest that Punjabi coronal place contrasts are signaled by the complex interaction of temporal and spectral cues.
Prosodic domain-initial effects on the acoustic structure of vowels
NASA Astrophysics Data System (ADS)
Fox, Robert Allen; Jacewicz, Ewa; Salmons, Joseph
2003-10-01
In the process of language change, vowels tend to shift in ``chains,'' leading to reorganizations of entire vowel systems over time. A long research tradition has described such patterns, but little is understood about what factors motivate such shifts. Drawing data from changes in progress in American English dialects, the broad hypothesis is tested that changes in vowel systems are related to prosodic organization and stress patterns. Changes in vowels under greater prosodic prominence correlate directly with, and likely underlie, historical patterns of shift. This study examines acoustic characteristics of vowels at initial edges of prosodic domains [Fougeron and Keating, J. Acoust. Soc. Am. 101, 3728-3740 (1997)]. The investigation is restricted to three distinct prosodic levels: utterance (sentence-initial), phonological phrase (strong branch of a foot), and syllable (weak branch of a foot). The predicted changes in vowels /e/ and /ɛ/ in two American English dialects (from Ohio and Wisconsin) are examined along a set of acoustic parameters: duration, formant frequencies (including dynamic changes over time), and fundamental frequency (F0). In addition to traditional methodology which elicits list-like intonation, a design is adapted to examine prosodic patterns in more typical sentence intonations. [Work partially supported by NIDCD R03 DC005560-01.
Revisiting the Canadian English vowel space
NASA Astrophysics Data System (ADS)
Hagiwara, Robert
2005-04-01
In order to fill a need for experimental-acoustic baseline measurements of Canadian English vowels, a database is currently being constructed in Winnipeg, Manitoba. The database derives from multiple repetitions of fifteen English vowels (eleven standard monophthongs, syllabic /r/ and three standard diphthongs) in /hVd/ and /hVt/ contexts, as spoken by multiple speakers. Frequencies of the first four formants are taken from three timepoints in every vowel token (25, 50, and 75% of vowel duration). Preliminary results (from five men and five women) confirm some features characteristic of Canadian English, but call others into question. For instance the merger of low back vowels appears to be complete for these speakers, but the result is a lower-mid and probably rounded vowel rather than the low back unround vowel often described. With these data Canadian Raising can be quantified as an average 200 Hz or 1.5 Bark downward shift in the frequency of F1 before voiceless /t/. Analysis of the database will lead to a more accurate picture of the Canadian English vowel system, as well as provide a practical and up-to-date point of reference for further phonetic and sociophonetic comparisons.
Information content and acoustic structure of male African elephant social rumbles
Stoeger, Angela S.; Baotic, Anton
2016-01-01
Until recently, the prevailing theory about male African elephants (Loxodonta africana) was that, once adult and sexually mature, males are solitary and targeted only at finding estrous females. While this is true during the state of ‘musth’ (a condition characterized by aggressive behavior and elevated androgen levels), ‘non-musth’ males exhibit a social system seemingly based on companionship, dominance and established hierarchies. Research on elephant vocal communication has so far focused on females, and very little is known about the acoustic structure and the information content of male vocalizations. Using the source and filter theory approach, we analyzed social rumbles of 10 male African elephants. Our results reveal that male rumbles encode information about individuality and maturity (age and size), with formant frequencies and absolute fundamental frequency values having the most informative power. This first comprehensive study on male elephant vocalizations gives important indications on their potential functional relevance for male-male and male-female communication. Our results suggest that, similar to the highly social females, future research on male elephant vocal behavior will reveal a complex communication system in which social knowledge, companionship, hierarchy, reproductive competition and the need to communicate over long distances play key roles. PMID:27273586
Delgado-Hernandez, J
2017-02-01
The acoustic analysis is a tool that provides objective data on changes of speech in dysarthria. To evaluate, in the ataxic dysarthria, the relationship between the vowel space area (VSA), the formant centralization ratio (FCR) and the mean of the primary distances with the speech intelligibility. A sample of fourteen Spanish speakers, ten with dysarthria and four controls, was used. The values of first and second formants in 140 vowels extracted of 140 words were analyzed. To calculate the level of intelligibility seven listeners were involved and a task of identification verbal stimuli was used. The dysarthric subjects have less contrast between middle and high vowels and between back vowels. Significant differences in the VSA, FCR and mean of the primary distances compared to control subjects (p = 0.007, 0.005 and 0.030, respectively) are observed. Regression analysis show the relationship between VSA and the mean of primary distances with the level of speech intelligibility (r = 0.60 and 0.74, respectively). Ataxic dysarthria subjects have lower contrast and vowel centralization in carrying out the vowels. The acoustic measures studied in this preliminary work have a high sensitivity in the detection of dysarthria but only the VSA and the mean of primary distances provide information on the severity of this type of speech disturbance.
Easwar, Vijayalakshmi; Purcell, David W; Aiken, Steven J; Parsa, Vijay; Scollie, Susan D
2015-01-01
The use of auditory evoked potentials as an objective outcome measure in infants fitted with hearing aids has gained interest in recent years. This article proposes a test paradigm using speech-evoked envelope following responses (EFRs) for use as an objective-aided outcome measure. The method uses a running speech-like, naturally spoken stimulus token /susa∫i/ (fundamental frequency [f0] = 98 Hz; duration 2.05 sec), to elicit EFRs by eight carriers representing low, mid, and high frequencies. Each vowel elicited two EFRs simultaneously, one from the region of formant one (F1) and one from the higher formants region (F2+). The simultaneous recording of two EFRs was enabled by lowering f0 in the region of F1 alone. Fricatives were amplitude modulated to enable recording of EFRs from high-frequency spectral regions. The present study aimed to evaluate the effect of level and bandwidth on speech-evoked EFRs in adults with normal hearing. As well, the study aimed to test convergent validity of the EFR paradigm by comparing it with changes in behavioral tasks due to bandwidth. Single-channel electroencephalogram was recorded from the vertex to the nape of the neck over 300 sweeps in two polarities from 20 young adults with normal hearing. To evaluate the effects of level in experiment I, EFRs were recorded at test levels of 50 and 65 dB SPL. To evaluate the effects of bandwidth in experiment II, EFRs were elicited by /susa∫i/ low-pass filtered at 1, 2, and 4 kHz, presented at 65 dB SPL. The 65 dB SPL condition from experiment I represented the full bandwidth condition. EFRs were averaged across the two polarities and estimated using a Fourier analyzer. An F test was used to determine whether an EFR was detected. Speech discrimination using the University of Western Ontario Distinctive Feature Differences test and sound quality rating using the Multiple Stimulus Hidden Reference and Anchors paradigm were measured in identical bandwidth conditions. In experiment I, the increase in level resulted in a significant increase in response amplitudes for all eight carriers (mean increase of 14 to 50 nV) and the number of detections (mean increase of 1.4 detections). In experiment II, an increase in bandwidth resulted in a significant increase in the number of EFRs detected until the low-pass filtered 4 kHz condition and carrier-specific changes in response amplitude until the full bandwidth condition. Scores in both behavioral tasks increased with bandwidth up to the full bandwidth condition. The number of detections and composite amplitude (sum of all eight EFR amplitudes) significantly correlated with changes in behavioral test scores. Results suggest that the EFR paradigm is sensitive to changes in level and audible bandwidth. This may be a useful tool as an objective-aided outcome measure considering its running speech-like stimulus, representation of spectral regions important for speech understanding, level and bandwidth sensitivity, and clinically feasible test times. This paradigm requires further validation in individuals with hearing loss, with and without hearing aids.
NASA Astrophysics Data System (ADS)
Ertmer, David Joseph
1994-01-01
The effectiveness of vowel production training which incorporated direct instruction in combination with spectrographic models and feedback was assessed for two children who exhibited profound hearing impairment. A multiple-baseline design across behaviors, with replication across subjects was implemented to determine if vowel production accuracy improved following the introduction of treatment. Listener judgments of vowel correctness were obtained during the baseline, training, and follow-up phases of the study. Data were analyzed through visual inspection of changes in levels of accuracy, changes in trends of accuracy, and changes in variability of accuracy within and across phases. One subject showed significant improvement of all three trained vowel targets; the second subject for the first trained target only (Kolmogorov-Smirnov Two Sample Test). Performance trends during training sessions suggest that continued treatment would have resulted in further improvement for both subjects. Vowel duration, fundamental frequency, and the frequency locations of the first and second formants were measured before and after training. Acoustic analysis revealed highly individualized changes in the frequency locations of F1 and F2. Vowels which received the most training were maintained at higher levels than those which were introduced later in training, Some generalization of practiced vowel targets to untrained words was observed in both subjects. A bias towards judging productions as "correct" was observed for both subjects during self-evaluation tasks using spectrographic feedback.
The Korean Prevocalic Palatal Glide: A Comparison with the Russian Glide and Palatalization.
Suh, Yunju; Hwang, Jiwon
2016-01-01
Phonetic studies of the Korean prevocalic glides have often suggested that they are shorter in duration than those of languages like English, and lack a prolonged steady state. In addition, the formant frequencies of the Korean labiovelar glide are reported to be greatly influenced by the following vowel. In this study the Korean prevocalic palatal glide is investigated vis-à-vis the two phonologically similar configurations of another language - the glide /j/ and the secondary palatalization of Russian, with regard to the inherent duration of the glide component, F2 trajectory, vowel-to-glide coarticulation and glide-to-vowel coarticulation. It is revealed that the Korean palatal glide is closer to the Russian palatalization in duration and F2 trajectory, indicating a lack of steady state, and to the Russian segmental glide in the vowel-to-glide coarticulation degree. When the glide-to-vowel coarticulation is considered, the Korean palatal glide is distinguished from both Russian categories. The results suggest that both the Korean palatal glide and the Russian palatalization involve significant articulatory overlap, the former with the vowel and the latter with the consonant. Phonological implications of such a difference in coarticulation pattern are discussed, as well as the comparison between the Korean labiovelar and palatal glides. © 2016 S. Karger AG, Basel.
Amplitude Rise Time Does Not Cue the /bɑ/–/wɑ/ Contrast for Adults or Children
Nittrouer, Susan; Lowenstein, Joanna H.; Tarr, Eric
2013-01-01
Purpose Previous research has demonstrated that children weight the acoustic cues to many phonemic decisions differently than do adults and gradually shift those strategies as they gain language experience. However, that research has focused on spectral and duration cues rather than on amplitude cues. In the current study, the authors examined amplitude rise time (ART; an amplitude cue) and formant rise time (FRT; a spectral cue) in the /bɑ/–/wɑ/ manner contrast for adults and children, and related those speech decisions to outcomes of nonspeech discrimination tasks. Method Twenty adults and 30 children (ages 4–5 years) labeled natural and synthetic speech stimuli manipulated to vary ARTs and FRTs, and discriminated nonspeech analogs that varied only by ART in an AX paradigm. Results Three primary results were obtained. First, listeners in both age groups based speech labeling judgments on FRT, not on ART. Second, the fundamental frequency of the natural speech samples did not influence labeling judgments. Third, discrimination performance for the nonspeech stimuli did not predict how listeners would perform with the speech stimuli. Conclusion Even though both adults and children are sensitive to ART, it was not weighted in phonemic judgments by these typical listeners. PMID:22992704
An acoustic comparison of two women's infant- and adult-directed speech
NASA Astrophysics Data System (ADS)
Andruski, Jean; Katz-Gershon, Shiri
2003-04-01
In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.
Acoustic analysis of the singing and speaking voice in singing students.
Lundy, D S; Roy, S; Casiano, R R; Xue, J W; Evans, J
2000-12-01
The singing power ratio (SPR) is an objective means of quantifying the singer's formant. SPR has been shown to differentiate trained singers from nonsingers and sung from spoken tones. This study was designed to evaluate SPR and acoustic parameters in singing students to determine if the singer-in-training has an identifiable difference between sung and spoken voices. Digital audio recordings were made of both sung and spoken vowel sounds in 55 singing students for acoustic analysis. SPR values were not significantly different between the sung and spoken samples. Shimmer and noise-to-harmonic ratio were significantly higher in spoken samples. SPR analysis may provide an objective tool for monitoring the student's progress.
The Effect of Timbre, Pitch, and Vibrato on Vocal Pitch-Matching Accuracy.
Duvvuru, Sirisha; Erickson, Molly
2016-05-01
This study seeks to examine how target stimulus timbre, vibrato, pitch, and singer classification affect pitch-matching accuracy. This is a repeated-measures factorial design. Source signals were synthesized with a source slope of -12 dB/octave with and without vibrato at each of the pitches, C4, B4, and F5. These source signals were filtered using five formant patterns (A-E) constituting a total of 30 stimuli (5 formant patterns × 3 pitches × 2 vibrato conditions). Twelve sopranos and 11 mezzo-sopranos with at least 3 years of individual voice training were recruited from the University Of Tennessee, Knoxville, School of Music and the Knoxville Opera Company. Each singer attempted to match the pitch of all 30 stimuli presented twice in a random order. Results indicated that there was no significant effect of formant pattern on pitch-matching accuracy. With increasing pitch from C4 to F5, pitch-matching accuracy increased in midpoint of the vowel condition but not in prephonatory set condition. Mezzo-sopranos moved toward being in tune from prephonatory to midpoint of the vowel. However, sopranos at C4 sang closer to being in tune at prephonatory but lowered the pitch at the midpoint of the vowel. Presence or absence of vibrato did not affect the pitch-matching accuracy. However, the interesting finding of the study was that singers attempted to match the timbre of stimuli with vibrato. The results of this study show that pitch matching is a complex process affected by many parameters. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Gowda, Dhananjaya; Airaksinen, Manu; Alku, Paavo
2017-09-01
Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.
Stromal-epithelial dynamics in response to fractionated radiotherapy
NASA Astrophysics Data System (ADS)
Rong, Panying
The speech of individuals with velopharyngeal incompetency (VPI) is characterized by hypernasality, a speech quality related to excessive emission of acoustic energy through the nose, as caused by failure of velopharyngeal closure. As an attempt to reduce hypernasality and, in turn, improve the quality of VPI-related hypernasal speech, this study is dedicated to developing an approach that uses speech-dependent articulatory adjustments to reduce hypernasality caused by excessive velopharyngeal opening. A preliminary study has been done to derive such articulatory adjustments for hypernasal /i/ vowels based on the simulation of an articulatorymodel (Speech Processing and Synthesis Toolboxes, Childers (2000)). Both nasal /i/ vowels with and without articulatory adjustments were synthesized by the model. Spectral analysis found that nasal acoustic features were attenuated and oral formant structures were restored after articulatory adjustments. In addition, comparisons of perceptual ratings of nasality between the two types of nasal vowels showed the articulatory adjustments generated by the model significantly reduced the perception of nasality for nasal /i/ vowels. Such articulatory adjustments for nasal /i/ have two patterns: 1) a consistent adjustment pattern, which corresponds an expansion at the velopharynx, and 2) some speech-dependent fine-tuning adjustment patterns, including adjustments in the lip area and the upper pharynx. The long-term goal of this study is to apply this approach of articulatory adjustment as a therapeutic tool in clinical speech treatment to detect and correct the maladaptive articulatory behaviors developed spontaneously by speakers with VPI on individual bases. This study constructed a speaker-adaptive articulatory model on the basis of the framework of Childers's vocal tract model to simulate articulatory adjustments aiming at compensating for the acoustic outcome caused by velopharyngeal opening and reducing nasality. To construct such a speaker-adaptive articulatory model, (1) an articulatory-acoustic-aerodynamic database was recorded using the articulography and aerodynamic instruments to provide point-wise articulatory data to be fitted into the framework of Childers's standard vocal tract model; (2) the length and transverse dimension of the vocal tract were adjusted to fit individual speaker by minimizing the acoustic discrepancy between the model simulation and the target derived from acoustic signal in the database using the simulated annealing algorithm; (3) the articulatory space of the model was adjusted to fit individual articulatory features by adapting the movement ranges of all articulators. With the speaker-adaptive articulatory model, the articulatory configurations of the oral and nasal vowels in the database were simulated and synthesized. Given the acoustic targets derived from the oral vowels in the database, speech-dependent articulatory adjustments were simulated to compensate for the acoustic outcome caused by VPO. The resultant articulatory configurations corresponds to nasal vowels with articulatory adjustment, which were synthesized to serve as the perceptual stimuli for a listening task of nasality rating. The oral and nasal vowels synthesized based on the oral and nasal vowel targets in the database also served as the perceptual stimuli. The results suggest both acoustic and perceptual effects of the mode-generated articulatory adjustment on the nasal vowels /a/, /i/ and /u/. In terms of acoustics, the articulatory adjustment (1) restores the altered formant structures due to nasal coupling, including shifted formant frequency, attenuated formant intensity and expanded formant bandwidth and (2) attenuates the peaks and zeros caused by nasal resonances. Perceptually, the articulatory adjustment generated by the speaker-adaptive model significantly reduces the perceived nasality for all three vowels (/a/, /i/, /u/). The acoustic and perceptual effects of articulatory adjustment suggest achievement of the acoustic goal of compensating for the acoustic discrepancy caused by VPO and the auditory goal of reducing the perception of nasality. Such a finding is consistent with motor equivalence (Hughes and Abbs, 1976; Maeda, 1990), which enables inter-articulator coordination to compensate for the deviation from the acoustic/auditory goal caused by the shifted position of an articulator. The articulatory adjustment responsible for the acoustic and perceptual effects as described above was decomposed into a set of empirical orthogonal modes (Story and Titze, 1998). Both gross articulatory patterns and fine-tuning adjustments were found in the principal orthogonal modes, which lead to the acoustic compensation and reduction of nasality. For /a/ and /i/, a direct relationship was found among the acoustic features, nasality, and articulatory adjustment patterns. Specifically, the articulatory adjustments indicated by the principal orthogonal modes of the adjusted nasal /a/ and /i/ were directly correlated with the attenuation of the acoustic cues of nasality (i.e., shifting of F1 and F2 frequencies) and the reduction of nasality rating. For /u/, such a direct relationship among the acoustic features, nasality and articulatory adjustment was not as prominent, suggesting the possibility of additional acoustic correlates of nasality other than F1 and F2. The findings of this study demonstrate the possibility of using articulatory adjustment to reduce the perception of nasality through model simulation. A speaker-adaptive articulatory model is able to simulate individual-based articulatory adjustment strategies that can be applied in clinical settings to serve as the articulatory targets for correction of the maladaptive articulatory behaviors developed spontaneously by speakers with hypernasal speech. Such a speaker-adaptive articulatory model provides an intuitive way of articulatory learning and self-training for speakers with VPI to learn appropriate articulatory strategies through model-speaker interaction.
NASA Astrophysics Data System (ADS)
Fikri Zanil, Muhamad; Nur Wahidah Nik Hashim, Nik; Azam, Huda
2017-11-01
Psychiatrist currently relies on questionnaires and interviews for psychological assessment. These conservative methods often miss true positives and might lead to death, especially in cases where a patient might be experiencing suicidal predisposition but was only diagnosed as major depressive disorder (MDD). With modern technology, an assessment tool might aid psychiatrist with a more accurate diagnosis and thus hope to reduce casualty. This project will explore on the relationship between speech features of spoken audio signal (reading) in Bahasa Malaysia with the Beck Depression Inventory scores. The speech features used in this project were Power Spectral Density (PSD), Mel-frequency Ceptral Coefficients (MFCC), Transition Parameter, formant and pitch. According to analysis, the optimum combination of speech features to predict BDI-II scores include PSD, MFCC and Transition Parameters. The linear regression approach with sequential forward/backward method was used to predict the BDI-II scores using reading speech. The result showed 0.4096 mean absolute error (MAE) for female reading speech. For male, the BDI-II scores successfully predicted 100% less than 1 scores difference with MAE of 0.098437. A prediction system called Depression Severity Evaluator (DSE) was developed. The DSE managed to predict one out of five subjects. Although the prediction rate was low, the system precisely predict the score within the maximum difference of 4.93 for each person. This demonstrates that the scores are not random numbers.
Hutka, Stefanie; Bidelman, Gavin M; Moreno, Sylvain
2015-05-01
Psychophysiological evidence supports a music-language association, such that experience in one domain can impact processing required in the other domain. We investigated the bidirectionality of this association by measuring event-related potentials (ERPs) in native English-speaking musicians, native tone language (Cantonese) nonmusicians, and native English-speaking nonmusician controls. We tested the degree to which pitch expertise stemming from musicianship or tone language experience similarly enhances the neural encoding of auditory information necessary for speech and music processing. Early cortical discriminatory processing for music and speech sounds was characterized using the mismatch negativity (MMN). Stimuli included 'large deviant' and 'small deviant' pairs of sounds that differed minimally in pitch (fundamental frequency, F0; contrastive musical tones) or timbre (first formant, F1; contrastive speech vowels). Behavioural F0 and F1 difference limen tasks probed listeners' perceptual acuity for these same acoustic features. Musicians and Cantonese speakers performed comparably in pitch discrimination; only musicians showed an additional advantage on timbre discrimination performance and an enhanced MMN responses to both music and speech. Cantonese language experience was not associated with enhancements on neural measures, despite enhanced behavioural pitch acuity. These data suggest that while both musicianship and tone language experience enhance some aspects of auditory acuity (behavioural pitch discrimination), musicianship confers farther-reaching enhancements to auditory function, tuning both pitch and timbre-related brain processes. Copyright © 2015 Elsevier Ltd. All rights reserved.
Brown, Jennifer A; Derksen, Frederik J; Stick, John A; Hartmann, William M; Robinson, N Edward
2005-01-01
To report the effect of unilateral laser vocal cordectomy on respiratory noise and airway function in horses with experimentally induced laryngeal hemiplegia (LH). Experimental study. Six Standardbred horses without upper airway abnormalities at rest or during high-speed treadmill exercise. Respiratory sounds and inspiratory trans-upper airway pressure (P(Ui)) were measured before (baseline) and 14 days after induction of LH by left recurrent laryngeal neurectomy, and again 30, 60, 90, and 120 days after endoscopically assisted laser cordectomy of the left vocal cord. Data were collected with the horses exercising on a treadmill at a speed producing maximum heart rate (HR(max)). In horses exercising at HR(max), induction of LH caused a significant increase in P(Ui), sound level (SL), and the sound intensity of formant 2 (F(2)) and 3 (F(3)). The sound intensity of formant 1 (F(1)) was unaffected by induction of LH. Laser vocal cordectomy had no effect on SL, or on the sound intensity of F(1) and F(3). At 30, 60, 90, and 120 days after surgery, P(Ui) and the sound intensity of F(2) were significantly reduced, but these variables remained significantly different from baseline values. Unilateral laser vocal cordectomy did not effectively improve upper airway noise in horses with LH. The procedure decreased upper airway obstruction to the same degree as bilateral ventriculocordectomy. Currently, laser vocal cordectomy cannot be recommended for the treatment of upper airway noise in horses with LH.
Hardy, Teresa L D; Boliek, Carol A; Wells, Kristopher; Dearden, Carol; Zalmanowitz, Connie; Rieger, Jana M
2016-05-01
The purpose of this study was to describe the pretreatment acoustic characteristics of individuals with male-to-female gender identity (IMtFGI) and investigate the ability of the acoustic measures to predict ratings of gender, femininity, and vocal naturalness. This retrospective descriptive study included 2 groups of participants. Speakers were IMtFGI who had not previously received communication feminization treatment (N = 25). Listeners were members of the lay community (N = 30). Acoustic data were retrospectively obtained from pretreatment recordings, and pretreatment recordings also served as stimuli for 3 perceptual rating tasks (completed by listeners). Acoustic data generally were within normal limits for male speakers. All but 2 speakers were perceived to be male, limiting information about the relationship between acoustic measures and gender perception. Fundamental frequency (reading) significantly predicted femininity ratings (p = .000). A total of 3 stepwise regression models indicated that minimum frequency (range task), second vowel formant (sustained vowel), and shimmer percentage (sustained vowel) together significantly predicted naturalness ratings (p = .005, p = .003, and p = .002, respectively). Study aims were achieved with the exception of acoustic predictors of gender perception, which could be described for only 2 speakers. Future research should investigate measures of prosody, voice quality, and other aspects of communication as predictors of gender, femininity, and naturalness.
Koda, Hiroki; Tokuda, Isao T; Wakita, Masumi; Ito, Tsuyoshi; Nishimura, Takeshi
2015-06-01
Whistle-like high-pitched "phee" calls are often used as long-distance vocal advertisements by small-bodied marmosets and tamarins in the dense forests of South America. While the source-filter theory proposes that vibration of the vocal fold is modified independently from the resonance of the supralaryngeal vocal tract (SVT) in human speech, a source-filter coupling that constrains the vibration frequency to SVT resonance effectively produces loud tonal sounds in some musical instruments. Here, a combined approach of acoustic analyses and simulation with helium-modulated voices was used to show that phee calls are produced principally with the same mechanism as in human speech. The animal keeps the fundamental frequency (f0) close to the first formant (F1) of the SVT, to amplify f0. Although f0 and F1 are primarily independent, the degree of their tuning can be strengthened further by a flexible source-filter interaction, the variable strength of which depends upon the cross-sectional area of the laryngeal cavity. The results highlight the evolutionary antiquity and universality of the source-filter model in primates, but the study can also explore the diversification of vocal physiology, including source-filter interaction and its anatomical basis in non-human primates.
The early phase of /see symbol/ production development in adult Japanese learners of English.
Saito, Kazuya; Munro, Murray J
2014-12-01
Although previous research indicates that Japanese speakers' second language (L2) perception and production of English /see symbol/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /see symbol/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /see symbol/ from 60 Japanese learners with a length of residence of between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 low-proficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions--word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is task-dependent and is influenced by the availability of L1 phonetic cues for redeployment in L2.
Acoustic correlates of Japanese expressions associated with voice quality of male adults
NASA Astrophysics Data System (ADS)
Kido, Hiroshi; Kasuya, Hideki
2004-05-01
Japanese expressions associated with the voice quality of male adults were extracted by a series of questionnaire surveys and statistical multivariate analysis. One hundred and thirty-seven Japanese expressions were collected through the first questionnaire and careful investigations of well-established Japanese dictionaries and articles. From the second questionnaire about familiarity with each of the expressions and synonymity that were addressed to 249 subjects, 25 expressions were extracted. The third questionnaire was about an evaluation of their own voice quality. By applying a statistical clustering method and a correlation analysis to the results of the questionnaires, eight bipolar expressions and one unipolar expression were obtained. They constituted high-pitched/low-pitched, masculine/feminine, hoarse/clear, calm/excited, powerful/weak, youthful/elderly, thick/thin, tense/lax, and nasal, respectively. Acoustic correlates of each of the eight bipolar expressions were extracted by means of perceptual evaluation experiments that were made with sentence utterances of 36 males and by a statistical decision tree method. They included an average of the fundamental frequency (F0) of the utterance, speaking rate, spectral tilt, formant frequency parameter, standard deviation of F0 values, and glottal noise, when SPL of each of the stimuli was maintained identical in the perceptual experiments.
Perceptual invariance of coarticulated vowels over variations in speaking rate.
Stack, Janet W; Strange, Winifred; Jenkins, James J; Clarke, William D; Trent, Sonja A
2006-04-01
This study examined the perception and acoustics of a large corpus of vowels spoken in consonant-vowel-consonant syllables produced in citation-form (lists) and spoken in sentences at normal and rapid rates by a female adult. Listeners correctly categorized the speaking rate of sentence materials as normal or rapid (2% errors) but did not accurately classify the speaking rate of the syllables when they were excised from the sentences (25% errors). In contrast, listeners accurately identified the vowels produced in sentences spoken at both rates when presented the sentences and when presented the excised syllables blocked by speaking rate or randomized. Acoustical analysis showed that formant frequencies at syllable midpoint for vowels in sentence materials showed "target undershoot" relative to citation-form values, but little change over speech rate. Syllable durations varied systematically with vowel identity, speaking rate, and voicing of final consonant. Vowel-inherent-spectral-change was invariant in direction of change over rate and context for most vowels. The temporal location of maximum F1 frequency further differentiated spectrally adjacent lax and tense vowels. It was concluded that listeners were able to utilize these rate- and context-independent dynamic spectrotemporal parameters to identify coarticulated vowels, even when sentential information about speaking rate was not available.
Gelfer, Marylou Pausewang; Tice, Ruthanne M
2013-05-01
The present study examined how effectively listeners' perceptions of gender could be changed from male to female for male-to-female (MTF) transgender (TG) clients based on the voice signal alone, immediately after voice therapy and at long-term follow-up. Short- and long-term changes in masculinity and femininity ratings and acoustic measures of speaking fundamental frequency (SFF) and vowel formant frequencies were also investigated. Prospective treatment study. Five MTF TG clients, five control female speakers, and five control male speakers provided a variety of speech samples for later analysis. The TG clients then underwent 8 weeks of voice therapy. Voice samples were collected immediately at the termination of therapy and again 15 months later. Two groups of listeners were recruited to evaluate gender and provide masculinity and femininity ratings. Perceptual results revealed that TG subjects were perceived as female 1.9% of the time in the pretest, 50.8% of the time in the immediate posttest, and 33.1% of the time in the long-term posttest. The TG speakers were also perceived as significantly less masculine and more feminine in the immediate posttest and the long-term posttest compared with the pre-test. Some acoustic measures showed significant differences between the pretest and the immediate posttest and long-term posttest. It appeared that 8 weeks of voice therapy could result in vocal changes in MTF TG individuals that persist at least partially for up to 15 months. However, some TG subjects were more successful with voice feminization than others. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Effects of HearFones on speaking and singing voice quality.
Laukkanen, Anne-Maria; Mickelson, Nils Peter; Laitala, Marja; Syrjä, Tiina; Salo, Arla; Sihvo, Marketta
2004-12-01
HearFones (HF) have been designed to enhance auditory feedback during phonation. This study investigated the effects of HF (1) on sound perceivable by the subject, (2) on voice quality in reading and singing, and (3) on voice production in speech and singing at the same pitch and sound level. Test 1: Text reading was recorded with two identical microphones in the ears of a subject. One ear was covered with HF, and the other was free. Four subjects attended this test. Tests 2 and 3: A reading sample was recorded from 13 subjects and a song from 12 subjects without and with HF on. Test 4: Six females repeated [pa:p:a] in speaking and singing modes without and with HF on same pitch and sound level. Long-term average spectra were made (Tests 1-3), and formant frequencies, fundamental frequency, and sound level were measured (Tests 2 and 3). Subglottic pressure was estimated from oral pressure in [p], and simultaneously electroglottography (EGG) was registered during voicing on [a:] (Test 4). Voice quality in speech and singing was evaluated by three professional voice trainers (Tests 2-4). HF seemed to enhance sound perceivable at the whole range studied (0-8 kHz), with the greatest enhancement (up to ca 25 dB) being at 1-3 kHz and at 4-7 kHz. The subjects tended to decrease loudness with HF (when sound level was not being monitored). In more than half of the cases, voice quality was evaluated "less strained" and "better controlled" with HF. When pitch and loudness were constant, no clear differences were heard but closed quotient of the EGG signal was higher and the signal more skewed, suggesting a better glottal closure and/or diminished activity of the thyroarytenoid muscle.
Cross-Channel Amplitude Sweeps Are Crucial to Speech Intelligibility
ERIC Educational Resources Information Center
Prendergast, Garreth; Green, Gary G. R.
2012-01-01
Classical views of speech perception argue that the static and dynamic characteristics of spectral energy peaks (formants) are the acoustic features that underpin phoneme recognition. Here we use representations where the amplitude modulations of sub-band filtered speech are described, precisely, in terms of co-sinusoidal pulses. These pulses are…
The Human Voice in Speech and Singing
NASA Astrophysics Data System (ADS)
Lindblom, Björn; Sundberg, Johan
This chapter
The Human Voice in Speech and Singing
NASA Astrophysics Data System (ADS)
Lindblom, Björn; Sundberg, Johan
This chapter describes various aspects of the human voice as a means of communication in speech and singing. From the point of view of function, vocal sounds can be regarded as the end result of a three stage process: (1) the compression of air in the respiratory system, which produces an exhalatory airstream, (2) the vibrating vocal folds' transformation of this air stream to an intermittent or pulsating air stream, which is a complex tone, referred to as the voice source, and (3) the filtering of this complex tone in the vocal tract resonator. The main function of the respiratory system is to generate an overpressure of air under the glottis, or a subglottal pressure. Section 16.1 describes different aspects of the respiratory system of significance to speech and singing, including lung volume ranges, subglottal pressures, and how this pressure is affected by the ever-varying recoil forces. The complex tone generated when the air stream from the lungs passes the vibrating vocal folds can be varied in at least three dimensions: fundamental frequency, amplitude and spectrum. Section 16.2 describes how these properties of the voice source are affected by the subglottal pressure, the length and stiffness of the vocal folds and how firmly the vocal folds are adducted. Section 16.3 gives an account of the vocal tract filter, how its form determines the frequencies of its resonances, and Sect. 16.4 gives an account for how these resonance frequencies or formants shape the vocal sounds by imposing spectrum peaks separated by spectrum valleys, and how the frequencies of these peaks determine vowel and voice qualities. The remaining sections of the chapter describe various aspects of the acoustic signals used for vocal communication in speech and singing. The syllable structure is discussed in Sect. 16.5, the closely related aspects of rhythmicity and timing in speech and singing is described in Sect. 16.6, and pitch and rhythm aspects in Sect. 16.7. The impressive control of all these acoustic characteristics of vocal signals is discussed in Sect. 16.8, while Sect. 16.9 considers expressive aspects of vocal communication.
The Effects of Surgical Rapid Maxillary Expansion (SRME) on Vowel Formants
ERIC Educational Resources Information Center
Sari, Emel; Kilic, Mehmet Akif
2009-01-01
The objective of this study was to investigate the effect of surgical rapid maxillary expansion (SRME) on vowel production. The subjects included 12 patients, whose speech were considered perceptually normal, that had undergone surgical RME for expansion of a narrow maxilla. They uttered the following Turkish vowels, ([a], [[epsilon
ERIC Educational Resources Information Center
Klein, Harriet B.; Grigos, Maria I.; Byun, Tara McAllister; Davidson, Lisa
2012-01-01
This study examined inexperienced listeners' perceptions of children's naturally produced /r/ sounds with reference to levels of accuracy determined by consensus between two expert clinicians. Participants rated /r/ sounds as fully correct, distorted or incorrect/non-rhotic. Second and third formant heights were measured to explore the…
ERIC Educational Resources Information Center
Roy, Nelson; Nissen, Shawn L.; Dromey, Christopher; Sapir, Shimon
2009-01-01
In a preliminary study, we documented significant changes in formant transitions associated with successful manual circumlaryngeal treatment (MCT) of muscle tension dysphonia (MTD), suggesting improvement in speech articulation. The present study explores further the effects of MTD on vowel articulation by means of additional vowel acoustic…
Pulse register phonation in Diana monkey alarm calls
NASA Astrophysics Data System (ADS)
Riede, Tobias; Zuberbühler, Klaus
2003-05-01
The adult male Diana monkeys (Cercopithecus diana) produce predator-specific alarm calls in response to two of their predators, the crowned eagles and the leopards. The acoustic structure of these alarm calls is remarkable for a number of theoretical and empirical reasons. First, although pulsed phonation has been described in a variety of mammalian vocalizations, very little is known about the underlying production mechanism. Second, Diana monkey alarm calls are based almost exclusively on this vocal production mechanism to an extent that has never been documented in mammalian vocal behavior. Finally, the Diana monkeys' pulsed phonation strongly resembles the pulse register in human speech, where fundamental frequency is mainly controlled by subglottal pressure. Here, we report the results of a detailed acoustic analysis to investigate the production mechanism of Diana monkey alarm calls. Within calls, we found a positive correlation between the fundamental frequency and the pulse amplitude, suggesting that both humans and monkeys control fundamental frequency by subglottal pressure. While in humans pulsed phonation is usually considered pathological or artificial, male Diana monkeys rely exclusively on pulsed phonation, suggesting a functional adaptation. Moreover, we were unable to document any nonlinear phenomena, despite the fact that they occur frequently in the vocal repertoire of humans and nonhumans, further suggesting that the very robust Diana monkey pulse production mechanism has evolved for a particular functional purpose. We discuss the implications of these findings for the structural evolution of Diana monkey alarm calls and suggest that the restricted variability in fundamental frequency and robustness of the source signal gave rise to the formant patterns observed in Diana monkey alarm calls, used to convey predator information.
Mishima, Katsuaki; Moritani, Norifumi; Nakano, Hiroyuki; Matsushita, Asuka; Iida, Seiji; Ueyama, Yoshiya
2013-12-01
The purpose of this study was to explore the voice characteristics of patients with mandibular prognathism, and to investigate the effects of mandibular setback surgery on these characteristics using nonlinear dynamics and conventional acoustic analyses. Sixteen patients (8 males and 8 females) who had skeletal 3, class III malocclusion without cleft palate, and who underwent a bilateral sagittal split ramus osteotomy (BSSRO), were enrolled. As controls, 50 healthy adults (25 males and 25 females) were enrolled. The mean first LEs (mLE1) computed for each one-second interval, and the fundamental frequency (F0) and frequencies of the first and second formant (F1, F2) were calculated for each Japanese vowel. The mLE1s for /u/ in males, and /o/ in females and the F2s for /i/ and /u/ in males, changed significantly after BSSRO. Class III voice characteristics were observed in the mLE1s for /i/ in both males and females, in the F0 for /a/, /i/, /u/ and /o/ in females, and in the F1 and F2 for /a/ in males, and the F1 for /u/ and the F2 for /i/ in females. Most of these characteristics were preserved after BSSRO. Copyright © 2013 European Association for Cranio-Maxillo-Facial Surgery. Published by Elsevier Ltd. All rights reserved.
Perception of speech in noise: neural correlates.
Song, Judy H; Skoe, Erika; Banai, Karen; Kraus, Nina
2011-09-01
The presence of irrelevant auditory information (other talkers, environmental noises) presents a major challenge to listening to speech. The fundamental frequency (F(0)) of the target speaker is thought to provide an important cue for the extraction of the speaker's voice from background noise, but little is known about the relationship between speech-in-noise (SIN) perceptual ability and neural encoding of the F(0). Motivated by recent findings that music and language experience enhance brainstem representation of sound, we examined the hypothesis that brainstem encoding of the F(0) is diminished to a greater degree by background noise in people with poorer perceptual abilities in noise. To this end, we measured speech-evoked auditory brainstem responses to /da/ in quiet and two multitalker babble conditions (two-talker and six-talker) in native English-speaking young adults who ranged in their ability to perceive and recall SIN. Listeners who were poorer performers on a standardized SIN measure demonstrated greater susceptibility to the degradative effects of noise on the neural encoding of the F(0). Particularly diminished was their phase-locked activity to the fundamental frequency in the portion of the syllable known to be most vulnerable to perceptual disruption (i.e., the formant transition period). Our findings suggest that the subcortical representation of the F(0) in noise contributes to the perception of speech in noisy conditions.
Acoustic and phonatory characterization of the Fado voice.
Mendes, Ana P; Rodrigues, Aira F; Guerreiro, David Michael
2013-09-01
Fado is a Portuguese musical genre, instrumentally accompanied by a Portuguese and an acoustic guitar. Fado singers' voice is perceptually characterized by a low pitch, hoarse, and strained voice. The present research study sketches the acoustic and phonatory profile of the Fado singers' voice. Fifteen Fado singers produced spoken and sung phonatory tasks. For the spoken voice measures, the maximum phonation time and s/z ratio of Fado singers were near the inefficient physiological threshold. Fundamental frequency was higher than that found in nonsingers and lower than that found in Western Classical singers. Jitter and shimmer mean values were higher compared with nonsingers. Harmonic-to-noise ratio (HNR) was similar to the mean values for nonsingers. For the sung voice, jitter was higher compared with Country, Musical Theater, Soul, Jazz, and Western Classical singers and lower than Pop singers. Shimmer mean values were lower than Country, Musical Theater, Pop, Soul, and Jazz singers and higher than Western Classical singers. HNR was similar for Western Classical singers. Maximum phonational frequency range of Fado singers indicated that male and female subjects had a lower range compared with Western Classical singers. Additionally, Fado singers produced vibrato, but singer's formant was rarely produced. These sung voice characteristics could be related with life habits, less/lack of singing training, or could be just a Fado voice characteristic. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Examining Acoustic and Kinematic Measures of Articulatory Working Space: Effects of Speech Intensity
ERIC Educational Resources Information Center
Whitfield, Jason A.; Dromey, Christopher; Palmer, Panika
2018-01-01
Purpose: The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces. Method: Young adult speakers produced 3…
ERIC Educational Resources Information Center
Goswami, Usha; Fosker, Tim; Huss, Martina; Mead, Natasha; Szucs, Denes
2011-01-01
Across languages, children with developmental dyslexia have a specific difficulty with the neural representation of the sound structure (phonological structure) of speech. One likely cause of their difficulties with phonology is a perceptual difficulty in auditory temporal processing (Tallal, 1980). Tallal (1980) proposed that basic auditory…
Individual Sensitivity to Spectral and Temporal Cues in Listeners with Hearing Impairment
ERIC Educational Resources Information Center
Souza, Pamela E.; Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.
2015-01-01
Purpose: The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method: Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal…
ERIC Educational Resources Information Center
Seitz, Aaron R.; Protopapas, Athanassios; Tsushima, Yoshiaki; Vlahou, Eleni L.; Gori, Simone; Grossberg, Stephen; Watanabe, Takeo
2010-01-01
Learning a second language as an adult is particularly effortful when new phonetic representations must be formed. Therefore the processes that allow learning of speech sounds are of great theoretical and practical interest. Here we examined whether perception of single formant transitions, that is, sound components critical in speech perception,…
Speech production in experienced cochlear implant users undergoing short-term auditory deprivation
NASA Astrophysics Data System (ADS)
Greenman, Geoffrey; Tjaden, Kris; Kozak, Alexa T.
2005-09-01
This study examined the effect of short-term auditory deprivation on the speech production of five postlingually deafened women, all of whom were experienced cochlear implant users. Each cochlear implant user, as well as age and gender matched control speakers, produced CVC target words embedded in a reading passage. Speech samples for the deafened adults were collected on two separate occasions. First, the speakers were recorded after wearing their speech processor consistently for at least two to three hours prior to recording (implant ``ON''). The second recording occurred when the speakers had their speech processors turned off for approximately ten to twelve hours prior to recording (implant ``OFF''). Acoustic measures, including fundamental frequency (F0), the first (F1) and second (F2) formants of the vowels, vowel space area, vowel duration, spectral moments of the consonants, as well as utterance duration and sound pressure level (SPL) across the entire utterance were analyzed in both speaking conditions. For each implant speaker, acoustic measures will be compared across implant ``ON'' and implant ``OFF'' speaking conditions, and will also be compared to data obtained from normal hearing speakers.
Paradoxical vocal changes in a trained singer by focally cooling the right superior temporal gyrus
Katlowitz, Kalman A.; Oya, Hiroyuki; Howard, Matthew A.; Greenlee, Jeremy D.W.; Long, Michael A.
2017-01-01
The production and perception of music is preferentially mediated by cortical areas within the right hemisphere, but little is known about how these brain regions individually contribute to this process. In an experienced singer undergoing awake craniotomy, we demonstrated that direct electrical stimulation to a portion of the right posterior superior temporal gyrus (pSTG) selectively interrupted singing but not speaking. We then focally cooled this region to modulate its activity during vocalization. In contrast to similar manipulations in left hemisphere speech production regions, pSTG cooling did not elicit any changes in vocal timing or quality. However, this manipulation led to an increase in the pitch of speaking with no such change in singing. Further analysis revealed that all vocalizations exhibited a cooling-induced increase in the frequency of the first formant, raising the possibility that potential pitch offsets may have been actively avoided during singing. Our results suggest that the right pSTG plays a key role in vocal sensorimotor processing whose impact is dependent on the type of vocalization produced. PMID:28282570
Paradoxical vocal changes in a trained singer by focally cooling the right superior temporal gyrus.
Katlowitz, Kalman A; Oya, Hiroyuki; Howard, Matthew A; Greenlee, Jeremy D W; Long, Michael A
2017-04-01
The production and perception of music is preferentially mediated by cortical areas within the right hemisphere, but little is known about how these brain regions individually contribute to this process. In an experienced singer undergoing awake craniotomy, we demonstrated that direct electrical stimulation to a portion of the right posterior superior temporal gyrus (pSTG) selectively interrupted singing but not speaking. We then focally cooled this region to modulate its activity during vocalization. In contrast to similar manipulations in left hemisphere speech production regions, pSTG cooling did not elicit any changes in vocal timing or quality. However, this manipulation led to an increase in the pitch of speaking with no such change in singing. Further analysis revealed that all vocalizations exhibited a cooling-induced increase in the frequency of the first formant, raising the possibility that potential pitch offsets may have been actively avoided during singing. Our results suggest that the right pSTG plays a key role in vocal sensorimotor processing whose impact is dependent on the type of vocalization produced. Copyright © 2017 Elsevier Ltd. All rights reserved.
Frey, Roland; Gebler, Alban; Fritsch, Guido; Nygrén, Kaarlo; Weissengruber, Gerald E
2007-01-01
Laryngeal air sacs have evolved convergently in diverse mammalian lineages including insectivores, bats, rodents, pinnipeds, ungulates and primates, but their precise function has remained elusive. Among cervids, the vocal tract of reindeer has evolved an unpaired inflatable ventrorostral laryngeal air sac. This air sac is not present at birth but emerges during ontogenetic development. It protrudes from the laryngeal vestibulum via a short duct between the epiglottis and the thyroid cartilage. In the female the growth of the air sac stops at the age of 2–3 years, whereas in males it continues to grow up to the age of about 6 years, leading to a pronounced sexual dimorphism of the air sac. In adult females it is of moderate size (about 100 cm3), whereas in adult males it is large (3000–4000 cm3) and becomes asymmetric extending either to the left or to the right side of the neck. In both adult females and males the ventral air sac walls touch the integument. In the adult male the air sac is laterally covered by the mandibular portion of the sternocephalic muscle and the skin. Both sexes of reindeer have a double stylohyoid muscle and a thyroepiglottic muscle. Possibly these muscles assist in inflation of the air sac. Head-and-neck specimens were subjected to macroscopic anatomical dissection, computer tomographic analysis and skeletonization. In addition, isolated larynges were studied for comparison. Acoustic recordings were made during an autumn round-up of semi-domestic reindeer in Finland and in a small zoo herd. Male reindeer adopt a specific posture when emitting their serial hoarse rutting calls. Head and neck are kept low and the throat region is extended. In the ventral neck region, roughly corresponding to the position of the large air sac, there is a mane of longer hairs. Neck swelling and mane spreading during vocalization may act as an optical signal to other males and females. The air sac, as a side branch of the vocal tract, can be considered as an additional acoustic filter. Individual acoustic recognition may have been the primary function in the evolution of a size-variable air sac, and this function is retained in mother–young communication. In males sexual selection seems to have favoured a considerable size increase of the air sac and a switch to call series instead of single calls. Vocalization became restricted to the rutting period serving the attraction of females. We propose two possibilities for the acoustic function of the air sac in vocalization that do not exclude each other. The first assumes a coupling between air sac and the environment, resulting in an acoustic output that is a combination of the vocal tract resonance frequencies emitted via mouth and nostrils and the resonance frequencies of the air sac transmitted via the neck skin. The second assumes a weak coupling so that resonance frequencies of the air sac are lost to surrounding tissues by dissipation. In this case the resonance frequencies of the air sac solely influence the signal that is further filtered by the remaining vocal tract. According to our results one acoustic effect of the air sac in adult reindeer might be to mask formants of the vocal tract proper. In other cervid species, however, formants of rutting calls convey essential information on the quality of the sender, related to its potential reproductive success, to conspecifics. Further studies are required to solve this inconsistency. PMID:17310544
Vowels in infant-directed speech: More breathy and more variable, but not clearer.
Miyazawa, Kouki; Shinya, Takahito; Martin, Andrew; Kikuchi, Hideaki; Mazuka, Reiko
2017-09-01
Infant-directed speech (IDS) is known to differ from adult-directed speech (ADS) in a number of ways, and it has often been argued that some of these IDS properties facilitate infants' acquisition of language. An influential study in support of this view is Kuhl et al. (1997), which found that vowels in IDS are produced with expanded first and second formants (F1/F2) on average, indicating that the vowels are acoustically further apart in IDS than in ADS. These results have been interpreted to mean that the way vowels are produced in IDS makes infants' task of learning vowel categories easier. The present paper revisits this interpretation by means of a thorough analysis of IDS vowels using a large-scale corpus of Japanese natural utterances. We will show that the expansion of F1/F2 values does occur in spontaneous IDS even when the vowels' prosodic position, lexical pitch accent, and lexical bias are accounted for. When IDS vowels are compared to carefully read speech (CS) by the same mothers, however, larger variability among IDS vowel tokens means that the acoustic distances among vowels are farther apart only in CS, but not in IDS when compared to ADS. Finally, we will show that IDS vowels are significantly more breathy than ADS or CS vowels. Taken together, our results demonstrate that even though expansion of formant values occurs in spontaneous IDS, this expansion cannot be interpreted as an indication that the acoustic distances among vowels are farther apart, as is the case in CS. Instead, we found that IDS vowels are characterized by breathy voice, which has been associated with the communication of emotional affect. Copyright © 2017 Elsevier B.V. All rights reserved.
[Acoustic characteristics of adductor spasmodic dysphonia].
Yang, Yang; Wang, Li-Ping
2008-06-01
To explore the acoustic characteristics of adductor spasmodic dysphonia. The acoustic characteristics, including acoustic signal of recorded voice, three-dimensional sonogram patterns and subjective assessment of voice, between 10 patients (7 women, 3 men) with adductor spasmodic dysphonia and 10 healthy volunteers (5 women, 5 men), were compared. The main clinical manifestation of adductor spasmodic dysphonia included the disorders of sound quality, rhyme and fluency. It demonstrated the tension dysphonia when reading, acoustic jitter, momentary fluctuation of frequency and volume, voice squeezing, interruption, voice prolongation, and losing normal chime. Among 10 patients, there were 1 mild dysphonia (abnormal syllable number < 25%), 6 moderate dysphonia (abnormal syllable number 25%-49%), 1 severe dysphonia (abnormal syllable number 50%-74%) and 2 extremely severe dysphonia (abnormal syllable number > or = 75%). The average reading time in 10 patients was 49 s, with reading time extension and aphasia area interruption in acoustic signals, whereas the average reading time in health control group was 30 s, without voice interruption. The aphasia ratio averaged 42%. The respective symptom syllable in different patients demonstrated in the three-dimensional sonogram. There were voice onset time prolongation, irregular, interrupted and even absent vowel formants. The consonant of symptom syllables displayed absence or prolongation of friction murmur in the block-friction murmur occasionally. The acoustic characteristics of adductor spasmodic dysphonia is the disorders of sound quality, rhyme and fluency. The three-dimensional sonogram of the symptom syllables show distinctive changes of proportional vowels or consonant phonemes.
Examining the Effects of Multiple Sclerosis on Speech Production: Does Phonetic Structure Matter?
ERIC Educational Resources Information Center
Rosen, Kristin M.; Goozee, Justine V.; Murdoch, Bruce E.
2008-01-01
The second formant (F2) is well-known to be important to intelligibility (e.g. [Delattre, P., Liberman, A., & Cooper, F. (1955). Acoustic loci and transitional cues for consonants. "Journal of the Acoustical Society of America, 27", 769-774]) and is affected by a variety of dysarthrias [Weismer, G., & Martin, R. (1992). Acoustic and perceptual…
ERIC Educational Resources Information Center
Tomita, Kaoru; Yamada, Jun; Takatsuka, Shigenobu
2010-01-01
This study investigated how Japanese-speaking learners of English pronounce the three point vowels /i/, /u/, and /a/ appearing in the first and second monosyllabic words of English noun phrases, and the schwa /[image omitted]/ appearing in English disyllabic words. First and second formant (F1 and F2) values were measured for four Japanese…
The Acoustic Characteristics of Diphthongs in Indian English
ERIC Educational Resources Information Center
Maxwell, Olga; Fletcher, Janet
2010-01-01
This paper presents the results of an acoustic analysis of English diphthongs produced by three L1 speakers of Hindi and four L1 speakers of Punjabi. Formant trajectories of rising and falling diphthongs (i.e., vowels where there is a clear rising or falling trajectory through the F1/F2 vowel space) were analysed in a corpus of citation-form…
ERIC Educational Resources Information Center
Recasens, Daniel
2015-01-01
Purpose: The goal of this study was to ascertain the effect of changes in stress and speech rate on vowel coarticulation in vowel-consonant-vowel sequences. Method: Data on second formant coarticulatory effects as a function of changing /i/ versus /a/ were collected for five Catalan speakers' productions of vowel-consonant-vowel sequences with the…
Individual Sensitivity to Spectral and Temporal Cues in Listeners With Hearing Impairment
Wright, Richard A.; Blackburn, Michael C.; Tatman, Rachael; Gallun, Frederick J.
2015-01-01
Purpose The present study was designed to evaluate use of spectral and temporal cues under conditions in which both types of cues were available. Method Participants included adults with normal hearing and hearing loss. We focused on 3 categories of speech cues: static spectral (spectral shape), dynamic spectral (formant change), and temporal (amplitude envelope). Spectral and/or temporal dimensions of synthetic speech were systematically manipulated along a continuum, and recognition was measured using the manipulated stimuli. Level was controlled to ensure cue audibility. Discriminant function analysis was used to determine to what degree spectral and temporal information contributed to the identification of each stimulus. Results Listeners with normal hearing were influenced to a greater extent by spectral cues for all stimuli. Listeners with hearing impairment generally utilized spectral cues when the information was static (spectral shape) but used temporal cues when the information was dynamic (formant transition). The relative use of spectral and temporal dimensions varied among individuals, especially among listeners with hearing loss. Conclusion Information about spectral and temporal cue use may aid in identifying listeners who rely to a greater extent on particular acoustic cues and applying that information toward therapeutic interventions. PMID:25629388
Fu, Q Y; Liang, Y; Zou, A; Wang, T; Zhao, X D; Wan, J
2016-04-07
To investigate the relationships between electrophysiological characteristic of speech evoked auditory brainstem response(s-ABR) and Mandarin phonetically balanced maximum(PBmax) at different hearing impairment, so as to provide more clues for the mechanism of speech cognitive behavior. Forty-one ears in 41 normal hearing adults(NH), thirty ears in 30 conductive hearing loss patients(CHL) and twenty-seven ears in 27 sensorineural hearing loss patients(SNHL) were included in present study. The speech discrimination scores were obtained by Mandarin phonemic-balanced monosyllable lists via speech audiometric software. Their s-ABRs were recorded with speech syllables /da/ with the intensity of phonetically balanced maximum(PBmax). The electrophysiological characteristic of s-ABR, as well as the relationships between PBmax and s-ABR parameters including latency in time domain, fundamental frequency(F0) and first formant(F1) in frequency domain were analyzed statistically. All subjects completed good speech perception tests and PBmax of CHL and SNHL had no significant difference (P>0.05), but both significantly less than that of NH (P<0.05). While divided the subjects into three groups by 90%
Mochida, Takemi; Gomi, Hiroaki; Kashino, Makio
2010-11-08
There has been plentiful evidence of kinesthetically induced rapid compensation for unanticipated perturbation in speech articulatory movements. However, the role of auditory information in stabilizing articulation has been little studied except for the control of voice fundamental frequency, voice amplitude and vowel formant frequencies. Although the influence of auditory information on the articulatory control process is evident in unintended speech errors caused by delayed auditory feedback, the direct and immediate effect of auditory alteration on the movements of articulators has not been clarified. This work examined whether temporal changes in the auditory feedback of bilabial plosives immediately affects the subsequent lip movement. We conducted experiments with an auditory feedback alteration system that enabled us to replace or block speech sounds in real time. Participants were asked to produce the syllable /pa/ repeatedly at a constant rate. During the repetition, normal auditory feedback was interrupted, and one of three pre-recorded syllables /pa/, /Φa/, or /pi/, spoken by the same participant, was presented once at a different timing from the anticipated production onset, while no feedback was presented for subsequent repetitions. Comparisons of the labial distance trajectories under altered and normal feedback conditions indicated that the movement quickened during the short period immediately after the alteration onset, when /pa/ was presented 50 ms before the expected timing. Such change was not significant under other feedback conditions we tested. The earlier articulation rapidly induced by the progressive auditory input suggests that a compensatory mechanism helps to maintain a constant speech rate by detecting errors between the internally predicted and actually provided auditory information associated with self movement. The timing- and context-dependent effects of feedback alteration suggest that the sensory error detection works in a temporally asymmetric window where acoustic features of the syllable to be produced may be coded.
Elie, Julie E.; Theunissen, Frédéric E.
2018-01-01
Although a universal code for the acoustic features of animal vocal communication calls may not exist, the thorough analysis of the distinctive acoustical features of vocalization categories is important not only to decipher the acoustical code for a specific species but also to understand the evolution of communication signals and the mechanisms used to produce and understand them. Here, we recorded more than 8,000 examples of almost all the vocalizations of the domesticated zebra finch, Taeniopygia guttata: vocalizations produced to establish contact, to form and maintain pair bonds, to sound an alarm, to communicate distress or to advertise hunger or aggressive intents. We characterized each vocalization type using complete representations that avoided any a priori assumptions on the acoustic code, as well as classical bioacoustics measures that could provide more intuitive interpretations. We then used these acoustical features to rigorously determine the potential information-bearing acoustical features for each vocalization type using both a novel regularized classifier and an unsupervised clustering algorithm. Vocalization categories are discriminated by the shape of their frequency spectrum and by their pitch saliency (noisy to tonal vocalizations) but not particularly by their fundamental frequency. Notably, the spectral shape of zebra finch vocalizations contains peaks or formants that vary systematically across categories and that would be generated by active control of both the vocal organ (source) and the upper vocal tract (filter). PMID:26581377
The speech intelligibility at the opera singing.
Novák, A; Vokrál, J
2000-01-01
The authors investigated the speech intelligibility at opera singing. They analysed several arias sung by soprano voices from Czech opera "Rusalka" (The water Nymph), from Puccini's opera "La Boheme" and several parts of arias from "Il barbiere di Siviglia" sung by baritone, tenor, bass and soprano. The sonographic pictures of selected arias were compared with a subjective evaluation. The difference between both authors was about 70%. The opinion is, that the singer's formant is not the only one problem having its role in the speech intelligibility of opera singing. The important role is played by the ability to change the shape of vocal tract and the ability of rapid and exact articulatory movements. This ability influences the shape of transients that are important at the normal speaking and also in the singing.
The dispersion-focalization theory of sound systems
NASA Astrophysics Data System (ADS)
Schwartz, Jean-Luc; Abry, Christian; Boë, Louis-Jean; Vallée, Nathalie; Ménard, Lucie
2005-04-01
The Dispersion-Focalization Theory states that sound systems in human languages are shaped by two major perceptual constraints: dispersion driving auditory contrast towards maximal or sufficient values [B. Lindblom, J. Phonetics 18, 135-152 (1990)] and focalization driving auditory spectra towards patterns with close neighboring formants. Dispersion is computed from the sum of the inverse squared inter-spectra distances in the (F1, F2, F3, F4) space, using a non-linear process based on the 3.5 Bark critical distance to estimate F2'. Focalization is based on the idea that close neighboring formants produce vowel spectra with marked peaks, easier to process and memorize in the auditory system. Evidence for increased stability of focal vowels in short-term memory was provided in a discrimination experiment on adult French subjects [J. L. Schwartz and P. Escudier, Speech Comm. 8, 235-259 (1989)]. A reanalysis of infant discrimination data shows that focalization could well be the responsible for recurrent discrimination asymmetries [J. L. Schwartz et al., Speech Comm. (in press)]. Recent data about children vowel production indicate that focalization seems to be part of the perceptual templates driving speech development. The Dispersion-Focalization Theory produces valid predictions for both vowel and consonant systems, in relation with available databases of human languages inventories.
Carey, Daniel; Miquel, Marc E.; Evans, Bronwen G.; Adank, Patti; McGettigan, Carolyn
2017-01-01
Abstract Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants’ vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST. PMID:28334401
Panchapagesan, Sankaran; Alwan, Abeer
2011-01-01
In this paper, a quantitative study of acoustic-to-articulatory inversion for vowel speech sounds by analysis-by-synthesis using the Maeda articulatory model is performed. For chain matrix calculation of vocal tract (VT) acoustics, the chain matrix derivatives with respect to area function are calculated and used in a quasi-Newton method for optimizing articulatory trajectories. The cost function includes a distance measure between natural and synthesized first three formants, and parameter regularization and continuity terms. Calibration of the Maeda model to two speakers, one male and one female, from the University of Wisconsin x-ray microbeam (XRMB) database, using a cost function, is discussed. Model adaptation includes scaling the overall VT and the pharyngeal region and modifying the outer VT outline using measured palate and pharyngeal traces. The inversion optimization is initialized by a fast search of an articulatory codebook, which was pruned using XRMB data to improve inversion results. Good agreement between estimated midsagittal VT outlines and measured XRMB tongue pellet positions was achieved for several vowels and diphthongs for the male speaker, with average pellet-VT outline distances around 0.15 cm, smooth articulatory trajectories, and less than 1% average error in the first three formants. PMID:21476670
Two-dimensional vocal tracts with three-dimensional behavior in the numerical generation of vowels.
Arnela, Marc; Guasch, Oriol
2014-01-01
Two-dimensional (2D) numerical simulations of vocal tract acoustics may provide a good balance between the high quality of three-dimensional (3D) finite element approaches and the low computational cost of one-dimensional (1D) techniques. However, 2D models are usually generated by considering the 2D vocal tract as a midsagittal cut of a 3D version, i.e., using the same radius function, wall impedance, glottal flow, and radiation losses as in 3D, which leads to strong discrepancies in the resulting vocal tract transfer functions. In this work, a four step methodology is proposed to match the behavior of 2D simulations with that of 3D vocal tracts with circular cross-sections. First, the 2D vocal tract profile becomes modified to tune the formant locations. Second, the 2D wall impedance is adjusted to fit the formant bandwidths. Third, the 2D glottal flow gets scaled to recover 3D pressure levels. Fourth and last, the 2D radiation model is tuned to match the 3D model following an optimization process. The procedure is tested for vowels /a/, /i/, and /u/ and the obtained results are compared with those of a full 3D simulation, a conventional 2D approach, and a 1D chain matrix model.
Carey, Daniel; Miquel, Marc E; Evans, Bronwen G; Adank, Patti; McGettigan, Carolyn
2017-05-01
Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants' vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST. © The Author 2017. Published by Oxford University Press.
A theoretical study of F0-F1 interaction with application to resonant speaking and singing voice.
Titze, Ingo R
2004-09-01
An interactive source-filter system, consisting of a three-mass body-cover model of the vocal folds and a wave reflection model of the vocal tract, was used to test the dependence of vocal fold vibration on the vocal tract. The degree of interaction is governed by the epilarynx tube, which raises the vocal tract impedance to match the impedance of the glottis. The key component of the impedance is inertive reactance. Whenever there is inertive reactance, the vocal tract assists the vocal folds in vibration. The amplitude of vibration and the glottal flow can more than double, and the oral radiated power can increase up to 10 dB. As F0 approaches F1, the first formant frequency, the interactive source-filter system loses its advantage (because inertive reactance changes to compliant reactance) and the noninteractive system produces greater vocal output. Thus, from a voice training and control standpoint, there may be reasons to operate the system in either interactive and noninteractive modes. The harmonics 2F0 and 3F0 can also benefit from being positioned slightly below F1.
Beautemps, D; Badin, P; Bailly, G
2001-05-01
The following contribution addresses several issues concerning speech degrees of freedom in French oral vowels, stop, and fricative consonants based on an analysis of tongue and lip shapes extracted from cineradio- and labio-films. The midsagittal tongue shapes have been submitted to a linear decomposition where some of the loading factors were selected such as jaw and larynx position while four other components were derived from principal component analysis (PCA). For the lips, in addition to the more traditional protrusion and opening components, a supplementary component was extracted to explain the upward movement of both the upper and lower lips in [v] production. A linear articulatory model was developed; the six tongue degrees of freedom were used as the articulatory control parameters of the midsagittal tongue contours and explained 96% of the tongue data variance. These control parameters were also used to specify the frontal lip width dimension derived from the labio-film front views. Finally, this model was complemented by a conversion model going from the midsagittal to the area function, based on a fitting of the midsagittal distances and the formant frequencies for both vowels and consonants.
Mechanisms of Vowel Variation in African American English.
Holt, Yolanda Feimster
2018-02-15
This research explored mechanisms of vowel variation in African American English by comparing 2 geographically distant groups of African American and White American English speakers for participation in the African American Shift and the Southern Vowel Shift. Thirty-two male (African American: n = 16, White American controls: n = 16) lifelong residents of cities in eastern and western North Carolina produced heed,hid,heyd,head,had,hod,hawed,whod,hood,hoed,hide,howed,hoyd, and heard 3 times each in random order. Formant frequency, duration, and acoustic analyses were completed for the vowels /i, ɪ, e, ɛ, æ, ɑ, ɔ, u, ʊ, o, aɪ, aʊ, oɪ, ɝ/ produced in the listed words. African American English speakers show vowel variation. In the west, the African American English speakers are participating in the Southern Vowel Shift and hod fronting of the African American Shift. In the east, neither the African American English speakers nor their White peers are participating in the Southern Vowel Shift. The African American English speakers show limited participation in the African American Shift. The results provide evidence of regional and socio-ethnic variation in African American English in North Carolina.
The influence of phonological priming on variability in articulation
NASA Astrophysics Data System (ADS)
Babel, Molly E.; Munson, Benjamin
2004-05-01
Previous research [Sevald and Dell, Cognition 53, 91-127 (1994)] has found that reiterant sequences of CVC words are produced more quickly when the prime word and target word share VC sequences (i.e., sequences like sit sick) than when they are identical (sequences like sick sick). Even slower production rates are found when primes and targets share a CV sequence (sequences like kick sick). These data have been used to support a model of speech production in which lexical items and their constituent phonemes are activated sequentially. The current experiment investigated whether phonological priming also influences variability in the acoustic characteristics of words. Specifically, we examined whether greater variability in the acoustic characteristics of target words was noted in the CV-related prime context than in the identical-prime context, and whether less variability was noted in the VC-related context. Thirty adult subjects with typical speech, language, and hearing ability produced reiterant two-word sequences that varied in their phonological similarity. The duration, first, and second formant frequencies of the target-words' vowels were measured. Preliminary analyses indicate that phonological priming does not have a systematic effect on variability in these acoustic parameters.
Revisiting the "enigma" of musicians with dyslexia: Auditory sequencing and speech abilities.
Zuk, Jennifer; Bishop-Liebler, Paula; Ozernov-Palchik, Ola; Moore, Emma; Overy, Katie; Welch, Graham; Gaab, Nadine
2017-04-01
Previous research has suggested a link between musical training and auditory processing skills. Musicians have shown enhanced perception of auditory features critical to both music and speech, suggesting that this link extends beyond basic auditory processing. It remains unclear to what extent musicians who also have dyslexia show these specialized abilities, considering often-observed persistent deficits that coincide with reading impairments. The present study evaluated auditory sequencing and speech discrimination in 52 adults comprised of musicians with dyslexia, nonmusicians with dyslexia, and typical musicians. An auditory sequencing task measuring perceptual acuity for tone sequences of increasing length was administered. Furthermore, subjects were asked to discriminate synthesized syllable continua varying in acoustic components of speech necessary for intraphonemic discrimination, which included spectral (formant frequency) and temporal (voice onset time [VOT] and amplitude envelope) features. Results indicate that musicians with dyslexia did not significantly differ from typical musicians and performed better than nonmusicians with dyslexia for auditory sequencing as well as discrimination of spectral and VOT cues within syllable continua. However, typical musicians demonstrated superior performance relative to both groups with dyslexia for discrimination of syllables varying in amplitude information. These findings suggest a distinct profile of speech processing abilities in musicians with dyslexia, with specific weaknesses in discerning amplitude cues within speech. Because these difficulties seem to remain persistent in adults with dyslexia despite musical training, this study only partly supports the potential for musical training to enhance the auditory processing skills known to be crucial for literacy in individuals with dyslexia. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
NASA Astrophysics Data System (ADS)
Toscano, Joseph Christopher
Several fundamental questions about speech perception concern how listeners understand spoken language despite considerable variability in speech sounds across different contexts (the problem of lack of invariance in speech). This contextual variability is caused by several factors, including differences between individual talkers' voices, variation in speaking rate, and effects of coarticulatory context. A number of models have been proposed to describe how the speech system handles differences across contexts. Critically, these models make different predictions about (1) whether contextual variability is handled at the level of acoustic cue encoding or categorization, (2) whether it is driven by feedback from category-level processes or interactions between cues, and (3) whether listeners discard fine-grained acoustic information to compensate for contextual variability. Separating the effects of cue- and category-level processing has been difficult because behavioral measures tap processes that occur well after initial cue encoding and are influenced by task demands and linguistic information. Recently, we have used the event-related brain potential (ERP) technique to examine cue encoding and online categorization. Specifically, we have looked at differences in the auditory N1 as a measure of acoustic cue encoding and the P3 as a measure of categorization. This allows us to examine multiple levels of processing during speech perception and can provide a useful tool for studying effects of contextual variability. Here, I apply this approach to determine the point in processing at which context has an effect on speech perception and to examine whether acoustic cues are encoded continuously. Several types of contextual variability (talker gender, speaking rate, and coarticulation), as well as several acoustic cues (voice onset time, formant frequencies, and bandwidths), are examined in a series of experiments. The results suggest that (1) at early stages of speech processing, listeners encode continuous differences in acoustic cues, independent of phonological categories; (2) at post-perceptual stages, fine-grained acoustic information is preserved; and (3) there is preliminary evidence that listeners encode cues relative to context via feedback from categories. These results are discussed in relation to proposed models of speech perception and sources of contextual variability.
Winn, Matthew B; Won, Jong Ho; Moon, Il Joon
This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). The authors hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. The authors further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Nineteen cochlear implant listeners and 10 listeners with normal hearing participated in a suite of tasks that included spectral ripple discrimination, temporal modulation detection, and syllable categorization, which was split into a spectral cue-based task (targeting the /ba/-/da/ contrast) and a timing cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for cochlear implant listeners. Cochlear implant users were generally less successful at utilizing both spectral and temporal cues for categorization compared with listeners with normal hearing. For the cochlear implant listener group, spectral ripple discrimination was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. Temporal modulation detection using 100- and 10-Hz-modulated noise was not correlated either with the cochlear implant subjects' categorization of voice onset time or with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart nonlinguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (voice onset time) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language.
Winn, Matthew B.; Won, Jong Ho; Moon, Il Joon
2016-01-01
Objectives This study was conducted to measure auditory perception by cochlear implant users in the spectral and temporal domains, using tests of either categorization (using speech-based cues) or discrimination (using conventional psychoacoustic tests). We hypothesized that traditional nonlinguistic tests assessing spectral and temporal auditory resolution would correspond to speech-based measures assessing specific aspects of phonetic categorization assumed to depend on spectral and temporal auditory resolution. We further hypothesized that speech-based categorization performance would ultimately be a superior predictor of speech recognition performance, because of the fundamental nature of speech recognition as categorization. Design Nineteen CI listeners and 10 listeners with normal hearing (NH) participated in a suite of tasks that included spectral ripple discrimination (SRD), temporal modulation detection (TMD), and syllable categorization, which was split into a spectral-cue-based task (targeting the /ba/-/da/ contrast) and a timing-cue-based task (targeting the /b/-/p/ and /d/-/t/ contrasts). Speech sounds were manipulated in order to contain specific spectral or temporal modulations (formant transitions or voice onset time, respectively) that could be categorized. Categorization responses were quantified using logistic regression in order to assess perceptual sensitivity to acoustic phonetic cues. Word recognition testing was also conducted for CI listeners. Results CI users were generally less successful at utilizing both spectral and temporal cues for categorization compared to listeners with normal hearing. For the CI listener group, SRD was significantly correlated with the categorization of formant transitions; both were correlated with better word recognition. TMD using 100 Hz and 10 Hz modulated noise was not correlated with the CI subjects’ categorization of VOT, nor with word recognition. Word recognition was correlated more closely with categorization of the controlled speech cues than with performance on the psychophysical discrimination tasks. Conclusions When evaluating people with cochlear implants, controlled speech-based stimuli are feasible to use in tests of auditory cue categorization, to complement traditional measures of auditory discrimination. Stimuli based on specific speech cues correspond to counterpart non-linguistic measures of discrimination, but potentially show better correspondence with speech perception more generally. The ubiquity of the spectral (formant transition) and temporal (VOT) stimulus dimensions across languages highlights the potential to use this testing approach even in cases where English is not the native language. PMID:27438871
Knez Ambrožič, Mojca; Hočevar Boltežar, Irena; Ihan Hren, Nataša
2015-09-01
Skeletal anterior open bite (AOB) or apertognathism is characterized by the absence of contact of the anterior teeth and affects articulation parameters, chewing, biting and voice quality. The treatment of AOB consists of orthognatic surgical procedures. The aim of this study was to evaluate the effects of treatment on voice quality, articulation and nasality in speech with respect to skeletal changes. The study was prospective; 15 patients with AOB were evaluated before and after surgery. Lateral cephalometric x-ray parameters (facial angle, interincisal distance, Wits appraisal) were measured to determine skeletal changes. Before surgery, nine patients still had articulation disorders despite speech therapy during childhood. The voice quality parameters were determined by acoustic analysis of the vowel sound /a/ (fundamental frequency-F0, jitter, shimmer). Spectral analysis of vowels /a/, /e/, /i/, /o/, /u/ was carried out by determining the mean frequency of the first (F1) and second (F2) formants. Nasality in speech was expressed as the ratio between the nasal and the oral sound energies during speech samples. After surgery, normalizations of facial skeletal parameters were observed in all patients, but no statistically significant changes in articulation and voice quality parameters occurred despite subjective observations of easier articulation. Any deterioration in velopharyngeal insufficiency was absent in all of the patients. In conclusion, the surgical treatment of skeletal AOB does not lead to deterioration in voice, resonance and articulation qualities. Despite surgical correction of the unfavourable skeletal situation of the speech apparatus, the pre-existing articulation disorder cannot improve without professional intervention.
Roy, Nelson; Nissen, Shawn L; Dromey, Christopher; Sapir, Shimon
2009-01-01
In a preliminary study, we documented significant changes in formant transitions associated with successful manual circumlaryngeal treatment (MCT) of muscle tension dysphonia (MTD), suggesting improvement in speech articulation. The present study explores further the effects of MTD on vowel articulation by means of additional vowel acoustic measures. Pre- and post-treatment audio recordings of 111 women with MTD were analyzed acoustically using two measures: vowel space area (VSA) and vowel articulation index (VAI), constructed using the first (F1) and second (F2) formants of 4 point vowels/ a, i, ae, u/, extracted from eight words within a standard reading passage. Pairwise t-tests revealed significant increases in both VSA and VAI, confirming that successful treatment of MTD is associated with vowel space expansion. Although MTD is considered a voice disorder, its treatment with MCT appears to positively affect vocal tract dynamics. While the precise mechanism underlying vowel space expansion remains unknown, improvements may be related to lowering of the larynx, expanding oropharyngeal space, and improving articulatory movements. The reader will be able to: (1) describe possible articulatory changes associated with successful treatment of muscle tension dysphonia; (2) describe two acoustic methods to assess vowel centralization and decentralization, and; (3) understand the basis for viewing muscle tension dysphonia as a disorder not solely confined to the larynx.
Bernardini, Francesco; Lunden, Anya; Covington, Michael; Broussard, Beth; Halpern, Brooke; Alolayan, Yazeed; Crisafio, Anthony; Pauselli, Luca; Balducci, Pierfrancesco M; Capulong, Leslie; Attademo, Luigi; Lucarini, Emanuela; Salierno, Gianfranco; Natalicchi, Luca; Quartesan, Roberto; Compton, Michael T
2016-05-30
This is the first cross-language study of the effect of schizophrenia on speech as measured by analyzing phonetic parameters with sound spectrography. We hypothesized that reduced variability in pitch and formants would be correlated with negative symptom severity in two samples of patients with schizophrenia, one from Italy, and one from the United States. Audio recordings of spontaneous speech were available from 40 patients. From each speech sample, a file of F0 (pitch) and formant values (F1 and F2, resonance bands indicating the moment-by-moment shape of the oral cavity), and the portion of the recording in which there was speaking ("fraction voiced," FV), was created. Correlations between variability in the phonetic indices and negative symptom severity were tested and further examined using regression analyses. Meaningful negative correlations between Scale for the Assessment of Negative Symptoms (SANS) total score and standard deviation (SD) of F2, as well as variability in pitch (SD F0) were observed in the Italian sample. We also found meaningful associations of SANS affective flattening and SANS alogia with SD F0, and of SANS avolition/apathy and SD F2 in the Italian sample. In both samples, FV was meaningfully correlated with SANS total score, avolition/apathy, and anhedonia/asociality. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
1982-03-01
13: p. 27]. There are some connected-speech reccgnizers on the market today but they are expensive * 8 ($50,0-$10e,200) and their capabilities have...readout, end stock market quotationsrRef. 17: p. 6]. The second voice response technique, formant sjrthesis, uses a method in which a word library (again...users. Marketing brochures, therefore, should be looked 2t rather carefully, the best guarantee cf recogniticr. accuracy being a test with the desired
Stress-Induced Acoustic Variation in L2 and L1 Spanish Vowels.
Romanelli, Sofía; Menegotto, Andrea; Smyth, Ron
2018-05-28
We assessed the effect of lexical stress on the duration and quality of Spanish word-final vowels /a, e, o/ produced by American English late intermediate learners of L2 Spanish, as compared to those of native L1 Argentine Spanish speakers. Participants read 54 real words ending in /a, e, o/, with either final or penultimate lexical stress, embedded in a text and a word list. We measured vowel duration and both F1 and F2 frequencies at 3 temporal points. stressed vowels were longer than unstressed vowels, in Spanish L1 and L2. L1 and L2 Spanish stressed /a/ and /e/ had higher F1 values than their unstressed counterparts. Only the L2 speakers showed evidence of rising offglides for /e/ and /o/. The L2 and L1 Spanish vowel space was compressed in the absence of stress. Lexical stress affected the vowel quality of L1 and L2 Spanish vowels. We provide an up-to-date account of the formant trajectories of Argentine River Plate Spanish word-final /a, e, o/ and offer experimental support to the claim that stress affects the quality of Spanish vowels in word-final contexts. © 2018 S. Karger AG, Basel.
Enzinger, Ewald; Morrison, Geoffrey Stewart
2017-08-01
In a 2012 case in New South Wales, Australia, the identity of a speaker on several audio recordings was in question. Forensic voice comparison testimony was presented based on an auditory-acoustic-phonetic-spectrographic analysis. No empirical demonstration of the validity and reliability of the analytical methodology was presented. Unlike the admissibility standards in some other jurisdictions (e.g., US Federal Rule of Evidence 702 and the Daubert criteria, or England & Wales Criminal Practice Directions 19A), Australia's Unified Evidence Acts do not require demonstration of the validity and reliability of analytical methods and their implementation before testimony based upon them is presented in court. The present paper reports on empirical tests of the performance of an acoustic-phonetic-statistical forensic voice comparison system which exploited the same features as were the focus of the auditory-acoustic-phonetic-spectrographic analysis in the case, i.e., second-formant (F2) trajectories in /o/ tokens and mean fundamental frequency (f0). The tests were conducted under conditions similar to those in the case. The performance of the acoustic-phonetic-statistical system was very poor compared to that of an automatic system. Copyright © 2017 Elsevier B.V. All rights reserved.
Data-driven automated acoustic analysis of human infant vocalizations using neural network tools.
Warlaumont, Anne S; Oller, D Kimbrough; Buder, Eugene H; Dale, Rick; Kozma, Robert
2010-04-01
Acoustic analysis of infant vocalizations has typically employed traditional acoustic measures drawn from adult speech acoustics, such as f(0), duration, formant frequencies, amplitude, and pitch perturbation. Here an alternative and complementary method is proposed in which data-derived spectrographic features are central. 1-s-long spectrograms of vocalizations produced by six infants recorded longitudinally between ages 3 and 11 months are analyzed using a neural network consisting of a self-organizing map and a single-layer perceptron. The self-organizing map acquires a set of holistic, data-derived spectrographic receptive fields. The single-layer perceptron receives self-organizing map activations as input and is trained to classify utterances into prelinguistic phonatory categories (squeal, vocant, or growl), identify the ages at which they were produced, and identify the individuals who produced them. Classification performance was significantly better than chance for all three classification tasks. Performance is compared to another popular architecture, the fully supervised multilayer perceptron. In addition, the network's weights and patterns of activation are explored from several angles, for example, through traditional acoustic measurements of the network's receptive fields. Results support the use of this and related tools for deriving holistic acoustic features directly from infant vocalization data and for the automatic classification of infant vocalizations.
Soltis, Joseph; Blowers, Tracy E; Savage, Anne
2011-02-01
As in other mammals, there is evidence that the African elephant voice reflects affect intensity, but it is less clear if positive and negative affective states are differentially reflected in the voice. An acoustic comparison was made between African elephant "rumble" vocalizations produced in negative social contexts (dominance interactions), neutral social contexts (minimal social activity), and positive social contexts (affiliative interactions) by four adult females housed at Disney's Animal Kingdom®. Rumbles produced in the negative social context exhibited higher and more variable fundamental frequencies (F(0)) and amplitudes, longer durations, increased voice roughness, and higher first formant locations (F1), compared to the neutral social context. Rumbles produced in the positive social context exhibited similar shifts in most variables (F(0 )variation, amplitude, amplitude variation, duration, and F1), but the magnitude of response was generally less than that observed in the negative context. Voice roughness and F(0) observed in the positive social context remained similar to that observed in the neutral context. These results are most consistent with the vocal expression of affect intensity, in which the negative social context elicited higher intensity levels than the positive context, but differential vocal expression of positive and negative affect cannot be ruled out.
NASA Astrophysics Data System (ADS)
Yildirim, Serdar; Montanari, Simona; Andersen, Elaine; Narayanan, Shrikanth S.
2003-10-01
Understanding the fine details of children's speech and gestural characteristics helps, among other things, in creating natural computer interfaces. We analyze the acoustic, lexical/non-lexical and spoken/gestural discourse characteristics of young children's speech using audio-video data gathered using a Wizard of Oz technique from 4 to 6 year old children engaged in resolving a series of age-appropriate cognitive challenges. Fundamental and formant frequencies exhibited greater variations between subjects consistent with previous results on read speech [Lee et al., J. Acoust. Soc. Am. 105, 1455-1468 (1999)]. Also, our analysis showed that, in a given bandwidth, phonemic information contained in the speech of young child is significantly less than that of older ones and adults. To enable an integrated analysis, a multi-track annotation board was constructed using the ANVIL tool kit [M. Kipp, Eurospeech 1367-1370 (2001)]. Along with speech transcriptions and acoustic analysis, non-lexical and discourse characteristics, and child's gesture (facial expressions, body movements, hand/head movements) were annotated in a synchronized multilayer system. Initial results showed that younger children rely more on gestures to emphasize their verbal assertions. Younger children use non-lexical speech (e.g., um, huh) associated with frustration and pondering/reflecting more frequently than older ones. Younger children also repair more with humans than with computer.
Training to Improve Hearing Speech in Noise: Biological Mechanisms
Song, Judy H.; Skoe, Erika; Banai, Karen
2012-01-01
We investigated training-related improvements in listening in noise and the biological mechanisms mediating these improvements. Training-related malleability was examined using a program that incorporates cognitively based listening exercises to improve speech-in-noise perception. Before and after training, auditory brainstem responses to a speech syllable were recorded in quiet and multitalker noise from adults who ranged in their speech-in-noise perceptual ability. Controls did not undergo training but were tested at intervals equivalent to the trained subjects. Trained subjects exhibited significant improvements in speech-in-noise perception that were retained 6 months later. Subcortical responses in noise demonstrated training-related enhancements in the encoding of pitch-related cues (the fundamental frequency and the second harmonic), particularly for the time-varying portion of the syllable that is most vulnerable to perceptual disruption (the formant transition region). Subjects with the largest strength of pitch encoding at pretest showed the greatest perceptual improvement. Controls exhibited neither neurophysiological nor perceptual changes. We provide the first demonstration that short-term training can improve the neural representation of cues important for speech-in-noise perception. These results implicate and delineate biological mechanisms contributing to learning success, and they provide a conceptual advance to our understanding of the kind of training experiences that can influence sensory processing in adulthood. PMID:21799207
Tanaka, Yasuhiro; Tsuboi, Takashi; Watanabe, Hirohisa; Kajita, Yasukazu; Nakatsubo, Daisuke; Fujimoto, Yasushi; Ohdake, Reiko; Ito, Mizuki; Atsuta, Naoki; Yamamoto, Masahiko; Wakabayashi, Toshihiko; Katsuno, Masahisa; Sobue, Gen
2016-10-19
Voice and speech disorders are one of the most important issues after subthalamic nucleus deep brain stimulation (STN-DBS) in Parkinson's disease (PD) patients. However, articulation features in this patient population remain unclear. We studied the articulation features of PD patients with STN-DBS. Participants were 56 PD patients treated with STN-DBS (STN-DBS group) and 41 patients treated only with medical therapy (medical-therapy-alone group). Articulation function was evaluated with acoustic and auditory-perceptual analyses. The vowel space area (VSA) was calculated using the formant frequency data of three vowels (/a/, /i/, and /u/) from sustained phonation task. The VSA reportedly reflects the distance of mouth/jaw and tongue movements during speech and phonation. Correlations between acoustic and auditory-perceptual measurements were also assessed. The VSA did not significantly differ between the medical-therapy-alone group and the STN-DBS group in the off-stimulation condition. In the STN-DBS group, the VSA was larger in the on-stimulation condition than in the off-stimulation condition. However, individual analysis showed the VSA changes after stopping stimulation were heterogeneous. In total, 89.8% of the STN-DBS group showed a large VSA size in the on- than in the off-stimulation condition. In contrast, the VSA of the remaining patients in that group was smaller in the on- than the off-stimulation condition. STN-DBS may resolve hypokinesia of the articulation structures, including the mouth/jaw and tongue, and improve maximal vowel articulation. However, in the on-stimulation condition, the VSA was not significantly correlated with speech intelligibility. This may be because STN-DBS potentially affects other speech processes such as voice and/or respiratory process.
Sayles, Mark; Stasiak, Arkadiusz; Winter, Ian M.
2015-01-01
The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into “auditory objects.” Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions. PMID:25628545
Objective measurement of motor speech characteristics in the healthy pediatric population.
Wong, A W; Allegro, J; Tirado, Y; Chadha, N; Campisi, P
2011-12-01
To obtain objective measurements of motor speech characteristics in normal children, using a computer-based motor speech software program. Cross-sectional, observational design in a university-based ambulatory pediatric otolaryngology clinic. Participants included 112 subjects (54 females and 58 males) aged 4-18 years. Participants with previously diagnosed hearing loss, voice and motor disorders, and children unable to repeat a passage in English were excluded. Voice samples were recorded and analysed using the Motor Speech Profile (MSP) software (KayPENTAX, Lincoln Park, NJ). The MSP produced measures of diadochokinetics, second formant transition, intonation, and syllabic rates. Demographic data, including sex, age, and cigarette smoke exposure were obtained. Normative data for several motor speech characteristics were derived for children ranging from age 4 to 18 years. A number of age-dependent changes were indentified, including an increase in average diadochokinetic rate (p<0.001) and standard syllabic duration (p<0.001) with age. There were no identified differences in motor speech characteristics between males and females across the measured age range. Variations in fundamental frequency (Fo) during speech did not change significantly with age for both males and females. To our knowledge, this is the first pediatric normative database for the MSP progam. The MSP is suitable for testing children and can be used to study developmental changes in motor speech. The analysis demonstrated that males and females behave similarly and show the same relationship with age for the motor speech characteristics studied. This normative database will provide essential comparative data for future studies exploring alterations in motor speech that may occur with hearing, voice, and motor disorders and to assess the results of targeted therapies. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Acoustic, respiratory kinematic and electromyographic effects of vocal training
NASA Astrophysics Data System (ADS)
Mendes, Ana Paula De Brito Garcia
The longitudinal effects of vocal training on the respiratory, phonatory and articulatory systems were investigated in this study. During four semesters, fourteen voice major students were recorded while speaking and singing. Acoustic, temporal, respiratory kinematic and electromyographic parameters were measured to determine changes in the three systems as a function of vocal training. Acoustic measures of the speaking voice included fundamental frequency, sound pressure level (SPL), percent jitter and shimmer, and harmonic-to-noise ratio. Temporal measures included duration of sentences, diphthongs and the closure durations of stop consonants. Acoustic measures of the singing voice included fundamental frequency and sound pressure level of the phonational range, vibrato pulses per second, vibrato amplitude variation and the presence of the singer's formant. Analysis of the data revealed that vocal training had a significant effect on the singing voice. Fundamental frequency and SPL of the 90% level and 90--10% of the phonational range increased significantly during four semesters of vocal training. Physiological data was collected from four subjects during three semesters of vocal training. Respiratory kinematic measures included lung volume, rib cage and abdominal excursions extracted from spoken sung samples. Descriptive statistics revealed that rib cage and abdominal excursions increased from the 1st to the 2nd semester and decrease from the 2nd to the 3rd semester of vocal training. Electromyographic measures of the pectoralis major, rectus abdominis and external obliques muscles revealed that burst duration means decreased from the 1st to the 2nd semester and increased from the 2nd to the 3rd semester. Peak amplitude means increased from the 1st to the 2nd and decreased from the 2nd to the 3rd semester of vocal training. Chest wall excursions and muscle force generation of the three muscles increased as the demanding level and the length of the phonatory tasks increased.
Fricative-stop coarticulation: acoustic and perceptual evidence.
Repp, B H; Mann, V A
1982-06-01
Eight native speakers of American English each produced ten tokens of all possible CV, FCV, and VFCV utterances with V = [a] or [u], F = [s] or [integral of], and C = [t] or [k]. Acoustic analysis showed that the formant transition onsets following the stop consonant release were systematically influenced by the preceding fricative, although there were large individual differences. In particular, F3 and F4 tended to be higher following [s] than following [integral of]. The coarticulatory effects were equally large in FCV (e.g.,/sta/) and VFCV (e.g.,/asda/) utterances; that is, they were not reduced when a syllable boundary intervened between fricative and stop. In a parallel perceptual study, the CV portions of these utterances (with release bursts removed to provoke errors) were presented to listeners for identification of the stop consonant. The pattern of place-of-articulation confusions, too, revealed coarticulatory effects due to the excised fricative context.
The Effect of Vocal Fold Inferior Surface Hypertrophy on Voice Function in Excised Canine Larynges.
Wang, Ruiqing; Bao, Huijing; Xu, Xinlin; Piotrowski, David; Zhang, Yu; Zhuang, Peiyun
2017-08-18
This study aimed to explore the changes in vocal fold inferior surface hypertrophy (VFISH) on vocal fold vibration by aerodynamic and acoustic analysis. The present study allows us to gain new insights into the subglottal convergence angle (SCA), which will change with VFISH. The study is prospective, and designed for repeated measures with each excised canine larynx serving as own control. Three degrees of VFISH, initial, mild, and severe, were simulated by injecting different doses of fructose injections into the inferior surface of the vocal folds of 10 excised canine larynges. Computed tomographic images of the larynx were gathered, and three-dimensional models of the airway and vocal folds were reconstructed using the Mimics software. The SCA was measured from the reconstructed models. Phonation threshold flow (PTF), phonation threshold pressure (PTP), and mean flow rate (MFR) were recorded directly in the excised canine larynx phonation setup. Glottal resistance (GR), sound pressure level (SPL), fundamental frequency (F0), and formants 1-4 (F1-4) were measured when subglottal pressure (P sub ) was at 1.5 kPa or 2.5 kPa, separately. Using ordinary one-way analysis of variance, we compared the aerodynamic outcomes and voice quality among the three groups of hypertrophy. The SCA, PTP, and PTF increased with the degree of VFISH. When the P sub was controlled at 1.5 kPa or 2.5 kPa, F0 also increased significantly with the degree of VFISH of the excised canine larynges. The MFR, GR, SPL, and F1-4 had little change between the three groups and were not significantly different. The VFISH makes onset phonation more difficult, increases the SCA, and increases the F0 in sustained phonation. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
1983-01-01
Satoshi Horiguchi 2 Harriet Magen Leonard Katz’ Sharon Manuel J. A. Scott Kelso Richard McGowan Andrea G. Levitt’ Daniel Recasens Isabelle Y... Martinus Nijhoff, 1959. Collier, R., & ’t Hart, J. The perceptual relevance of formant trajectories in diphthongs. In M. van den Broecke & V. van Heuven...Studies in Dutch phonology (Dutch Studies, Vol. 4). The Hague: Martinus Nijhoff, 1980. FOOTNOTES Ipossible occurrences of these diphthongs in Dutch
Speech Evoked Auditory Brainstem Response in Stuttering
Tahaei, Ali Akbar; Ashayeri, Hassan; Pourbakht, Akram; Kamali, Mohammad
2014-01-01
Auditory processing deficits have been hypothesized as an underlying mechanism for stuttering. Previous studies have demonstrated abnormal responses in subjects with persistent developmental stuttering (PDS) at the higher level of the central auditory system using speech stimuli. Recently, the potential usefulness of speech evoked auditory brainstem responses in central auditory processing disorders has been emphasized. The current study used the speech evoked ABR to investigate the hypothesis that subjects with PDS have specific auditory perceptual dysfunction. Objectives. To determine whether brainstem responses to speech stimuli differ between PDS subjects and normal fluent speakers. Methods. Twenty-five subjects with PDS participated in this study. The speech-ABRs were elicited by the 5-formant synthesized syllable/da/, with duration of 40 ms. Results. There were significant group differences for the onset and offset transient peaks. Subjects with PDS had longer latencies for the onset and offset peaks relative to the control group. Conclusions. Subjects with PDS showed a deficient neural timing in the early stages of the auditory pathway consistent with temporal processing deficits and their abnormal timing may underlie to their disfluency. PMID:25215262
The influence of different native language systems on vowel discrimination and identification
NASA Astrophysics Data System (ADS)
Kewley-Port, Diane; Bohn, Ocke-Schwen; Nishi, Kanae
2005-04-01
The ability to identify the vowel sounds of a language reliably is dependent on the ability to discriminate between vowels at a more sensory level. This study examined how the complexity of the vowel systems of three native languages (L1) influenced listeners perception of American English (AE) vowels. AE has a fairly complex vowel system with 11 monophthongs. In contrast, Japanese has only 5 spectrally different vowels, while Swedish has 9 and Danish has 12. Six listeners, with exposure of less than 4 months in English speaking environments, participated from each L1. Their performance in two tasks was compared to 6 AE listeners. As expected, there were large differences in a linguistic identification task using 4 confusable AE low vowels. Japanese listeners performed quite poorly compared to listeners with more complex L1 vowel systems. Thresholds for formant discrimination for the 3 groups were very similar to those of native AE listeners. Thus it appears that sensory abilities for discriminating vowels are only slightly affected by native vowel systems, and that vowel confusions occur at a more central, linguistic level. [Work supported by funding from NIHDCD-02229 and the American-Scandinavian Foundation.
Hodges-Simeon, Carolyn R; Gaulin, Steven J C; Puts, David A
2011-06-01
Men's copulatory success can often be predicted by measuring traits involved in male contests and female choice. Previous research has demonstrated relationships between one such vocal trait in men, mean fundamental frequency (F(0)), and the outcomes and indicators of sexual success with women. The present study investigated the role of another vocal parameter, F(0) variation (the within-subject SD in F(0) across the utterance, F(0)-SD), in predicting men's reported number of female sexual partners in the last year. Male participants (N = 111) competed with another man for a date with a woman. Recorded interactions with the competitor ("competitive recording") and the woman ("courtship recording") were analyzed for five non-linguistic vocal parameters: F(0)-SD, mean F(0), intensity, duration, and formant dispersion (D( f ), an acoustic correlate of vocal tract length), as well as dominant and attractive linguistic content. After controlling for age and attitudes toward uncommitted sex (SOI), lower F(0)-SD (i.e., a more monotone voice) and more dominant linguistic content were strong predictors of the number of past-year sexual partners, whereas mean F(0) and D( f ) did not significantly predict past-year partners. These contrasts have implications for the relative importance of male contests and female choice in shaping men's mating success and hence the origins and maintenance of sexually dimorphic traits in humans.
Zhu, Lianzhang; Chen, Leiming; Zhao, Dehai
2017-01-01
Accurate emotion recognition from speech is important for applications like smart health care, smart entertainment, and other smart services. High accuracy emotion recognition from Chinese speech is challenging due to the complexities of the Chinese language. In this paper, we explore how to improve the accuracy of speech emotion recognition, including speech signal feature extraction and emotion classification methods. Five types of features are extracted from a speech sample: mel frequency cepstrum coefficient (MFCC), pitch, formant, short-term zero-crossing rate and short-term energy. By comparing statistical features with deep features extracted by a Deep Belief Network (DBN), we attempt to find the best features to identify the emotion status for speech. We propose a novel classification method that combines DBN and SVM (support vector machine) instead of using only one of them. In addition, a conjugate gradient method is applied to train DBN in order to speed up the training process. Gender-dependent experiments are conducted using an emotional speech database created by the Chinese Academy of Sciences. The results show that DBN features can reflect emotion status better than artificial features, and our new classification approach achieves an accuracy of 95.8%, which is higher than using either DBN or SVM separately. Results also show that DBN can work very well for small training databases if it is properly designed. PMID:28737705
Vocal tract and glottal function during and after vocal exercising with resonance tube and straw.
Guzman, Marco; Laukkanen, Anne-Maria; Krupa, Petr; Horáček, Jaromir; Švec, Jan G; Geneid, Ahmed
2013-07-01
The present study aimed to investigate the vocal tract and glottal function during and after phonation into a tube and a stirring straw. A male classically trained singer was assessed. Computerized tomography (CT) was performed when the subject produced [a:] at comfortable speaking pitch, phonated into the resonance tube and when repeating [a:] after the exercise. Similar procedure was performed with a narrow straw after 15 minutes silence. Anatomic distances and area measures were obtained from CT midsagittal and transversal images. Acoustic, perceptual, electroglottographic (EGG), and subglottic pressure measures were also obtained. During and after phonation into the tube or straw, the velum closed the nasal passage better, the larynx position lowered, and hypopharynx area widened. Moreover, the ratio between the inlet of the lower pharynx and the outlet of the epilaryngeal tube became larger during and after tube/straw phonation. Acoustic results revealed a stronger spectral prominence in the singer/speaker's formant cluster region after exercising. Listening test demonstrated better voice quality after straw/tube than before. Contact quotient derived from EGG decreased during both tube and straw and remained lower after exercising. Subglottic pressure increased during straw and remained somewhat higher after it. CT and acoustic results indicated that vocal exercises with increased vocal tract impedance lead to increased vocal efficiency and economy. One of the major changes was the more prominent singer's/speaker's formant cluster. Vocal tract and glottal modifications were more prominent during and after straw exercising compared with tube phonation. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Neurophysiology of spectrotemporal cue organization of spoken language in auditory memory.
Moberly, Aaron C; Bhat, Jyoti; Welling, D Bradley; Shahin, Antoine J
2014-03-01
Listeners assign different weights to spectral dynamics, such as formant rise time (FRT), and temporal dynamics, such as amplitude rise time (ART), during phonetic judgments. We examined the neurophysiological basis of FRT and ART weighting in the /ba/-/wa/ contrast. Electroencephalography was recorded for thirteen adult English speakers during a mismatch negativity (MMN) design using synthetic stimuli: a /ba/ with /ba/-like FRT and ART; a /wa/ with /wa/-like FRT and ART; and a /ba/(wa) with /ba/-like FRT and /wa/-like ART. We hypothesized that because of stronger reliance on FRT, subjects would encode a stronger memory trace and exhibit larger MMN during the FRT than the ART contrast. Results supported this hypothesis. The effect was most robust in the later portion of MMN. Findings suggest that MMN is generated by multiple sources, differentially reflecting acoustic change detection (earlier MMN, bottom-up process) and perceptual weighting of ART and FRT (later MMN, top-down process). Copyright © 2014 Elsevier Inc. All rights reserved.
Precision of working memory for speech sounds.
Joseph, Sabine; Iverson, Paul; Manohar, Sanjay; Fox, Zoe; Scott, Sophie K; Husain, Masud
2015-01-01
Memory for speech sounds is a key component of models of verbal working memory (WM). But how good is verbal WM? Most investigations assess this using binary report measures to derive a fixed number of items that can be stored. However, recent findings in visual WM have challenged such "quantized" views by employing measures of recall precision with an analogue response scale. WM for speech sounds might rely on both continuous and categorical storage mechanisms. Using a novel speech matching paradigm, we measured WM recall precision for phonemes. Vowel qualities were sampled from a formant space continuum. A probe vowel had to be adjusted to match the vowel quality of a target on a continuous, analogue response scale. Crucially, this provided an index of the variability of a memory representation around its true value and thus allowed us to estimate how memories were distorted from the original sounds. Memory load affected the quality of speech sound recall in two ways. First, there was a gradual decline in recall precision with increasing number of items, consistent with the view that WM representations of speech sounds become noisier with an increase in the number of items held in memory, just as for vision. Based on multidimensional scaling (MDS), the level of noise appeared to be reflected in distortions of the formant space. Second, as memory load increased, there was evidence of greater clustering of participants' responses around particular vowels. A mixture model captured both continuous and categorical responses, demonstrating a shift from continuous to categorical memory with increasing WM load. This suggests that direct acoustic storage can be used for single items, but when more items must be stored, categorical representations must be used.
Zielińska-Bliźniewska, Hanna; Sułkowski, Wiesław J; Pietkiewicz, Piotr; Miłoński, Jarosław; Mazurek, Agnieszka; Olszewski, Jurek
2012-06-01
The aim of this study was to compare the parameters of vocal acoustic and vocal efficiency analyses in medical students and academic teachers with use of the IRIS and DiagnoScope Specialist software and to evaluate their usefulness in prevention and certification of occupational disease. The study group comprised 40 women, including students and employees of the Military Medical Faculty, Medical University of Łodź. After informed consent had been obtained from the participant women, the primary medical history was taken, videolaryngoscopic and stroboscopic examinations were performed and diagnostic vocal acoustic analysis was carried out with the use of the IRIS and Diagno-Scope Specialist software. Based on the results of the performed measurements, the statistical analysis evidenced the compatibility between two software programs, IRIS and DiagnoScope Specialist, with the only exception of the F4 formant. The mean values of vocal acoustic parameters in medical students and academic teachers, obtained by means of the IRIS software, can be used as standards for the female population not yet developed by the producer. When using the DiagnoScope Specialist software, some mean values were higher and some lower than the standards specified by the producer. The study evidenced the compatibility between two measurement software programs, IRIS and DiagnoScope Specialist, except for the F4 formant. It should be noted that the later has advantage over the former since the standard values of vocal acoustic parameters have been worked out by the producer. Moreover, they only slightly departed from the values obtained in our study and may be useful in diagnostics of occupational voice disorders.
NASA Astrophysics Data System (ADS)
Mayo, Catherine; Turk, Alice
2004-06-01
It has been proposed that young children may have a perceptual preference for transitional cues [Nittrouer, S. (2002). J. Acoust. Soc. Am. 112, 711-719]. According to this proposal, this preference can manifest itself either as heavier weighting of transitional cues by children than by adults, or as heavier weighting of transitional cues than of other, more static, cues by children. This study tested this hypothesis by examining adults' and children's cue weighting for the contrasts /ess,aye,smcapi/-/sh,aye,smcapi/, /de/-/be/, /ta/-/da/, and /ti/-/di/. Children were found to weight transitions more heavily than did adults for the fricative contrast /ess,aye,smcapi/-/sh,aye,smcapi/, and were found to weight transitional cues more heavily than nontransitional cues for the voice-onset-time contrast /ta/-/da/. However, these two patterns of cue weighting were not found to hold for the contrasts /de/-/be/ and /ti/-/di/. Consistent with several studies in the literature, results suggest that children do not always show a bias towards vowel-formant transitions, but that cue weighting can differ according to segmental context, and possibly the physical distinctiveness of available acoustic cues.
Evaluating acoustic speaker normalization algorithms: evidence from longitudinal child data.
Kohn, Mary Elizabeth; Farrington, Charlie
2012-03-01
Speaker vowel formant normalization, a technique that controls for variation introduced by physical differences between speakers, is necessary in variationist studies to compare speakers of different ages, genders, and physiological makeup in order to understand non-physiological variation patterns within populations. Many algorithms have been established to reduce variation introduced into vocalic data from physiological sources. The lack of real-time studies tracking the effectiveness of these normalization algorithms from childhood through adolescence inhibits exploration of child participation in vowel shifts. This analysis compares normalization techniques applied to data collected from ten African American children across five time points. Linear regressions compare the reduction in variation attributable to age and gender for each speaker for the vowels BEET, BAT, BOT, BUT, and BOAR. A normalization technique is successful if it maintains variation attributable to a reference sociolinguistic variable, while reducing variation attributable to age. Results indicate that normalization techniques which rely on both a measure of central tendency and range of the vowel space perform best at reducing variation attributable to age, although some variation attributable to age persists after normalization for some sections of the vowel space. © 2012 Acoustical Society of America
2015-01-01
Several competing aetiologies of developmental dyslexia suggest that the problems with acquiring literacy skills are causally entailed by low-level auditory and/or speech perception processes. The purpose of this study is to evaluate the diverging claims about the specific deficient peceptual processes under conditions of strong inference. Theoretically relevant acoustic features were extracted from a set of artificial speech stimuli that lie on a /bAk/-/dAk/ continuum. The features were tested on their ability to enable a simple classifier (Quadratic Discriminant Analysis) to reproduce the observed classification performance of average and dyslexic readers in a speech perception experiment. The ‘classical’ features examined were based on component process accounts of developmental dyslexia such as the supposed deficit in Envelope Rise Time detection and the deficit in the detection of rapid changes in the distribution of energy in the frequency spectrum (formant transitions). Studies examining these temporal processing deficit hypotheses do not employ measures that quantify the temporal dynamics of stimuli. It is shown that measures based on quantification of the dynamics of complex, interaction-dominant systems (Recurrence Quantification Analysis and the multifractal spectrum) enable QDA to classify the stimuli almost identically as observed in dyslexic and average reading participants. It seems unlikely that participants used any of the features that are traditionally associated with accounts of (impaired) speech perception. The nature of the variables quantifying the temporal dynamics of the speech stimuli imply that the classification of speech stimuli cannot be regarded as a linear aggregate of component processes that each parse the acoustic signal independent of one another, as is assumed by the ‘classical’ aetiologies of developmental dyslexia. It is suggested that the results imply that the differences in speech perception performance between average and dyslexic readers represent a scaled continuum rather than being caused by a specific deficient component. PMID:25834769
Cepstral domain modification of audio signals for data embedding: preliminary results
NASA Astrophysics Data System (ADS)
Gopalan, Kaliappan
2004-06-01
A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
An interdisciplinary study of the timbre of the classical guitar
NASA Astrophysics Data System (ADS)
Traube, Caroline
This dissertation proposes an interdisciplinary approach for the study of the timbre of the classical guitar. We start by identifying the static; control parameters of timbre, relating to the structural components of the guitar and the dynamic control parameters of timbre, relating to the gestures applied by the performer on the instrument. From the plucked string physical model (obtained from the transverse wave equation), we derive a digital signal interpretation of the plucking effect which is a comb filtering. Then we investigate how subjective characteristics of sound, like timbre, are related to gesture parameters. The starting point for exploration is an inventory of verbal descriptors commonly used by professional musicians to describe the brightness, the colour, the shape and the texture of the sounds they produce on their instruments. An explanation for the voice-like nature of guitar tones is proposed based on the observation that the maxima of the comb-filter-shaped magnitude spectrum of guitar tones are located at frequencies similar to the formant frequencies of a subset of identifiable vowels. These analogies at the spectral level might account for the origin of some timbre descriptors such as open, oval, round, thin, closed, nasal and hollow, that seem to refer to phonetic gestures. In a experiment conducted to confirm these analogies, participants were asked to associate a consonant to the attack and a vowel to the decay of guitar tones. The results of this study support the idea that some perceptual dimensions of the guitar timbre space can be borrowed from phonetics. Finally, we address the problem of the indirect acquisition of instrumental gesture parameters. Pursuing previous research on the estimation of the plucking position from a recording, we propose a new estimation method based on an iterative weighted least-square algorithm, starting from a first approximation derived from a variant of the autocorrelation function of the signal.
Pillot, C; Vaissière, J
2006-01-01
What is vocal effectiveness in lyrical singing in comparison to speech? Our study tries to answer this question, using vocal efficiency and spectral vocal effectiveness. Vocal efficiency was mesured for a trained and untrained subject. According to these invasive measures, it appears that the trained singer uses her larynx less efficiently. Efficiency of the larynx in terms of energy then appears to be secondary to the desired voice quality. The acoustic measures of spectral vocal effectiveness of vowels and sentences, spoken and sung by 23 singers, reveal two complementary markers: The "singing power ratio" and the difference in amplitude between the singing formant and the spectral minimum that follows it. Magnetic resonance imaging and simulations of [a], [i] and [o] spoken and sung show laryngeal lowering and the role of the piriform sinuses as the physiological foundations of spectral vocal effectiveness, perceptively related to carrying power. These scientifical aspects allow applications in voice therapy, such as physiological and perceptual foundations allowing patients to recuperate voice carrying power with or without background noise.
Kokkinakis, Kostas; Loizou, Philipos C
2011-09-01
The purpose of this study is to determine the relative impact of reverberant self-masking and overlap-masking effects on speech intelligibility by cochlear implant listeners. Sentences were presented in two conditions wherein reverberant consonant segments were replaced with clean consonants, and in another condition wherein reverberant vowel segments were replaced with clean vowels. The underlying assumption is that self-masking effects would dominate in the first condition, whereas overlap-masking effects would dominate in the second condition. Results indicated that the degradation of speech intelligibility in reverberant conditions is caused primarily by self-masking effects that give rise to flattened formant transitions. © 2011 Acoustical Society of America
Ménard, Lucie; Polak, Marek; Denny, Margaret; Burton, Ellen; Lane, Harlan; Matthies, Melanie L; Marrone, Nicole; Perkell, Joseph S; Tiede, Mark; Vick, Jennell
2007-06-01
This study investigates the effects of speaking condition and auditory feedback on vowel production by postlingually deafened adults. Thirteen cochlear implant users produced repetitions of nine American English vowels prior to implantation, and at one month and one year after implantation. There were three speaking conditions (clear, normal, and fast), and two feedback conditions after implantation (implant processor turned on and off). Ten normal-hearing controls were also recorded once. Vowel contrasts in the formant space (expressed in mels) were larger in the clear than in the fast condition, both for controls and for implant users at all three time samples. Implant users also produced differences in duration between clear and fast conditions that were in the range of those obtained from the controls. In agreement with prior work, the implant users had contrast values lower than did the controls. The implant users' contrasts were larger with hearing on than off and improved from one month to one year postimplant. Because the controls and implant users responded similarly to a change in speaking condition, it is inferred that auditory feedback, although demonstrably important for maintaining normative values of vowel contrasts, is not needed to maintain the distinctiveness of those contrasts in different speaking conditions.
Integration of auditory and somatosensory error signals in the neural control of speech movements.
Feng, Yongqiang; Gracco, Vincent L; Max, Ludo
2011-08-01
We investigated auditory and somatosensory feedback contributions to the neural control of speech. In task I, sensorimotor adaptation was studied by perturbing one of these sensory modalities or both modalities simultaneously. The first formant (F1) frequency in the auditory feedback was shifted up by a real-time processor and/or the extent of jaw opening was increased or decreased with a force field applied by a robotic device. All eight subjects lowered F1 to compensate for the up-shifted F1 in the feedback signal regardless of whether or not the jaw was perturbed. Adaptive changes in subjects' acoustic output resulted from adjustments in articulatory movements of the jaw or tongue. Adaptation in jaw opening extent in response to the mechanical perturbation occurred only when no auditory feedback perturbation was applied or when the direction of adaptation to the force was compatible with the direction of adaptation to a simultaneous acoustic perturbation. In tasks II and III, subjects' auditory and somatosensory precision and accuracy were estimated. Correlation analyses showed that the relationships 1) between F1 adaptation extent and auditory acuity for F1 and 2) between jaw position adaptation extent and somatosensory acuity for jaw position were weak and statistically not significant. Taken together, the combined findings from this work suggest that, in speech production, sensorimotor adaptation updates the underlying control mechanisms in such a way that the planning of vowel-related articulatory movements takes into account a complex integration of error signals from previous trials but likely with a dominant role for the auditory modality.
Using the structure of natural scenes and sounds to predict neural response properties in the brain
NASA Astrophysics Data System (ADS)
Deweese, Michael
2014-03-01
The natural scenes and sounds we encounter in the world are highly structured. The fact that animals and humans are so efficient at processing these sensory signals compared with the latest algorithms running on the fastest modern computers suggests that our brains can exploit this structure. We have developed a sparse mathematical representation of speech that minimizes the number of active model neurons needed to represent typical speech sounds. The model learns several well-known acoustic features of speech such as harmonic stacks, formants, onsets and terminations, but we also find more exotic structures in the spectrogra representation of sound such as localized checkerboard patterns and frequency-modulated excitatory subregions flanked by suppressive sidebands. Moreover, several of these novel features resemble neuronal receptive fields reported in the Inferior Colliculus (IC), as well as auditory thalamus (MGBv) and primary auditory cortex (A1), and our model neurons exhibit the same tradeoff in spectrotemporal resolution as has been observed in IC. To our knowledge, this is the first demonstration that receptive fields of neurons in the ascending mammalian auditory pathway beyond the auditory nerve can be predicted based on coding principles and the statistical properties of recorded sounds. We have also developed a biologically-inspired neural network model of primary visual cortex (V1) that can learn a sparse representation of natural scenes using spiking neurons and strictly local plasticity rules. The representation learned by our model is in good agreement with measured receptive fields in V1, demonstrating that sparse sensory coding can be achieved in a realistic biological setting.
Speaker normalization and adaptation using second-order connectionist networks.
Watrous, R L
1993-01-01
A method for speaker normalization and adaption using connectionist networks is developed. A speaker-specific linear transformation of observations of the speech signal is computed using second-order network units. Classification is accomplished by a multilayer feedforward network that operates on the normalized speech data. The network is adapted for a new talker by modifying the transformation parameters while leaving the classifier fixed. This is accomplished by backpropagating classification error through the classifier to the second-order transformation units. This method was evaluated for the classification of ten vowels for 76 speakers using the first two formant values of the Peterson-Barney data. The results suggest that rapid speaker adaptation resulting in high classification accuracy can be accomplished by this method.
Ananthakrishnan, Saradha; Krishnan, Ananthanarayan; Bartlett, Edward
2015-01-01
Objective Listeners with sensorineural hearing loss (SNHL) typically experience reduced speech perception, which is not completely restored with amplification. This likely occurs because cochlear damage, in addition to elevating audiometric thresholds, alters the neural representation of speech transmitted to higher centers along the auditory neuroaxis. While the deleterious effects of SNHL on speech perception in humans have been well-documented using behavioral paradigms, our understanding of the neural correlates underlying these perceptual deficits remains limited. Using the scalp-recorded Frequency Following Response (FFR), the authors examine the effects of SNHL and aging on subcortical neural representation of acoustic features important for pitch and speech perception, namely the periodicity envelope (F0) and temporal fine structure (TFS) (formant structure), as reflected in the phase-locked neural activity generating the FFR. Design FFRs were obtained from 10 listeners with normal hearing (NH) and 9 listeners with mild-moderate SNHL in response to a steady-state English back vowel /u/ presented at multiple intensity levels. Use of multiple presentation levels facilitated comparisons at equal sound pressure level (SPL) and equal sensation level (SL). In a second follow-up experiment to address the effect of age on envelope and TFS representation, FFRs were obtained from 25 NH and 19 listeners with mild to moderately-severe SNHL to the same vowel stimulus presented at 80 dB SPL. Temporal waveforms, Fast Fourier Transform (FFT) and spectrograms were used to evaluate the magnitude of the phase-locked activity at F0 (periodicity envelope) and F1 (TFS). Results Neural representation of both envelope (F0) and TFS (F1) at equal SPLs was stronger in NH listeners compared to listeners with SNHL. Also, comparison of neural representation of F0 and F1 across stimulus levels expressed in SPL and SL (accounting for audibility) revealed that level-related changes in F0 and F1 magnitude were different for listeners with SNHL compared to listeners with normal hearing. Further, the degradation in subcortical neural representation was observed to persist in listeners with SNHL even when the effects of age were controlled for. Conclusions Overall, our results suggest a relatively greater degradation in the neural representation of TFS compared to periodicity envelope in individuals with SNHL. This degraded neural representation of TFS in SNHL, as reflected in the brainstem FFR, may reflect a disruption in the temporal pattern of phase-locked neural activity arising from altered tonotopic maps and/or wider filters causing poor frequency selectivity in these listeners. Lastly, while preliminary results indicate that the deleterious effects of SNHL may be greater than age-related degradation in subcortical neural representation, the lack of a balanced age-matched control group in this study does not permit us to completely rule out the effects of age on subcortical neural representation. PMID:26583482
Some articulatory details of emotional speech
NASA Astrophysics Data System (ADS)
Lee, Sungbok; Yildirim, Serdar; Bulut, Murtaza; Kazemzadeh, Abe; Narayanan, Shrikanth
2005-09-01
Differences in speech articulation among four emotion types, neutral, anger, sadness, and happiness are investigated by analyzing tongue tip, jaw, and lip movement data collected from one male and one female speaker of American English. The data were collected using an electromagnetic articulography (EMA) system while subjects produce simulated emotional speech. Pitch, root-mean-square (rms) energy and the first three formants were estimated for vowel segments. For both speakers, angry speech exhibited the largest rms energy and largest articulatory activity in terms of displacement range and movement speed. Happy speech is characterized by largest pitch variability. It has higher rms energy than neutral speech but articulatory activity is rather comparable to, or less than, neutral speech. That is, happy speech is more prominent in voicing activity than in articulation. Sad speech exhibits longest sentence duration and lower rms energy. However, its articulatory activity is no less than neutral speech. Interestingly, for the male speaker, articulation for vowels in sad speech is consistently more peripheral (i.e., more forwarded displacements) when compared to other emotions. However, this does not hold for female subject. These and other results will be discussed in detail with associated acoustics and perceived emotional qualities. [Work supported by NIH.
Long-Term Memory for Affiliates in Ravens
Boeckle, Markus; Bugnyar, Thomas
2012-01-01
Summary Complex social life requires individuals to recognize and remember group members [1] and, within those, to distinguish affiliates from nonaffiliates. Whereas long-term individual recognition has been demonstrated in some nonhuman animals [2–5], memory for the relationship valence to former group members has received little attention. Here we show that adult, pair-housed ravens not only respond differently to the playback of calls from previous group members and unfamiliar conspecifics but also discriminate between familiar birds according to the relationship valence they had to those subjects up to three years ago as subadult nonbreeders. The birds' distinction between familiar and unfamiliar individuals is reflected mainly in the number of calls, whereas their differentiation according to relationship valence is reflected in call modulation only. As compared to their response to affiliates, ravens responded to nonaffiliates by increasing chaotic parts of the vocalization and lowering formant spacing, potentially exaggerating the perceived impression of body size. Our findings indicate that ravens remember relationship qualities to former group members even after long periods of separation, confirming that their sophisticated social knowledge as nonbreeders is maintained into the territorial breeding stage. PMID:22521788
Production-perception relationships during speech development
NASA Astrophysics Data System (ADS)
Menard, Lucie; Schwartz, Jean-Luc; Boe, Louis-Jean; Aubin, Jerome
2005-04-01
It has been shown that nonuniform growth of the supraglottal cavities, motor control development, and perceptual refinement shape the vowel systems during speech development. In this talk, we propose to investigate the role of perceptual constraints as a guide to the speakers task from birth to adulthood. Simulations with an articulatory-to-acoustic model, acoustic analyses of natural vowels, and results of perceptual tests provide evidence that the production-perception relationships evolve with age. At the perceptual level, results show that (i) linear combination of spectral peaks are good predictors of vowel targets, and (ii) focalization, defined as an acoustic pattern with close neighboring formants [J.-L. Schwartz, L.-J. Boe, N. Vallee, and C. Abry, J. Phonetics 25, 255-286 (1997)], is part of the speech task. At the production level, we propose that (i) frequently produced vowels in the baby's early sound inventory can in part be explained by perceptual templates, (ii) the achievement of these perceptual templates may require adaptive articulatory strategies for the child, compared with the adults, to cope with morphological differences. Results are discussed in the light of a perception for action control theory. [Work supported by the Social Sciences and Humanities Research Council of Canada.
Effects of gender on the production of emphasis in Jordanian Arabic: A sociophonetic study
NASA Astrophysics Data System (ADS)
Abudalbuh, Mujdey D.
Emphasis, or pharyngealization, is a distinctive phonetic phenomenon and a phonemic feature of Semitic languages such as Arabic and Hebrew. The goal of this study is to investigate the effect of gender on the production of emphasis in Jordanian Arabic as manifested on the consonants themselves as well as on the adjacent vowels. To this end, 22 speakers of Jordanian Arabic, 12 males and 10 females, participated in a production experiment where they produced monosyllabic minimal CVC pairs contrasted on the basis of the presence of a word-initial plain or emphatic consonant. Several acoustic parameters were measured including Voice Onset Time (VOT), friction duration, the spectral mean of the friction noise, vowel duration and the formant frequencies (F1-F3) of the target vowels. The results of this study indicated that VOT is a reliable acoustic correlate of emphasis in Jordanian Arabic only for voiceless stops whose emphatic VOT was significantly shorter than their plain VOT. Also, emphatic fricatives were shorter than plain fricatives. Emphatic vowels were found to be longer than plain vowels. Overall, the results showed that emphatic vowels were characterized by a raised F1 at the onset and midpoint of the vowel, lowered F2 throughout the vowel, and raised F3 at the onset and offset of the vowel relative to the corresponding values of the plain vowels. Finally, results using Nearey's (1978) normalization algorithm indicated that emphasis was more acoustically evident in the speech of males than in the speech of females in terms of the F-pattern. The results are discussed from a sociolinguistic perspective in light of the previous literature and the notion of linguistic feminism.
Covington, Michael A; Lunden, S L Anya; Cristofaro, Sarah L; Wan, Claire Ramsay; Bailey, C Thomas; Broussard, Beth; Fogarty, Robert; Johnson, Stephanie; Zhang, Shayi; Compton, Michael T
2012-12-01
Aprosody, or flattened speech intonation, is a recognized negative symptom of schizophrenia, though it has rarely been studied from a linguistic/phonological perspective. To bring the latest advances in computational linguistics to the phenomenology of schizophrenia and related psychotic disorders, a clinical first-episode psychosis research team joined with a phonetics/computational linguistics team to conduct a preliminary, proof-of-concept study. Video recordings from a semi-structured clinical research interview were available from 47 first-episode psychosis patients. Audio tracks of the video recordings were extracted, and after review of quality, 25 recordings were available for phonetic analysis. These files were de-noised and a trained phonologist extracted a 1-minute sample of each patient's speech. WaveSurfer 1.8.5 was used to create, from each speech sample, a file of formant values (F0, F1, F2, where F0 is the fundamental frequency and F1 and F2 are resonance bands indicating the moment-by-moment shape of the oral cavity). Variability in these phonetic indices was correlated with severity of Positive and Negative Syndrome Scale negative symptom scores using Pearson correlations. A measure of variability of tongue front-to-back position-the standard deviation of F2-was statistically significantly correlated with the severity of negative symptoms (r=-0.446, p=0.03). This study demonstrates a statistically significant and meaningful correlation between negative symptom severity and phonetically measured reductions in tongue movements during speech in a sample of first-episode patients just initiating treatment. Further studies of negative symptoms, applying computational linguistics methods, are warranted. Copyright © 2012 Elsevier B.V. All rights reserved.
Covington, Michael A.; Lunden, S.L. Anya; Cristofaro, Sarah L.; Wan, Claire Ramsay; Bailey, C. Thomas; Broussard, Beth; Fogarty, Robert; Johnson, Stephanie; Zhang, Shayi; Compton, Michael T.
2012-01-01
Background Aprosody, or flattened speech intonation, is a recognized negative symptom of schizophrenia, though it has rarely been studied from a linguistic/phonological perspective. To bring the latest advances in computational linguistics to the phenomenology of schizophrenia and related psychotic disorders, a clinical first-episode psychosis research team joined with a phonetics/computational linguistics team to conduct a preliminary, proof-of-concept study. Methods Video recordings from a semi-structured clinical research interview were available from 47 first-episode psychosis patients. Audio tracks of the video recordings were extracted, and after review of quality, 25 recordings were available for phonetic analysis. These files were de-noised and a trained phonologist extracted a 1-minute sample of each patient’s speech. WaveSurfer 1.8.5 was used to create, from each speech sample, a file of formant values (F0, F1, F2, where F0 is the fundamental frequency and F1 and F2 are resonance bands indicating the moment-by-moment shape of the oral cavity). Variability in these phonetic indices was correlated with severity of Positive and Negative Syndrome Scale negative symptom scores using Pearson correlations. Results A measure of variability of tongue front-to-back position—the standard deviation of F2—was statistically significantly correlated with the severity of negative symptoms (r=−0.446, p=0.03). Conclusion This study demonstrates a statistically significant and meaningful correlation between negative symptom severity and phonetically measured reductions in tongue movements during speech in a sample of first-episode patients just initiating treatment. Further studies of negative symptoms, applying computational linguistics methods, are warranted. PMID:23102940
Unconstrained snoring detection using a smartphone during ordinary sleep.
Shin, Hangsik; Cho, Jaegeol
2014-08-15
Snoring can be a representative symptom of a sleep disorder, and thus snoring detection is quite important to improving the quality of an individual's daily life. The purpose of this research is to develop an unconstrained snoring detection technique that can be integrated into a smartphone application. In contrast with previous studies, we developed a practical technique for snoring detection during ordinary sleep by using the built-in sound recording system of a smartphone, and the recording was carried out in a standard private bedroom. The experimental protocol was designed to include a variety of actions that frequently produce noise (including coughing, playing music, talking, rining an alarm, opening/closing doors, running a fan, playing the radio, and walking) in order to accurately recreate the actual circumstances during sleep. The sound data were recorded for 10 individuals during actual sleep. In total, 44 snoring data sets and 75 noise datasets were acquired. The algorithm uses formant analysis to examine sound features according to the frequency and magnitude. Then, a quadratic classifier is used to distinguish snoring from non-snoring noises. Ten-fold cross validation was used to evaluate the developed snoring detection methods, and validation was repeated 100 times randomly to improve statistical effectiveness. The overall results showed that the proposed method is competitive with those from previous research. The proposed method presented 95.07% accuracy, 98.58% sensitivity, 94.62% specificity, and 70.38% positive predictivity. Though there was a relatively high false positive rate, the results show the possibility for ubiquitous personal snoring detection through a smartphone application that takes into account data from normally occurring noises without training using preexisting data.
McGlashan, Julian; Thuesen, Mathias Aaen; Sadolin, Cathrine
2017-05-01
We aimed to study the categorizations "Overdrive" and "Edge" from the pedagogical method Complete Vocal Technique as refiners of the often ill-defined concept of "belting" by means of audio perception, laryngostroboscopic imaging, acoustics, long-term average spectrum (LTAS), and electroglottography (EGG). This is a case-control study. Twenty singers were recorded singing sustained vowels in a "belting" quality refined by audio perception as "Overdrive" and "Edge." Two studies were performed: (1) a laryngostroboscopic examination using a videonasoendoscopic camera system (Olympus) and the Laryngostrobe program (Laryngograph); (2) a simultaneous recording of the EGG and acoustic signals using Speech Studio (Laryngograph). The images were analyzed based on consensus agreement. Statistical analysis of the acoustic, LTAS, and EGG parameters was undertaken using the Student paired t test. The two modes of singing determined by audio perception have visibly different laryngeal gestures: Edge has a more constricted setting than that of Overdrive, where the ventricular folds seem to cover more of the vocal folds, the aryepiglottic folds show a sharper edge in Edge, and the cuneiform cartilages are rolled in anteromedially. LTAS analysis shows a statistical difference, particularly after the ninth harmonic, with a coinciding first formant. The combined group showed statistical differences in shimmer, harmonics-to-noise ratio, normalized noise energy, and mean sound pressure level (P ≤ 0.05). "Belting" sounds can be categorized using audio perception into two modes of singing: "Overdrive" and "Edge." This study demonstrates consistent visibly different laryngeal gestures between these modes and with some correspondingly significant differences in LTAS, EGG, and acoustic measures. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Gentilucci, Maurizio; Campione, Giovanna Cristina; Dalla Volta, Riccardo; Bernardis, Paolo
2009-12-01
Does the mirror system affect the control of speech? This issue was addressed in behavioral and Transcranial Magnetic Stimulation (TMS) experiments. In behavioral experiment 1, participants pronounced the syllable /da/ while observing (1) a hand grasping large and small objects with power and precision grasps, respectively, (2) a foot interacting with large and small objects and (3) differently sized objects presented alone. Voice formant 1 was higher when observing power as compared to precision grasp, whereas it remained unaffected by observation of the different types of foot interaction and objects alone. In TMS experiment 2, we stimulated hand motor cortex, while participants observed the two types of grasp. Motor Evoked Potentials (MEPs) of hand muscles active during the two types of grasp were greater when observing power than precision grasp. In experiments 3-5, TMS was applied to tongue motor cortex of participants silently pronouncing the syllable /da/ and simultaneously observing power and precision grasps, pantomimes of the two types of grasps, and differently sized objects presented alone. Tongue MEPs were greater when observing power than precision grasp either executed or pantomimed. Finally, in TMS experiment 6, the observation of foot interaction with large and small objects did not modulate tongue MEPs. We hypothesized that grasp observation activated motor commands to the mouth as well as to the hand that were congruent with the hand kinematics implemented in the observed type of grasp. The commands to the mouth selectively affected postures of phonation organs and consequently basic features of phonological units.
Maruthy, Santosh; Feng, Yongqiang; Max, Ludo
2018-03-01
A longstanding hypothesis about the sensorimotor mechanisms underlying stuttering suggests that stuttered speech dysfluencies result from a lack of coarticulation. Formant-based measures of either the stuttered or fluent speech of children and adults who stutter have generally failed to obtain compelling evidence in support of the hypothesis that these individuals differ in the timing or degree of coarticulation. Here, we used a sensitive acoustic technique-spectral coefficient analyses-that allowed us to compare stuttering and nonstuttering speakers with regard to vowel-dependent anticipatory influences as early as the onset burst of a preceding voiceless stop consonant. Eight adults who stutter and eight matched adults who do not stutter produced C 1 VC 2 words, and the first four spectral coefficients were calculated for one analysis window centered on the burst of C 1 and two subsequent windows covering the beginning of the aspiration phase. Findings confirmed that the combined use of four spectral coefficients is an effective method for detecting the anticipatory influence of a vowel on the initial burst of a preceding voiceless stop consonant. However, the observed patterns of anticipatory coarticulation showed no statistically significant differences, or trends toward such differences, between the stuttering and nonstuttering groups. Combining the present results for fluent speech in one given phonetic context with prior findings from both stuttered and fluent speech in a variety of other contexts, we conclude that there is currently no support for the hypothesis that the fluent speech of individuals who stutter is characterized by limited coarticulation.
Jacewicz, Ewa; Fox, Robert Allen
2015-01-01
Purpose To investigate how linguistic knowledge interacts with indexical knowledge in older children's perception under demanding listening conditions created by extensive talker variability. Method Twenty five 9- to 12-year-old children, 12 from North Carolina (NC) and 13 from Wisconsin (WI), identified 12 vowels in isolated hVd-words produced by 120 talkers representing the two dialects (NC and WI), both genders and three age groups (generations) of residents from the same geographic locations as the listeners. Results Identification rates were higher for responses to talkers from the same dialect as the listeners and for female speech. Listeners were sensitive to systematic positional variations in vowels and their dynamic structure (formant movement) associated with generational differences in vowel pronunciation resulting from sound change in a speech community. Overall identification rate was 71.7%, which is 8.5% lower than for the adults responding to the same stimuli in Jacewicz and Fox (2012). Conclusions Typically developing older children are successful in dealing with both phonetic and indexical variation related to talker dialect, gender and generation. They are less consistent than the adults most likely due to their less efficient encoding of acoustic-phonetic information in the speech of multiple talkers and relative inexperience with indexical variation. PMID:24686520
Dynamic spectral structure specifies vowels for children and adultsa
Nittrouer, Susan
2008-01-01
When it comes to making decisions regarding vowel quality, adults seem to weight dynamic syllable structure more strongly than static structure, although disagreement exists over the nature of the most relevant kind of dynamic structure: spectral change intrinsic to the vowel or structure arising from movements between consonant and vowel constrictions. Results have been even less clear regarding the signal components children use in making vowel judgments. In this experiment, listeners of four different ages (adults, and 3-, 5-, and 7-year-old children) were asked to label stimuli that sounded either like steady-state vowels or like CVC syllables which sometimes had middle sections masked by coughs. Four vowel contrasts were used, crossed for type (front/back or closed/open) and consonant context (strongly or only slightly constraining of vowel tongue position). All listeners recognized vowel quality with high levels of accuracy in all conditions, but children were disproportionately hampered by strong coarticulatory effects when only steady-state formants were available. Results clarified past studies, showing that dynamic structure is critical to vowel perception for all aged listeners, but particularly for young children, and that it is the dynamic structure arising from vocal-tract movement between consonant and vowel constrictions that is most important. PMID:17902868
Multi-disciplinary clinical protocol for the diagnosis of bulbar amyotrophic lateral sclerosis.
Chiaramonte, Rita; Di Luciano, Carmela; Chiaramonte, Ignazio; Serra, Agostino; Bonfiglio, Marco
2018-04-23
The objective of this study was to examine the role of different specialists in the diagnosis of amyotrophic lateral sclerosis (ALS), to understand changes in verbal expression and phonation, respiratory dynamics and swallowing that occurred rapidly over a short period of time. 22 patients with bulbar ALS were submitted for voice assessment, ENT evaluation, Multi-Dimensional Voice Program (MDVP), spectrogram, electroglottography, fiberoptic endoscopic evaluation of swallowing. In the early stage of the disease, the oral tract and velopharyngeal port were involved. Three months after the initial symptoms, most of the patients presented hoarseness, breathy voice, dysarthria, pitch modulation problems and difficulties in pronunciation of explosive, velar and lingual consonants. Values of MDVP were altered. Spectrogram showed an additional formant, due to nasal resonance. Electroglottography showed periodic oscillation of the vocal folds only during short vocal cycle. Swallowing was characterized by weakness and incoordination of oro-pharyngeal muscles with penetration or aspiration. A specific multidisciplinary clinical protocol was designed to report vocal parameters and swallowing disorders that changed more quickly in bulbar ALS patients. Furthermore, the patients were stratified according to involvement of pharyngeal structures, and severity index. Copyright © 2018 Sociedad Española de Otorrinolaringología y Cirugía de Cabeza y Cuello. Publicado por Elsevier España, S.L.U. All rights reserved.
Boë, Louis-Jean; Berthommier, Frédéric; Legou, Thierry; Captier, Guillaume; Kemp, Caralyn; Sawallis, Thomas R.; Becker, Yannick; Rey, Arnaud; Fagot, Joël
2017-01-01
Language is a distinguishing characteristic of our species, and the course of its evolution is one of the hardest problems in science. It has long been generally considered that human speech requires a low larynx, and that the high larynx of nonhuman primates should preclude their producing the vowel systems universally found in human language. Examining the vocalizations through acoustic analyses, tongue anatomy, and modeling of acoustic potential, we found that baboons (Papio papio) produce sounds sharing the F1/F2 formant structure of the human [ɨ æ ɑ ɔ u] vowels, and that similarly with humans those vocalic qualities are organized as a system on two acoustic-anatomic axes. This confirms that hominoids can produce contrasting vowel qualities despite a high larynx. It suggests that spoken languages evolved from ancient articulatory skills already present in our last common ancestor with Cercopithecoidea, about 25 MYA. PMID:28076426
Nittrouer, Susan; Lowenstein, Joanna H
2007-02-01
It has been reported that children and adults weight differently the various acoustic properties of the speech signal that support phonetic decisions. This finding is generally attributed to the fact that the amount of weight assigned to various acoustic properties by adults varies across languages, and that children have not yet discovered the mature weighting strategies of their own native languages. But an alternative explanation exists: Perhaps children's auditory sensitivities for some acoustic properties of speech are poorer than those of adults, and children cannot categorize stimuli based on properties to which they are not keenly sensitive. The purpose of the current study was to test that hypothesis. Edited-natural, synthetic-formant, and sine wave stimuli were all used, and all were modeled after words with voiced and voiceless final stops. Adults and children (5 and 7 years of age) listened to pairs of stimuli in 5 conditions: 2 involving a temporal property (1 with speech and 1 with nonspeech stimuli) and 3 involving a spectral property (1 with speech and 2 with nonspeech stimuli). An AX discrimination task was used in which a standard stimulus (A) was compared with all other stimuli (X) equal numbers of times (method of constant stimuli). Adults and children had similar difference thresholds (i.e., 50% point on the discrimination function) for 2 of the 3 sets of nonspeech stimuli (1 temporal and 1 spectral), but children's thresholds were greater for both sets of speech stimuli. Results are interpreted as evidence that children's auditory sensitivities are adequate to support weighting strategies similar to those of adults, and so observed differences between children and adults in speech perception cannot be explained by differences in auditory perception. Furthermore, it is concluded that listeners bring expectations to the listening task about the nature of the signals they are hearing based on their experiences with those signals.
The evolution of speech: a comparative review.
Fitch
2000-07-01
The evolution of speech can be studied independently of the evolution of language, with the advantage that most aspects of speech acoustics, physiology and neural control are shared with animals, and thus open to empirical investigation. At least two changes were necessary prerequisites for modern human speech abilities: (1) modification of vocal tract morphology, and (2) development of vocal imitative ability. Despite an extensive literature, attempts to pinpoint the timing of these changes using fossil data have proven inconclusive. However, recent comparative data from nonhuman primates have shed light on the ancestral use of formants (a crucial cue in human speech) to identify individuals and gauge body size. Second, comparative analysis of the diverse vertebrates that have evolved vocal imitation (humans, cetaceans, seals and birds) provides several distinct, testable hypotheses about the adaptive function of vocal mimicry. These developments suggest that, for understanding the evolution of speech, comparative analysis of living species provides a viable alternative to fossil data. However, the neural basis for vocal mimicry and for mimesis in general remains unknown.
Analysis of digital images into energy-angular momentum modes.
Vicent, Luis Edgar; Wolf, Kurt Bernardo
2011-05-01
The measurement of continuous wave fields by a digital (pixellated) screen of sensors can be used to assess the quality of a beam by finding its formant modes. A generic continuous field F(x, y) sampled at an N × N Cartesian grid of point sensors on a plane yields a matrix of values F(q(x), q(y)), where (q(x), q(y)) are integer coordinates. When the approximate rotational symmetry of the input field is important, one may use the sampled Laguerre-Gauss functions, with radial and angular modes (n, m), to analyze them into their corresponding coefficients F(n, m) of energy and angular momentum (E-AM). The sampled E-AM modes span an N²-dimensional space, but are not orthogonal--except for parity. In this paper, we propose the properly orthonormal "Laguerre-Kravchuk" discrete functions Λ(n, m)(q(x), q(y)) as a convenient basis to analyze the sampled beams into their E-AM polar modes, and with them synthesize the input image exactly.
Hearing impaired speech in noisy classrooms
NASA Astrophysics Data System (ADS)
Shahin, Kimary; McKellin, William H.; Jamieson, Janet; Hodgson, Murray; Pichora-Fuller, M. Kathleen
2005-04-01
Noisy classrooms have been shown to induce among students patterns of interaction similar to those used by hearing impaired people [W. H. McKellin et al., GURT (2003)]. In this research, the speech of children in a noisy classroom setting was investigated to determine if noisy classrooms have an effect on students' speech. Audio recordings were made of the speech of students during group work in their regular classrooms (grades 1-7), and of the speech of the same students in a sound booth. Noise level readings in the classrooms were also recorded. Each student's noisy and quiet environment speech samples were acoustically analyzed for prosodic and segmental properties (f0, pitch range, pitch variation, phoneme duration, vowel formants), and compared. The analysis showed that the students' speech in the noisy classrooms had characteristics of the speech of hearing-impaired persons [e.g., R. O'Halpin, Clin. Ling. and Phon. 15, 529-550 (2001)]. Some educational implications of our findings were identified. [Work supported by the Peter Wall Institute for Advanced Studies, University of British Columbia.
Classification of the Correct Quranic Letters Pronunciation of Male and Female Reciters
NASA Astrophysics Data System (ADS)
Khairuddin, Safiah; Ahmad, Salmiah; Embong, Abdul Halim; Nur Wahidah Nik Hashim, Nik; Altamas, Tareq M. K.; Nuratikah Syd Badaruddin, Syarifah; Shahbudin Hassan, Surul
2017-11-01
Recitation of the Holy Quran with the correct Tajweed is essential for every Muslim. Islam has encouraged Quranic education since early age as the recitation of the Quran correctly will represent the correct meaning of the words of Allah. It is important to recite the Quranic verses according to its characteristics (sifaat) and from its point of articulations (makhraj). This paper presents the identification and classification analysis of Quranic letters pronunciation for both male and female reciters, to obtain the unique representation of each letter by male as compared to female expert reciters. Linear Discriminant Analysis (LDA) was used as the classifier to classify the data with Formants and Power Spectral Density (PSD) as the acoustic features. The result shows that linear classifier of PSD with band 1 and band 2 power spectral combinations gives a high percentage of classification accuracy for most of the Quranic letters. It is also shown that the pronunciation by male reciters gives better result in the classification of the Quranic letters.
Innovative /ye/ and /we/ sequences in recent loans in Japanese
NASA Astrophysics Data System (ADS)
Vance, Timothy; Matsugu, Yuka
2005-04-01
The GV sequences /ye/ and /we/ do not occur in Japanese except perhaps in recent loans. Katakana spellings of the relevant loans in authoritative dictionaries are inconsistent, and it is not clear whether native speakers treat them as containing the GV sequences /ye/ and /we/ or as containing the VV sequences /ie/ and /ue/. Native speakers of Japanese with minimal exposure to spoken English were recorded producing some relevant loans in response to picture prompts. The same speakers were also recorded producing some native words containing uncontroversial /ie/ and /ue/ sequences. All the productions are being analyzed acoustically to determine whether they show the expected contrast between GV and VV sequences. A VV sequence is disyllabic (and bimoraic) and should therefore have greater duration and more gradual formant movements than a monosyllabic (and monomoraic) GV sequence. Utterance-initially, a VV sequence should have a LH pitch pattern and should be preceded by a nondistinctive glottal stop, whereas a GV sequence should have a H pitch pattern and should have smooth onset.
Whitfield, Jason A; Goberman, Alexander M
2014-01-01
Individuals with Parkinson disease (PD) often exhibit decreased range of movement secondary to the disease process, which has been shown to affect articulatory movements. A number of investigations have failed to find statistically significant differences between control and disordered groups, and between speaking conditions, using traditional vowel space area measures. The purpose of the current investigation was to evaluate both between-group (PD versus control) and within-group (habitual versus clear) differences in articulatory function using a novel vowel space measure, the articulatory-acoustic vowel space (AAVS). The novel AAVS is calculated from continuously sampled formant trajectories of connected speech. In the current study, habitual and clear speech samples from twelve individuals with PD along with habitual control speech samples from ten neurologically healthy adults were collected and acoustically analyzed. In addition, a group of listeners completed perceptual rating of speech clarity for all samples. Individuals with PD were perceived to exhibit decreased speech clarity compared to controls. Similarly, the novel AAVS measure was significantly lower in individuals with PD. In addition, the AAVS measure significantly tracked changes between the habitual and clear conditions that were confirmed by perceptual ratings. In the current study, the novel AAVS measure is shown to be sensitive to disease-related group differences and within-person changes in articulatory function of individuals with PD. Additionally, these data confirm that individuals with PD can modulate the speech motor system to increase articulatory range of motion and speech clarity when given a simple prompt. The reader will be able to (i) describe articulatory behavior observed in the speech of individuals with Parkinson disease; (ii) describe traditional measures of vowel space area and how they relate to articulation; (iii) describe a novel measure of vowel space, the articulatory-acoustic vowel space and its relationship to articulation and the perception of speech clarity. Copyright © 2014 Elsevier Inc. All rights reserved.
Aguiar Santos, Susana; Robens, Anne; Boehm, Anna; Leonhardt, Steffen; Teichmann, Daniel
2016-01-01
A new prototype of a multi-frequency electrical impedance tomography system is presented. The system uses a field-programmable gate array as a main controller and is configured to measure at different frequencies simultaneously through a composite waveform. Both real and imaginary components of the data are computed for each frequency and sent to the personal computer over an ethernet connection, where both time-difference imaging and frequency-difference imaging are reconstructed and visualized. The system has been tested for both time-difference and frequency-difference imaging for diverse sets of frequency pairs in a resistive/capacitive test unit and in self-experiments. To our knowledge, this is the first work that shows preliminary frequency-difference images of in-vivo experiments. Results of time-difference imaging were compared with simulation results and shown that the new prototype performs well at all frequencies in the tested range of 60 kHz–960 kHz. For frequency-difference images, further development of algorithms and an improved normalization process is required to correctly reconstruct and interpreted the resulting images. PMID:27463715
Degraded Vowel Acoustics and the Perceptual Consequences in Dysarthria
NASA Astrophysics Data System (ADS)
Lansford, Kaitlin L.
Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
NASA Astrophysics Data System (ADS)
Liberman, A. M.
1984-08-01
This report (1 January-30 June) is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: Sources of variability in early speech development; Invariance: Functional or descriptive?; Brief comments on invariance in phonetic perception; Phonetic category boundaries are flexible; On categorizing asphasic speech errors; Universal and language particular aspects of vowel-to-vowel coarticulation; Functional specific articulatory cooperation following jaw perturbation; during speech: Evidence for coordinative structures; Formant integration and the perception of nasal vowel height; Relative power of cues: FO shifts vs. voice timing; Laryngeal management at utterance-internal word boundary in American English; Closure duration and release burst amplitude cues to stop consonant manner and place of articulation; Effects of temporal stimulus properties on perception of the (sl)-(spl) distinction; The physics of controlled conditions: A reverie about locomotion; On the perception of intonation from sinusoidal sentences; Speech Perception; Speech Articulation; Motor Control; Speech Development.
Masculine men articulate less clearly.
Kempe, Vera; Puts, David A; Cárdenas, Rodrigo A
2013-12-01
In previous research, acoustic characteristics of the male voice have been shown to signal various aspects of mate quality and threat potential. But the human voice is also a medium of linguistic communication. The present study explores whether physical and vocal indicators of male mate quality and threat potential are linked to effective communicative behaviors such as vowel differentiation and use of more salient phonetic variants of consonants. We show that physical and vocal indicators of male threat potential, height and formant position, are negatively linked to vowel space size, and that height and levels of circulating testosterone are negatively linked to the use of the aspirated variant of the alveolar stop consonant /t/. Thus, taller, more masculine men display less clarity in their speech and prefer phonetic variants that may be associated with masculine attributes such as toughness. These findings suggest that vocal signals of men's mate quality and/or dominance are not confined to the realm of voice acoustics but extend to other aspects of communicative behavior, even if this means a trade-off with speech patterns that are considered communicatively advantageous, such as clarity and indexical cues to higher social class.
Analysis of Spanish consonant recognition in 8-talker babble.
Moreno-Torres, Ignacio; Otero, Pablo; Luna-Ramírez, Salvador; Garayzábal Heinze, Elena
2017-05-01
This paper presents the results of a closed-set recognition task for 80 Spanish consonant-vowel sounds (16 C × 5 V, spoken by 2 talkers) in 8-talker babble (-6, -2, +2 dB). A ranking of resistance to noise was obtained using the signal detection d' measure, and confusion patterns were analyzed using a graphical method (confusion graphs). The resulting ranking indicated the existence of three resistance groups: (1) high resistance: /ʧ, s, ʝ/; (2) mid resistance: /r, l, m, n/; and (3) low resistance: /t, θ, x, ɡ, b, d, k, f, p/. Confusions involved mostly place of articulation and voicing errors, and occurred especially among consonants in the same resistance group. Three perceptual confusion groups were identified: the three low-energy fricatives (i.e., /f, θ, x/), the six stops (i.e., /p, t, k, b, d, ɡ/), and three consonants with clear formant structure (i.e., /m, n, l/). The factors underlying consonant resistance and confusion patterns are discussed. The results are compared with data from other languages.
Resonant difference-frequency atomic force ultrasonic microscope
NASA Technical Reports Server (NTRS)
Cantrell, John H. (Inventor); Cantrell, Sean A. (Inventor)
2010-01-01
A scanning probe microscope and methodology called resonant difference-frequency atomic force ultrasonic microscopy (RDF-AFUM), employs an ultrasonic wave launched from the bottom of a sample while the cantilever of an atomic force microscope, driven at a frequency differing from the ultrasonic frequency by one of the contact resonance frequencies of the cantilever, engages the sample top surface. The nonlinear mixing of the oscillating cantilever and the ultrasonic wave in the region defined by the cantilever tip-sample surface interaction force generates difference-frequency oscillations at the cantilever contact resonance. The resonance-enhanced difference-frequency signals are used to create images of nanoscale near-surface and subsurface features.
Hofer, T; Ray, N; Wegmann, D; Excoffier, L
2009-01-01
Several studies have found strikingly different allele frequencies between continents. This has been mainly interpreted as being due to local adaptation. However, demographic factors can generate similar patterns. Namely, allelic surfing during a population range expansion may increase the frequency of alleles in newly colonised areas. In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.
Method for ambiguity resolution in range-Doppler measurements
NASA Technical Reports Server (NTRS)
Heymsfield, Gerald M. (Inventor); Miller, Lee S. (Inventor)
1994-01-01
A method for resolving range and Doppler target ambiguities when the target has substantial range or has a high relative velocity in which a first signal is generated and a second signal is also generated which is coherent with the first signal but at a slightly different frequency such that there exists a difference in frequency between these two signals of Delta f(sub t). The first and second signals are converted into a dual-frequency pulsed signal, amplified, and the dual-frequency pulsed signal is transmitted towards a target. A reflected dual-frequency signal is received from the target, amplified, and changed to an intermediate dual-frequency signal. The intermediate dual-frequency signal is amplified, with extracting of a shifted difference frequency Delta f(sub r) from the amplified intermediate dual-frequency signal done by a nonlinear detector. The final step is generating two quadrature signals from the difference frequency Delta f(sub t) and the shifted difference frequency Delta f(sub r) and processing the two quadrature signals to determine range and Doppler information of the target.
Pulsed infrared difference frequency generation in CdGeAs.sub.2
Piltch, Martin S.; Rink, John P.; Tallman, Charles R.
1977-03-08
The disclosure relates to a laser apparatus for generating a line-tunable pulsed infrared difference frequency output. The apparatus comprises a CO.sub.2 laser which produces a first frequency, a CO laser which produces a second frequency and a mixer for combining the output of the CO.sub.2 and CO lasers so as to produce a final output comprising a difference frequency from the first and second frequency outputs.
Pulsed infrared difference frequency generation in CdGeAs/sub 2/
Piltch, M.S.; Rink, J.P.; Tallman, C.R.
1975-11-26
A laser apparatus for generating a line-tunable pulsed infrared difference frequency output is described. The apparatus comprises a CO/sub 2/ laser which produces a first frequency, a CO laser which produces a second frequency, and a mixer for combining the output of the CO/sub 2/ and CO lasers so as to produce a final output comprising a difference frequency from the first and second frequency outputs.
Digital-Difference Processing For Collision Avoidance.
NASA Technical Reports Server (NTRS)
Shores, Paul; Lichtenberg, Chris; Kobayashi, Herbert S.; Cunningham, Allen R.
1988-01-01
Digital system for automotive crash avoidance measures and displays difference in frequency between two sinusoidal input signals of slightly different frequencies. Designed for use with Doppler radars. Characterized as digital mixer coupled to frequency counter measuring difference frequency in mixer output. Technique determines target path mathematically. Used for tracking cars, missiles, bullets, baseballs, and other fast-moving objects.
Nanoscale Subsurface Imaging via Resonant Difference-Frequency Atomic Force Ultrasonic Microscopy
NASA Technical Reports Server (NTRS)
Cantrell, Sean A.; Cantrell, John H.; Lilehei, Peter T.
2007-01-01
A novel scanning probe microscope methodology has been developed that employs an ultrasonic wave launched from the bottom of a sample while the cantilever of an atomic force microscope, driven at a frequency differing from the ultrasonic frequency by the fundamental resonance frequency of the cantilever, engages the sample top surface. The nonlinear mixing of the oscillating cantilever and the ultrasonic wave in the region defined by the cantilever tip-sample surface interaction force generates difference-frequency oscillations at the cantilever fundamental resonance. The resonance-enhanced difference-frequency signals are used to create images of embedded nanoscale features.
Vertical Vibration Characteristics of a High-Temperature Superconducting Maglev Vehicle System
NASA Astrophysics Data System (ADS)
Jiang, Jing; Li, Ke Cai; Zhao, Li Feng; Ma, Jia Qing; Zhang, Yong; Zhao, Yong
2013-06-01
The vertical vibration characteristics of a high-temperature superconducting maglev vehicle system are investigated experimentally. The displacement variations of the maglev vehicle system are measured with different external excitation frequency, in the case of a certain levitation gap. When the external vibration frequency is low, the amplitude variations of the response curve are small. With the increase of the vibration frequency, chaos status can be found. The resonance frequencies with difference levitation gap are also investigated, while the external excitation frequency range is 0-100 Hz. Along with the different levitation gap, resonance frequency is also different. There almost is a linear relationship between the levitation gap and the resonance frequency.
Krause, C M; Viemerö, V; Rosenqvist, A; Sillanmäki, L; Aström, T
2000-05-26
The reactivity of different narrow electroencephalographic (EEG) frequencies (4-6, 6-8, 8-10 and 10-12 Hz) to three types of emotionally laden film clips (aggressive, sad, neutral) were examined. We observed that different EEG frequency bands responded differently to the three types of film content. In the 4-6 Hz frequency band, the viewing of aggressive film content elicited greater relative synchronization as compared the responses elicited by the viewing of sad and neutral film content. The 6-8 Hz and 8-10 Hz frequency bands exhibited reactivity to the chronological succession of film viewing whereas the responses of the 10-12 Hz frequency band evolved within minutes during film viewing. Our results propose dissociations between the responses of different frequencies within the EEG to different emotion-related stimuli. Narrow frequency band EEG analysis offers an adequate tool for studying cortical activation patterns during emotion-related information processing.
Controlling Energy Radiations of Electromagnetic Waves via Frequency Coding Metamaterials.
Wu, Haotian; Liu, Shuo; Wan, Xiang; Zhang, Lei; Wang, Dan; Li, Lianlin; Cui, Tie Jun
2017-09-01
Metamaterials are artificial structures composed of subwavelength unit cells to control electromagnetic (EM) waves. The spatial coding representation of metamaterial has the ability to describe the material in a digital way. The spatial coding metamaterials are typically constructed by unit cells that have similar shapes with fixed functionality. Here, the concept of frequency coding metamaterial is proposed, which achieves different controls of EM energy radiations with a fixed spatial coding pattern when the frequency changes. In this case, not only different phase responses of the unit cells are considered, but also different phase sensitivities are also required. Due to different frequency sensitivities of unit cells, two units with the same phase response at the initial frequency may have different phase responses at higher frequency. To describe the frequency coding property of unit cell, digitalized frequency sensitivity is proposed, in which the units are encoded with digits "0" and "1" to represent the low and high phase sensitivities, respectively. By this merit, two degrees of freedom, spatial coding and frequency coding, are obtained to control the EM energy radiations by a new class of frequency-spatial coding metamaterials. The above concepts and physical phenomena are confirmed by numerical simulations and experiments.
Wide bandwidth phase-locked loop circuit
NASA Technical Reports Server (NTRS)
Koudelka, Robert David (Inventor)
2005-01-01
A PLL circuit uses a multiple frequency range PLL in order to phase lock input signals having a wide range of frequencies. The PLL includes a VCO capable of operating in multiple different frequency ranges and a divider bank independently configurable to divide the output of the VCO. A frequency detector detects a frequency of the input signal and a frequency selector selects an appropriate frequency range for the PLL. The frequency selector automatically switches the PLL to a different frequency range as needed in response to a change in the input signal frequency. Frequency range hysteresis is implemented to avoid operating the PLL near a frequency range boundary.
Binaural beats at high frequencies.
McFadden, D; Pasanen, E G
1975-10-24
Binaural beats have long been believed to be audible only at low frequencies, but an interaction reminiscent of a binaural beat can sometimes be heard when different two-tone complexes of high frequency are presented to the two ears. The primary requirement is that the frequency separation in the complex at one ear be slightly different from that in the other--that is, that there be a small interaural difference in the envelope periodicities. This finding is in accord with other recent demonstrations that the auditory system is not deaf to interaural time differences at high frequencies.
Multi-pulse frequency shifted (MPFS) multiple access modulation for ultra wideband
Nekoogar, Faranak [San Ramon, CA; Dowla, Farid U [Castro Valley, CA
2012-01-24
The multi-pulse frequency shifted technique uses mutually orthogonal short duration pulses o transmit and receive information in a UWB multiuser communication system. The multiuser system uses the same pulse shape with different frequencies for the reference and data for each user. Different users have a different pulse shape (mutually orthogonal to each other) and different transmit and reference frequencies. At the receiver, the reference pulse is frequency shifted to match the data pulse and a correlation scheme followed by a hard decision block detects the data.
Žižys, Darius; Gaidys, Rimvydas; Ostaševičius, Vytautas; Narijauskaitė, Birutė
2017-04-27
Frequency up-conversion is a promising technique for energy harvesting in low frequency environments. In this approach, abundantly available environmental motion energy is absorbed by a Low Frequency Resonator (LFR) which transfers it to a high frequency Piezoelectric Vibration Energy Harvester (PVEH) via impact or magnetic coupling. As a result, a decaying alternating output signal is produced, that can later be collected using a battery or be transferred directly to the electric load. The paper reports an impact-coupled frequency up-converting tandem setup with different LFR to PVEH natural frequency ratios and varying contact point location along the length of the harvester. RMS power output of different frequency up-converting tandems with optimal resistive values was found from the transient analysis revealing a strong relation between power output and LFR-PVEH natural frequency ratio as well as impact point location. Simulations revealed that higher power output is obtained from a higher natural frequency ratio between LFR and PVEH, an increase of power output by one order of magnitude for a doubled natural frequency ratio and up to 150% difference in power output from different impact point locations. The theoretical results were experimentally verified.
High-Energy, Multi-Octave-Spanning Mid-IR Sources via Adiabatic Difference Frequency Generation
2016-10-17
plan. We have evaluated a brand -new concept in nonlinear optics, adiabatic difference frequency generation (ADFG) for the efficient transfer of...achieved the main goals of our research plan. We have evaluated a brand -new concept in nonlinear optics, adiabatic difference frequency generation (ADFG...research plan. We have evaluated a brand -new concept in nonlinear optics, adiabatic difference frequency generation (ADFG) for the efficient transfer of
Moore, Carrie B.; Wallace, John R.; Wolfe, Daniel J.; Frase, Alex T.; Pendergrass, Sarah A.; Weiss, Kenneth M.; Ritchie, Marylyn D.
2013-01-01
Analyses investigating low frequency variants have the potential for explaining additional genetic heritability of many complex human traits. However, the natural frequencies of rare variation between human populations strongly confound genetic analyses. We have applied a novel collapsing method to identify biological features with low frequency variant burden differences in thirteen populations sequenced by the 1000 Genomes Project. Our flexible collapsing tool utilizes expert biological knowledge from multiple publicly available database sources to direct feature selection. Variants were collapsed according to genetically driven features, such as evolutionary conserved regions, regulatory regions genes, and pathways. We have conducted an extensive comparison of low frequency variant burden differences (MAF<0.03) between populations from 1000 Genomes Project Phase I data. We found that on average 26.87% of gene bins, 35.47% of intergenic bins, 42.85% of pathway bins, 14.86% of ORegAnno regulatory bins, and 5.97% of evolutionary conserved regions show statistically significant differences in low frequency variant burden across populations from the 1000 Genomes Project. The proportion of bins with significant differences in low frequency burden depends on the ancestral similarity of the two populations compared and types of features tested. Even closely related populations had notable differences in low frequency burden, but fewer differences than populations from different continents. Furthermore, conserved or functionally relevant regions had fewer significant differences in low frequency burden than regions under less evolutionary constraint. This degree of low frequency variant differentiation across diverse populations and feature elements highlights the critical importance of considering population stratification in the new era of DNA sequencing and low frequency variant genomic analyses. PMID:24385916
Kuffner, Tamara; Whitworth, William; Jairam, Maya; McNicholl, Janet
2003-06-01
Knowledge of population major histocompatibility complex gene frequencies is important for construction of organ donor pools and for studies of disease association. Human leukocyte antigen DRB1 (HLA-DRB1), HLA-DQB1, and TNFalpha -308 (G-A) promoter genetic typing was performed in 112 healthy, unrelated African Americans (AAs) from the southeastern United States. Allele frequencies were compared with published frequency data from other AA populations. Our AA population had the highest frequency of HLA- DRB1*09 (6.7%) reported in any AA population. The frequency of the TNF alpha -308A polymorphism was also high (14.4%), when compared with published frequencies in AAs. Significant regional differences in the distribution of most HLA-DRB1 and HLA-DQB1 alleles were observed in all AA populations examined. The AA HLA-DRB1 and -DQB1 frequencies also differed from published Caucasian frequencies. This is the first report describing the distribution of TNF alpha promoter alleles in the Southeastern United States. The high DRB1*09 and TNF alpha -308A allele frequencies of our population most resemble the frequencies of these alleles in certain West African populations. These varying major histocompatibility complex gene frequencies may reflect different regional population structures among AAs in the United States, which may be due to differences in ancestral origins, migration, and racial admixture.
Perceptual Space of Superimposed Dual-Frequency Vibrations in the Hands.
Hwang, Inwook; Seo, Jeongil; Choi, Seungmoon
2017-01-01
The use of distinguishable complex vibrations that have multiple spectral components can improve the transfer of information by vibrotactile interfaces. We investigated the qualitative characteristics of dual-frequency vibrations as the simplest complex vibrations compared to single-frequency vibrations. Two psychophysical experiments were conducted to elucidate the perceptual characteristics of these vibrations by measuring the perceptual distances among single-frequency and dual-frequency vibrations. The perceptual distances of dual-frequency vibrations between their two frequency components along their relative intensity ratio were measured in Experiment I. The estimated perceptual spaces for three frequency conditions showed non-linear perceptual differences between the dual-frequency and single-frequency vibrations. A perceptual space was estimated from the measured perceptual distances among ten dual-frequency compositions and five single-frequency vibrations in Experiment II. The effect of the component frequency and the frequency ratio was revealed in the perceptual space. In a percept of dual-frequency vibration, the lower frequency component showed a dominant effect. Additionally, the perceptual difference among single-frequency and dual-frequency vibrations were increased with a low relative difference between two frequencies of a dual-frequency vibration. These results are expected to provide a fundamental understanding about the perception of complex vibrations to enrich the transfer of information using vibrotactile stimuli.
NASA Astrophysics Data System (ADS)
Piao, Daqing
2017-02-01
The magneto-thermo-acoustic effect that we predicted in 2013 refers to the generation of acoustic-pressure wave from magnetic nanoparticle (MNP) when thermally mediated under an alternating magnetic field (AMF) at a pulsed or frequency-chirped application. Several independent experimental studies have since validated magneto-thermoacoustic effect, and a latest report has discovered acoustic-wave generation from MNP at the second-harmonic frequency of the AMF when operating continuously. We propose that applying two AMFs with differing frequencies to MNP will produce acoustic-pressure wave at the summation and difference of the two frequencies, in addition to the two second-harmonic frequencies. Analysis of the specific absorption dynamics of the MNP when exposed to two AMFs of differing frequencies has shown some interesting patterns of acoustic-intensity at the multiple frequency components. The ratio of the acoustic-intensity at the summation-frequency over that of the difference-frequency is determined by the frequency-ratio of the two AMFs, but remains independent of the AMF strengths. The ratio of the acoustic-intensity at the summation- or difference-frequency over that at each of the two second-harmonic frequencies is determined by both the frequency-ratio and the field-strength-ratio of the two AMFs. The results indicate a potential strategy for localization of the source of a continuous-wave magneto-thermalacoustic signal by examining the frequency spectrum of full-field non-differentiating acoustic detection, with the field-strength ratio changed continuously at a fixed frequency-ratio. The practicalities and challenges of this magnetic spatial localization approach for magneto-thermo-acoustic imaging using a simple envisioned set of two AMFs arranged in parallel to each other are discussed.
The Frequency of School Resource Officer Tasks and Incidents of School Violence
ERIC Educational Resources Information Center
Lane, James F.
2009-01-01
The purpose of this study was to determine if differences in the frequency of school resource officer behaviors exist and if any differences in behaviors were related to differences in the frequency of incidents of school violence. This study collected information about tasks that SROs complete and compared that information to the frequency of…
NASA Astrophysics Data System (ADS)
Lin, Yeong-Fen Emily
This thesis is the result of an investigation of the source-vowel interaction from the point of view of perception. Major objectives include the identification of the acoustic correlates of breathy voice and the disclosure of the interdependent relationship between the perception of vowel identity and breathiness. Two experiments were conducted to achieve these objectives. In the first experiment, voice samples from one control group and seven patient groups were compared. The control group consisted of five female and five male adults. The ten normals were recruited to perform a sustained vowel phonation task with constant pitch and loudness. The voice samples of seventy patients were retrieved from a hospital data base, with vowels extracted from sentences repeated by patients at their habitual pitch and loudness. The seven patient groups were divided, based on a unique combination of patients' measures on mean flow rate and glottal resistance. Eighteen acoustic variables were treated with a three-way (Gender x Group x Vowel) ANOVA. Parameters showing a significant female-male difference as well as group differences, especially those between the presumed breathy group and the other groups, were identified as relevant to the distinction of breathy voice. As a result, F1-F3 amplitude difference and slope were found to be most effective in distinguishing breathy voice. Other acoustic correlates of breathy voice included F1 bandwidth, RMS-H1 amplitude difference, and F1-F2 amplitude difference and slope. In the second experiment, a formant synthesizer was used to generate vowel stimuli with varying spectral tilt and F1 bandwidth. Thirteen native American English speakers made dissimilarity judgements on paired stimuli in terms of vowel identity and breathiness. Listeners' perceptual vowel spaces were found to be affected by changes in the acoustic correlates of breathy voice. The threshold of detecting a change of vocal quality in the breathiness domain was also found to be vowel-dependent.
Immediate effects of anticipatory coarticulation in spoken-word recognition
Salverda, Anne Pier; Kleinschmidt, Dave; Tanenhaus, Michael K.
2014-01-01
Two visual-world experiments examined listeners’ use of pre word-onset anticipatory coarticulation in spoken-word recognition. Experiment 1 established the shortest lag with which information in the speech signal influences eye-movement control, using stimuli such as “The … ladder is the target”. With a neutral token of the definite article preceding the target word, saccades to the referent were not more likely than saccades to an unrelated distractor until 200–240 ms after the onset of the target word. In Experiment 2, utterances contained definite articles which contained natural anticipatory coarticulation pertaining to the onset of the target word (“ The ladder … is the target”). A simple Gaussian classifier was able to predict the initial sound of the upcoming target word from formant information from the first few pitch periods of the article’s vowel. With these stimuli, effects of speech on eye-movement control began about 70 ms earlier than in Experiment 1, suggesting rapid use of anticipatory coarticulation. The results are interpreted as support for “data explanation” approaches to spoken-word recognition. Methodological implications for visual-world studies are also discussed. PMID:24511179
Daily hunger sensation monitoring as a tool for investigating human circadian synchronization.
Cugini, P; Camillieri, G; Alessio, L; Cristina, G; De Rosa, R; Petrangeli, C M
2000-03-01
This study investigates within-day hunger sensation (HS) variability in Clinically Healthy Subjects Adapted to Living in Antarctica (CHSALA), as compared to their coeval subjects living in their mother country. The aim is to detect how the orectic stimulus behaves in those environmental conditions and occupational schemes, in order to investigate the individual synchronization to sleep-wake alternation and meal time schedule. HS was estimated via a self-rating score of its intensity on a Visual Analog Scale, repeating the subjective perception every 30 min, unless sleeping. The individual HS time-qualified scores (orexigram) were analyzed according to conventional and chronobiological procedures. The orexigrams of the CHSALA were seen to show a more cadenced intermittence during the diurnal part of the day, strictly related to the meal timing, and a preserved circadian rhythm as well. In addition, these orexigrams were resolved in a spectrum of harmonic components which indicated a subsidiary number of ultradian formants. These findings are convincing evidence that the individual orexigram may be used to investigate whether or not a single subject is synchronized to sleep-wake cycle, meal time schedule and socio-occupational routines, instead of using more complex and expensive techniques, involving automated equipments and biohumoral assays.
NASA Astrophysics Data System (ADS)
Cui, Sheng; Jin, Shang; Xia, Wenjuan; Ke, Changjian; Liu, Deming
2015-11-01
Symbol rate identification (SRI) based on asynchronous delayed sampling is accurate, cost-effective and robust to impairments. For on-off keying (OOK) signals the symbol rate can be derived from the periodicity of the second-order autocorrelation function (ACF2) of the delay tap samples. But it is found that when applied this method to advanced modulation format signals with auxiliary amplitude modulation (AAM), incorrect results may be produced because AAM has significant impact on ACF2 periodicity, which makes the symbol period harder or even unable to be correctly identified. In this paper it is demonstrated that for these signals the first order autocorrelation function (ACF1) has stronger periodicity and can be used to replace ACF2 to produce more accurate and robust results. Utilizing the characteristics of the ACFs, an improved SRI method is proposed to accommodate both OOK and advanced modulation formant signals in a transparent manner. Furthermore it is proposed that by minimizing the peak to average ratio (PAPR) of the delay tap samples with an additional tunable dispersion compensator (TDC) the limited dispersion tolerance can be expanded to desired values.
Measures to Evaluate the Effects of DBS on Speech Production
Weismer, Gary; Yunusova, Yana; Bunton, Kate
2011-01-01
The purpose of this paper is to review and evaluate measures of speech production that could be used to document effects of Deep Brain Stimulation (DBS) on speech performance, especially in persons with Parkinson disease (PD). A small set of evaluative criteria for these measures is presented first, followed by consideration of several speech physiology and speech acoustic measures that have been studied frequently and reported on in the literature on normal speech production, and speech production affected by neuromotor disorders (dysarthria). Each measure is reviewed and evaluated against the evaluative criteria. Embedded within this review and evaluation is a presentation of new data relating speech motions to speech intelligibility measures in speakers with PD, amyotrophic lateral sclerosis (ALS), and control speakers (CS). These data are used to support the conclusion that at the present time the slope of second formant transitions (F2 slope), an acoustic measure, is well suited to make inferences to speech motion and to predict speech intelligibility. The use of other measures should not be ruled out, however, and we encourage further development of evaluative criteria for speech measures designed to probe the effects of DBS or any treatment with potential effects on speech production and communication skills. PMID:24932066
Perception of temporally modified speech in auditory neuropathy.
Hassan, Dalia Mohamed
2011-01-01
Disrupted auditory nerve activity in auditory neuropathy (AN) significantly impairs the sequential processing of auditory information, resulting in poor speech perception. This study investigated the ability of AN subjects to perceive temporally modified consonant-vowel (CV) pairs and shed light on their phonological awareness skills. Four Arabic CV pairs were selected: /ki/-/gi/, /to/-/do/, /si/-/sti/ and /so/-/zo/. The formant transitions in consonants and the pauses between CV pairs were prolonged. Rhyming, segmentation and blending skills were tested using words at a natural rate of speech and with prolongation of the speech stream. Fourteen adult AN subjects were compared to a matched group of cochlear-impaired patients in their perception of acoustically processed speech. The AN group distinguished the CV pairs at a low speech rate, in particular with modification of the consonant duration. Phonological awareness skills deteriorated in adult AN subjects but improved with prolongation of the speech inter-syllabic time interval. A rehabilitation program for AN should consider temporal modification of speech, training for auditory temporal processing and the use of devices with innovative signal processing schemes. Verbal modifications as well as visual imaging appear to be promising compensatory strategies for remediating the affected phonological processing skills.
Acoustical study of the development of stop consonants in children
NASA Astrophysics Data System (ADS)
Imbrie, Annika K.
2003-10-01
This study focuses on the acoustic patterns of stop consonants and adjacent vowels as they develop in young children (ages 26-33) over a six month period. The acoustic properties that are being measured for stop consonants include spectra of bursts, frication noise and aspiration noise, and formant movements. Additionally, acoustic landmarks are labeled for measurements of durations of events determined by these landmarks. These acoustic measurements are being interpreted in terms of the supraglottal, laryngeal, and respiratory actions that give rise to them. Preliminary data show that some details of the child's gestures are still far from achieving the adult pattern. The burst of frication noise at the release tends to be shorter than adult values, and often consists of multiple bursts. From the burst spectrum, the place of articulation appears to be normal. Finally, coordination of closure of the glottis and release of the primary articulator is still quite variable, as is apparent from a large standard deviation in VOT. Analysis of longitudinal data on young children will result in better models of the development of the coordination of articulation, phonation, and respiration for motor speech production. [Work supported by NIH Grants Nos. DC00038 and DC00075.
Acoustical study of the development of stop consonants in children
NASA Astrophysics Data System (ADS)
Imbrie, Annika K.
2004-05-01
This study focuses on the acoustic patterns of stop consonants and adjacent vowels as they develop in young children (ages 2.6-3.3) over a 6-month period. The acoustic properties that are being measured for stop consonants include spectra of bursts, frication noise and aspiration noise, and formant movements. Additionally, acoustic landmarks are labeled for measurements of durations of events determined by these landmarks. These acoustic measurements are being interpreted in terms of the supraglottal, laryngeal, and respiratory actions that give rise to them. Preliminary data show that some details of the child's gestures are still far from achieving the adult pattern. The burst of frication noise at the release tends to be shorter than adult values, and often consists of multiple bursts, possibly due to greater compliance of the active articulator. From the burst spectrum, the place of articulation appears to be normal. Finally, coordination of closure of the glottis and release of the primary articulator is still quite variable, as is apparent from a large standard deviation in VOT. Analysis of longitudinal data on young children will result in better models of the development of motor speech production. [Work supported by NIH Grants DC00038 and DC00075.
Keller, Peter E.; König, Rasmus; Novembre, Giacomo
2017-01-01
Human interaction through music is a vital part of social life across cultures. Influential accounts of the evolutionary origins of music favor cooperative functions related to social cohesion or competitive functions linked to sexual selection. However, work on non-human “chorusing” displays, as produced by congregations of male insects and frogs to attract female mates, suggests that cooperative and competitive functions may coexist. In such chorusing, rhythmic coordination between signalers, which maximizes the salience of the collective broadcast, can arise through competitive mechanisms by which individual males jam rival signals. Here, we show that mixtures of cooperative and competitive behavior also occur in human music. Acoustic analyses of the renowned St. Thomas Choir revealed that, in the presence of female listeners, boys with the deepest voices enhance vocal brilliance and carrying power by boosting high spectral energy. This vocal enhancement may reflect sexually mature males competing for female attention in a covert manner that does not undermine collaborative musical goals. The evolutionary benefits of music may thus lie in its aptness as a medium for balancing sexually motivated behavior and group cohesion. PMID:28959222
Advancements in text-to-speech technology and implications for AAC applications
NASA Astrophysics Data System (ADS)
Syrdal, Ann K.
2003-10-01
Intelligibility was the initial focus in text-to-speech (TTS) research, since it is clearly a necessary condition for the application of the technology. Sufficiently high intelligibility (approximating human speech) has been achieved in the last decade by the better formant-based and concatenative TTS systems. This led to commercially available TTS systems for highly motivated users, particularly the blind and vocally impaired. Some unnatural qualities of TTS were exploited by these users, such as very fast speaking rates and altered pitch ranges for flagging relevant information. Recently, the focus in TTS research has turned to improving naturalness, so that synthetic speech sounds more human and less robotic. Unit selection approaches to concatenative synthesis have dramatically improved TTS quality, although at the cost of larger and more complex systems. This advancement in naturalness has made TTS technology more acceptable to the general public. The vocally impaired appreciate a more natural voice with which to represent themselves when communicating with others. Unit selection TTS does not achieve such high speaking rates as the earlier TTS systems, however, which is a disadvantage to some AAC device users. An important new research emphasis is to improve and increase the range of emotional expressiveness of TTS.
Precursors of Dancing and Singing to Music in Three- to Four-Months-Old Infants
Fujii, Shinya; Watanabe, Hama; Oohashi, Hiroki; Hirashima, Masaya; Nozaki, Daichi; Taga, Gentaro
2014-01-01
Dancing and singing to music involve auditory-motor coordination and have been essential to our human culture since ancient times. Although scholars have been trying to understand the evolutionary and developmental origin of music, early human developmental manifestations of auditory-motor interactions in music have not been fully investigated. Here we report limb movements and vocalizations in three- to four-months-old infants while they listened to music and were in silence. In the group analysis, we found no significant increase in the amount of movement or in the relative power spectrum density around the musical tempo in the music condition compared to the silent condition. Intriguingly, however, there were two infants who demonstrated striking increases in the rhythmic movements via kicking or arm-waving around the musical tempo during listening to music. Monte-Carlo statistics with phase-randomized surrogate data revealed that the limb movements of these individuals were significantly synchronized to the musical beat. Moreover, we found a clear increase in the formant variability of vocalizations in the group during music perception. These results suggest that infants at this age are already primed with their bodies to interact with music via limb movements and vocalizations. PMID:24837135
Precursors of dancing and singing to music in three- to four-months-old infants.
Fujii, Shinya; Watanabe, Hama; Oohashi, Hiroki; Hirashima, Masaya; Nozaki, Daichi; Taga, Gentaro
2014-01-01
Dancing and singing to music involve auditory-motor coordination and have been essential to our human culture since ancient times. Although scholars have been trying to understand the evolutionary and developmental origin of music, early human developmental manifestations of auditory-motor interactions in music have not been fully investigated. Here we report limb movements and vocalizations in three- to four-months-old infants while they listened to music and were in silence. In the group analysis, we found no significant increase in the amount of movement or in the relative power spectrum density around the musical tempo in the music condition compared to the silent condition. Intriguingly, however, there were two infants who demonstrated striking increases in the rhythmic movements via kicking or arm-waving around the musical tempo during listening to music. Monte-Carlo statistics with phase-randomized surrogate data revealed that the limb movements of these individuals were significantly synchronized to the musical beat. Moreover, we found a clear increase in the formant variability of vocalizations in the group during music perception. These results suggest that infants at this age are already primed with their bodies to interact with music via limb movements and vocalizations.
Frequency modulation detection in cochlear implant subjects
NASA Astrophysics Data System (ADS)
Chen, Hongbin; Zeng, Fan-Gang
2004-10-01
Frequency modulation (FM) detection was investigated in acoustic and electric hearing to characterize cochlear-implant subjects' ability to detect dynamic frequency changes and to assess the relative contributions of temporal and spectral cues to frequency processing. Difference limens were measured for frequency upward sweeps, downward sweeps, and sinusoidal FM as a function of standard frequency and modulation rate. In electric hearing, factors including electrode position and stimulation level were also studied. Electric hearing data showed that the difference limen increased monotonically as a function of standard frequency regardless of the modulation type, the modulation rate, the electrode position, and the stimulation level. In contrast, acoustic hearing data showed that the difference limen was nearly a constant as a function of standard frequency. This difference was interpreted to mean that temporal cues are used only at low standard frequencies and at low modulation rates. At higher standard frequencies and modulation rates, the reliance on the place cue is increased, accounting for the better performance in acoustic hearing than for electric hearing with single-electrode stimulation. The present data suggest a speech processing strategy that encodes slow frequency changes using lower stimulation rates than those typically employed by contemporary cochlear-implant speech processors. .
NASA Astrophysics Data System (ADS)
Popescu, Gheorghe
2001-06-01
An international frequency comparison was carried out at the Bundesamt fuer Eich- und Vermessungswessen (BEV), Vienna, within the framework of the EUROMET Project #498 from August 29 to September 5, 1999. The frequency differences obtained when the RO.1 laser from the National Institute for Laser, Plasma and Radiation Physics (NILPRP), Romania, was compared with five lasers from Austria (BEV1), Czech Republic (PLD1), France (BIPM3), Poland (GUM1) and Hungary (OMH1) are reported. Frequency differences were computed by using the matrix determinations for the group d, e, f, g. Considering the frequency differences measured for a group of three lasers compared to each other, we call the closing frequency the difference between measured and expected frequency difference (resulting from the previous two measurements). For the RO1 laser, when the BIPM3 laser was the reference laser, the closing frequencies range from +8.1 kHz to - 3.8 kHz. The relative Allan standard deviation was used to express the frequency stability and resulted 3.8 parts in 1012 for 100 s sampling time and 14000 s duration of the measurements. The averaged offset frequency relative to the BIPM4 stationary laser was 5.6 kHz and the standard deviation was 9.9 kHz.
High sensitivity of p-modes near the acoustic cutoff frequency to solar model parameters
NASA Technical Reports Server (NTRS)
Guenther, D. B.
1991-01-01
The p-mode frequencies of low l have been calculated for solar models with initial helium mass fraction varying from Y = 0.2753-0.2875. The differences in frequency of the p-modes in the frequency range, 2500-4500 microHz, do not exceed 1-5 microHz among the models. But in the vicinity of the acoustic cutoff frequency, near 5000 microHz the p-mode frequency differences are enhanced by a factor of 4. The enhanced sensitivity of p-modes near the acoustic cutoff frequency was further tested by calculating and comparing p-mode frequencies of low l for two solar models one incorporating the Eddington T-tau relation and the other the Krishna Swamy T-tau relation. Again, it is found that p-modes with frequencies near the acoustic cutoff frequency show a significant increase in sensitivity to the different T-tau relations, compared to lower frequency p-modes. It is noted that frequencies above the acoustic cutoff frequency are complex, hence, cannot be modeled by the adiabatic pulsation code (assumes real eigenfrequencies) used in these calculations.
Wurdeman, Shane R; Huisinga, Jessie M; Filipi, Mary; Stergiou, Nicholas
2011-02-01
Multiple sclerosis is a progressive neurological disease that results in a high incident of gait disturbance. Exploring the frequency content of the ground reaction forces generated during walking may provide additional insights to gait in patients with multiple sclerosis that could lead to specific tools for differential diagnosis. The purpose of this study was to investigate differences in the frequency content of these forces in an effort to contribute to improved clinical management of this disease. Eighteen patients and eighteen healthy controls walked across a 10 meter long walkway. The anterior-posterior and vertical ground reaction forces generated during the stance phase of gait were evaluated in the frequency domain using fast Fourier transformation. T-tests were utilized for comparison of median frequency, the 99.5% frequency, and the frequency bandwidth between patients and healthy controls and also for comparisons between patients with mild and moderate severity. Patients with multiple sclerosis had significantly lower 99.5% frequency (P=0.006) and median frequency (P<0.001) in the vertical ground reaction force. No differences were found in the anterior-posterior reaction force frequency content. There were no differences between patients with mild and moderate severity. The lower frequency content suggests lesser vertical oscillation of the center of gravity. Lack of differences between severities may suggest presence of differences prior to currently established diagnosis timelines. Analysis of the frequency content may potentially serve to provide earlier diagnostic assessment of this debilitating disease. Copyright © 2010 Elsevier Ltd. All rights reserved.
Compact flexible multifrequency splitter based on plasmonic graded metallic grating arc waveguide.
Han, Chao; Wang, Zhaohong; Chu, Yangyang; Zhao, Xiaodan; Zhang, Xuanru
2018-04-15
A compact flexible multifrequency splitter based on an arc waveguide constructed of plasmonic metallic grating structures with graded-height T-grooves is proposed and studied. The dispersion curves and cutoff frequencies of the plasmonic grating waveguides with different T-groove metallic grating heights are different. The guided spoof surface plasmonic polariton waves at different frequencies can be localized at dissimilar angles along the graded grating arc waveguide. The output flexibility at an arbitrary groove for different frequencies is realized by introducing an additional symmetrical T-groove structure as an output. The compact four-, seven-, and eight-output frequency splitters demonstrate its flexible multifrequency separation capability at different output angle locations, while the dimensional size of the frequency splitters is not increased. Measurement results at the microwave frequency display excellent agreement with numerical simulation results.
Wang, Mingjun; Zhou, Yufeng
2018-04-01
Inertial cavitation thresholds, which are defined as bubble growth by 2-fold from the equilibrium radius, by two types of ultrasonic excitation (at the classical single-frequency mode and dual-frequency mode) were calculated. The effect of the dual-frequency excitation on the inertial cavitation threshold in the different surrounding media (fluid and tissue) was studied, and the paramount parameters (driving frequency, amplitude ratio, phase difference, and frequency ratio) were also optimized to maximize the inertial cavitation. The numerical prediction confirms the previous experimental results that the dual-frequency excitation is capable of reducing the inertial cavitation threshold in comparison to the single-frequency one at the same output power. The dual-frequency excitation at the high frequency (i.e., 3.1 + 3.5 MHz vs. 1.1 + 1.3 MHz) is preferred in this study. The simulation results suggest that the same amplitudes of individual components, zero phase difference, and large frequency difference are beneficial for enhancing the bubble cavitation. Overall, this work may provide a theoretical model for further investigation of dual-frequency excitation and guidance of its applications for a better outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Spatial-frequency spectrum of patterns changes the visibility of spatial-phase differences
NASA Technical Reports Server (NTRS)
Lawton, T. B.
1985-01-01
It is shown that spatial-frequency components over a 4-octave range affected the visibility of spatial-phase differences. Contrast thresholds were measured for discrimination between two (+45- and -45-deg) spatial phases of a sinusoidal test grating added to a background grating. The background could contain one or several sinusoidal components, all in 0-deg phase. Phase differences between the test and the background were visible at lower contrasts when test and background frequencies were harmonically related than when they were not, when test and background frequencies were within 1 octave than when they were farther apart, when the fundamental frequency of the background was low than when it was high, and for some discriminations more than for others, after practice. The visibility of phase differences was not affected by additional components in the background if the fundamental and difference frequencies of the background remained unchanged. Observers' reports of their strategies gave information about the types of attentive processing that were used to discriminate phase differences. Attentive processing facilitated phase discrimination for multifrequency gratings spanning a much wider range of spatial frequencies than would be possible by using only local preattentive processing. These results were consistent with the visibility of phase differences being processed by some combination of even- and odd-symmetric simple cells tuned to a wide range of different spatial frequencies.
Xue, L J; Yang, A C; Chen, H; Huang, W X; Guo, J J; Liang, X Y; Chen, Z Q; Zheng, Q L
2017-11-20
Objective: Study of the results and the degree on occupational noise-induced deafness in-to the different high frequency hearing threshold weighted value, in order to provide theoretical basis for the re-vision of diagnostic criteria on occupational noise-induced deafness. Methods: A retrospective study was con-ducted to investigate the cases on the diagnosis of occupational noise-induced deafness in Guangdong province hospital for occupational disease prevention and treatment from January 2016 to January 2017. Based on the re-sults of the 3 hearing test for each test interval greater than 3 days in the hospital, the best threshold of each frequency was obtained, and based on the diagnostic criteria of occupational noise deafness in 2007 edition, Chi square test, t test and variance analysis were used to measure SPSS21.0 data, their differences are tested among the means of speech frequency and the high frequency weighted value into different age group, noise ex-posure group, and diagnostic classification between different dimensions. Results: 1. There were totally 168 cases in accordance with the study plan, male 154 cases, female 14 cases, the average age was 41.18 ±6.07 years old. 2. The diagnosis rate was increased into the weighted value of different high frequency than the mean value of pure speech frequency, the weighted 4 kHz frequency increased by 13.69% (χ(2)=9.880, P =0.002) , 6 kHz increased by 15.47% (χ(2)=9.985, P =0.002) and 4 kHz+6 kHz increased by15.47% (χ(2)=9.985, P =0.002) , the difference was statistically significant. The diagnostic rate of different high threshold had no obvious differ-ence between the genders. 3. The age groups were divided into less than or equal to 40years old group (A group) and 40-50 years old group (group B) , there were higher the diagnostic rate between high frequency weighted 4 kHz (A group χ(2)=3.380, P =0.050; B group χ(2)=4.054, P =0.032) , weighted 6 kHz (A group χ(2)=6.362, P =0.012; B group χ(2)=4.054, P =0.032) , high frequency weighted 4 kHz+6 kHz (A group χ(2)=6.362, P =0.012; B group χ(2)=4.054, P =0.032) than those of speech frequency average value in the same group on oc-cupational noise-induced deafness diagnosis rate, the difference was statistically significant. There was no sig-nificant difference between age groups (χ(2)=2.265, P =0.944) . 4. The better ear's mean value of pure speech fre-quency and the weighted values into different high frequency of working years on each group were compared, working years more than 10 years group was significantly higher than that of average thresholds of each frequen-cy band in 3-5 group ( F =2.271, P =0.001) , 6-10 group ( F =1.563, P =0.046) , the difference was statistically significant. The different high frequency weighted values were higher than those of the mean value of pure speech frequency, and the high frequency weighted 4 kHz+6 kHz had the highest frequency difference, with an average increase of 2.83 dB. 5. The diagnostic rate into weighted different high frequency was higher in the mild, moderate and severe grades than in the pure speech frequency. In the comparison of diagnosis for mild occupational noise-induced deafness, in addition to the weighted 3 kHz high frequency (χ(2)=3.117, P =0.077) had no significant difference, the weighted 4 kHz (χ(2)=10.835, P =0.001) , 6 kHz (χ(2)=9.985, P =0.002) , 3 kHz+4 kHz (χ(2)=6.315, P =0.012) , 3 kHz+6 kHz (χ(2)=6.315, P =0.012) , 4 kHz+6 kHz (χ(2)=9.985, P =0.002) , 3 kHz+4 kHz+6 kHz (χ(2)=7.667, P =0.002) were significantly higher than the diagnosis rate of the mean value of pure speech frequency. There was no significant difference between the two groups in the moderate and se-vere grades ( P >0.05) . Conclusion: Bring into different high frequency hearing threshold weighted value in-creases the diagnostic rate of occupational noise-induced deafness, the weighted 4 kHz, 6 kHz and 4 kHz+ 6 kHz high frequency value affects the result greatly, and the weighted 4 kHz+6 kHz high frequency hearing threshold value is maximum the effect on occupational noise-induced deafness diagnosis.
Strings on a Violin: Location Dependence of Frequency Tuning in Active Dendrites.
Das, Anindita; Rathour, Rahul K; Narayanan, Rishikesh
2017-01-01
Strings on a violin are tuned to generate distinct sound frequencies in a manner that is firmly dependent on finger location along the fingerboard. Sound frequencies emerging from different violins could be very different based on their architecture, the nature of strings and their tuning. Analogously, active neuronal dendrites, dendrites endowed with active channel conductances, are tuned to distinct input frequencies in a manner that is dependent on the dendritic location of the synaptic inputs. Further, disparate channel expression profiles and differences in morphological characteristics could result in dendrites on different neurons of the same subtype tuned to distinct frequency ranges. Alternately, similar location-dependence along dendritic structures could be achieved through disparate combinations of channel profiles and morphological characteristics, leading to degeneracy in active dendritic spectral tuning. Akin to strings on a violin being tuned to different frequencies than those on a viola or a cello, different neuronal subtypes exhibit distinct channel profiles and disparate morphological characteristics endowing each neuronal subtype with unique location-dependent frequency selectivity. Finally, similar to the tunability of musical instruments to elicit distinct location-dependent sounds, neuronal frequency selectivity and its location-dependence are tunable through activity-dependent plasticity of ion channels and morphology. In this morceau, we explore the origins of neuronal frequency selectivity, and survey the literature on the mechanisms behind the emergence of location-dependence in distinct forms of frequency tuning. As a coda to this composition, we present some future directions for this exciting convergence of biophysical mechanisms that endow a neuron with frequency multiplexing capabilities.
Using wavelets to decompose the time frequency effects of monetary policy
NASA Astrophysics Data System (ADS)
Aguiar-Conraria, Luís; Azevedo, Nuno; Soares, Maria Joana
2008-05-01
Central banks have different objectives in the short and long run. Governments operate simultaneously at different timescales. Many economic processes are the result of the actions of several agents, who have different term objectives. Therefore, a macroeconomic time series is a combination of components operating on different frequencies. Several questions about economic time series are connected to the understanding of the behavior of key variables at different frequencies over time, but this type of information is difficult to uncover using pure time-domain or pure frequency-domain methods. To our knowledge, for the first time in an economic setup, we use cross-wavelet tools to show that the relation between monetary policy variables and macroeconomic variables has changed and evolved with time. These changes are not homogeneous across the different frequencies.
A New Test of Attention in Listening (TAIL) Predicts Auditory Performance
Zhang, Yu-Xuan; Barry, Johanna G.; Moore, David R.; Amitay, Sygal
2012-01-01
Attention modulates auditory perception, but there are currently no simple tests that specifically quantify this modulation. To fill the gap, we developed a new, easy-to-use test of attention in listening (TAIL) based on reaction time. On each trial, two clearly audible tones were presented sequentially, either at the same or different ears. The frequency of the tones was also either the same or different (by at least two critical bands). When the task required same/different frequency judgments, presentation at the same ear significantly speeded responses and reduced errors. A same/different ear (location) judgment was likewise facilitated by keeping tone frequency constant. Perception was thus influenced by involuntary orienting of attention along the task-irrelevant dimension. When information in the two stimulus dimensions were congruent (same-frequency same-ear, or different-frequency different-ear), response was faster and more accurate than when they were incongruent (same-frequency different-ear, or different-frequency same-ear), suggesting the involvement of executive control to resolve conflicts. In total, the TAIL yielded five independent outcome measures: (1) baseline reaction time, indicating information processing efficiency, (2) involuntary orienting of attention to frequency and (3) location, and (4) conflict resolution for frequency and (5) location. Processing efficiency and conflict resolution accounted for up to 45% of individual variances in the low- and high-threshold variants of three psychoacoustic tasks assessing temporal and spectral processing. Involuntary orientation of attention to the irrelevant dimension did not correlate with perceptual performance on these tasks. Given that TAIL measures are unlikely to be limited by perceptual sensitivity, we suggest that the correlations reflect modulation of perceptual performance by attention. The TAIL thus has the power to identify and separate contributions of different components of attention to auditory perception. PMID:23300934
Semi-automatic, octave-spanning optical frequency counter.
Liu, Tze-An; Shu, Ren-Huei; Peng, Jin-Long
2008-07-07
This work presents and demonstrates a semi-automatic optical frequency counter with octave-spanning counting capability using two fiber laser combs operated at different repetition rates. Monochromators are utilized to provide an approximate frequency of the laser under measurement to determine the mode number difference between the two laser combs. The exact mode number of the beating comb line is obtained from the mode number difference and the measured beat frequencies. The entire measurement process, except the frequency stabilization of the laser combs and the optimization of the beat signal-to-noise ratio, is controlled by a computer running a semi-automatic optical frequency counter.
The frequency-difference and frequency-sum acoustic-field autoproducts.
Worthmann, Brian M; Dowling, David R
2017-06-01
The frequency-difference and frequency-sum autoproducts are quadratic products of solutions of the Helmholtz equation at two different frequencies (ω + and ω - ), and may be constructed from the Fourier transform of any time-domain acoustic field. Interestingly, the autoproducts may carry wave-field information at the difference (ω + - ω - ) and sum (ω + + ω - ) frequencies even though these frequencies may not be present in the original acoustic field. This paper provides analytical and simulation results that justify and illustrate this possibility, and indicate its limitations. The analysis is based on the inhomogeneous Helmholtz equation and its solutions while the simulations are for a point source in a homogeneous half-space bounded by a perfectly reflecting surface. The analysis suggests that the autoproducts have a spatial phase structure similar to that of a true acoustic field at the difference and sum frequencies if the in-band acoustic field is a plane or spherical wave. For multi-ray-path environments, this phase structure similarity persists in portions of the autoproduct fields that are not suppressed by bandwidth averaging. Discrepancies between the bandwidth-averaged autoproducts and true out-of-band acoustic fields (with potentially modified boundary conditions) scale inversely with the product of the bandwidth and ray-path arrival time differences.
Controlling Energy Radiations of Electromagnetic Waves via Frequency Coding Metamaterials
Wu, Haotian; Liu, Shuo; Wan, Xiang; Zhang, Lei; Wang, Dan; Li, Lianlin
2017-01-01
Metamaterials are artificial structures composed of subwavelength unit cells to control electromagnetic (EM) waves. The spatial coding representation of metamaterial has the ability to describe the material in a digital way. The spatial coding metamaterials are typically constructed by unit cells that have similar shapes with fixed functionality. Here, the concept of frequency coding metamaterial is proposed, which achieves different controls of EM energy radiations with a fixed spatial coding pattern when the frequency changes. In this case, not only different phase responses of the unit cells are considered, but also different phase sensitivities are also required. Due to different frequency sensitivities of unit cells, two units with the same phase response at the initial frequency may have different phase responses at higher frequency. To describe the frequency coding property of unit cell, digitalized frequency sensitivity is proposed, in which the units are encoded with digits “0” and “1” to represent the low and high phase sensitivities, respectively. By this merit, two degrees of freedom, spatial coding and frequency coding, are obtained to control the EM energy radiations by a new class of frequency‐spatial coding metamaterials. The above concepts and physical phenomena are confirmed by numerical simulations and experiments. PMID:28932671
267 Spanish Exomes Reveal Population-Specific Differences in Disease-Related Genetic Variation
Dopazo, Joaquín; Amadoz, Alicia; Bleda, Marta; Garcia-Alonso, Luz; Alemán, Alejandro; García-García, Francisco; Rodriguez, Juan A.; Daub, Josephine T.; Muntané, Gerard; Rueda, Antonio; Vela-Boza, Alicia; López-Domingo, Francisco J.; Florido, Javier P.; Arce, Pablo; Ruiz-Ferrer, Macarena; Méndez-Vidal, Cristina; Arnold, Todd E.; Spleiss, Olivia; Alvarez-Tejado, Miguel; Navarro, Arcadi; Bhattacharya, Shomi S.; Borrego, Salud; Santoyo-López, Javier; Antiñolo, Guillermo
2016-01-01
Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalog of local variability motivated the whole-exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one-third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including ∼10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes, and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies to distinguish real disease associations from population-specific polymorphisms. PMID:26764160
Interaural time sensitivity of high-frequency neurons in the inferior colliculus.
Yin, T C; Kuwada, S; Sujaku, Y
1984-11-01
Recent psychoacoustic experiments have shown that interaural time differences provide adequate cues for lateralizing high-frequency sounds, provided the stimuli are complex and not pure tones. We present here physiological evidence in support of these findings. Neurons of high best frequency in the cat inferior colliculus respond to interaural phase differences of amplitude modulated waveforms, and this response depends upon preservation of phase information of the modulating signal. Interaural phase differences were introduced in two ways: by interaural delays of the entire waveform and by binaural beats in which there was an interaural frequency difference in the modulating waveform. Results obtained with these two methods are similar. Our results show that high-frequency cells can respond to interaural time differences of amplitude modulated signals and that they do so by a sensitivity to interaural phase differences of the modulating waveform.
Li, Jiang; Meng, Xiang-Min; Li, Ru-Yi; Zhang, Ru; Zhang, Zheng; Du, Yi-Feng
2016-10-01
Studies have confirmed that low-frequency repetitive transcranial magnetic stimulation can decrease the activity of cortical neurons, and high-frequency repetitive transcranial magnetic stimulation can increase the excitability of cortical neurons. However, there are few studies concerning the use of different frequencies of repetitive transcranial magnetic stimulation on the recovery of upper-limb motor function after cerebral infarction. We hypothesized that different frequencies of repetitive transcranial magnetic stimulation in patients with cerebral infarction would produce different effects on the recovery of upper-limb motor function. This study enrolled 127 patients with upper-limb dysfunction during the subacute phase of cerebral infarction. These patients were randomly assigned to three groups. The low-frequency group comprised 42 patients who were treated with 1 Hz repetitive transcranial magnetic stimulation on the contralateral hemisphere primary motor cortex (M1). The high-frequency group comprised 43 patients who were treated with 10 Hz repetitive transcranial magnetic stimulation on ipsilateral M1. Finally, the sham group comprised 42 patients who were treated with 10 Hz of false stimulation on ipsilateral M1. A total of 135 seconds of stimulation was applied in the sham group and high-frequency group. At 2 weeks after treatment, cortical latency of motor-evoked potentials and central motor conduction time were significantly lower compared with before treatment. Moreover, motor function scores were significantly improved. The above indices for the low- and high-frequency groups were significantly different compared with the sham group. However, there was no significant difference between the low- and high-frequency groups. The results show that low- and high-frequency repetitive transcranial magnetic stimulation can similarly improve upper-limb motor function in patients with cerebral infarction.
Perception of the fundamental frequencies of children's voices by trained and untrained listeners.
Wilson, F B; Wellen, C J; Kimbarow, M L
1983-10-01
This study was designed to determine if trained voice clinicians were better than untrained listeners in judging differences in the fundamental frequencies of children's voices. We also attempted to determine the degree of difference in fundamental frequency necessary for accurate judgments. Finally, ability to perceive pitch differences in speaking voices was correlated with ability to judge puretone stimuli. Results indicated that trained clinicians were no better at judging average fundamental frequency than were untrained listeners. Both groups performed at chance level until differences in vocal fundamental frequency exceeded 20 Hz. Finally, there was no correlation between subjects' success on standardized puretone pitch tests and ability to judge average pitch in the speaking voice.
Shioiri, Satoshi; Matsumiya, Kazumichi
2009-05-29
We investigated spatiotemporal characteristics of motion mechanisms using a new type of motion aftereffect (MAE) we found. Our stimulus comprised two superimposed sinusoidal gratings with different spatial frequencies. After exposure to the moving stimulus, observers perceived the MAE in the static test in the direction opposite to that of the high spatial frequency grating even when low spatial frequency motion was perceived during adaptation. In contrast, in the flicker test, the MAE was perceived in the direction opposite to that of the low spatial frequency grating. These MAEs indicate that two different motion systems contribute to motion perception and can be isolated by using different test stimuli. Using a psychophysical technique based on the MAE, we investigated the differences between the two motion mechanisms. The results showed that the static MAE is the aftereffect of the motion system with a high spatial and low temporal frequency tuning (slow motion detector) and the flicker MAE is the aftereffect of the motion system with a low spatial and high temporal frequency tuning (fast motion detector). We also revealed that the two motion detectors differ in orientation tuning, temporal frequency tuning, and sensitivity to relative motion.
Kirchberger, Martin
2016-01-01
A novel algorithm for frequency lowering in music was developed and experimentally tested in hearing-impaired listeners. Harmonic frequency lowering (HFL) combines frequency transposition and frequency compression to preserve the harmonic content of music stimuli. Listeners were asked to make judgments regarding detail and sound quality in music stimuli. Stimuli were presented under different signal processing conditions: original, low-pass filtered, HFL, and nonlinear frequency compressed. Results showed that participants reported perceiving the most detail in the HFL condition. In addition, there was no difference in sound quality across conditions. PMID:26834122
Phase-locked loop with controlled phase slippage
Mestha, Lingappa K.
1994-01-01
A system for synchronizing a first subsystem controlled by a changing frequency sweeping from a first frequency to a second frequency, with a second subsystem operating at a steady state second frequency. Trip plan parameters are calculated in advance to determine the phase relationship between the frequencies of the first subsystem and second subsystem in order to obtain synchronism at the end of the frequency sweep of the first subsystem. During the time in which the frequency of the first subsystem is sweeping from the first frequency to the second frequency, the phase locked system compares the actual phase difference with the trip plan phase difference and incrementally changes the sweep frequency in a manner so that phase lock is achieved when the first subsystem reaches a frequency substantially identical to that of the second subsystem.
Radar network communication through sensing of frequency hopping
Dowla, Farid; Nekoogar, Faranak
2013-05-28
In one embodiment, a radar communication system includes a plurality of radars having a communication range and being capable of operating at a sensing frequency and a reporting frequency, wherein the reporting frequency is different than the sensing frequency, each radar is adapted for operating at the sensing frequency until an event is detected, each radar in the plurality of radars has an identification/location frequency for reporting information different from the sensing frequency, a first radar of the radars which senses the event sends a reporting frequency corresponding to its identification/location frequency when the event is detected, and all other radars in the plurality of radars switch their reporting frequencies to match the reporting frequency of the first radar upon detecting the reporting frequency switch of a radar within the communication range. In another embodiment, a method is presented for communicating information in a radar system.
NASA Technical Reports Server (NTRS)
Maleki, Lutfollah (Inventor)
1993-01-01
Two different carrier frequencies modulated by a reference frequency are transmitted to each receiver to be synchronized therewith. Each receiver responds to local phase differences between the two received signals to correct the phase of one of them so as to maintain the corrected signal as a reliable synchronization reference.
Direct reading inductance meter
NASA Technical Reports Server (NTRS)
Kolby, R. B. (Inventor)
1977-01-01
A direct reading inductance meter comprised of a crystal oscillator and an LC tuned oscillator is presented. The oscillators function respectively to generate a reference frequency, f(r), and to generate an initial frequency, f(0), which when mixed produce a difference equal to zero. Upon connecting an inductor of small unknown value in the LC circuit to change its resonant frequency to f(x), a difference frequency (f(r)-f(x)) is produced that is very nearly a linear function of the inductance of the inductor. The difference frequency is measured and displayed on a linear scale in units of inductance.
Graph Frequency Analysis of Brain Signals
Huang, Weiyu; Goldsberry, Leah; Wymbs, Nicholas F.; Grafton, Scott T.; Bassett, Danielle S.; Ribeiro, Alejandro
2016-01-01
This paper presents methods to analyze functional brain networks and signals from graph spectral perspectives. The notion of frequency and filters traditionally defined for signals supported on regular domains such as discrete time and image grids has been recently generalized to irregular graph domains, and defines brain graph frequencies associated with different levels of spatial smoothness across the brain regions. Brain network frequency also enables the decomposition of brain signals into pieces corresponding to smooth or rapid variations. We relate graph frequency with principal component analysis when the networks of interest denote functional connectivity. The methods are utilized to analyze brain networks and signals as subjects master a simple motor skill. We observe that brain signals corresponding to different graph frequencies exhibit different levels of adaptability throughout learning. Further, we notice a strong association between graph spectral properties of brain networks and the level of exposure to tasks performed, and recognize the most contributing and important frequency signatures at different levels of task familiarity. PMID:28439325
Cross, Deanna S; Ivacic, Lynn C; Stefanski, Elisha L; McCarty, Catherine A
2010-06-17
There is a lack of knowledge regarding the frequency of disease associated polymorphisms in populations and population attributable risk for many populations remains unknown. Factors that could affect the association of the allele with disease, either positively or negatively, such as race, ethnicity, and gender, may not be possible to determine without population based allele frequencies.Here we used a panel of 51 polymorphisms previously associated with at least one disease and determined the allele frequencies within the entire Personalized Medicine Research Project population based cohort. We compared these allele frequencies to those in dbSNP and other data sources stratified by race. Differences in allele frequencies between self reported race, region of origin, and sex were determined. There were 19544 individuals who self reported a single racial category, 19027 or (97.4%) self reported white Caucasian, and 11205 (57.3%) individuals were female. Of the 11,208 (57%) individuals with an identifiable region of origin 8337 or (74.4%) were German.41 polymorphisms were significantly different between self reported race at the 0.05 level. Stratification of our Caucasian population by self reported region of origin revealed 19 polymorphisms that were significantly different (p = 0.05) between individuals of different origins. Further stratification of the population by gender revealed few significant differences in allele frequencies between the genders. This represents one of the largest population based allele frequency studies to date. Stratification by self reported race and region of origin revealed wide differences in allele frequencies not only by race but also by region of origin within a single racial group. We report allele frequencies for our Asian/Hmong and American Indian populations; these two minority groups are not typically selected for population allele frequency detection. Population wide allele frequencies are important for the design and implementation of studies and for determining the relevance of a disease associated polymorphism for a given population.
Lowet, Eric; Roberts, Mark; Hadjipapas, Avgis; Peter, Alina; van der Eerden, Jan; De Weerd, Peter
2015-02-01
Fine-scale temporal organization of cortical activity in the gamma range (∼25-80Hz) may play a significant role in information processing, for example by neural grouping ('binding') and phase coding. Recent experimental studies have shown that the precise frequency of gamma oscillations varies with input drive (e.g. visual contrast) and that it can differ among nearby cortical locations. This has challenged theories assuming widespread gamma synchronization at a fixed common frequency. In the present study, we investigated which principles govern gamma synchronization in the presence of input-dependent frequency modulations and whether they are detrimental for meaningful input-dependent gamma-mediated temporal organization. To this aim, we constructed a biophysically realistic excitatory-inhibitory network able to express different oscillation frequencies at nearby spatial locations. Similarly to cortical networks, the model was topographically organized with spatially local connectivity and spatially-varying input drive. We analyzed gamma synchronization with respect to phase-locking, phase-relations and frequency differences, and quantified the stimulus-related information represented by gamma phase and frequency. By stepwise simplification of our models, we found that the gamma-mediated temporal organization could be reduced to basic synchronization principles of weakly coupled oscillators, where input drive determines the intrinsic (natural) frequency of oscillators. The gamma phase-locking, the precise phase relation and the emergent (measurable) frequencies were determined by two principal factors: the detuning (intrinsic frequency difference, i.e. local input difference) and the coupling strength. In addition to frequency coding, gamma phase contained complementary stimulus information. Crucially, the phase code reflected input differences, but not the absolute input level. This property of relative input-to-phase conversion, contrasting with latency codes or slower oscillation phase codes, may resolve conflicting experimental observations on gamma phase coding. Our modeling results offer clear testable experimental predictions. We conclude that input-dependency of gamma frequencies could be essential rather than detrimental for meaningful gamma-mediated temporal organization of cortical activity.
Lowet, Eric; Roberts, Mark; Hadjipapas, Avgis; Peter, Alina; van der Eerden, Jan; De Weerd, Peter
2015-01-01
Fine-scale temporal organization of cortical activity in the gamma range (∼25–80Hz) may play a significant role in information processing, for example by neural grouping (‘binding’) and phase coding. Recent experimental studies have shown that the precise frequency of gamma oscillations varies with input drive (e.g. visual contrast) and that it can differ among nearby cortical locations. This has challenged theories assuming widespread gamma synchronization at a fixed common frequency. In the present study, we investigated which principles govern gamma synchronization in the presence of input-dependent frequency modulations and whether they are detrimental for meaningful input-dependent gamma-mediated temporal organization. To this aim, we constructed a biophysically realistic excitatory-inhibitory network able to express different oscillation frequencies at nearby spatial locations. Similarly to cortical networks, the model was topographically organized with spatially local connectivity and spatially-varying input drive. We analyzed gamma synchronization with respect to phase-locking, phase-relations and frequency differences, and quantified the stimulus-related information represented by gamma phase and frequency. By stepwise simplification of our models, we found that the gamma-mediated temporal organization could be reduced to basic synchronization principles of weakly coupled oscillators, where input drive determines the intrinsic (natural) frequency of oscillators. The gamma phase-locking, the precise phase relation and the emergent (measurable) frequencies were determined by two principal factors: the detuning (intrinsic frequency difference, i.e. local input difference) and the coupling strength. In addition to frequency coding, gamma phase contained complementary stimulus information. Crucially, the phase code reflected input differences, but not the absolute input level. This property of relative input-to-phase conversion, contrasting with latency codes or slower oscillation phase codes, may resolve conflicting experimental observations on gamma phase coding. Our modeling results offer clear testable experimental predictions. We conclude that input-dependency of gamma frequencies could be essential rather than detrimental for meaningful gamma-mediated temporal organization of cortical activity. PMID:25679780
Liang, Chun; Earl, Brian; Thompson, Ivy; Whitaker, Kayla; Cahn, Steven; Xiang, Jing; Fu, Qian-Jie; Zhang, Fawen
2016-01-01
Objective: The objectives of this study were: (1) to determine if musicians have a better ability to detect frequency changes under quiet and noisy conditions; (2) to use the acoustic change complex (ACC), a type of electroencephalographic (EEG) response, to understand the neural substrates of musician vs. non-musician difference in frequency change detection abilities. Methods: Twenty-four young normal hearing listeners (12 musicians and 12 non-musicians) participated. All participants underwent psychoacoustic frequency detection tests with three types of stimuli: tones (base frequency at 160 Hz) containing frequency changes (Stim 1), tones containing frequency changes masked by low-level noise (Stim 2), and tones containing frequency changes masked by high-level noise (Stim 3). The EEG data were recorded using tones (base frequency at 160 and 1200 Hz, respectively) containing different magnitudes of frequency changes (0, 5, and 50% changes, respectively). The late-latency evoked potential evoked by the onset of the tones (onset LAEP or N1-P2 complex) and that evoked by the frequency change contained in the tone (the acoustic change complex or ACC or N1′-P2′ complex) were analyzed. Results: Musicians significantly outperformed non-musicians in all stimulus conditions. The ACC and onset LAEP showed similarities and differences. Increasing the magnitude of frequency change resulted in increased ACC amplitudes. ACC measures were found to be significantly different between musicians (larger P2′ amplitude) and non-musicians for the base frequency of 160 Hz but not 1200 Hz. Although the peak amplitude in the onset LAEP appeared to be larger and latency shorter in musicians than in non-musicians, the difference did not reach statistical significance. The amplitude of the onset LAEP is significantly correlated with that of the ACC for the base frequency of 160 Hz. Conclusion: The present study demonstrated that musicians do perform better than non-musicians in detecting frequency changes in quiet and noisy conditions. The ACC and onset LAEP may involve different but overlapping neural mechanisms. Significance: This is the first study using the ACC to examine music-training effects. The ACC measures provide an objective tool for documenting musical training effects on frequency detection. PMID:27826221
Ma, Xiaomei; Wang, Di; Zhou, Yujing; Zhuo, Chuanjun; Qin, Wen; Zhu, Jiajia; Yu, Chunshui
2016-04-01
We aimed to investigate sex-dependent alterations in resting-state relative cerebral blood flow, amplitude of low-frequency fluctuations and relative cerebral blood flow-amplitude of low-frequency fluctuations coupling in patients with schizophrenia. Resting-state functional magnetic resonance imaging and three-dimensional pseudo-continuous arterial spin labeling imaging were performed to obtain resting-state amplitude of low-frequency fluctuations and relative cerebral blood flow in 95 schizophrenia patients and 99 healthy controls. Sex differences in relative cerebral blood flow and amplitude of low-frequency fluctuations were compared in both groups. Diagnostic group differences in relative cerebral blood flow, amplitude of low-frequency fluctuations and relative cerebral blood flow-amplitude of low-frequency fluctuations coupling were compared in male and female subjects, respectively. In both healthy controls and schizophrenia patients, the males had higher relative cerebral blood flow in anterior brain regions and lower relative cerebral blood flow in posterior brain regions than did the females. Compared with multiple regions exhibiting sex differences in relative cerebral blood flow, only the left middle frontal gyrus had a significant sex difference in amplitude of low-frequency fluctuations. In the females, schizophrenia patients exhibited increased relative cerebral blood flow and amplitude of low-frequency fluctuations in the basal ganglia, thalamus and hippocampus and reduced relative cerebral blood flow and amplitude of low-frequency fluctuations in the frontal, parietal and occipital regions compared with those of healthy controls. However, there were fewer brain regions with diagnostic group differences in the males than in the females. Brain regions with diagnostic group differences in relative cerebral blood flow and amplitude of low-frequency fluctuations only partially overlapped. Only the female patients exhibited increased relative cerebral blood flow-amplitude of low-frequency fluctuations couplings compared with those of healthy females. The alterations in the relative cerebral blood flow and amplitude of low-frequency fluctuations in schizophrenia are sex-specific, which should be considered in future neuroimaging studies. The relative cerebral blood flow and amplitude of low-frequency fluctuations have different sensitivity in detecting changes in neuronal activity in schizophrenia and can provide complementary information. © The Royal Australian and New Zealand College of Psychiatrists 2015.
Lamb, B C; Saleem, M; Scott, W; Thapa, N; Nevo, E
1998-05-01
We have studied whether there is natural genetic variation for mutation frequencies, and whether any such variation is environment-related. Mutation frequencies differed significantly between wild strains of the fungus Sordaria fimicola isolated from a harsher or a milder microscale environment in "Evolution Canyon," Israel. Strains from the harsher, drier, south-facing slope had higher frequencies of new spontaneous mutations and of accumulated mutations than strains from the milder, lusher, north-facing slope. Collective total mutation frequencies over many loci for ascospore pigmentation were 2.3, 3.5 and 4.4% for three strains from the south-facing slope, and 0.9, 1.1, 1.2, 1.3 and 1.3% for five strains from the north-facing slope. Some of this between-slope difference was inherited through two generations of selfing, with average spontaneous mutation frequencies of 1.9% for south-facing slope strains and 0.8% for north-facing slope strains. The remainder was caused by different frequencies of mutations arising in the original environments. There was also significant heritable genetic variation in mutation frequencies within slopes. Similar between-slope differences were found for ascospore germination-resistance to acriflavine, with much higher frequencies in strains from the south-facing slope. Such inherited variation provides a basis for natural selection for optimum mutation rates in each environment.