Effect of Vowel Context on the Recognition of Initial Consonants in Kannada.
Kalaiah, Mohan Kumar; Bhat, Jayashree S
2017-09-01
The present study was carried out to investigate the effect of vowel context on the recognition of Kannada consonants in quiet for young adults. A total of 17 young adults with normal hearing in both ears participated in the study. The stimuli included consonant-vowel syllables, spoken by 12 native speakers of Kannada. Consonant recognition task was carried out as a closed-set (fourteen-alternative forced-choice). The present study showed an effect of vowel context on the perception of consonants. Maximum consonant recognition score was obtained in the /o/ vowel context, followed by the /a/ and /u/ vowel contexts, and then the /e/ context. Poorest consonant recognition score was obtained in the vowel context /i/. Vowel context has an effect on the recognition of Kannada consonants, and the vowel effect was unique for Kannada consonants.
Rate and onset cues can improve cochlear implant synthetic vowel recognition in noise
Mc Laughlin, Myles; Reilly, Richard B.; Zeng, Fan-Gang
2013-01-01
Understanding speech-in-noise is difficult for most cochlear implant (CI) users. Speech-in-noise segregation cues are well understood for acoustic hearing but not for electric hearing. This study investigated the effects of stimulation rate and onset delay on synthetic vowel-in-noise recognition in CI subjects. In experiment I, synthetic vowels were presented at 50, 145, or 795 pulse/s and noise at the same three rates, yielding nine combinations. Recognition improved significantly if the noise had a lower rate than the vowel, suggesting that listeners can use temporal gaps in the noise to detect a synthetic vowel. This hypothesis is supported by accurate prediction of synthetic vowel recognition using a temporal integration window model. Using lower rates a similar trend was observed in normal hearing subjects. Experiment II found that for CI subjects, a vowel onset delay improved performance if the noise had a lower or higher rate than the synthetic vowel. These results show that differing rates or onset times can improve synthetic vowel-in-noise recognition, indicating a need to develop speech processing strategies that encode or emphasize these cues. PMID:23464025
Post interaural neural net-based vowel recognition
NASA Astrophysics Data System (ADS)
Jouny, Ismail I.
2001-10-01
Interaural head related transfer functions are used to process speech signatures prior to neural net based recognition. Data representing the head related transfer function of a dummy has been collected at MIT and made available on the Internet. This data is used to pre-process vowel signatures to mimic the effects of human ear on speech perception. Signatures representing various vowels of the English language are then presented to a multi-layer perceptron trained using the back propagation algorithm for recognition purposes. The focus in this paper is to assess the effects of human interaural system on vowel recognition performance particularly when using a classification system that mimics the human brain such as a neural net.
Dynamic Spectral Structure Specifies Vowels for Adults and Children
Nittrouer, Susan; Lowenstein, Joanna H.
2014-01-01
The dynamic specification account of vowel recognition suggests that formant movement between vowel targets and consonant margins is used by listeners to recognize vowels. This study tested that account by measuring contributions to vowel recognition of dynamic (i.e., time-varying) spectral structure and coarticulatory effects on stationary structure. Adults and children (four-and seven-year-olds) were tested with three kinds of consonant-vowel-consonant syllables: (1) unprocessed; (2) sine waves that preserved both stationary coarticulated and dynamic spectral structure; and (3) vocoded signals that primarily preserved that stationary, but not dynamic structure. Sections of two lengths were removed from syllable middles: (1) half the vocalic portion; and (2) all but the first and last three pitch periods. Adults performed accurately with unprocessed and sine-wave signals, as long as half the syllable remained; their recognition was poorer for vocoded signals, but above chance. Seven-year-olds performed more poorly than adults with both sorts of processed signals, but disproportionately worse with vocoded than sine-wave signals. Most four-year-olds were unable to recognize vowels at all with vocoded signals. Conclusions were that both dynamic and stationary coarticulated structures support vowel recognition for adults, but children attend to dynamic spectral structure more strongly because early phonological organization favors whole words. PMID:25536845
Congruent and Incongruent Semantic Context Influence Vowel Recognition
ERIC Educational Resources Information Center
Wotton, J. M.; Elvebak, R. L.; Moua, L. C.; Heggem, N. M.; Nelson, C. A.; Kirk, K. M.
2011-01-01
The influence of sentence context on the recognition of naturally spoken vowels degraded by reverberation and Gaussian noise was investigated. Target words were paired to have similar consonant sounds but different vowels (e.g., map/mop) and were embedded early in sentences which provided three types of semantic context. Fifty-eight…
Speaker recognition with temporal cues in acoustic and electric hearing
NASA Astrophysics Data System (ADS)
Vongphoe, Michael; Zeng, Fan-Gang
2005-08-01
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
ERIC Educational Resources Information Center
Vergara-Martinez, Marta; Perea, Manuel; Marin, Alejandro; Carreiras, Manuel
2011-01-01
Recent research suggests that there is a processing distinction between consonants and vowels in visual-word recognition. Here we conjointly examine the time course of consonants and vowels in processes of letter identity and letter position assignment. Event related potentials (ERPs) were recorded while participants read words and pseudowords in…
Caballero-Morales, Santiago-Omar
2013-01-01
An approach for the recognition of emotions in speech is presented. The target language is Mexican Spanish, and for this purpose a speech database was created. The approach consists in the phoneme acoustic modelling of emotion-specific vowels. For this, a standard phoneme-based Automatic Speech Recognition (ASR) system was built with Hidden Markov Models (HMMs), where different phoneme HMMs were built for the consonants and emotion-specific vowels associated with four emotional states (anger, happiness, neutral, sadness). Then, estimation of the emotional state from a spoken sentence is performed by counting the number of emotion-specific vowels found in the ASR's output for the sentence. With this approach, accuracy of 87–100% was achieved for the recognition of emotional state of Mexican Spanish speech. PMID:23935410
Dissociation of tone and vowel processing in Mandarin idioms.
Hu, Jiehui; Gao, Shan; Ma, Weiyi; Yao, Dezhong
2012-09-01
Using event-related potentials, this study measured the access of suprasegmental (tone) and segmental (vowel) information in spoken word recognition with Mandarin idioms. Participants performed a delayed-response acceptability task, in which they judged the correctness of the last word of each idiom, which might deviate from the correct word in either tone or vowel. Results showed that, compared with the correct idioms, a larger early negativity appeared only for vowel violation. Additionally, a larger N400 effect was observed for vowel mismatch than tone mismatch. A control experiment revealed that these differences were not due to low-level physical differences across conditions; instead, they represented the greater constraining power of vowels than tones in the lexical selection and semantic integration of the spoken words. Furthermore, tone violation elicited a more robust late positive component than vowel violation, suggesting different reanalyses of the two types of information. In summary, the current results support a functional dissociation of tone and vowel processing in spoken word recognition. Copyright © 2012 Society for Psychophysiological Research.
Bouchon, Camillia; Floccia, Caroline; Fux, Thibaut; Adda-Decker, Martine; Nazzi, Thierry
2015-07-01
Consonants and vowels differ acoustically and articulatorily, but also functionally: Consonants are more relevant for lexical processing, and vowels for prosodic/syntactic processing. These functional biases could be powerful bootstrapping mechanisms for learning language, but their developmental origin remains unclear. The relative importance of consonants and vowels at the onset of lexical acquisition was assessed in French-learning 5-month-olds by testing sensitivity to minimal phonetic changes in their own name. Infants' reactions to mispronunciations revealed sensitivity to vowel but not consonant changes. Vowels were also more salient (on duration and intensity) but less distinct (on spectrally based measures) than consonants. Lastly, vowel (but not consonant) mispronunciation detection was modulated by acoustic factors, in particular spectrally based distance. These results establish that consonant changes do not affect lexical recognition at 5 months, while vowel changes do; the consonant bias observed later in development does not emerge until after 5 months through additional language exposure. © 2014 John Wiley & Sons Ltd.
Soares, Ana Paula; Perea, Manuel; Comesaña, Montserrat
2014-01-01
Recent research with skilled adult readers has consistently revealed an advantage of consonants over vowels in visual-word recognition (i.e., the so-called "consonant bias"). Nevertheless, little is known about how early in development the consonant bias emerges. This work aims to address this issue by studying the relative contribution of consonants and vowels at the early stages of visual-word recognition in developing readers (2(nd) and 4(th) Grade children) and skilled adult readers (college students) using a masked priming lexical decision task. Target words starting either with a consonant or a vowel were preceded by a briefly presented masked prime (50 ms) that could be the same as the target (e.g., pirata-PIRATA [pirate-PIRATE]), a consonant-preserving prime (e.g., pureto-PIRATA), a vowel-preserving prime (e.g., gicala-PIRATA), or an unrelated prime (e.g., bocelo -PIRATA). Results revealed significant priming effects for the identity and consonant-preserving conditions in adult readers and 4(th) Grade children, whereas 2(nd) graders only showed priming for the identity condition. In adult readers, the advantage of consonants was observed both for words starting with a consonant or a vowel, while in 4(th) graders this advantage was restricted to words with an initial consonant. Thus, the present findings suggest that a Consonant/Vowel skeleton should be included in future (developmental) models of visual-word recognition and reading.
Processing voiceless vowels in Japanese: Effects of language-specific phonological knowledge
NASA Astrophysics Data System (ADS)
Ogasawara, Naomi
2005-04-01
There has been little research on processing allophonic variation in the field of psycholinguistics. This study focuses on processing the voiced/voiceless allophonic alternation of high vowels in Japanese. Three perception experiments were conducted to explore how listeners parse out vowels with the voicing alternation from other segments in the speech stream and how the different voicing statuses of the vowel affect listeners' word recognition process. The results from the three experiments show that listeners use phonological knowledge of their native language for phoneme processing and for word recognition. However, interactions of the phonological and acoustic effects are observed to be different in each process. The facilitatory phonological effect and the inhibitory acoustic effect cancel out one another in phoneme processing; while in word recognition, the facilitatory phonological effect overrides the inhibitory acoustic effect.
Banzina, Elina; Dilley, Laura C; Hewitt, Lynne E
2016-08-01
The importance of secondary-stressed (SS) and unstressed-unreduced (UU) syllable accuracy for spoken word recognition in English is as yet unclear. An acoustic study first investigated Russian learners' of English production of SS and UU syllables. Significant vowel quality and duration reductions in Russian-spoken SS and UU vowels were found, likely due to a transfer of native phonological features. Next, a cross-modal phonological priming technique combined with a lexical decision task assessed the effect of inaccurate SS and UU syllable productions on native American English listeners' speech processing. Inaccurate UU vowels led to significant inhibition of lexical access, while reduced SS vowels revealed less interference. The results have implications for understanding the role of SS and UU syllables for word recognition and English pronunciation instruction.
Lexical Competition in Non-Native Spoken-Word Recognition
ERIC Educational Resources Information Center
Weber, Andrea; Cutler, Anne
2004-01-01
Four eye-tracking experiments examined lexical competition in non-native spoken-word recognition. Dutch listeners hearing English fixated longer on distractor pictures with names containing vowels that Dutch listeners are likely to confuse with vowels in a target picture name ("pencil," given target "panda") than on less confusable distractors…
Soares, Ana Paula; Perea, Manuel; Comesaña, Montserrat
2014-01-01
Recent research with skilled adult readers has consistently revealed an advantage of consonants over vowels in visual-word recognition (i.e., the so-called “consonant bias”). Nevertheless, little is known about how early in development the consonant bias emerges. This work aims to address this issue by studying the relative contribution of consonants and vowels at the early stages of visual-word recognition in developing readers (2nd and 4th Grade children) and skilled adult readers (college students) using a masked priming lexical decision task. Target words starting either with a consonant or a vowel were preceded by a briefly presented masked prime (50 ms) that could be the same as the target (e.g., pirata-PIRATA [pirate-PIRATE]), a consonant-preserving prime (e.g., pureto-PIRATA), a vowel-preserving prime (e.g., gicala-PIRATA), or an unrelated prime (e.g., bocelo -PIRATA). Results revealed significant priming effects for the identity and consonant-preserving conditions in adult readers and 4th Grade children, whereas 2nd graders only showed priming for the identity condition. In adult readers, the advantage of consonants was observed both for words starting with a consonant or a vowel, while in 4th graders this advantage was restricted to words with an initial consonant. Thus, the present findings suggest that a Consonant/Vowel skeleton should be included in future (developmental) models of visual-word recognition and reading. PMID:24523917
Effects of Talker Variability on Vowel Recognition in Cochlear Implants
ERIC Educational Resources Information Center
Chang, Yi-ping; Fu, Qian-Jie
2006-01-01
Purpose: To investigate the effects of talker variability on vowel recognition by cochlear implant (CI) users and by normal-hearing (NH) participants listening to 4-channel acoustic CI simulations. Method: CI users were tested with their clinically assigned speech processors. For NH participants, 3 CI processors were simulated, using different…
The influence of sexual orientation on vowel production (L)
NASA Astrophysics Data System (ADS)
Pierrehumbert, Janet B.; Bent, Tessa; Munson, Benjamin; Bradlow, Ann R.; Bailey, J. Michael
2004-10-01
Vowel production in gay, lesbian, bisexual (GLB), and heterosexual speakers was examined. Differences in the acoustic characteristics of vowels were found as a function of sexual orientation. Lesbian and bisexual women produced less fronted /u/ and /opena/ than heterosexual women. Gay men produced a more expanded vowel space than heterosexual men. However, the vowels of GLB speakers were not generally shifted toward vowel patterns typical of the opposite sex. These results are inconsistent with the conjecture that innate biological factors have a broadly feminizing influence on the speech of gay men and a broadly masculinizing influence on the speech of lesbian/bisexual women. They are consistent with the idea that innate biological factors influence GLB speech patterns indirectly by causing selective adoption of certain speech patterns characteristic of the opposite sex. .
Alexander, Joshua M.
2016-01-01
By varying parameters that control nonlinear frequency compression (NFC), this study examined how different ways of compressing inaudible mid- and/or high-frequency information at lower frequencies influences perception of consonants and vowels. Twenty-eight listeners with mild to moderately severe hearing loss identified consonants and vowels from nonsense syllables in noise following amplification via a hearing aid simulator. Low-pass filtering and the selection of NFC parameters fixed the output bandwidth at a frequency representing a moderately severe (3.3 kHz, group MS) or a mild-to-moderate (5.0 kHz, group MM) high-frequency loss. For each group (n = 14), effects of six combinations of NFC start frequency (SF) and input bandwidth [by varying the compression ratio (CR)] were examined. For both groups, the 1.6 kHz SF significantly reduced vowel and consonant recognition, especially as CR increased; whereas, recognition was generally unaffected if SF increased at the expense of a higher CR. Vowel recognition detriments for group MS were moderately correlated with the size of the second formant frequency shift following NFC. For both groups, significant improvement (33%–50%) with NFC was confined to final /s/ and /z/ and to some VCV tokens, perhaps because of listeners' limited exposure to each setting. No set of parameters simultaneously maximized recognition across all tokens. PMID:26936574
ERIC Educational Resources Information Center
Bouchon, Camillia; Floccia, Caroline; Fux, Thibaut; Adda-Decker, Martine; Nazzi, Thierry
2015-01-01
Consonants and vowels differ acoustically and articulatorily, but also functionally: Consonants are more relevant for lexical processing, and vowels for prosodic/syntactic processing. These functional biases could be powerful bootstrapping mechanisms for learning language, but their developmental origin remains unclear. The relative importance of…
ERIC Educational Resources Information Center
Pollo, Tatiana Cury; Kessler, Brett; Treiman, Rebecca
2005-01-01
Young Portuguese-speaking children have been reported to produce more vowel- and syllable-oriented spellings than have English speakers. To investigate the extent and source of such differences, we analyzed children's vocabulary and found that Portuguese words have more vowel letter names and a higher vowel-consonant ratio than do English words.…
ERIC Educational Resources Information Center
Banzina, Elina; Dilley, Laura C.; Hewitt, Lynne E.
2016-01-01
The importance of secondary-stressed (SS) and unstressed-unreduced (UU) syllable accuracy for spoken word recognition in English is as yet unclear. An acoustic study first investigated Russian learners' of English production of SS and UU syllables. Significant vowel quality and duration reductions in Russian-spoken SS and UU vowels were found,…
Vergara-Martínez, Marta; Perea, Manuel; Marín, Alejandro; Carreiras, Manuel
2011-09-01
Recent research suggests that there is a processing distinction between consonants and vowels in visual-word recognition. Here we conjointly examine the time course of consonants and vowels in processes of letter identity and letter position assignment. Event related potentials (ERPs) were recorded while participants read words and pseudowords in a lexical decision task. The stimuli were displayed under different conditions in a masked priming paradigm with a 50-ms SOA: (i) identity/baseline condition e.g., chocolate-CHOCOLATE); (ii) vowels-delayed condition (e.g., choc_l_te-CHOCOLATE); (iii) consonants-delayed condition (cho_o_ate-CHOCOLATE); (iv) consonants-transposed condition (cholocate-CHOCOLATE); (v) vowels-transposed condition (chocalote-CHOCOLATE), and (vi) unrelated condition (editorial-CHOCOLATE). Results showed earlier ERP effects and longer reaction times for the delayed-letter compared to the transposed-letter conditions. Furthermore, at early stages of processing, consonants may play a greater role during letter identity processing. Differences between vowels and consonants regarding letter position assignment are discussed in terms of a later phonological level involved in lexical retrieval. Copyright © 2010 Elsevier Inc. All rights reserved.
Adaptation to novel accents by toddlers
White, Katherine S.; Aslin, Richard N.
2010-01-01
Word recognition is a balancing act: listeners must be sensitive to phonetic detail to avoid confusing similar words, yet, at the same time, be flexible enough to adapt to phonetically variable pronunciations, such as those produced by speakers of different dialects or by non-native speakers. Recent work has demonstrated that young toddlers are sensitive to phonetic detail during word recognition; pronunciations that deviate from the typical phonological form lead to a disruption of processing. However, it is not known whether young word learners show the flexibility that is characteristic of adult word recognition. The present study explores whether toddlers can adapt to artificial accents in which there is a vowel category shift with respect to the native language. 18–20-month-olds heard mispronunciations of familiar words (e.g., vowels were shifted from [a] to [æ]: “dog” pronounced as “dag”). In test, toddlers were tolerant of mispronunciations if they had recently been exposed to the same vowel shift, but not if they had been exposed to standard pronunciations or other vowel shifts. The effects extended beyond particular items heard in exposure to words sharing the same vowels. These results indicate that, like adults, toddlers show flexibility in their interpretation of phonological detail. Moreover, they suggest that effects of top-down knowledge on the reinterpretation of phonological detail generalize across the phono-lexical system. PMID:21479106
Fu, Qian-Jie; Chinchilla, Sherol; Galvin, John J
2004-09-01
The present study investigated the relative importance of temporal and spectral cues in voice gender discrimination and vowel recognition by normal-hearing subjects listening to an acoustic simulation of cochlear implant speech processing and by cochlear implant users. In the simulation, the number of speech processing channels ranged from 4 to 32, thereby varying the spectral resolution; the cutoff frequencies of the channels' envelope filters ranged from 20 to 320 Hz, thereby manipulating the available temporal cues. For normal-hearing subjects, results showed that both voice gender discrimination and vowel recognition scores improved as the number of spectral channels was increased. When only 4 spectral channels were available, voice gender discrimination significantly improved as the envelope filter cutoff frequency was increased from 20 to 320 Hz. For all spectral conditions, increasing the amount of temporal information had no significant effect on vowel recognition. Both voice gender discrimination and vowel recognition scores were highly variable among implant users. The performance of cochlear implant listeners was similar to that of normal-hearing subjects listening to comparable speech processing (4-8 spectral channels). The results suggest that both spectral and temporal cues contribute to voice gender discrimination and that temporal cues are especially important for cochlear implant users to identify the voice gender when there is reduced spectral resolution.
Stress Effects in Vowel Perception as a Function of Language-Specific Vocabulary Patterns.
Warner, Natasha; Cutler, Anne
2017-01-01
Evidence from spoken word recognition suggests that for English listeners, distinguishing full versus reduced vowels is important, but discerning stress differences involving the same full vowel (as in mu- from music or museum) is not. In Dutch, in contrast, the latter distinction is important. This difference arises from the relative frequency of unstressed full vowels in the two vocabularies. The goal of this paper is to determine how this difference in the lexicon influences the perception of stressed versus unstressed vowels. All possible sequences of two segments (diphones) in Dutch and in English were presented to native listeners in gated fragments. We recorded identification performance over time throughout the speech signal. The data were here analysed specifically for patterns in perception of stressed versus unstressed vowels. The data reveal significantly larger stress effects (whereby unstressed vowels are harder to identify than stressed vowels) in English than in Dutch. Both language-specific and shared patterns appear regarding which vowels show stress effects. We explain the larger stress effect in English as reflecting the processing demands caused by the difference in use of unstressed vowels in the lexicon. The larger stress effect in English is due to relative inexperience with processing unstressed full vowels. © 2016 S. Karger AG, Basel.
Speaker normalization for chinese vowel recognition in cochlear implants.
Luo, Xin; Fu, Qian-Jie
2005-07-01
Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.
Handwritten recognition of Tamil vowels using deep learning
NASA Astrophysics Data System (ADS)
Ram Prashanth, N.; Siddarth, B.; Ganesh, Anirudh; Naveen Kumar, Vaegae
2017-11-01
We come across a large volume of handwritten texts in our daily lives and handwritten character recognition has long been an important area of research in pattern recognition. The complexity of the task varies among different languages and it so happens largely due to the similarity between characters, distinct shapes and number of characters which are all language-specific properties. There have been numerous works on character recognition of English alphabets and with laudable success, but regional languages have not been dealt with very frequently and with similar accuracies. In this paper, we explored the performance of Deep Belief Networks in the classification of Handwritten Tamil vowels, and conclusively compared the results obtained. The proposed method has shown satisfactory recognition accuracy in light of difficulties faced with regional languages such as similarity between characters and minute nuances that differentiate them. We can further extend this to all the Tamil characters.
Finding Words in a Language that Allows Words without Vowels
ERIC Educational Resources Information Center
El Aissati, Abder; McQueen, James M.; Cutler, Anne
2012-01-01
Across many languages from unrelated families, spoken-word recognition is subject to a constraint whereby potential word candidates must contain a vowel. This constraint minimizes competition from embedded words (e.g., in English, disfavoring "win" in "twin" because "t" cannot be a word). However, the constraint would be counter-productive in…
Auditory, Visual, and Auditory-Visual Perception of Vowels by Hearing-Impaired Children.
ERIC Educational Resources Information Center
Hack, Zarita Caplan; Erber, Norman P.
1982-01-01
Vowels were presented through auditory, visual, and auditory-visual modalities to 18 hearing impaired children (12 to 15 years old) having good, intermediate, and poor auditory word recognition skills. All the groups had difficulty with acoustic information and visual information alone. The first two groups had only moderate difficulty identifying…
ERIC Educational Resources Information Center
Goswami, Usha
1993-01-01
Three experiments on vowel decoding involving primary school children partially tested an interactive model of reading acquisition. The model suggests that children begin learning to read by establishing orthographic recognition units for words that have phonological underpinning that is initially at the onset-rime level but that becomes…
Unspoken vowel recognition using facial electromyogram.
Arjunan, Sridhar P; Kumar, Dinesh K; Yau, Wai C; Weghorn, Hans
2006-01-01
The paper aims to identify speech using the facial muscle activity without the audio signals. The paper presents an effective technique that measures the relative muscle activity of the articulatory muscles. Five English vowels were used as recognition variables. This paper reports using moving root mean square (RMS) of surface electromyogram (SEMG) of four facial muscles to segment the signal and identify the start and end of the utterance. The RMS of the signal between the start and end markers was integrated and normalised. This represented the relative muscle activity of the four muscles. These were classified using back propagation neural network to identify the speech. The technique was successfully used to classify 5 vowels into three classes and was not sensitive to the variation in speed and the style of speaking of the different subjects. The results also show that this technique was suitable for classifying the 5 vowels into 5 classes when trained for each of the subjects. It is suggested that such a technology may be used for the user to give simple unvoiced commands when trained for the specific user.
The Relative Position Priming Effect Depends on Whether Letters Are Vowels or Consonants
ERIC Educational Resources Information Center
Dunabeitia, Jon Andoni; Carreiras, Manuel
2011-01-01
The relative position priming effect is a type of subset priming in which target word recognition is facilitated as a consequence of priming the word with some of its letters, maintaining their relative position (e.g., "csn" as a prime for "casino"). Five experiments were conducted to test whether vowel-only and consonant-only…
Moradi, Shahram; Lidestam, Björn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Rönnberg, Jerker
2017-09-18
We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels-in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands-in listeners with hearing impairment using hearing aids. The study comprised 199 participants with hearing impairment (mean age = 61.1 years) with bilateral, symmetrical, mild-to-severe sensorineural hearing loss. Gated Swedish consonants and vowels were presented aurally and audiovisually to participants. Linear amplification was adjusted for each participant to assure audibility. The reading span test was used to measure participants' working memory capacity. Audiovisual presentation resulted in shortened isolation points and improved accuracy for consonants and vowels relative to auditory-only presentation. This benefit was more evident for consonants than vowels. In addition, correlations and subsequent analyses revealed that listeners with higher scores on the reading span test identified both consonants and vowels earlier in auditory-only presentation, but only vowels (not consonants) in audiovisual presentation. Consonants and vowels differed in terms of the benefits afforded from their associative visual cues, as indicated by the degree of audiovisual benefit and reduction in cognitive demands linked to the identification of consonants and vowels presented audiovisually.
Perception of steady-state vowels and vowelless syllables by adults and children
NASA Astrophysics Data System (ADS)
Nittrouer, Susan
2005-04-01
Vowels can be produced as long, isolated, and steady-state, but that is not how they are found in natural speech. Instead natural speech consists of almost continuously changing (i.e., dynamic) acoustic forms from which mature listeners recover underlying phonetic form. Some theories suggest that children need steady-state information to recognize vowels (and so learn vowel systems), even though that information is sparse in natural speech. The current study examined whether young children can recover vowel targets from dynamic forms, or whether they need steady-state information. Vowel recognition was measured for adults and children (3, 5, and 7 years) for natural productions of /dæd/, /dUd/ /æ/, /U/ edited to make six stimulus sets: three dynamic (whole syllables; syllables with middle 50-percent replaced by cough; syllables with all but the first and last three pitch periods replaced by cough), and three steady-state (natural, isolated vowels; reiterated pitch periods from those vowels; reiterated pitch periods from the syllables). Adults scored nearly perfectly on all but first/last three pitch period stimuli. Children performed nearly perfectly only when the entire syllable was heard, and performed similarly (near 80%) for all other stimuli. Consequently, children need dynamic forms to perceive vowels; steady-state forms are not preferred.
NASA Astrophysics Data System (ADS)
Green, Tim; Faulkner, Andrew; Rosen, Stuart; Macherey, Olivier
2005-07-01
Standard continuous interleaved sampling processing, and a modified processing strategy designed to enhance temporal cues to voice pitch, were compared on tests of intonation perception, and vowel perception, both in implant users and in acoustic simulations. In standard processing, 400 Hz low-pass envelopes modulated either pulse trains (implant users) or noise carriers (simulations). In the modified strategy, slow-rate envelope modulations, which convey dynamic spectral variation crucial for speech understanding, were extracted by low-pass filtering (32 Hz). In addition, during voiced speech, higher-rate temporal modulation in each channel was provided by 100% amplitude-modulation by a sawtooth-like wave form whose periodicity followed the fundamental frequency (F0) of the input. Channel levels were determined by the product of the lower- and higher-rate modulation components. Both in acoustic simulations and in implant users, the ability to use intonation information to identify sentences as question or statement was significantly better with modified processing. However, while there was no difference in vowel recognition in the acoustic simulation, implant users performed worse with modified processing both in vowel recognition and in formant frequency discrimination. It appears that, while enhancing pitch perception, modified processing harmed the transmission of spectral information.
Synthesis fidelity and time-varying spectral change in vowels
NASA Astrophysics Data System (ADS)
Assmann, Peter F.; Katz, William F.
2005-02-01
Recent studies have shown that synthesized versions of American English vowels are less accurately identified when the natural time-varying spectral changes are eliminated by holding the formant frequencies constant over the duration of the vowel. A limitation of these experiments has been that vowels produced by formant synthesis are generally less accurately identified than the natural vowels after which they are modeled. To overcome this limitation, a high-quality speech analysis-synthesis system (STRAIGHT) was used to synthesize versions of 12 American English vowels spoken by adults and children. Vowels synthesized with STRAIGHT were identified as accurately as the natural versions, in contrast with previous results from our laboratory showing identification rates 9%-12% lower for the same vowels synthesized using the cascade formant model. Consistent with earlier studies, identification accuracy was not reduced when the fundamental frequency was held constant across the vowel. However, elimination of time-varying changes in the spectral envelope using STRAIGHT led to a greater reduction in accuracy (23%) than was previously found with cascade formant synthesis (11%). A statistical pattern recognition model, applied to acoustic measurements of the natural and synthesized vowels, predicted both the higher identification accuracy for vowels synthesized using STRAIGHT compared to formant synthesis, and the greater effects of holding the formant frequencies constant over time with STRAIGHT synthesis. Taken together, the experiment and modeling results suggest that formant estimation errors and incorrect rendering of spectral and temporal cues by cascade formant synthesis contribute to lower identification accuracy and underestimation of the role of time-varying spectral change in vowels. .
Human phoneme recognition depending on speech-intrinsic variability.
Meyer, Bernd T; Jürgens, Tim; Wesker, Thorsten; Brand, Thomas; Kollmeier, Birger
2010-11-01
The influence of different sources of speech-intrinsic variation (speaking rate, effort, style and dialect or accent) on human speech perception was investigated. In listening experiments with 16 listeners, confusions of consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) sounds in speech-weighted noise were analyzed. Experiments were based on the OLLO logatome speech database, which was designed for a man-machine comparison. It contains utterances spoken by 50 speakers from five dialect/accent regions and covers several intrinsic variations. By comparing results depending on intrinsic and extrinsic variations (i.e., different levels of masking noise), the degradation induced by variabilities can be expressed in terms of the SNR. The spectral level distance between the respective speech segment and the long-term spectrum of the masking noise was found to be a good predictor for recognition rates, while phoneme confusions were influenced by the distance to spectrally close phonemes. An analysis based on transmitted information of articulatory features showed that voicing and manner of articulation are comparatively robust cues in the presence of intrinsic variations, whereas the coding of place is more degraded. The database and detailed results have been made available for comparisons between human speech recognition (HSR) and automatic speech recognizers (ASR).
Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.
ERIC Educational Resources Information Center
Makhoul, John I.
The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…
ERIC Educational Resources Information Center
Moradi, Shahram; Lidestam, Bjorn; Danielsson, Henrik; Ng, Elaine Hoi Ning; Ronnberg, Jerker
2017-01-01
Purpose: We sought to examine the contribution of visual cues in audiovisual identification of consonants and vowels--in terms of isolation points (the shortest time required for correct identification of a speech stimulus), accuracy, and cognitive demands--in listeners with hearing impairment using hearing aids. Method: The study comprised 199…
Neural-scaled entropy predicts the effects of nonlinear frequency compression on speech perception
Rallapalli, Varsha H.; Alexander, Joshua M.
2015-01-01
The Neural-Scaled Entropy (NSE) model quantifies information in the speech signal that has been altered beyond simple gain adjustments by sensorineural hearing loss (SNHL) and various signal processing. An extension of Cochlear-Scaled Entropy (CSE) [Stilp, Kiefte, Alexander, and Kluender (2010). J. Acoust. Soc. Am. 128(4), 2112–2126], NSE quantifies information as the change in 1-ms neural firing patterns across frequency. To evaluate the model, data from a study that examined nonlinear frequency compression (NFC) in listeners with SNHL were used because NFC can recode the same input information in multiple ways in the output, resulting in different outcomes for different speech classes. Overall, predictions were more accurate for NSE than CSE. The NSE model accurately described the observed degradation in recognition, and lack thereof, for consonants in a vowel-consonant-vowel context that had been processed in different ways by NFC. While NSE accurately predicted recognition of vowel stimuli processed with NFC, it underestimated them relative to a low-pass control condition without NFC. In addition, without modifications, it could not predict the observed improvement in recognition for word final /s/ and /z/. Findings suggest that model modifications that include information from slower modulations might improve predictions across a wider variety of conditions. PMID:26627780
Meyer, Ted A; Frisch, Stefan A; Pisoni, David B; Miyamoto, Richard T; Svirsky, Mario A
2003-07-01
Do cochlear implants provide enough information to allow adult cochlear implant users to understand words in ways that are similar to listeners with acoustic hearing? Can we use a computational model to gain insight into the underlying mechanisms used by cochlear implant users to recognize spoken words? The Neighborhood Activation Model has been shown to be a reasonable model of word recognition for listeners with normal hearing. The Neighborhood Activation Model assumes that words are recognized in relation to other similar-sounding words in a listener's lexicon. The probability of correctly identifying a word is based on the phoneme perception probabilities from a listener's closed-set consonant and vowel confusion matrices modified by the relative frequency of occurrence of the target word compared with similar-sounding words (neighbors). Common words with few similar-sounding neighbors are more likely to be selected as responses than less common words with many similar-sounding neighbors. Recent studies have shown that several of the assumptions of the Neighborhood Activation Model also hold true for cochlear implant users. Closed-set consonant and vowel confusion matrices were obtained from 26 postlingually deafened adults who use cochlear implants. Confusion matrices were used to represent input errors to the Neighborhood Activation Model. Responses to the different stimuli were then generated by the Neighborhood Activation Model after incorporating the frequency of occurrence counts of the stimuli and their neighbors. Model outputs were compared with obtained performance measures on the Consonant-Vowel Nucleus-Consonant word test. Information transmission analysis was used to assess whether the Neighborhood Activation Model was able to successfully generate and predict word and individual phoneme recognition by cochlear implant users. The Neighborhood Activation Model predicted Consonant-Vowel Nucleus-Consonant test words at levels similar to those correctly identified by the cochlear implant users. The Neighborhood Activation Model also predicted phoneme feature information well. The results obtained suggest that the Neighborhood Activation Model provides a reasonable explanation of word recognition by postlingually deafened adults after cochlear implantation. It appears that multichannel cochlear implants give cochlear implant users access to their mental lexicons in a manner that is similar to listeners with acoustic hearing. The lexical properties of the test stimuli used to assess performance are important to spoken-word recognition and should be included in further models of the word recognition process.
Improving Tone Recognition with Nucleus Modeling and Sequential Learning
ERIC Educational Resources Information Center
Wang, Siwei
2010-01-01
Mandarin Chinese and many other tonal languages use tones that are defined as specific pitch patterns to distinguish syllables otherwise ambiguous. It had been shown that tones carry at least as much information as vowels in Mandarin Chinese [Surendran et al., 2005]. Surprisingly, though, many speech recognition systems for Mandarin Chinese have…
Meyer, Ted A.; Frisch, Stefan A.; Pisoni, David B.; Miyamoto, Richard T.; Svirsky, Mario A.
2012-01-01
Hypotheses Do cochlear implants provide enough information to allow adult cochlear implant users to understand words in ways that are similar to listeners with acoustic hearing? Can we use a computational model to gain insight into the underlying mechanisms used by cochlear implant users to recognize spoken words? Background The Neighborhood Activation Model has been shown to be a reasonable model of word recognition for listeners with normal hearing. The Neighborhood Activation Model assumes that words are recognized in relation to other similar-sounding words in a listener’s lexicon. The probability of correctly identifying a word is based on the phoneme perception probabilities from a listener’s closed-set consonant and vowel confusion matrices modified by the relative frequency of occurrence of the target word compared with similar-sounding words (neighbors). Common words with few similar-sounding neighbors are more likely to be selected as responses than less common words with many similar-sounding neighbors. Recent studies have shown that several of the assumptions of the Neighborhood Activation Model also hold true for cochlear implant users. Methods Closed-set consonant and vowel confusion matrices were obtained from 26 postlingually deafened adults who use cochlear implants. Confusion matrices were used to represent input errors to the Neighborhood Activation Model. Responses to the different stimuli were then generated by the Neighborhood Activation Model after incorporating the frequency of occurrence counts of the stimuli and their neighbors. Model outputs were compared with obtained performance measures on the Consonant-Vowel Nucleus-Consonant word test. Information transmission analysis was used to assess whether the Neighborhood Activation Model was able to successfully generate and predict word and individual phoneme recognition by cochlear implant users. Results The Neighborhood Activation Model predicted Consonant-Vowel Nucleus-Consonant test words at levels similar to those correctly identified by the cochlear implant users. The Neighborhood Activation Model also predicted phoneme feature information well. Conclusion The results obtained suggest that the Neighborhood Activation Model provides a reasonable explanation of word recognition by postlingually deafened adults after cochlear implantation. It appears that multichannel cochlear implants give cochlear implant users access to their mental lexicons in a manner that is similar to listeners with acoustic hearing. The lexical properties of the test stimuli used to assess performance are important to spoken-word recognition and should be included in further models of the word recognition process. PMID:12851554
Perea, Manuel; Acha, Joana
2009-02-01
Recently, a number of input coding schemes (e.g., SOLAR model, SERIOL model, open-bigram model, overlap model) have been proposed that capture the transposed-letter priming effect (i.e., faster response times for jugde-JUDGE than for jupte-JUDGE). In their current version, these coding schemes do not assume any processing differences between vowels and consonants. However, in a lexical decision task, Perea and Lupker (2004, JML; Lupker, Perea, & Davis, 2008, L&CP) reported that transposed-letter priming effects occurred for consonant transpositions but not for vowel transpositions. This finding poses a challenge for these recently proposed coding schemes. Here, we report four masked priming experiments that examine whether this consonant/vowel dissociation in transposed-letter priming is task-specific. In Experiment 1, we used a lexical decision task and found a transposed-letter priming effect only for consonant transpositions. In Experiments 2-4, we employed a same-different task - a task which taps early perceptual processes - and found a robust transposed-letter priming effect that did not interact with consonant/vowel status. We examine the implications of these findings for the front-end of the models of visual word recognition.
Electronic system with memristive synapses for pattern recognition
Park, Sangsu; Chu, Myonglae; Kim, Jongin; Noh, Jinwoo; Jeon, Moongu; Hun Lee, Byoung; Hwang, Hyunsang; Lee, Boreom; Lee, Byung-geun
2015-01-01
Memristive synapses, the most promising passive devices for synaptic interconnections in artificial neural networks, are the driving force behind recent research on hardware neural networks. Despite significant efforts to utilize memristive synapses, progress to date has only shown the possibility of building a neural network system that can classify simple image patterns. In this article, we report a high-density cross-point memristive synapse array with improved synaptic characteristics. The proposed PCMO-based memristive synapse exhibits the necessary gradual and symmetrical conductance changes, and has been successfully adapted to a neural network system. The system learns, and later recognizes, the human thought pattern corresponding to three vowels, i.e. /a /, /i /, and /u/, using electroencephalography signals generated while a subject imagines speaking vowels. Our successful demonstration of a neural network system for EEG pattern recognition is likely to intrigue many researchers and stimulate a new research direction. PMID:25941950
Speech recognition: Acoustic-phonetic knowledge acquisition and representation
NASA Astrophysics Data System (ADS)
Zue, Victor W.
1988-09-01
The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.
Shi, Lu-Feng; Morozova, Natalia
2012-08-01
Word recognition is a basic component in a comprehensive hearing evaluation, but data are lacking for listeners speaking two languages. This study obtained such data for Russian natives in the US and analysed the data using the perceptual assimilation model (PAM) and speech learning model (SLM). Listeners were randomly presented 200 NU-6 words in quiet. Listeners responded verbally and in writing. Performance was scored on words and phonemes (word-initial consonants, vowels, and word-final consonants). Seven normal-hearing, adult monolingual English natives (NM), 16 English-dominant (ED), and 15 Russian-dominant (RD) Russian natives participated. ED and RD listeners differed significantly in their language background. Consistent with the SLM, NM outperformed ED listeners and ED outperformed RD listeners, whether responses were scored on words or phonemes. NM and ED listeners shared similar phoneme error patterns, whereas RD listeners' errors had unique patterns that could be largely understood via the PAM. RD listeners had particular difficulty differentiating vowel contrasts /i-I/, /æ-ε/, and /ɑ-Λ/, word-initial consonant contrasts /p-h/ and /b-f/, and word-final contrasts /f-v/. Both first-language phonology and second-language learning history affect word and phoneme recognition. Current findings may help clinicians differentiate word recognition errors due to language background from hearing pathologies.
English vowel learning by speakers of Mandarin
NASA Astrophysics Data System (ADS)
Thomson, Ron I.
2005-04-01
One of the most influential models of second language (L2) speech perception and production [Flege, Speech Perception and Linguistic Experience (York, Baltimore, 1995) pp. 233-277] argues that during initial stages of L2 acquisition, perceptual categories sharing the same or nearly the same acoustic space as first language (L1) categories will be processed as members of that L1 category. Previous research has generally been limited to testing these claims on binary L2 contrasts, rather than larger portions of the perceptual space. This study examines the development of 10 English vowel categories by 20 Mandarin L1 learners of English. Imitation of English vowel stimuli by these learners, at 6 data collection points over the course of one year, were recorded. Using a statistical pattern recognition model, these productions were then assessed against native speaker norms. The degree to which the learners' perception/production shifted toward the target English vowels and the degree to which they matched L1 categories in ways predicted by theoretical models are discussed. The results of this experiment suggest that previous claims about perceptual assimilation of L2 categories to L1 categories may be too strong.
Effect of body position on vocal tract acoustics: Acoustic pharyngometry and vowel formants.
Vorperian, Houri K; Kurtzweil, Sara L; Fourakis, Marios; Kent, Ray D; Tillman, Katelyn K; Austin, Diane
2015-08-01
The anatomic basis and articulatory features of speech production are often studied with imaging studies that are typically acquired in the supine body position. It is important to determine if changes in body orientation to the gravitational field alter vocal tract dimensions and speech acoustics. The purpose of this study was to assess the effect of body position (upright versus supine) on (1) oral and pharyngeal measurements derived from acoustic pharyngometry and (2) acoustic measurements of fundamental frequency (F0) and the first four formant frequencies (F1-F4) for the quadrilateral point vowels. Data were obtained for 27 male and female participants, aged 17 to 35 yrs. Acoustic pharyngometry showed a statistically significant effect of body position on volumetric measurements, with smaller values in the supine than upright position, but no changes in length measurements. Acoustic analyses of vowels showed significantly larger values in the supine than upright position for the variables of F0, F3, and the Euclidean distance from the centroid to each corner vowel in the F1-F2-F3 space. Changes in body position affected measurements of vocal tract volume but not length. Body position also affected the aforementioned acoustic variables, but the main vowel formants were preserved.
Exploration of Behavioral, Physiological, and Computational Approaches to Auditory Scene Analysis
2004-01-01
Bronkhorst and R. Plomp, "Effects of multiple speechlike maskers on binaural speech recognitions in normal and impaired listening". Journal of the Acoustical...of simultaneous vowels: cues arising from low frequency beating ". Journal of the Acoustical Society of America. 95: pp. 1559-1569. 1994. [41] C.J...and Hearing Research. 12: pp. 229-245. 1969. [44] T. Doll and T. Hanna, "Directional cueing effects in auditory recognition", in Binaural and
The Role of Phonetics in the Teaching of Foreign Languages in India
ERIC Educational Resources Information Center
Bansal, R. K.
1974-01-01
Oral work is considered the most effective way of laying the foundations for language proficiency. Recognition and production of vowels and consonants, use of a pronouncing dictionary, and practice in accent rhythm and intonation should all be included in a pronunciation course. (SC)
Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng
2011-05-01
To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.
Cheng, Xiaoting; Liu, Yangwenyi; Wang, Bing; Yuan, Yasheng; Galvin, John J; Fu, Qian-Jie; Shu, Yilai; Chen, Bing
2018-01-01
The aim of this study was to investigate the benefits of residual hair cell function for speech and music perception in bimodal pediatric Mandarin-speaking cochlear implant (CI) listeners. Speech and music performance was measured in 35 Mandarin-speaking pediatric CI users for unilateral (CI-only) and bimodal listening. Mandarin speech perception was measured for vowels, consonants, lexical tones, and sentences in quiet. Music perception was measured for melodic contour identification (MCI). Combined electric and acoustic hearing significantly improved MCI and Mandarin tone recognition performance, relative to CI-only performance. For MCI, performance was significantly better with bimodal listening for all semitone spacing conditions ( p < 0.05 in all cases). For tone recognition, bimodal performance was significantly better only for tone 2 (rising; p < 0.05). There were no significant differences between CI-only and CI + HA for vowel, consonant, or sentence recognition. The results suggest that combined electric and acoustic hearing can significantly improve perception of music and Mandarin tones in pediatric Mandarin-speaking CI patients. Music and lexical tone perception depends strongly on pitch perception, and the contralateral acoustic hearing coming from residual hair cell function provided pitch cues that are generally not well preserved in electric hearing.
Tone Attrition in Mandarin Speakers of Varying English Proficiency
Creel, Sarah C.
2017-01-01
Purpose The purpose of this study was to determine whether the degree of dominance of Mandarin–English bilinguals' languages affects phonetic processing of tone content in their native language, Mandarin. Method We tested 72 Mandarin–English bilingual college students with a range of language-dominance profiles in the 2 languages and ages of acquisition of English. Participants viewed 2 photographs at a time while hearing a familiar Mandarin word referring to 1 photograph. The names of the 2 photographs diverged in tone, vowels, or both. Word recognition was evaluated using clicking accuracy, reaction times, and an online recognition measure (gaze) and was compared in the 3 conditions. Results Relative proficiency in English was correlated with reduced word recognition success in tone-disambiguated trials, but not in vowel-disambiguated trials, across all 3 dependent measures. This selective attrition for tone content emerged even though all bilinguals had learned Mandarin from birth. Lengthy experience with English thus weakened tone use. Conclusions This finding has implications for the question of the extent to which bilinguals' 2 phonetic systems interact. It suggests that bilinguals may not process pitch information language-specifically and that processing strategies from the dominant language may affect phonetic processing in the nondominant language—even when the latter was learned natively. PMID:28124064
Donaldson, Gail S; Dawson, Patricia K; Borden, Lamar Z
2011-01-01
Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many cochlear implant (CI) users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 wks during the main study; a subset of five subjects used Fidelity120 for three additional months after the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency, vowel F2 frequency, and consonant place of articulation; overall transmitted information for vowels and consonants; and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle, and basal regions of the implanted array using a psychophysical pitch-ranking task. With one exception, there was no effect of strategy (HiRes versus Fidelity120) on the speech measures tested, either during the main study (N = 10) or after extended use of Fidelity120 (N = 5). The exception was a small but significant advantage for HiRes over Fidelity120 for consonant perception during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 wks or longer experience with Fidelity120. Another three subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8-wk time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in vowel F2 frequency perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (vowel F1 frequency or consonant place of articulation). Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise.
Cross-Frequency Integration for Consonant and Vowel Identification in Bimodal Hearing
ERIC Educational Resources Information Center
Kong, Ying-Yee; Braida, Louis D.
2011-01-01
Purpose: Improved speech recognition in binaurally combined acoustic-electric stimulation (otherwise known as "bimodal hearing") could arise when listeners integrate speech cues from the acoustic and electric hearing. The aims of this study were (a) to identify speech cues extracted in electric hearing and residual acoustic hearing in the…
Dysphonia Detected by Pattern Recognition of Spectral Composition.
ERIC Educational Resources Information Center
Leinonen, Lea; And Others
1992-01-01
This study analyzed production of a long vowel sound within Finnish words by normal or dysphonic voices, using the Self-Organizing Map, the artificial neural network algorithm of T. Kohonen which produces two-dimensional representations of speech. The method was found to be both sensitive and specific in the detection of dysphonia. (Author/JDD)
ERIC Educational Resources Information Center
Singh, Leher; Tan, Aloysia; Wewalaarachchi, Thilanga D.
2017-01-01
Children undergo gradual progression in their ability to differentiate correct and incorrect pronunciations of words, a process that is crucial to establishing a native vocabulary. For the most part, the development of mature phonological representations has been researched by investigating children's sensitivity to consonant and vowel variation,…
Lip reading using neural networks
NASA Astrophysics Data System (ADS)
Kalbande, Dhananjay; Mishra, Akassh A.; Patil, Sanjivani; Nirgudkar, Sneha; Patel, Prashant
2011-10-01
Computerized lip reading, or speech reading, is concerned with the difficult task of converting a video signal of a speaking person to written text. It has several applications like teaching deaf and dumb to speak and communicate effectively with the other people, its crime fighting potential and invariance to acoustic environment. We convert the video of the subject speaking vowels into images and then images are further selected manually for processing. However, several factors like fast speech, bad pronunciation, and poor illumination, movement of face, moustaches and beards make lip reading difficult. Contour tracking methods and Template matching are used for the extraction of lips from the face. K Nearest Neighbor algorithm is then used to classify the 'speaking' images and the 'silent' images. The sequence of images is then transformed into segments of utterances. Feature vector is calculated on each frame for all the segments and is stored in the database with properly labeled class. Character recognition is performed using modified KNN algorithm which assigns more weight to nearer neighbors. This paper reports the recognition of vowels using KNN algorithms
Rendall, Drew; Vasey, Paul L; McKenzie, Jared
2008-02-01
Popular stereotypes concerning the speech of homosexuals typically attribute speech patterns characteristic of the opposite-sex, i.e., broadly feminized speech in gay men and broadly masculinized speech in lesbian women. A small body of recent empirical research has begun to address the subject more systematically and to consider specific mechanistic hypotheses to account for the potentially distinctive features of homosexual speech. Results do not yet fully endorse the stereotypes but they do not entirely discount them either; nor do they cleanly favor any single mechanistic hypothesis. To contribute to this growing body of research, we report acoustic analyses of 2,875 vowel sounds from a balanced set of 125 speakers representing heterosexual and homosexual individuals of each sex from southern Alberta, Canada. Analyses focused on voice pitch and formant frequencies which together determine the principle perceptual features of vowels. There was no significant difference in mean voice pitch between heterosexual and homosexual men or between heterosexual and homosexual women, but there were significant differences in the formant frequencies of vowels produced by both homosexual groups compared to their heterosexual counterparts. Formant frequency differences were specific to only certain vowel sounds and some could be attributed to basic differences in body size between heterosexual and homosexual speakers. The remaining formant frequency differences were not obviously due to differences in vocal tract anatomy between heterosexual and homosexual speakers, nor did they reflect global feminization or masculinization of vowel production patterns in homosexual men and women, respectively. The vowel-specific differences observed could reflect social modeling processes in which only certain speech patterns of the opposite-sex, or of same-sex homosexuals, are selectively adopted. However, we introduce an alternative biosocial hypothesis, specifically that the distinctive, vowel-specific features of homosexual speakers relative to heterosexual speakers arise incidentally as a product of broader psychobehavioral differences between the two groups that are, in turn, continuous with and flow from the physiological processes that affect sexual orientation to begin with.
Weiss, Yael; Katzir, Tami; Bitan, Tali
2016-10-01
The current study examined the effects of orthographic transparency and familiarity on brain mechanisms involved in word recognition in adult dyslexic Hebrew readers. We compared functional Magnetic Resonance Imaging (fMRI) brain activation in 21 dyslexic readers and 22 typical readers, and examined the effects of diacritic marks that provide transparent but less familiar information and vowel letters that increase orthographic transparency without compromising familiarity. Dyslexic readers demonstrated reduced activation in left supramarginal gyrus (SMG) as compared to typical readers, as well as different patterns of activation within the left inferior frontal gyrus (IFG). Furthermore, in contrast to typical readers, dyslexic readers did not show increased activation for diacritics in left temporo-parietal junction regions, associated with mapping orthography to phonology. Nevertheless, both groups showed the facilitation effect of vowel letters on regions associated with lexical-semantic access. Altogether the results suggest that while typical readers can compensate for the reduced familiarity of pointed words with increased reliance on decoding of smaller units, dyslexic readers do not, and therefore they show a higher cost. Copyright © 2016 Elsevier Ltd. All rights reserved.
Learning L2 Pronunciation with a Mobile Speech Recognizer: French /y/
ERIC Educational Resources Information Center
Liakin, Denis; Cardoso, Walcir; Liakina, Natallia
2015-01-01
This study investigates the acquisition of the L2 French vowel /y/ in a mobile-assisted learning environment, via the use of automatic speech recognition (ASR). Particularly, it addresses the question of whether ASR-based pronunciation instruction using a mobile device can improve the production and perception of French /y/. Forty-two elementary…
Shannon, Robert V.; Cruz, Rachel J.; Galvin, John J.
2011-01-01
High stimulation rates in cochlear implants (CI) offer better temporal sampling, can induce stochastic-like firing of auditory neurons and can increase the electric dynamic range, all of which could improve CI speech performance. While commercial CI have employed increasingly high stimulation rates, no clear or consistent advantage has been shown for high rates. In this study, speech recognition was acutely measured with experimental processors in 7 CI subjects (Clarion CII users). The stimulation rate varied between (approx.) 600 and 4800 pulses per second per electrode (ppse) and the number of active electrodes varied between 4 and 16. Vowel, consonant, consonant-nucleus-consonant word and IEEE sentence recognition was acutely measured in quiet and in steady noise (+10 dB signal-to-noise ratio). Subjective quality ratings were obtained for each of the experimental processors in quiet and in noise. Except for a small difference for vowel recognition in quiet, there were no significant differences in performance among the experimental stimulation rates for any of the speech measures. There was also a small but significant increase in subjective quality rating as stimulation rates increased from 1200 to 2400 ppse in noise. Consistent with previous studies, performance significantly improved as the number of electrodes was increased from 4 to 8, but no significant difference showed between 8, 12 and 16 electrodes. Altogether, there was little-to-no advantage of high stimulation rates in quiet or in noise, at least for the present speech tests and conditions. PMID:20639631
Poellmann, Katja; Mitterer, Holger; McQueen, James M.
2014-01-01
Three eye-tracking experiments tested whether native listeners recognized reduced Dutch words better after having heard the same reduced words, or different reduced words of the same reduction type and whether familiarization with one reduction type helps listeners to deal with another reduction type. In the exposure phase, a segmental reduction group was exposed to /b/-reductions (e.g., minderij instead of binderij, “book binder”) and a syllabic reduction group was exposed to full-vowel deletions (e.g., p'raat instead of paraat, “ready”), while a control group did not hear any reductions. In the test phase, all three groups heard the same speaker producing reduced-/b/ and deleted-vowel words that were either repeated (Experiments 1 and 2) or new (Experiment 3), but that now appeared as targets in semantically neutral sentences. Word-specific learning effects were found for vowel-deletions but not for /b/-reductions. Generalization of learning to new words of the same reduction type occurred only if the exposure words showed a phonologically consistent reduction pattern (/b/-reductions). In contrast, generalization of learning to words of another reduction type occurred only if the exposure words showed a phonologically inconsistent reduction pattern (the vowel deletions; learning about them generalized to recognition of the /b/-reductions). In order to deal with reductions, listeners thus use various means. They store reduced variants (e.g., for the inconsistent vowel-deleted words) and they abstract over incoming information to build up and apply mapping rules (e.g., for the consistent /b/-reductions). Experience with inconsistent pronunciations leads to greater perceptual flexibility in dealing with other forms of reduction uttered by the same speaker than experience with consistent pronunciations. PMID:24910622
ERIC Educational Resources Information Center
Zascavage, Victoria Selden; McKenzie, Ginger Kelley; Buot, Max; Woods, Carol; Orton-Gillingham, Fellow
2012-01-01
This study compared word recognition for words written in a traditional flat font to the same words written in a three-dimensional appearing font determined to create a right hemispheric stimulation. The participants were emergent readers enrolled in Montessori schools in the United States learning to read basic CVC (consonant, vowel, consonant)…
NASA Astrophysics Data System (ADS)
Studdert-Kennedy, M.; Obrien, N.
1983-05-01
This report is one of a regular series on the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Manuscripts cover the following topics: The influence of subcategorical mismatches on lexical access; The Serbo-Croatian orthography constraints the reader to a phonologically analytic strategy; Grammatical priming effects between pronouns and inflected verb forms; Misreadings by beginning readers of Serrbo-Croatian; Bi-alphabetism and work recognition; Orthographic and phonemic coding for word identification: Evidence for Hebrew; Stress and vowel duration effects on syllable recognition; Phonetic and auditory trading relations between acoustic cues in speech perception: Further results; Linguistic coding by deaf children in relation beginning reading success; Determinants of spelling ability in deaf and hearing adults: Access to linguistic structures; A dynamical basis for action systems; On the space-time structure of human interlimb coordination; Some acoustic and physiological observations on diphthongs; Relationship between pitch control and vowel articulation; Laryngeal vibrations: A comparison between high-speed filming and glottographic techniques; Compensatory articulation in hearing impaired speakers: A cinefluorographic study; and Review (Pierre Delattre: Studies in comparative phonetics.)
Reconsidering the role of temporal order in spoken word recognition.
Toscano, Joseph C; Anderson, Nathaniel D; McMurray, Bob
2013-10-01
Models of spoken word recognition assume that words are represented as sequences of phonemes. We evaluated this assumption by examining phonemic anadromes, words that share the same phonemes but differ in their order (e.g., sub and bus). Using the visual-world paradigm, we found that listeners show more fixations to anadromes (e.g., sub when bus is the target) than to unrelated words (well) and to words that share the same vowel but not the same set of phonemes (sun). This contrasts with the predictions of existing models and suggests that words are not defined as strict sequences of phonemes.
McGettigan, Carolyn; Rosen, Stuart; Scott, Sophie K.
2014-01-01
Noise-vocoding is a transformation which, when applied to speech, severely reduces spectral resolution and eliminates periodicity, yielding a stimulus that sounds “like a harsh whisper” (Scott et al., 2000, p. 2401). This process simulates a cochlear implant, where the activity of many thousand hair cells in the inner ear is replaced by direct stimulation of the auditory nerve by a small number of tonotopically-arranged electrodes. Although a cochlear implant offers a powerful means of restoring some degree of hearing to profoundly deaf individuals, the outcomes for spoken communication are highly variable (Moore and Shannon, 2009). Some variability may arise from differences in peripheral representation (e.g., the degree of residual nerve survival) but some may reflect differences in higher-order linguistic processing. In order to explore this possibility, we used noise-vocoding to explore speech recognition and perceptual learning in normal-hearing listeners tested across several levels of the linguistic hierarchy: segments (consonants and vowels), single words, and sentences. Listeners improved significantly on all tasks across two test sessions. In the first session, individual differences analyses revealed two independently varying sources of variability: one lexico-semantic in nature and implicating the recognition of words and sentences, and the other an acoustic-phonetic factor associated with words and segments. However, consequent to learning, by the second session there was a more uniform covariance pattern concerning all stimulus types. A further analysis of phonetic feature recognition allowed greater insight into learning-related changes in perception and showed that, surprisingly, participants did not make full use of cues that were preserved in the stimuli (e.g., vowel duration). We discuss these findings in relation cochlear implantation, and suggest auditory training strategies to maximize speech recognition performance in the absence of typical cues. PMID:24616669
The Mechanics of Fingerspelling: Analyzing Ethiopian Sign Language
ERIC Educational Resources Information Center
Duarte, Kyle
2010-01-01
Ethiopian Sign Language utilizes a fingerspelling system that represents Amharic orthography. Just as each character of the Amharic abugida encodes a consonant-vowel sound pair, each sign in the Ethiopian Sign Language fingerspelling system uses handshape to encode a base consonant, as well as a combination of timing, placement, and orientation to…
Investigating the Pedagogical Potential of Recasts for L2 Vowel Acquisition
ERIC Educational Resources Information Center
Saito, Kazuya; Lyster, Roy
2012-01-01
Whereas second language (L2) education research has extensively examined how different types of interactional feedback can be facilitative of L2 development in meaning-oriented classrooms, most of these primary studies have focused on recasts (i.e., teachers' reformulations of students' errors). Some researchers have claimed that recasts serve an…
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
1988-10-12
Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the
Speech feature discrimination in deaf children following cochlear implantation
NASA Astrophysics Data System (ADS)
Bergeson, Tonya R.; Pisoni, David B.; Kirk, Karen Iler
2002-05-01
Speech feature discrimination is a fundamental perceptual skill that is often assumed to underlie word recognition and sentence comprehension performance. To investigate the development of speech feature discrimination in deaf children with cochlear implants, we conducted a retrospective analysis of results from the Minimal Pairs Test (Robbins et al., 1988) selected from patients enrolled in a longitudinal study of speech perception and language development. The MP test uses a 2AFC procedure in which children hear a word and select one of two pictures (bat-pat). All 43 children were prelingually deafened, received a cochlear implant before 6 years of age or between ages 6 and 9, and used either oral or total communication. Children were tested once every 6 months to 1 year for 7 years; not all children were tested at each interval. By 2 years postimplant, the majority of these children achieved near-ceiling levels of discrimination performance for vowel height, vowel place, and consonant manner. Most of the children also achieved plateaus but did not reach ceiling performance for consonant place and voicing. The relationship between speech feature discrimination, spoken word recognition, and sentence comprehension will be discussed. [Work supported by NIH/NIDCD Research Grant No. R01DC00064 and NIH/NIDCD Training Grant No. T32DC00012.
Does Vowel Inventory Density Affect Vowel-to-Vowel Coarticulation?
ERIC Educational Resources Information Center
Mok, Peggy P. K.
2013-01-01
This study tests the output constraints hypothesis that languages with a crowded phonemic vowel space would allow less vowel-to-vowel coarticulation than languages with a sparser vowel space to avoid perceptual confusion. Mandarin has fewer vowel phonemes than Cantonese, but their allophonic vowel spaces are similarly crowded. The hypothesis…
Fogerty, Daniel; Ahlstrom, Jayne B.; Bologna, William J.; Dubno, Judy R.
2015-01-01
This study investigated how single-talker modulated noise impacts consonant and vowel cues to sentence intelligibility. Younger normal-hearing, older normal-hearing, and older hearing-impaired listeners completed speech recognition tests. All listeners received spectrally shaped speech matched to their individual audiometric thresholds to ensure sufficient audibility with the exception of a second younger listener group who received spectral shaping that matched the mean audiogram of the hearing-impaired listeners. Results demonstrated minimal declines in intelligibility for older listeners with normal hearing and more evident declines for older hearing-impaired listeners, possibly related to impaired temporal processing. A correlational analysis suggests a common underlying ability to process information during vowels that is predictive of speech-in-modulated noise abilities. Whereas, the ability to use consonant cues appears specific to the particular characteristics of the noise and interruption. Performance declines for older listeners were mostly confined to consonant conditions. Spectral shaping accounted for the primary contributions of audibility. However, comparison with the young spectral controls who received identical spectral shaping suggests that this procedure may reduce wideband temporal modulation cues due to frequency-specific amplification that affected high-frequency consonants more than low-frequency vowels. These spectral changes may impact speech intelligibility in certain modulation masking conditions. PMID:26093436
Brennan, Marc A; Lewis, Dawna; McCreery, Ryan; Kopun, Judy; Alexander, Joshua M
2017-10-01
Nonlinear frequency compression (NFC) can improve the audibility of high-frequency sounds by lowering them to a frequency where audibility is better; however, this lowering results in spectral distortion. Consequently, performance is a combination of the effects of increased access to high-frequency sounds and the detrimental effects of spectral distortion. Previous work has demonstrated positive benefits of NFC on speech recognition when NFC is set to improve audibility while minimizing distortion. However, the extent to which NFC impacts listening effort is not well understood, especially for children with sensorineural hearing loss (SNHL). To examine the impact of NFC on recognition and listening effort for speech in adults and children with SNHL. Within-subject, quasi-experimental study. Participants listened to amplified nonsense words that were (1) frequency-lowered using NFC, (2) low-pass filtered at 5 kHz to simulate the restricted bandwidth (RBW) of conventional hearing aid processing, or (3) low-pass filtered at 10 kHz to simulate extended bandwidth (EBW) amplification. Fourteen children (8-16 yr) and 14 adults (19-65 yr) with mild-to-severe SNHL. Participants listened to speech processed by a hearing aid simulator that amplified input signals to fit a prescriptive target fitting procedure. Participants were blinded to the type of processing. Participants' responses to each nonsense word were analyzed for accuracy and verbal-response time (VRT; listening effort). A multivariate analysis of variance and linear mixed model were used to determine the effect of hearing-aid signal processing on nonsense word recognition and VRT. Both children and adults identified the nonsense words and initial consonants better with EBW and NFC than with RBW. The type of processing did not affect the identification of the vowels or final consonants. There was no effect of age on recognition of the nonsense words, initial consonants, medial vowels, or final consonants. VRT did not change significantly with the type of processing or age. Both adults and children demonstrated improved speech recognition with access to the high-frequency sounds in speech. Listening effort as measured by VRT was not affected by access to high-frequency sounds. American Academy of Audiology
Phoneme Error Pattern by Heritage Speakers of Spanish on an English Word Recognition Test.
Shi, Lu-Feng
2017-04-01
Heritage speakers acquire their native language from home use in their early childhood. As the native language is typically a minority language in the society, these individuals receive their formal education in the majority language and eventually develop greater competency with the majority than their native language. To date, there have not been specific research attempts to understand word recognition by heritage speakers. It is not clear if and to what degree we may infer from evidence based on bilingual listeners in general. This preliminary study investigated how heritage speakers of Spanish perform on an English word recognition test and analyzed their phoneme errors. A prospective, cross-sectional, observational design was employed. Twelve normal-hearing adult Spanish heritage speakers (four men, eight women, 20-38 yr old) participated in the study. Their language background was obtained through the Language Experience and Proficiency Questionnaire. Nine English monolingual listeners (three men, six women, 20-41 yr old) were also included for comparison purposes. Listeners were presented with 200 Northwestern University Auditory Test No. 6 words in quiet. They repeated each word orally and in writing. Their responses were scored by word, word-initial consonant, vowel, and word-final consonant. Performance was compared between groups with Student's t test or analysis of variance. Group-specific error patterns were primarily descriptive, but intergroup comparisons were made using 95% or 99% confidence intervals for proportional data. The two groups of listeners yielded comparable scores when their responses were examined by word, vowel, and final consonant. However, heritage speakers of Spanish misidentified significantly more word-initial consonants and had significantly more difficulty with initial /p, b, h/ than their monolingual peers. The two groups yielded similar patterns for vowel and word-final consonants, but heritage speakers made significantly fewer errors with /e/ and more errors with word-final /p, k/. Data reported in the present study lead to a twofold conclusion. On the one hand, normal-hearing heritage speakers of Spanish may misidentify English phonemes in patterns different from those of English monolingual listeners. Not all phoneme errors can be readily understood by comparing Spanish and English phonology, suggesting that Spanish heritage speakers differ in performance from other Spanish-English bilingual listeners. On the other hand, the absolute number of errors and the error pattern of most phonemes were comparable between English monolingual listeners and Spanish heritage speakers, suggesting that audiologists may assess word recognition in quiet in the same way for these two groups of listeners, if diagnosis is based on words, not phonemes. American Academy of Audiology
Speaker Recognition Through NLP and CWT Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous waveletmore » transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
Speaker recognition through NLP and CWT modeling.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and themore » continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation waveform.« less
ERIC Educational Resources Information Center
Recasens, Daniel
2015-01-01
Purpose: The goal of this study was to ascertain the effect of changes in stress and speech rate on vowel coarticulation in vowel-consonant-vowel sequences. Method: Data on second formant coarticulatory effects as a function of changing /i/ versus /a/ were collected for five Catalan speakers' productions of vowel-consonant-vowel sequences with the…
Koerner, Tess K; Zhang, Yang; Nelson, Peggy B; Wang, Boxiang; Zou, Hui
2017-07-01
This study examined how speech babble noise differentially affected the auditory P3 responses and the associated neural oscillatory activities for consonant and vowel discrimination in relation to segmental- and sentence-level speech perception in noise. The data were collected from 16 normal-hearing participants in a double-oddball paradigm that contained a consonant (/ba/ to /da/) and vowel (/ba/ to /bu/) change in quiet and noise (speech-babble background at a -3 dB signal-to-noise ratio) conditions. Time-frequency analysis was applied to obtain inter-trial phase coherence (ITPC) and event-related spectral perturbation (ERSP) measures in delta, theta, and alpha frequency bands for the P3 response. Behavioral measures included percent correct phoneme detection and reaction time as well as percent correct IEEE sentence recognition in quiet and in noise. Linear mixed-effects models were applied to determine possible brain-behavior correlates. A significant noise-induced reduction in P3 amplitude was found, accompanied by significantly longer P3 latency and decreases in ITPC across all frequency bands of interest. There was a differential effect of noise on consonant discrimination and vowel discrimination in both ERP and behavioral measures, such that noise impacted the detection of the consonant change more than the vowel change. The P3 amplitude and some of the ITPC and ERSP measures were significant predictors of speech perception at segmental- and sentence-levels across listening conditions and stimuli. These data demonstrate that the P3 response with its associated cortical oscillations represents a potential neurophysiological marker for speech perception in noise. Copyright © 2017 Elsevier B.V. All rights reserved.
Mi, Lin; Tao, Sha; Wang, Wenjing; Dong, Qi; Guan, Jingjing; Liu, Chang
2016-03-01
The purpose of this study was to examine the relationship between English vowel identification and English vowel formant discrimination for native Mandarin Chinese- and native English-speaking listeners. The identification of 12 English vowels was measured with the duration cue preserved or removed. The thresholds of vowel formant discrimination on the F2 of two English vowels,/Λ/and/i/, were also estimated using an adaptive-tracking procedure. Native Mandarin Chinese-speaking listeners showed significantly higher thresholds of vowel formant discrimination and lower identification scores than native English-speaking listeners. The duration effect on English vowel identification was similar between native Mandarin Chinese- and native English-speaking listeners. Moreover, regardless of listeners' language background, vowel identification was significantly correlated with vowel formant discrimination for the listeners who were less dependent on duration cues, whereas the correlation between vowel identification and vowel formant discrimination was not significant for the listeners who were highly dependent on duration cues. This study revealed individual variability in using multiple acoustic cues to identify English vowels for both native and non-native listeners. Copyright © 2016 Elsevier B.V. All rights reserved.
Speech Recognition: Acoustic-Phonetic Knowledge Acquisition and Representation.
1987-09-25
the release duration is the voice onset time, or VOT. For the purpose of this investigation, alveolar flaps ( as in "butter’) and and glottalized /t/’s...Cambridge, Massachusetts 02139 Abstract females and 8 males. The other sentence was said by 7 females We discuss a framework for an acoustic-phonetic...tarned a number of semivowels. One sentence was said by 6 vowels + + "jpporte.d by a Xerox Fellowhsp Table It Features which characterite
Truong, Son Ngoc; Ham, Seok-Jin; Min, Kyeong-Sik
2014-01-01
In this paper, a neuromorphic crossbar circuit with binary memristors is proposed for speech recognition. The binary memristors which are based on filamentary-switching mechanism can be found more popularly and are easy to be fabricated than analog memristors that are rare in materials and need a more complicated fabrication process. Thus, we develop a neuromorphic crossbar circuit using filamentary-switching binary memristors not using interface-switching analog memristors. The proposed binary memristor crossbar can recognize five vowels with 4-bit 64 input channels. The proposed crossbar is tested by 2,500 speech samples and verified to be able to recognize 89.2% of the tested samples. From the statistical simulation, the recognition rate of the binary memristor crossbar is estimated to be degraded very little from 89.2% to 80%, though the percentage variation in memristance is increased very much from 0% to 15%. In contrast, the analog memristor crossbar loses its recognition rate significantly from 96% to 9% for the same percentage variation in memristance.
Strange, Winifred; Weber, Andrea; Levy, Erika S; Shafiro, Valeriy; Hisagi, Miwako; Nishi, Kanae
2007-08-01
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.
NASA Astrophysics Data System (ADS)
Fernández Pozo, Rubén; Blanco Murillo, Jose Luis; Hernández Gómez, Luis; López Gonzalo, Eduardo; Alcázar Ramírez, José; Toledano, Doroteo T.
2009-12-01
This study is part of an ongoing collaborative effort between the medical and the signal processing communities to promote research on applying standard Automatic Speech Recognition (ASR) techniques for the automatic diagnosis of patients with severe obstructive sleep apnoea (OSA). Early detection of severe apnoea cases is important so that patients can receive early treatment. Effective ASR-based detection could dramatically cut medical testing time. Working with a carefully designed speech database of healthy and apnoea subjects, we describe an acoustic search for distinctive apnoea voice characteristics. We also study abnormal nasalization in OSA patients by modelling vowels in nasal and nonnasal phonetic contexts using Gaussian Mixture Model (GMM) pattern recognition on speech spectra. Finally, we present experimental findings regarding the discriminative power of GMMs applied to severe apnoea detection. We have achieved an 81% correct classification rate, which is very promising and underpins the interest in this line of inquiry.
Degraded Vowel Acoustics and the Perceptual Consequences in Dysarthria
NASA Astrophysics Data System (ADS)
Lansford, Kaitlin L.
Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.
Soskey, Laura N; Allen, Paul D; Bennetto, Loisa
2017-08-01
One of the earliest observable impairments in autism spectrum disorder (ASD) is a failure to orient to speech and other social stimuli. Auditory spatial attention, a key component of orienting to sounds in the environment, has been shown to be impaired in adults with ASD. Additionally, specific deficits in orienting to social sounds could be related to increased acoustic complexity of speech. We aimed to characterize auditory spatial attention in children with ASD and neurotypical controls, and to determine the effect of auditory stimulus complexity on spatial attention. In a spatial attention task, target and distractor sounds were played randomly in rapid succession from speakers in a free-field array. Participants attended to a central or peripheral location, and were instructed to respond to target sounds at the attended location while ignoring nearby sounds. Stimulus-specific blocks evaluated spatial attention for simple non-speech tones, speech sounds (vowels), and complex non-speech sounds matched to vowels on key acoustic properties. Children with ASD had significantly more diffuse auditory spatial attention than neurotypical children when attending front, indicated by increased responding to sounds at adjacent non-target locations. No significant differences in spatial attention emerged based on stimulus complexity. Additionally, in the ASD group, more diffuse spatial attention was associated with more severe ASD symptoms but not with general inattention symptoms. Spatial attention deficits have important implications for understanding social orienting deficits and atypical attentional processes that contribute to core deficits of ASD. Autism Res 2017, 10: 1405-1416. © 2017 International Society for Autism Research, Wiley Periodicals, Inc. © 2017 International Society for Autism Research, Wiley Periodicals, Inc.
Temporal and acoustic characteristics of Greek vowels produced by adults with cerebral palsy
NASA Astrophysics Data System (ADS)
Botinis, Antonis; Orfanidou, Ioanna; Fourakis, Marios; Fourakis, Marios
2005-09-01
The present investigation examined the temporal and spectral characteristics of Greek vowels as produced by speakers with intact (NO) versus cerebral palsy affected (CP) neuromuscular systems. Six NO and six CP native speakers of Greek produced the Greek vowels [i, e, a, o, u] in the first syllable of CVCV nonsense words in a short carrier phrase. Stress could be on either the first or second syllable. There were three female and three male speakers in each group. In terms of temporal characteristics, the results showed that: vowels produced by CP speakers were longer than vowels produced by NO speakers; stressed vowels were longer than unstressed vowels; vowels produced by female speakers were longer than vowels produced by male speakers. In terms of spectral characteristics the results showed that the vowel space of the CP speakers was smaller than that of the NO speakers. This is similar to the results recently reported by Liu et al. [J. Acoust. Soc. Am. 117, 3879-3889 (2005)] for CP speakers of Mandarin. There was also a reduction of the acoustic vowel space defined by unstressed vowels, but this reduction was much more pronounced in the vowel productions of CP speakers than NO speakers.
Children's discrimination of vowel sequences
NASA Astrophysics Data System (ADS)
Coady, Jeffry A.; Kluender, Keith R.; Evans, Julia
2003-10-01
Children's ability to discriminate sequences of steady-state vowels was investigated. Vowels (as in ``beet,'' ``bat,'' ``bought,'' and ``boot'') were synthesized at durations of 40, 80, 160, 320, 640, and 1280 ms. Four different vowel sequences were created by concatenating different orders of vowels for each duration, separated by 10-ms intervening silence. Thus, sequences differed in vowel order and duration (rate). Sequences were 12 s in duration, with amplitude ramped linearly over the first and last 2 s. Sequence pairs included both same (identical sequences) and different trials (sequences with vowels in different orders). Sequences with vowel of equal duration were presented on individual trials. Children aged 7;0 to 10;6 listened to pairs of sequences (with 100 ms between sequences) and responded whether sequences sounded the same or different. Results indicate that children are best able to discriminate sequences of intermediate-duration vowels, typical of conversational speaking rate. Children were less accurate with both shorter and longer vowels. Results are discussed in terms of auditory processing (shortest vowels) and memory (longest vowels). [Research supported by NIDCD DC-05263, DC-04072, and DC-005650.
NASA Astrophysics Data System (ADS)
Liu, Huei-Mei; Tsao, Feng-Ming; Kuhl, Patricia K.
2005-06-01
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /eye/, /aye/, and /you/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners. .
Cross-language categorization of French and German vowels by naive American listeners.
Strange, Winifred; Levy, Erika S; Law, Franzo F
2009-09-01
American English (AE) speakers' perceptual assimilation of 14 North German (NG) and 9 Parisian French (PF) vowels was examined in two studies using citation-form disyllables (study 1) and sentences with vowels surrounded by labial and alveolar consonants in multisyllabic nonsense words (study 2). Listeners categorized multiple tokens of each NG and PF vowel as most similar to selected AE vowels and rated their category "goodness" on a nine-point Likert scale. Front, rounded vowels were assimilated primarily to back AE vowels, despite their acoustic similarity to front AE vowels. In study 1, they were considered poorer exemplars of AE vowels than were NG and PF back, rounded vowels; in study 2, front and back, rounded vowels were perceived as similar to each other. Assimilation of some front, unrounded and back, rounded NG and PF vowels varied with language, speaking style, and consonantal context. Differences in perceived similarity often could not be predicted from context-specific cross-language spectral similarities. Results suggest that listeners can access context-specific, phonetic details when listening to citation-form materials, but assimilate non-native vowels on the basis of context-independent phonological equivalence categories when processing continuous speech. Results are interpreted within the Automatic Selective Perception model of speech perception.
Cross-language categorization of French and German vowels by naïve American listeners
Strange, Winifred; Levy, Erika S.; Law, Franzo F.
2009-01-01
American English (AE) speakers’ perceptual assimilation of 14 North German (NG) and 9 Parisian French (PF) vowels was examined in two studies using citation-form disyllables (study 1) and sentences with vowels surrounded by labial and alveolar consonants in multisyllabic nonsense words (study 2). Listeners categorized multiple tokens of each NG and PF vowel as most similar to selected AE vowels and rated their category “goodness” on a nine-point Likert scale. Front, rounded vowels were assimilated primarily to back AE vowels, despite their acoustic similarity to front AE vowels. In study 1, they were considered poorer exemplars of AE vowels than were NG and PF back, rounded vowels; in study 2, front and back, rounded vowels were perceived as similar to each other. Assimilation of some front, unrounded and back, rounded NG and PF vowels varied with language, speaking style, and consonantal context. Differences in perceived similarity often could not be predicted from context-specific cross-language spectral similarities. Results suggest that listeners can access context-specific, phonetic details when listening to citation-form materials, but assimilate non-native vowels on the basis of context-independent phonological equivalence categories when processing continuous speech. Results are interpreted within the Automatic Selective Perception model of speech perception. PMID:19739759
Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations.
Smith, David R R
2016-01-01
Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women's but not for men's voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it's spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels.
The influence of different native language systems on vowel discrimination and identification
NASA Astrophysics Data System (ADS)
Kewley-Port, Diane; Bohn, Ocke-Schwen; Nishi, Kanae
2005-04-01
The ability to identify the vowel sounds of a language reliably is dependent on the ability to discriminate between vowels at a more sensory level. This study examined how the complexity of the vowel systems of three native languages (L1) influenced listeners perception of American English (AE) vowels. AE has a fairly complex vowel system with 11 monophthongs. In contrast, Japanese has only 5 spectrally different vowels, while Swedish has 9 and Danish has 12. Six listeners, with exposure of less than 4 months in English speaking environments, participated from each L1. Their performance in two tasks was compared to 6 AE listeners. As expected, there were large differences in a linguistic identification task using 4 confusable AE low vowels. Japanese listeners performed quite poorly compared to listeners with more complex L1 vowel systems. Thresholds for formant discrimination for the 3 groups were very similar to those of native AE listeners. Thus it appears that sensory abilities for discriminating vowels are only slightly affected by native vowel systems, and that vowel confusions occur at a more central, linguistic level. [Work supported by funding from NIHDCD-02229 and the American-Scandinavian Foundation.
Speaker-Sex Discrimination for Voiced and Whispered Vowels at Short Durations
2016-01-01
Whispered vowels, produced with no vocal fold vibration, lack the periodic temporal fine structure which in voiced vowels underlies the perceptual attribute of pitch (a salient auditory cue to speaker sex). Voiced vowels possess no temporal fine structure at very short durations (below two glottal cycles). The prediction was that speaker-sex discrimination performance for whispered and voiced vowels would be similar for very short durations but, as stimulus duration increases, voiced vowel performance would improve relative to whispered vowel performance as pitch information becomes available. This pattern of results was shown for women’s but not for men’s voices. A whispered vowel needs to have a duration three times longer than a voiced vowel before listeners can reliably tell whether it’s spoken by a man or woman (∼30 ms vs. ∼10 ms). Listeners were half as sensitive to information about speaker-sex when it is carried by whispered compared with voiced vowels. PMID:27757218
Discrimination of synthesized English vowels by American and Korean listeners
NASA Astrophysics Data System (ADS)
Yang, Byunggon
2004-05-01
This study explored the discrimination of synthesized English vowel pairs by 27 American and Korean, male and female listeners. The average formant values of nine monophthongs produced by ten American English male speakers were employed to synthesize the vowels. Then, subjects were instructed explicitly to respond to AX discrimination tasks in which the standard vowel was followed by another one with the increment or decrement of the original formant values. The highest and lowest formant values of the same vowel quality were collected and compared to examine patterns of vowel discrimination. Results showed that the American and Korean groups discriminated the vowel pairs almost identically and their center formant frequency values of the high and low boundary fell almost exactly on those of the standards. In addition, the acceptable range of the same vowel quality was similar among the language and gender groups. The acceptable thresholds of each vowel formed an oval to maintain perceptual contrast from adjacent vowels. Pedagogical implications of those findings are discussed.
Statistical properties of Chinese phonemic networks
NASA Astrophysics Data System (ADS)
Yu, Shuiyuan; Liu, Haitao; Xu, Chunshan
2011-04-01
The study of properties of speech sound systems is of great significance in understanding the human cognitive mechanism and the working principles of speech sound systems. Some properties of speech sound systems, such as the listener-oriented feature and the talker-oriented feature, have been unveiled with the statistical study of phonemes in human languages and the research of the interrelations between human articulatory gestures and the corresponding acoustic parameters. With all the phonemes of speech sound systems treated as a coherent whole, our research, which focuses on the dynamic properties of speech sound systems in operation, investigates some statistical parameters of Chinese phoneme networks based on real text and dictionaries. The findings are as follows: phonemic networks have high connectivity degrees and short average distances; the degrees obey normal distribution and the weighted degrees obey power law distribution; vowels enjoy higher priority than consonants in the actual operation of speech sound systems; the phonemic networks have high robustness against targeted attacks and random errors. In addition, for investigating the structural properties of a speech sound system, a statistical study of dictionaries is conducted, which shows the higher frequency of shorter words and syllables and the tendency that the longer a word is, the shorter the syllables composing it are. From these structural properties and dynamic properties one can derive the following conclusion: the static structure of a speech sound system tends to promote communication efficiency and save articulation effort while the dynamic operation of this system gives preference to reliable transmission and easy recognition. In short, a speech sound system is an effective, efficient and reliable communication system optimized in many aspects.
Regional dialect variation in the vowel systems of typically developing children
Jacewicz, Ewa; Fox, Robert Allen; Salmons, Joseph
2015-01-01
Purpose To investigate regional dialect variation in the vowel systems of normally developing 8–12 years-old children. Method Thirteen vowels in isolated h_d words were produced by 94 children and 93 adults, males and females. All participants spoke American English and were born and raised in one of three distinct dialect regions in the United States: western North Carolina (Southern dialect), central Ohio (Midland) and southeastern Wisconsin (Northern Midwestern dialect). Acoustic analysis included formant frequencies (F1 and F2) measured at five equidistant time points in a vowel and formant movement (trajectory length). Results Children’s productions showed many dialect-specific features comparable to those in adult speakers, both in terms of vowel dispersion patterns and formant movement. Different features were also found including systemic vowel changes, significant monophthongization of selected vowels and greater formant movement in diphthongs. Conclusions The acoustic results provide evidence for regional distinctiveness in children’s vowel systems. Children acquire not only the systemic relations among vowels but also their dialect-specific patterns of formant dynamics. Directing attention to the regional variation in the production of American English vowels, this work may prove helpful in better understanding and interpretation of the development of vowel categories and vowel systems in children. PMID:20966384
NASA Astrophysics Data System (ADS)
Strange, Winifred; Bohn, Ocke-Schwen; Nishi, Kanae; Trent, Sonja A.
2005-09-01
Strange et al. [J. Acoust. Soc. Am. 115, 1791-1807 (2004)] reported that North German (NG) front-rounded vowels in hVp syllables were acoustically intermediate between front and back American English (AE) vowels. However, AE listeners perceptually assimilated them as poor exemplars of back AE vowels. In this study, speaker- and context-independent cross-language discriminant analyses of NG and AE vowels produced in CVC syllables (C=labial, alveolar, velar stops) in sentences showed that NG front-rounded vowels fell within AE back-vowel distributions, due to the ``fronting'' of AE back vowels in alveolar/velar contexts. NG [smcapi, e, ɛ, openo] were located relatively ``higher'' in acoustic vowel space than their AE counterparts and varied in cross-language similarity across consonantal contexts. In a perceptual assimilation task, naive listeners classified NG vowels in terms of native AE categories and rated their goodness on a 7-point scale (very foreign to very English sounding). Both front- and back-rounded NG vowels were perceptually assimilated overwhelmingly to back AE categories and judged equally good exemplars. Perceptual assimilation patterns did not vary with context, and were not always predictable from acoustic similarity. These findings suggest that listeners adopt a context-independent strategy when judging the cross-language similarity of vowels produced and presented in continuous speech contexts.
Identification and Multiplicity of Double Vowels in Cochlear Implant Users
ERIC Educational Resources Information Center
Kwon, Bomjun J.; Perry, Trevor T.
2014-01-01
Purpose: The present study examined cochlear implant (CI) users' perception of vowels presented concurrently (i.e., "double vowels") to further our understanding of auditory grouping in electric hearing. Method: Identification of double vowels and single vowels was measured with 10 CI subjects. Fundamental frequencies (F0s) of…
Alispahic, Samra; Mulak, Karen E.; Escudero, Paola
2017-01-01
Research suggests that the size of the second language (L2) vowel inventory relative to the native (L1) inventory may affect the discrimination and acquisition of L2 vowels. Models of non-native and L2 vowel perception stipulate that naïve listeners' non-native and L2 perceptual patterns may be predicted by the relationship in vowel inventory size between the L1 and the L2. Specifically, having a smaller L1 vowel inventory than the L2 impedes L2 vowel perception, while having a larger one often facilitates it. However, the Second Language Linguistic Perception (L2LP) model specifies that it is the L1–L2 acoustic relationships that predict non-native and L2 vowel perception, regardless of L1 vowel inventory. To test the effects of vowel inventory size vs. acoustic properties on non-native vowel perception, we compared XAB discrimination and categorization of five Dutch vowel contrasts between monolinguals whose L1 contains more (Australian English) or fewer (Peruvian Spanish) vowels than Dutch. No effect of language background was found, suggesting that L1 inventory size alone did not account for performance. Instead, participants in both language groups were more accurate in discriminating contrasts that were predicted to be perceptually easy based on L1–L2 acoustic relationships, and were less accurate for contrasts likewise predicted to be difficult. Further, cross-language discriminant analyses predicted listeners' categorization patterns which in turn predicted listeners' discrimination difficulty. Our results show that listeners with larger vowel inventories appear to activate multiple native categories as reflected in lower accuracy scores for some Dutch vowels, while listeners with a smaller vowel inventory seem to have higher accuracy scores for those same vowels. In line with the L2LP model, these findings demonstrate that L1–L2 acoustic relationships better predict non-native and L2 perceptual performance and that inventory size alone is not a good predictor for cross-language perceptual difficulties. PMID:28191001
Conrad, Markus; Carreiras, Manuel; Tamm, Sascha; Jacobs, Arthur M
2009-04-01
Over the last decade, there has been increasing evidence for syllabic processing during visual word recognition. If syllabic effects prove to be independent from orthographic redundancy, this would seriously challenge the ability of current computational models to account for the processing of polysyllabic words. Three experiments are presented to disentangle effects of the frequency of syllabic units and orthographic segments in lexical decision. In Experiment 1 the authors obtained an inhibitory syllable frequency effect that was unaffected by the presence or absence of a bigram trough at the syllable boundary. In Experiments 2 and 3 an inhibitory effect of initial syllable frequency but a facilitative effect of initial bigram frequency emerged when manipulating 1 of the 2 measures and controlling for the other in Spanish words starting with consonant-vowel syllables. The authors conclude that effects of syllable frequency and letter-cluster frequency are independent and arise at different processing levels of visual word recognition. Results are discussed within the framework of an interactive activation model of visual word recognition. (c) 2009 APA, all rights reserved.
Li, Tianhao; Fu, Qian-Jie
2011-08-01
(1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. VGD was measured using two talker sets with different inter-gender fundamental frequencies (F(0)), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Eleven postlingually deaf CI users. The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments.
Stages of functional processing and the bihemispheric recognition of Japanese Kana script.
Yoshizaki, K
2000-04-01
Two experiments were carried out in order to examine the effects of functional steps on the benefits of interhemispheric integration. The purpose of Experiment 1 was to investigate the validity of the Banich (1995a) model, where the benefits of interhemispheric processing increase as the task involves more functional steps. The 16 right-handed subjects were given two types of Hiragana-Katakana script matching tasks. One was the Name Identity (NI) task, and the other was the vowel matching (VM) task, which involved more functional steps compared to the NI task. The VM task required subjects to make a decision whether or not a pair of Katakana-Hiragana scripts had a common vowel. In both tasks, a pair of Kana scripts (Katakana-Hiragana scripts) was tachistoscopically presented in the unilateral visual fields or the bilateral visual fields, where each letter was presented in each visual field. A bilateral visual fields advantage (BFA) was found in both tasks, and the size of this did not differ between the tasks, suggesting that these findings did not support the Banich model. The purpose of Experiment 2 was to examine the effects of imbalanced processing load between the hemispheres on the benefits of interhemispheric integration. In order to manipulate the balance of processing load across the hemispheres, the revised vowel matching (r-VM) task was developed by amending the VM task. The r-VM task was the same as the VM task in Experiment 1, except that a script that has only vowel sound was presented as a counterpart of a pair of Kana scripts. The 24 right-handed subjects were given the r-VM and NI tasks. The results showed that although a BFA showed up in the NI task, it did not in the r-VM task. These results suggested that the balance of processing load between hemispheres would have an influence on the bilateral hemispheric processing.
Changes in Reported Sexual Orientation Following US States Recognition of Same-Sex Couples
Corliss, Heather L.; Spiegelman, Donna; Williams, Kerry; Austin, S. Bryn
2016-01-01
Objectives. To compare changes in self-reported sexual orientation of women living in states with any recognition of same-sex relationships (e.g., hospital visitation, domestic partnerships) with those of women living in states without such recognition. Methods. We calculated the likelihood of women in the Nurses’ Health Study II (n = 69 790) changing their reported sexual orientation between 1995 and 2009. Results. We used data from the Nurses’ Health Study II and found that living in a state with same-sex relationship recognition was associated with changing one’s reported sexual orientation, particularly from heterosexual to sexual minority. Individuals who reported being heterosexual in 1995 were 30% more likely to report a minority orientation (i.e., bisexual or lesbian) in 2009 (risk ratio = 1.30; 95% confidence interval = 1.05, 1.61) if they lived in a state with any recognition of same-sex relationships compared with those who lived in a state without such recognition. Conclusions. Policies recognizing same-sex relationships may encourage women to report a sexual minority orientation. Future research is needed to clarify how other social and legal policies may affect sexual orientation self-reports. PMID:27736213
Vowels in infant-directed speech: More breathy and more variable, but not clearer.
Miyazawa, Kouki; Shinya, Takahito; Martin, Andrew; Kikuchi, Hideaki; Mazuka, Reiko
2017-09-01
Infant-directed speech (IDS) is known to differ from adult-directed speech (ADS) in a number of ways, and it has often been argued that some of these IDS properties facilitate infants' acquisition of language. An influential study in support of this view is Kuhl et al. (1997), which found that vowels in IDS are produced with expanded first and second formants (F1/F2) on average, indicating that the vowels are acoustically further apart in IDS than in ADS. These results have been interpreted to mean that the way vowels are produced in IDS makes infants' task of learning vowel categories easier. The present paper revisits this interpretation by means of a thorough analysis of IDS vowels using a large-scale corpus of Japanese natural utterances. We will show that the expansion of F1/F2 values does occur in spontaneous IDS even when the vowels' prosodic position, lexical pitch accent, and lexical bias are accounted for. When IDS vowels are compared to carefully read speech (CS) by the same mothers, however, larger variability among IDS vowel tokens means that the acoustic distances among vowels are farther apart only in CS, but not in IDS when compared to ADS. Finally, we will show that IDS vowels are significantly more breathy than ADS or CS vowels. Taken together, our results demonstrate that even though expansion of formant values occurs in spontaneous IDS, this expansion cannot be interpreted as an indication that the acoustic distances among vowels are farther apart, as is the case in CS. Instead, we found that IDS vowels are characterized by breathy voice, which has been associated with the communication of emotional affect. Copyright © 2017 Elsevier B.V. All rights reserved.
Yang, Jing; Brown, Emily; Fox, Robert A.; Xu, Li
2015-01-01
The present study examined the acoustic features of vowel production in Mandarin-speaking children with cochlear implants (CIs). The subjects included 14 native Mandarin-speaking, prelingually deafened children with CIs (2.9–8.3 yr old) and 60 age-matched, normal-hearing (NH) children (3.1–9.0 years old). Each subject produced a list of monosyllables containing seven Mandarin vowels: [i, a, u, y, ɤ, ʅ, ɿ]. Midpoint F1 and F2 of each vowel token were extracted and normalized to eliminate the effects of different vocal tract sizes. Results showed that the CI children produced significantly longer vowels and less compact vowel categories than the NH children did. The CI children's acoustic vowel space was reduced due to a retracted production of the vowel [i]. The vowel space area showed a strong negative correlation with age at implantation (r = −0.80). The analysis of acoustic distance showed that the CI children produced corner vowels [a, u] similarly to the NH children, but other vowels (e.g., [ʅ, ɿ]) differently from the NH children, which suggests that CI children generally follow a similar developmental path of vowel acquisition as NH children. These findings highlight the importance of early implantation and have implications in clinical aural habilitation in young children with CIs. PMID:26627755
Strange, Winifred; Bohn, Ocke-Schwen; Nishi, Kanae; Trent, Sonja A
2005-09-01
Strange et al. [J. Acoust. Soc. Am. 115, 1791-1807 (2004)] reported that North German (NG) front-rounded vowels in hVp syllables were acoustically intermediate between front and back American English (AE) vowels. However, AE listeners perceptually assimilated them as poor exemplars of back AE vowels. In this study, speaker- and context-independent cross-language discriminant analyses of NG and AE vowels produced in CVC syllables (C=labial, alveolar, velar stops) in sentences showed that NG front-rounded vowels fell within AE back-vowel distributions, due to the "fronting" of AE back vowels in alveolar/velar contexts. NG [I, e, epsilon, inverted c] were located relatively "higher" in acoustic vowel space than their AE counterparts and varied in cross-language similarity across consonantal contexts. In a perceptual assimilation task, naive listeners classified NG vowels in terms of native AE categories and rated their goodness on a 7-point scale (very foreign to very English sounding). Both front- and back-rounded NG vowels were perceptually assimilated overwhelmingly to back AE categories and judged equally good exemplars. Perceptual assimilation patterns did not vary with context, and were not always predictable from acoustic similarity. These findings suggest that listeners adopt a context-independent strategy when judging the cross-language similarity of vowels produced and presented in continuous speech contexts.
Malaysian English: An Instrumental Analysis of Vowel Contrasts
ERIC Educational Resources Information Center
Pillai, Stefanie; Don, Zuraidah Mohd.; Knowles, Gerald; Tang, Jennifer
2010-01-01
This paper makes an instrumental analysis of English vowel monophthongs produced by 47 female Malaysian speakers. The focus is on the distribution of Malaysian English vowels in the vowel space, and the extent to which there is phonetic contrast between traditionally paired vowels. The results indicate that, like neighbouring varieties of English,…
Perception of Vowel Length by Japanese- and English-Learning Infants
ERIC Educational Resources Information Center
Mugitani, Ryoko; Pons, Ferran; Fais, Laurel; Dietrich, Christiane; Werker, Janet F.; Amano, Shigeaki
2009-01-01
This study investigated vowel length discrimination in infants from 2 language backgrounds, Japanese and English, in which vowel length is either phonemic or nonphonemic. Experiment 1 revealed that English 18-month-olds discriminate short and long vowels although vowel length is not phonemically contrastive in English. Experiments 2 and 3 revealed…
Acoustic and Durational Properties of Indian English Vowels
ERIC Educational Resources Information Center
Maxwell, Olga; Fletcher, Janet
2009-01-01
This paper presents findings of an acoustic phonetic analysis of vowels produced by speakers of English as a second language from northern India. The monophthongal vowel productions of a group of male speakers of Hindi and male speakers of Punjabi were recorded, and acoustic phonetic analyses of vowel formant frequencies and vowel duration were…
Acoustic characteristics of the vowel systems of six regional varieties of American English
NASA Astrophysics Data System (ADS)
Clopper, Cynthia G.; Pisoni, David B.; de Jong, Kenneth
2005-09-01
Previous research by speech scientists on the acoustic characteristics of American English vowel systems has typically focused on a single regional variety, despite decades of sociolinguistic research demonstrating the extent of regional phonological variation in the United States. In the present study, acoustic measures of duration and first and second formant frequencies were obtained from five repetitions of 11 different vowels produced by 48 talkers representing both genders and six regional varieties of American English. Results revealed consistent variation due to region of origin, particularly with respect to the production of low vowels and high back vowels. The Northern talkers produced shifted low vowels consistent with the Northern Cities Chain Shift, the Southern talkers produced fronted back vowels consistent with the Southern Vowel Shift, and the New England, Midland, and Western talkers produced the low back vowel merger. These findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in terms of a single set of idealized acoustic-phonetic baselines of ``General'' American English and provide benchmark data for six regional varieties.
Shi, Lu-Feng; Koenig, Laura L
2016-09-01
Nonnative listeners have difficulty recognizing English words due to underdeveloped acoustic-phonetic and/or lexical skills. The present study used Boothroyd and Nittrouer's (1988)j factor to tease apart these two components of word recognition. Participants included 15 native English and 29 native Russian listeners. Fourteen and 15 of the Russian listeners reported English (ED) and Russian (RD) to be their dominant language, respectively. Listeners were presented 119 consonant-vowel-consonant real and nonsense words in speech-spectrum noise at +6 dB SNR. Responses were scored for word and phoneme recognition, the logarithmic quotient of which yielded j. Word and phoneme recognition was comparable between native and ED listeners but poorer in RD listeners. Analysis of j indicated less effective use of lexical information in RD than in native and ED listeners. Lexical processing was strongly correlated with the length of residence in the United States. Language background is important for nonnative word recognition. Lexical skills can be regarded as nativelike in ED nonnative listeners. Compromised word recognition in ED listeners is unlikely a result of poor lexical processing. Performance should be interpreted with caution for listeners dominant in their first language, whose word recognition is affected by both lexical and acoustic-phonetic factors.
Dynamic spectral structure specifies vowels for children and adultsa
Nittrouer, Susan
2008-01-01
When it comes to making decisions regarding vowel quality, adults seem to weight dynamic syllable structure more strongly than static structure, although disagreement exists over the nature of the most relevant kind of dynamic structure: spectral change intrinsic to the vowel or structure arising from movements between consonant and vowel constrictions. Results have been even less clear regarding the signal components children use in making vowel judgments. In this experiment, listeners of four different ages (adults, and 3-, 5-, and 7-year-old children) were asked to label stimuli that sounded either like steady-state vowels or like CVC syllables which sometimes had middle sections masked by coughs. Four vowel contrasts were used, crossed for type (front/back or closed/open) and consonant context (strongly or only slightly constraining of vowel tongue position). All listeners recognized vowel quality with high levels of accuracy in all conditions, but children were disproportionately hampered by strong coarticulatory effects when only steady-state formants were available. Results clarified past studies, showing that dynamic structure is critical to vowel perception for all aged listeners, but particularly for young children, and that it is the dynamic structure arising from vocal-tract movement between consonant and vowel constrictions that is most important. PMID:17902868
Colloquial Arabic vowels in Israel: a comparative acoustic study of two dialects.
Amir, Noam; Amir, Ofer; Rosenhouse, Judith
2014-10-01
This study explores the acoustic properties of the vowel systems of two dialects of colloquial Arabic spoken in Israel. One dialect is spoken in the Galilee region in the north of Israel, and the other is spoken in the Triangle (Muthallath) region, in central Israel. These vowel systems have five short and five long vowels /i, i:, e, e:, a, a:, o, o:, u, u:/. Twenty men and twenty women from each region were included, uttering 30 vowels each. All speakers were adult Muslim native speakers of these two dialects. The studied vowels were uttered in non-pharyngeal and non-laryngeal environments in the context of CVC words, embedded in a carrier sentence. The acoustic parameters studied were the two first formants, F0, and duration. Results revealed that long vowels were approximately twice as long as short vowels and differed also in their formant values. The two dialects diverged mainly in the short vowels rather than in the long ones. An overlap was found between the two short vowel pairs /i/-/e/ and /u/-/o/. This study demonstrates the existence of dialectal differences in the colloquial Arabic vowel systems, underlining the need for further research into the numerous additional dialects found in the region.
Cross-dialectal variation in formant dynamics of American English vowels
Fox, Robert Allen; Jacewicz, Ewa
2009-01-01
This study aims to characterize the nature of the dynamic spectral change in vowels in three distinct regional varieties of American English spoken in the Western North Carolina, in Central Ohio, and in Southern Wisconsin. The vowels ∕ɪ, ε, e, æ, aɪ∕ were produced by 48 women for a total of 1920 utterances and were contained in words of the structure ∕bVts∕ and ∕bVdz∕ in sentences which elicited nonemphatic and emphatic vowels. Measurements made at the vowel target (i.e., the central 60% of the vowel) produced a set of acoustic parameters which included position and movement in the F1 by F2 space, vowel duration, amount of spectral change [measured as vector length (VL) and trajectory length (TL)], and spectral rate of change. Results revealed expected variation in formant dynamics as a function of phonetic factors (vowel emphasis and consonantal context). However, for each vowel and for each measure employed, dialect was a strong source of variation in vowel-inherent spectral change. In general, the dialect-specific nature and amount of spectral change can be characterized quite effectively by position and movement in the F1 by F2 space, vowel duration, TL (but not VL which underestimates formant movement), and spectral rate of change. PMID:19894839
The Vowel Harmony in the Sinhala Language
ERIC Educational Resources Information Center
Petryshyn, Ivan
2005-01-01
The Sinhala language is characterized by the melodic shifty stress or its essence, the opposition between long and short vowels, the Ablaut variants of the vowels and the syllabic alphabet which, of course, might impact the vowel harmony and can be a feature of all the leveled Indo-European languages. The vowel harmony is a well-known concept in…
An EMA/EPG Study of Vowel-to-Vowel Articulation across Velars in Southern British English
ERIC Educational Resources Information Center
Fletcher, Janet
2004-01-01
Recent studies have attested that the extent of transconsonantal vowel-to-vowel coarticulation is at least partly dependent on degree of prosodic accentuation, in languages like English. A further important factor is the mutual compatibility of consonant and vowel gestures associated with the segments in question. In this study two speakers of…
Chládková, Kateřina; Escudero, Paola; Lipski, Silvia C
2013-09-01
In some languages (e.g. Czech), changes in vowel duration affect word meaning, while in others (e.g. Spanish) they do not. Yet for other languages (e.g. Dutch), the linguistic role of vowel duration remains unclear. To reveal whether Dutch represents vowel length in its phonology, we compared auditory pre-attentive duration processing in native and non-native vowels across Dutch, Czech, and Spanish. Dutch duration sensitivity patterned with Czech but was larger than Spanish in the native vowel, while it was smaller than Czech and Spanish in the non-native vowel. An interpretation of these findings suggests that in Dutch, duration is used phonemically but it might be relevant for the identity of certain native vowels only. Furthermore, the finding that Spanish listeners are more sensitive to duration in non-native than in native vowels indicates that a lack of duration differences in one's native language could be beneficial for second-language learning. Copyright © 2013 Elsevier Inc. All rights reserved.
Sonority contours in word recognition
NASA Astrophysics Data System (ADS)
McLennan, Sean
2003-04-01
Contrary to the Generativist distinction between competence and performance which asserts that speech or perception errors are due to random, nonlinguistic factors, it seems likely that errors are principled and possibly governed by some of the same constraints as language. A preliminary investigation of errors modeled after the child's ``Chain Whisper'' game (a degraded stimulus task) suggests that a significant number of recognition errors can be characterized as an improvement in syllable sonority contour towards the linguistically least-marked, voiceless-stop-plus-vowel syllable. An independent study of sonority contours showed that approximately half of the English lexicon can be uniquely identified by their contour alone. Additionally, ``sororities'' (groups of words that share a single sonority contour), surprisingly, show no correlation to familiarity or frequency in either size or membership. Together these results imply that sonority contours may be an important factor in word recognition and in defining word ``neighborhoods.'' Moreover, they suggest that linguistic markedness constraints may be more prevalent in performance-related phenomena than previously accepted.
Activity Recognition Invariant to Sensor Orientation with Wearable Motion Sensors.
Yurtman, Aras; Barshan, Billur
2017-08-09
Most activity recognition studies that employ wearable sensors assume that the sensors are attached at pre-determined positions and orientations that do not change over time. Since this is not the case in practice, it is of interest to develop wearable systems that operate invariantly to sensor position and orientation. We focus on invariance to sensor orientation and develop two alternative transformations to remove the effect of absolute sensor orientation from the raw sensor data. We test the proposed methodology in activity recognition with four state-of-the-art classifiers using five publicly available datasets containing various types of human activities acquired by different sensor configurations. While the ordinary activity recognition system cannot handle incorrectly oriented sensors, the proposed transformations allow the sensors to be worn at any orientation at a given position on the body, and achieve nearly the same activity recognition performance as the ordinary system for which the sensor units are not rotatable. The proposed techniques can be applied to existing wearable systems without much effort, by simply transforming the time-domain sensor data at the pre-processing stage.
Effect of Vowel Identity and Onset Asynchrony on Concurrent Vowel Identification
ERIC Educational Resources Information Center
Hedrick, Mark S.; Madix, Steven G.
2009-01-01
Purpose: The purpose of the current study was to determine the effects of vowel identity and temporal onset asynchrony on identification of vowels overlapped in time. Method: Fourteen listeners with normal hearing, with a mean age of 24 years, participated. The listeners were asked to identify both of a pair of 200-ms vowels (referred to as…
We're Not in Kansas Anymore: The TOTO Strategy for Decoding Vowel Pairs
ERIC Educational Resources Information Center
Meese, Ruth Lyn
2016-01-01
Vowel teams such as vowel digraphs present a challenge to struggling readers. Some researchers assert that phonics generalizations such as the "two vowels go walking and the first one does the talking" rule do not hold often enough to be reliable for children. Others suggest that some vowel teams are highly regular and that children can…
The Spelling of Vowels Is Influenced by Australian and British English Dialect Differences
ERIC Educational Resources Information Center
Kemp, Nenagh
2009-01-01
Two experiments examined the influence of dialect on the spelling of vowel sounds. British and Australian children (6 to 8 years) and university students wrote words whose unstressed vowel sound is spelled i or e and pronounced /I/ or /schwa/. Participants often (mis)spelled these vowel sounds as they pronounced them. When vowels were pronounced…
The Developmental Process of Vowel Integration as Found in Children in Grades 1-3.
ERIC Educational Resources Information Center
Bentz, Darrell; Szymczuk, Mike
A study was designed to investigate the auditory-visual integrative abilities of primary grade children for five long vowels and five short vowels. The Vowel Integration Test (VIT), composed of 35 nonsense words having all the long and short vowel sounds, was administered to students in 64 schools over a period of two years. Students' indications…
Acoustic characteristics of the vowel systems of six regional varieties of American English
Clopper, Cynthia G.; Pisoni, David B.; de Jong, Kenneth
2012-01-01
Previous research by speech scientists on the acoustic characteristics of American English vowel systems has typically focused on a single regional variety, despite decades of sociolinguistic research demonstrating the extent of regional phonological variation in the United States. In the present study, acoustic measures of duration and first and second formant frequencies were obtained from five repetitions of 11 different vowels produced by 48 talkers representing both genders and six regional varieties of American English. Results revealed consistent variation due to region of origin, particularly with respect to the production of low vowels and high back vowels. The Northern talkers produced shifted low vowels consistent with the Northern Cities Chain Shift, the Southern talkers produced fronted back vowels consistent with the Southern Vowel Shift, and the New England, Midland, and Western talkers produced the low back vowel merger. These findings indicate that the vowel systems of American English are better characterized in terms of the region of origin of the talkers than in terms of a single set of idealized acoustic-phonetic baselines of “General” American English and provide benchmark data for six regional varieties. PMID:16240825
Comparing Measures of Voice Quality From Sustained Phonation and Continuous Speech.
Gerratt, Bruce R; Kreiman, Jody; Garellek, Marc
2016-10-01
The question of what type of utterance-a sustained vowel or continuous speech-is best for voice quality analysis has been extensively studied but with equivocal results. This study examines whether previously reported differences derive from the articulatory and prosodic factors occurring in continuous speech versus sustained phonation. Speakers with voice disorders sustained vowels and read sentences. Vowel samples were excerpted from the steadiest portion of each vowel in the sentences. In addition to sustained and excerpted vowels, a 3rd set of stimuli was created by shortening sustained vowel productions to match the duration of vowels excerpted from continuous speech. Acoustic measures were made on the stimuli, and listeners judged the severity of vocal quality deviation. Sustained vowels and those extracted from continuous speech contain essentially the same acoustic and perceptual information about vocal quality deviation. Perceived and/or measured differences between continuous speech and sustained vowels derive largely from voice source variability across segmental and prosodic contexts and not from variations in vocal fold vibration in the quasisteady portion of the vowels. Approaches to voice quality assessment by using continuous speech samples average across utterances and may not adequately quantify the variability they are intended to assess.
Are vowel errors influenced by consonantal context in the speech of persons with aphasia?
NASA Astrophysics Data System (ADS)
Gelfer, Carole E.; Bell-Berti, Fredericka; Boyle, Mary
2004-05-01
The literature suggests that vowels and consonants may be affected differently in the speech of persons with conduction aphasia (CA) or nonfluent aphasia with apraxia of speech (AOS). Persons with CA have shown similar error rates across vowels and consonants, while those with AOS have shown more errors for consonants than vowels. These data have been interpreted to suggest that consonants have greater gestural complexity than vowels. However, recent research [M. Boyle et al., Proc. International Cong. Phon. Sci., 3265-3268 (2003)] does not support this interpretation: persons with AOS and CA both had a high proportion of vowel errors, and vowel errors almost always occurred in the context of consonantal errors. To examine the notion that vowels are inherently less complex than consonants and are differentially affected in different types of aphasia, vowel production in different consonantal contexts for speakers with AOS or CA was examined. The target utterances, produced in carrier phrases, were bVC and bV syllables, allowing us to examine whether vowel production is influenced by consonantal context. Listener judgments were obtained for each token, and error productions were grouped according to the intended utterance and error type. Acoustical measurements were made from spectrographic displays.
LEARNING NONADJACENT DEPENDENCIES IN PHONOLOGY: TRANSPARENT VOWELS IN VOWEL HARMONY
Finley, Sara
2015-01-01
Nonadjacent dependencies are an important part of the structure of language. While the majority of syntactic and phonological processes occur at a local domain, there are several processes that appear to apply at a distance, posing a challenge for theories of linguistic structure. This article addresses one of the most common nonadjacent phenomena in phonology: transparent vowels in vowel harmony. Vowel harmony occurs when adjacent vowels are required to share the same phonological feature value (e.g. V+F C V+F). However, transparent vowels create a second-order nonadjacent pattern because agreement between two vowels can ‘skip’ the transparent neutral vowel in addition to consonants (e.g. V+F C VT−F C V+F). Adults are shown to display initial learning biases against second-order nonadjacency in experiments that use an artificial grammar learning paradigm. Experiments 1–3 show that adult learners fail to learn the second-order long-distance dependency created by the transparent vowel (as compared to a control condition). In experiments 4–5, training in terms of overall exposure as well as the frequency of relevant transparent items was increased. With adequate exposure, learners reliably generalize to novel words containing transparent vowels. The experiments suggest that learners are sensitive to the structure of phonological representations, even when learning occurs at a relatively rapid pace.* PMID:26146423
Whitfield, Jason A; Dromey, Christopher; Palmer, Panika
2018-05-17
The purpose of this study was to examine the effect of speech intensity on acoustic and kinematic vowel space measures and conduct a preliminary examination of the relationship between kinematic and acoustic vowel space metrics calculated from continuously sampled lingual marker and formant traces. Young adult speakers produced 3 repetitions of 2 different sentences at 3 different loudness levels. Lingual kinematic and acoustic signals were collected and analyzed. Acoustic and kinematic variants of several vowel space metrics were calculated from the formant frequencies and the position of 2 lingual markers. Traditional metrics included triangular vowel space area and the vowel articulation index. Acoustic and kinematic variants of sentence-level metrics based on the articulatory-acoustic vowel space and the vowel space hull area were also calculated. Both acoustic and kinematic variants of the sentence-level metrics significantly increased with an increase in loudness, whereas no statistically significant differences in traditional vowel-point metrics were observed for either the kinematic or acoustic variants across the 3 loudness conditions. In addition, moderate-to-strong relationships between the acoustic and kinematic variants of the sentence-level vowel space metrics were observed for the majority of participants. These data suggest that both kinematic and acoustic vowel space metrics that reflect the dynamic contributions of both consonant and vowel segments are sensitive to within-speaker changes in articulation associated with manipulations of speech intensity.
ERIC Educational Resources Information Center
Khan, Mohamed Fazlulla
2013-01-01
L1 habits often tend to interfere with the process of learning a second language. The vowel habits of Arab learners of English are one such interference. Arabic orthography is such that certain vowels indicated by diacritics are often omitted, since an experienced reader of Arabic knows, by habit, the exact vowel sound in each phonetic…
A comparison of vowel normalization procedures for language variation research
NASA Astrophysics Data System (ADS)
Adank, Patti; Smits, Roel; van Hout, Roeland
2004-11-01
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels (``vowel-extrinsic'' information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself (``vowel-intrinsic'' information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., ``formant-extrinsic'' F2-F1). .
A comparison of vowel normalization procedures for language variation research.
Adank, Patti; Smits, Roel; van Hout, Roeland
2004-11-01
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).
Acoustic Analysis of Nasal Vowels in Monguor Language
NASA Astrophysics Data System (ADS)
Zhang, Hanbin
2017-09-01
The purpose of the study is to analyze the spectrum characteristics and acoustic features for the nasal vowels [ɑ˜] and [ɔ˜] in Monguor language. On the base of acoustic parameter database of the Monguor speech, the study finds out that there are five main zero-pole pairs appearing for the nasal vowel [ɔ˜] and two zero-pole pairs appear for the nasal vowel [ɔ˜]. The results of regression analysis demonstrate that the duration of the nasal vowel [ɔ˜] or the nasal vowel [ɔ˜] can be predicted by its F1, F2 and F3 respectively.
Fogerty, Daniel
2014-01-01
The present study investigated the importance of overall segment amplitude and intrinsic segment amplitude modulation of consonants and vowels to sentence intelligibility. Sentences were processed according to three conditions that replaced consonant or vowel segments with noise matched to the long-term average speech spectrum. Segments were replaced with (1) low-level noise that distorted the overall sentence envelope, (2) segment-level noise that restored the overall syllabic amplitude modulation of the sentence, and (3) segment-modulated noise that further restored faster temporal envelope modulations during the vowel. Results from the first experiment demonstrated an incremental benefit with increasing resolution of the vowel temporal envelope. However, amplitude modulations of replaced consonant segments had a comparatively minimal effect on overall sentence intelligibility scores. A second experiment selectively noise-masked preserved vowel segments in order to equate overall performance of consonant-replaced sentences to that of the vowel-replaced sentences. Results demonstrated no significant effect of restoring consonant modulations during the interrupting noise when existing vowel cues were degraded. A third experiment demonstrated greater perceived sentence continuity with the preservation or addition of vowel envelope modulations. Overall, results support previous investigations demonstrating the importance of vowel envelope modulations to the intelligibility of interrupted sentences. PMID:24606291
Perceptual invariance of coarticulated vowels over variations in speaking rate.
Stack, Janet W; Strange, Winifred; Jenkins, James J; Clarke, William D; Trent, Sonja A
2006-04-01
This study examined the perception and acoustics of a large corpus of vowels spoken in consonant-vowel-consonant syllables produced in citation-form (lists) and spoken in sentences at normal and rapid rates by a female adult. Listeners correctly categorized the speaking rate of sentence materials as normal or rapid (2% errors) but did not accurately classify the speaking rate of the syllables when they were excised from the sentences (25% errors). In contrast, listeners accurately identified the vowels produced in sentences spoken at both rates when presented the sentences and when presented the excised syllables blocked by speaking rate or randomized. Acoustical analysis showed that formant frequencies at syllable midpoint for vowels in sentence materials showed "target undershoot" relative to citation-form values, but little change over speech rate. Syllable durations varied systematically with vowel identity, speaking rate, and voicing of final consonant. Vowel-inherent-spectral-change was invariant in direction of change over rate and context for most vowels. The temporal location of maximum F1 frequency further differentiated spectrally adjacent lax and tense vowels. It was concluded that listeners were able to utilize these rate- and context-independent dynamic spectrotemporal parameters to identify coarticulated vowels, even when sentential information about speaking rate was not available.
Neural Representation of Concurrent Vowels in Macaque Primary Auditory Cortex123
Micheyl, Christophe; Steinschneider, Mitchell
2016-01-01
Abstract Successful speech perception in real-world environments requires that the auditory system segregate competing voices that overlap in frequency and time into separate streams. Vowels are major constituents of speech and are comprised of frequencies (harmonics) that are integer multiples of a common fundamental frequency (F0). The pitch and identity of a vowel are determined by its F0 and spectral envelope (formant structure), respectively. When two spectrally overlapping vowels differing in F0 are presented concurrently, they can be readily perceived as two separate “auditory objects” with pitches at their respective F0s. A difference in pitch between two simultaneous vowels provides a powerful cue for their segregation, which in turn, facilitates their individual identification. The neural mechanisms underlying the segregation of concurrent vowels based on pitch differences are poorly understood. Here, we examine neural population responses in macaque primary auditory cortex (A1) to single and double concurrent vowels (/a/ and /i/) that differ in F0 such that they are heard as two separate auditory objects with distinct pitches. We find that neural population responses in A1 can resolve, via a rate-place code, lower harmonics of both single and double concurrent vowels. Furthermore, we show that the formant structures, and hence the identities, of single vowels can be reliably recovered from the neural representation of double concurrent vowels. We conclude that A1 contains sufficient spectral information to enable concurrent vowel segregation and identification by downstream cortical areas. PMID:27294198
Relationships between alexithymia, affect recognition, and empathy after traumatic brain injury.
Neumann, Dawn; Zupan, Barbra; Malec, James F; Hammond, Flora
2014-01-01
To determine (1) alexithymia, affect recognition, and empathy differences in participants with and without traumatic brain injury (TBI); (2) the amount of affect recognition variance explained by alexithymia; and (3) the amount of empathy variance explained by alexithymia and affect recognition. Sixty adults with moderate-to-severe TBI; 60 age and gender-matched controls. Participants were evaluated for alexithymia (difficulty identifying feelings, difficulty describing feelings, and externally-oriented thinking); facial and vocal affect recognition; and affective and cognitive empathy (empathic concern and perspective-taking, respectively). Participants with TBI had significantly higher alexithymia; poorer facial and vocal affect recognition; and lower empathy scores. For TBI participants, facial and vocal affect recognition variances were significantly explained by alexithymia (12% and 8%, respectively); however, the majority of the variances were accounted for by externally-oriented thinking alone. Affect recognition and alexithymia significantly accounted for 16.5% of cognitive empathy. Again, the majority of the variance was primarily explained by externally-oriented thinking. Affect recognition and alexithymia did not explain affective empathy. Results suggest that people who have a tendency to avoid thinking about emotions (externally-oriented thinking) are more likely to have problems recognizing others' emotions and assuming others' points of view. Clinical implications are discussed.
ERIC Educational Resources Information Center
Karins, A. Krisjanis
1995-01-01
Investigates variable deletion of short vowels in word-final unstressed syllables in Latvian spoken in Riga. Affected vowels were almost always inflectional endings and results indicated that internal phonological and prosodic factors (especially distance from main word stress) were the strongest constraints on vowel deletion, along with the…
Vowel reduction in word-final position by early and late Spanish-English bilinguals.
Byers, Emily; Yavas, Mehmet
2017-01-01
Vowel reduction is a prominent feature of American English, as well as other stress-timed languages. As a phonological process, vowel reduction neutralizes multiple vowel quality contrasts in unstressed syllables. For bilinguals whose native language is not characterized by large spectral and durational differences between tonic and atonic vowels, systematically reducing unstressed vowels to the central vowel space can be problematic. Failure to maintain this pattern of stressed-unstressed syllables in American English is one key element that contributes to a "foreign accent" in second language speakers. Reduced vowels, or "schwas," have also been identified as particularly vulnerable to the co-articulatory effects of adjacent consonants. The current study examined the effects of adjacent sounds on the spectral and temporal qualities of schwa in word-final position. Three groups of English-speaking adults were tested: Miami-based monolingual English speakers, early Spanish-English bilinguals, and late Spanish-English bilinguals. Subjects performed a reading task to examine their schwa productions in fluent speech when schwas were preceded by consonants from various points of articulation. Results indicated that monolingual English and late Spanish-English bilingual groups produced targeted vowel qualities for schwa, whereas early Spanish-English bilinguals lacked homogeneity in their vowel productions. This extends prior claims that schwa is targetless for F2 position for native speakers to highly-proficient bilingual speakers. Though spectral qualities lacked homogeneity for early Spanish-English bilinguals, early bilinguals produced schwas with near native-like vowel duration. In contrast, late bilinguals produced schwas with significantly longer durations than English monolinguals or early Spanish-English bilinguals. Our results suggest that the temporal properties of a language are better integrated into second language phonologies than spectral qualities. Finally, we examined the role of nonstructural variables (e.g. linguistic history measures) in predicting native-like vowel duration. These factors included: Age of L2 learning, amount of L1 use, and self-reported bilingual dominance. Our results suggested that different sociolinguistic factors predicted native-like reduced vowel duration than predicted native-like vowel qualities across multiple phonetic environments.
Vowel reduction in word-final position by early and late Spanish-English bilinguals
2017-01-01
Vowel reduction is a prominent feature of American English, as well as other stress-timed languages. As a phonological process, vowel reduction neutralizes multiple vowel quality contrasts in unstressed syllables. For bilinguals whose native language is not characterized by large spectral and durational differences between tonic and atonic vowels, systematically reducing unstressed vowels to the central vowel space can be problematic. Failure to maintain this pattern of stressed-unstressed syllables in American English is one key element that contributes to a “foreign accent” in second language speakers. Reduced vowels, or “schwas,” have also been identified as particularly vulnerable to the co-articulatory effects of adjacent consonants. The current study examined the effects of adjacent sounds on the spectral and temporal qualities of schwa in word-final position. Three groups of English-speaking adults were tested: Miami-based monolingual English speakers, early Spanish-English bilinguals, and late Spanish-English bilinguals. Subjects performed a reading task to examine their schwa productions in fluent speech when schwas were preceded by consonants from various points of articulation. Results indicated that monolingual English and late Spanish-English bilingual groups produced targeted vowel qualities for schwa, whereas early Spanish-English bilinguals lacked homogeneity in their vowel productions. This extends prior claims that schwa is targetless for F2 position for native speakers to highly-proficient bilingual speakers. Though spectral qualities lacked homogeneity for early Spanish-English bilinguals, early bilinguals produced schwas with near native-like vowel duration. In contrast, late bilinguals produced schwas with significantly longer durations than English monolinguals or early Spanish-English bilinguals. Our results suggest that the temporal properties of a language are better integrated into second language phonologies than spectral qualities. Finally, we examined the role of nonstructural variables (e.g. linguistic history measures) in predicting native-like vowel duration. These factors included: Age of L2 learning, amount of L1 use, and self-reported bilingual dominance. Our results suggested that different sociolinguistic factors predicted native-like reduced vowel duration than predicted native-like vowel qualities across multiple phonetic environments. PMID:28384234
Identification and discrimination of Spanish front vowels
NASA Astrophysics Data System (ADS)
Castellanos, Isabel; Lopez-Bascuas, Luis E.
2004-05-01
The idea that vowels are perceived less categorically than consonants is widely accepted. Ades [Psychol. Rev. 84, 524-530 (1977)] tried to explain this fact on the basis of the Durlach and Braida [J. Acoust. Soc. Am. 46, 372-383 (1969)] theory of intensity resolution. Since vowels seem to cover a broader perceptual range, context-coding noise for vowels should be greater than for consonants leading to a less categorical performance on the vocalic segments. However, relatively recent work by Macmillan et al. [J. Acoust. Soc. Am. 84, 1262-1280 (1988)] has cast doubt on the assumption of different perceptual ranges for vowels and consonants even though context variance is acknowledged to be greater for the former. A possibility is that context variance increases as number of long-term phonemic categories also increases. To test this hypothesis we focused on Spanish as the target language. Spanish has less vowel categories than English and the implication is that Spanish vowels will be more categorically perceived. Identification and discrimination experiments were conducted on a synthetic /i/-/e/ continuum and the obtained functions were studied to assess whether Spanish vowels are more categorically perceived than English vowels. The results are discussed in the context of different theories of speech perception.
Adult perceptions of phonotactic violations in Japanese
NASA Astrophysics Data System (ADS)
Fais, Laurel; Kajikawa, Sachiyo; Werker, Janet; Amano, Shigeaki
2004-05-01
Adult Japanese speakers ``hear'' epenthetic vowels in productions of Japanese-like words that violate the canonical CVCVCV form by containing internal consonant clusters (CVCCV) [Dupoux et al., J. Exp. Psychol. 25, 1568-1578 (1999)]. Given this finding, this research examined how Japanese adults rated the goodness of Japanese-like words produced without a vowel in the final syllable (CVC), and words produced without vowels in the penultimate and final syllables (CVCC). Furthermore, in some of these contexts, voiceless vowels may appear in fluent, casual Japanese productions, especially in the Kanto dialect, and in some, such voiceless vowels may not appear. Results indicate that both Kanto and Kinki speakers rated CVC productions for contexts in which voiceless vowels are not allowed as the worst; they rated CVC and CVCC contexts in which voiceless vowel productions are allowed as better. In these latter contexts, the CVC words, which result from the loss of one, final, vowel, are judged to be better than the CVCC words, which result from the loss of two (final and penultimate) vowels. These results mirror the relative seriousness of the phonotactic violations and indicate listeners have tacit knowledge of these regularities in their language.
Catalan speakers' perception of word stress in unaccented contexts.
Ortega-Llebaria, Marta; del Mar Vanrell, Maria; Prieto, Pilar
2010-01-01
In unaccented contexts, formant frequency differences related to vowel reduction constitute a consistent cue to word stress in English, whereas in languages such as Spanish that have no systematic vowel reduction, stress perception is based on duration and intensity cues. This article examines the perception of word stress by speakers of Central Catalan, in which, due to its vowel reduction patterns, words either alternate stressed open vowels with unstressed mid-central vowels as in English or contain no vowel quality cues to stress, as in Spanish. Results show that Catalan listeners perceive stress based mainly on duration cues in both word types. Other cues pattern together with duration to make stress perception more robust. However, no single cue is absolutely necessary and trading effects compensate for a lack of differentiation in one dimension by changes in another dimension. In particular, speakers identify longer mid-central vowels as more stressed than shorter open vowels. These results and those obtained in other stress-accent languages provide cumulative evidence that word stress is perceived independently of pitch accents by relying on a set of cues with trading effects so that no single cue, including formant frequency differences related to vowel reduction, is absolutely necessary for stress perception.
Newman, R A; Emanuel, F W
1991-08-01
This study was designed to investigate the effects of vocal fo on vowel spectral noise level (SNL) and perceived vowel roughness for subjects in high- and low-pitch voice categories. The subjects were 40 adult singers (10 each sopranos, altos, tenors, and basses). Each produced the vowel /a/ in isolation at a comfortable speaking pitch, and at each of seven assigned pitches spaced at whole-tone intervals over a musical octave within his or her singing pitch range. The eight /a/ productions were repeated by each subject on a second test day. The SNL differences between repeated test samples (different days) were not statistically significant for any subject group. For the vowel samples produced at a comfortable pitch, a relatively large SNL was associated with samples phonated by the subjects of each sex who manifested the relatively low singing pitch range. Regarding the vowel samples produced at the assigned-pitch levels, it was found that both vowel SNL and perceived vowel roughness decreased as test-pitch level was raised over a range of one octave. The relationship between vocal pitch and either vowel roughness or SNL approached linearity for each of the four subject groups.
Revisiting the Canadian English vowel space
NASA Astrophysics Data System (ADS)
Hagiwara, Robert
2005-04-01
In order to fill a need for experimental-acoustic baseline measurements of Canadian English vowels, a database is currently being constructed in Winnipeg, Manitoba. The database derives from multiple repetitions of fifteen English vowels (eleven standard monophthongs, syllabic /r/ and three standard diphthongs) in /hVd/ and /hVt/ contexts, as spoken by multiple speakers. Frequencies of the first four formants are taken from three timepoints in every vowel token (25, 50, and 75% of vowel duration). Preliminary results (from five men and five women) confirm some features characteristic of Canadian English, but call others into question. For instance the merger of low back vowels appears to be complete for these speakers, but the result is a lower-mid and probably rounded vowel rather than the low back unround vowel often described. With these data Canadian Raising can be quantified as an average 200 Hz or 1.5 Bark downward shift in the frequency of F1 before voiceless /t/. Analysis of the database will lead to a more accurate picture of the Canadian English vowel system, as well as provide a practical and up-to-date point of reference for further phonetic and sociophonetic comparisons.
Shin, Yu-Jeong; Kim, Yongsoo; Kim, Hyun-Gi
2017-12-01
Nasalance is used to evaluate the velopharyngeal incompetence in clinical diagnoses using a nasometer. The aim of this study is to find the nasalance differences between Vietnamese cleft palate children and Korean cleft palate children by measuring the nasalance of five oral vowels. Ten Vietnamese cleft palate children after surgery, three Vietnamese children for the control group, and ten Korean cleft palate children after surgery with the same age participated in this experimentation. Instead of Korean control, the standard value of Korean version of the simplified nasometric assessment procedures (kSNAP) was used. The results are as follows: (1) the highest nasalance score among the Vietnamese normal vowels is the low vowel /a/; however, that of Korean normal vowels is the high vowel /i/. (2) The average nasalance score of Korean cleft palate vowels is 18% higher than that of Vietnamese cleft palate vowels. There was a nasalance score of over 45% among the vowels /e/ and /i/ in Vietnamese cleft palate patients and /i/, /o/, and /u/ in Korean cleft palate patients. These different nasalance scores of the same vowels seem to cause an ethnic difference between Vietnamese and Korean cleft palate children.
Schiff, Rachel
2012-12-01
The present study explored the speed, accuracy, and reading comprehension of vowelized versus unvowelized scripts among 126 native Hebrew speaking children in second, fourth, and sixth grades. Findings indicated that second graders read and comprehended vowelized scripts significantly more accurately and more quickly than unvowelized scripts, whereas among fourth and sixth graders reading of unvowelized scripts developed to a greater degree than the reading of vowelized scripts. An analysis of the mediation effect for children's mastery of vowelized reading speed and accuracy on their mastery of unvowelized reading speed and comprehension revealed that in second grade, reading accuracy of vowelized words mediated the reading speed and comprehension of unvowelized scripts. In the fourth grade, accuracy in reading both vowelized and unvowelized words mediated the reading speed and comprehension of unvowelized scripts. By sixth grade, accuracy in reading vowelized words offered no mediating effect, either on reading speed or comprehension of unvowelized scripts. The current outcomes thus suggest that young Hebrew readers undergo a scaffolding process, where vowelization serves as the foundation for building initial reading abilities and is essential for successful and meaningful decoding of unvowelized scripts.
Development of detection and recognition of orientation of geometric and real figures.
Stein, N L; Mandler, J M
1975-06-01
Black and white kindergarten and second-grade children were tested for accuracy of detection and recognition of orientation and location changes in pictures of real-world and geometric figures. No differences were found in accuracy of recognition between the 2 kinds of pictures, but patterns of verbalization differed on specific transformations. Although differences in accuracy were found between kindergarten and second grade on an initial recognition task, practice on a matching-to-sample task eliminated differences on a second recognition task. Few ethnic differences were found on accuracy of recognition, but significant differences were found in amount of verbal output on specific transformations. For both groups, mention of orientation changes was markedly reduced when location changes were present.
Secondary iris recognition method based on local energy-orientation feature
NASA Astrophysics Data System (ADS)
Huo, Guang; Liu, Yuanning; Zhu, Xiaodong; Dong, Hongxing
2015-01-01
This paper proposes a secondary iris recognition based on local features. The application of the energy-orientation feature (EOF) by two-dimensional Gabor filter to the extraction of the iris goes before the first recognition by the threshold of similarity, which sets the whole iris database into two categories-a correctly recognized class and a class to be recognized. Therefore, the former are accepted and the latter are transformed by histogram to achieve an energy-orientation histogram feature (EOHF), which is followed by a second recognition with the chi-square distance. The experiment has proved that the proposed method, because of its higher correct recognition rate, could be designated as the most efficient and effective among its companion studies in iris recognition algorithms.
Articulatory characteristics of Hungarian ‘transparent’ vowels
Benus, Stefan; Gafos, Adamantios I.
2007-01-01
Using a combination of magnetometry and ultrasound, we examined the articulatory characteristics of the so-called ‘transparent’ vowels [iː], [i], and [eː] in Hungarian vowel harmony. Phonologically, transparent vowels are front, but they can be followed by either front or back suffixes. However, a finer look reveals an underlying phonetic coherence in two respects. First, transparent vowels in back harmony contexts show a less advanced (more retracted) tongue body posture than phonemically identical vowels in front harmony contexts: e.g. [i] in buli-val is less advanced than [i] in bili-vel. Second, transparent vowels in monosyllabic stems selecting back suffixes are also less advanced than phonemically identical vowels in stems selecting front suffixes: e.g. [iː] in ír, taking back suffixes, compared to [iː] of hír, taking front suffixes, is less advanced when these stems are produced in bare form (no suffixes). We thus argue that the phonetic degree of tongue body horizontal position correlates with the phonological alternation in suffixes. A hypothesis that emerges from this work is that a plausible phonetic basis for transparency can be found in quantal characteristics of the relation between articulation and acoustics of transparent vowels. More broadly, the proposal is that the phonology of transparent vowels is better understood when their phonological patterning is studied together with their articulatory and acoustic characteristics. PMID:18389086
Harnsberger, James D.; Svirsky, Mario A.; Kaiser, Adam R.; Pisoni, David B.; Wright, Richard; Meyer, Ted A.
2012-01-01
Cochlear implant (CI) users differ in their ability to perceive and recognize speech sounds. Two possible reasons for such individual differences may lie in their ability to discriminate formant frequencies or to adapt to the spectrally shifted information presented by cochlear implants, a basalward shift related to the implant’s depth of insertion in the cochlea. In the present study, we examined these two alternatives using a method-of-adjustment (MOA) procedure with 330 synthetic vowel stimuli varying in F1 and F2 that were arranged in a two-dimensional grid. Subjects were asked to label the synthetic stimuli that matched ten monophthongal vowels in visually presented words. Subjects then provided goodness ratings for the stimuli they had chosen. The subjects’ responses to all ten vowels were used to construct individual perceptual “vowel spaces.” If CI users fail to adapt completely to the basalward spectral shift, then the formant frequencies of their vowel categories should be shifted lower in both F1 and F2. However, with one exception, no systematic shifts were observed in the vowel spaces of CI users. Instead, the vowel spaces differed from one another in the relative size of their vowel categories. The results suggest that differences in formant frequency discrimination may account for the individual differences in vowel perception observed in cochlear implant users. PMID:11386565
Cross-linguistic studies of children’s and adults’ vowel spacesa
Chung, Hyunju; Kong, Eun Jong; Edwards, Jan; Weismer, Gary; Fourakis, Marios; Hwang, Youngdeok
2012-01-01
This study examines cross-linguistic variation in the location of shared vowels in the vowel space across five languages (Cantonese, American English, Greek, Japanese, and Korean) and three age groups (2-year-olds, 5-year-olds, and adults). The vowels /a/, /i/, and /u/ were elicited in familiar words using a word repetition task. The productions of target words were recorded and transcribed by native speakers of each language. For correctly produced vowels, first and second formant frequencies were measured. In order to remove the effect of vocal tract size on these measurements, a normalization approach that calculates distance and angular displacement from the speaker centroid was adopted. Language-specific differences in the location of shared vowels in the formant values as well as the shape of the vowel spaces were observed for both adults and children. PMID:22280606
McArdle, Rachel; Wilson, Richard H
2008-06-01
To analyze the 50% correct recognition data that were from the Wilson et al (this issue) study and that were obtained from 24 listeners with normal hearing; also to examine whether acoustic, phonetic, or lexical variables can predict recognition performance for monosyllabic words presented in speech-spectrum noise. The specific variables are as follows: (a) acoustic variables (i.e., effective root-mean-square sound pressure level, duration), (b) phonetic variables (i.e., consonant features such as manner, place, and voicing for initial and final phonemes; vowel phonemes), and (c) lexical variables (i.e., word frequency, word familiarity, neighborhood density, neighborhood frequency). The descriptive, correlational study will examine the influence of acoustic, phonetic, and lexical variables on speech recognition in noise performance. Regression analysis demonstrated that 45% of the variance in the 50% point was accounted for by acoustic and phonetic variables whereas only 3% of the variance was accounted for by lexical variables. These findings suggest that monosyllabic word-recognition-in-noise is more dependent on bottom-up processing than on top-down processing. The results suggest that when speech-in-noise testing is used in a pre- and post-hearing-aid-fitting format, the use of monosyllabic words may be sensitive to changes in audibility resulting from amplification.
Acoustic correlates of sexual orientation and gender-role self-concept in women's speech.
Kachel, Sven; Simpson, Adrian P; Steffens, Melanie C
2017-06-01
Compared to studies of male speakers, relatively few studies have investigated acoustic correlates of sexual orientation in women. The present investigation focuses on shedding more light on intra-group variability in lesbians and straight women by using a fine-grained analysis of sexual orientation and collecting data on psychological characteristics (e.g., gender-role self-concept). For a large-scale women's sample (overall n = 108), recordings of spontaneous and read speech were analyzed for median fundamental frequency and acoustic vowel space features. Two studies showed no acoustic differences between lesbians and straight women, but there was evidence of acoustic differences within sexual orientation groups. Intra-group variability in median f0 was found to depend on the exclusivity of sexual orientation; F1 and F2 in /iː/ (study 1) and median f0 (study 2) were acoustic correlates of gender-role self-concept, at least for lesbians. Other psychological characteristics (e.g., sexual orientation of female friends) were also reflected in lesbians' speech. Findings suggest that acoustic features indexicalizing sexual orientation can only be successfully interpreted in combination with a fine-grained analysis of psychological characteristics.
Mandarin compound vowels produced by prelingually deafened children with cochlear implants.
Yang, Jing; Xu, Li
2017-06-01
Compound vowels including diphthongs and triphthongs have complex, dynamic spectral features. The production of compound vowels by children with cochlear implants (CIs) has not been studied previously. The present study examined the dynamic features of compound vowels in native Mandarin-speaking children with CIs. Fourteen prelingually deafened children with CIs (aged 2.9-8.3 years old) and 14 age-matched, normal-hearing (NH) children produced monosyllables containing six Mandarin compound vowels (i.e., /aɪ/, /aʊ/, /uo/, /iɛ/, /iaʊ/, /ioʊ/). The frequency values of the first two formants were measured at nine equidistant time points over the course of the vowel duration. All formant frequency values were normalized and then used to calculate vowel trajectory length and overall spectral rate of change. The results revealed that the CI children produced significantly longer durations for all six compound vowels. The CI children's ability to produce formant movement for the compound vowels varied considerably. Some CI children produced relatively static formant trajectories for certain diphthongs, whereas others produced certain vowels with greater formant movement than did the NH children. As a group, the CI children roughly followed the NH children on the pattern of magnitude of formant movement, but they showed a slower rate of formant change than did the NH children. The findings suggested that prelingually deafened children with CIs, during the early stage of speech acquisition, had not established appropriate targets and articulatory coordination for compound vowel productions. This preliminary study may shed light on rehabilitation of prelingually deafened children with CIs. Copyright © 2017 Elsevier B.V. All rights reserved.
Donaldson, Gail S.; Dawson, Patricia K.; Borden, Lamar Z.
2010-01-01
Objectives Previous studies have confirmed that current steering can increase the number of discriminable pitches available to many CI users; however, the ability to perceive additional pitches has not been linked to improved speech perception. The primary goals of this study were to determine (1) whether adult CI users can achieve higher levels of spectral-cue transmission with a speech processing strategy that implements current steering (Fidelity120) than with a predecessor strategy (HiRes) and, if so, (2) whether the magnitude of improvement can be predicted from individual differences in place-pitch sensitivity. A secondary goal was to determine whether Fidelity120 supports higher levels of speech recognition in noise than HiRes. Design A within-subjects repeated measures design evaluated speech perception performance with Fidelity120 relative to HiRes in 10 adult CI users. Subjects used the novel strategy (either HiRes or Fidelity120) for 8 weeks during the main study; a subset of five subjects used Fidelity120 for 3 additional months following the main study. Speech perception was assessed for the spectral cues related to vowel F1 frequency (Vow F1), vowel F2 frequency (Vow F2) and consonant place of articulation (Con PLC); overall transmitted information for vowels (Vow STIM) and consonants (Con STIM); and sentence recognition in noise. Place-pitch sensitivity was measured for electrode pairs in the apical, middle and basal regions of the implanted array using a psychophysical pitch-ranking task. Results With one exception, there was no effect of strategy (HiRes vs. Fidelity120) on the speech measures tested, either during the main study (n=10) or after extended use of Fidelity120 (n=5). The exception was a small but significant advantage for HiRes over Fidelity120 for the Con STIM measure during the main study. Examination of individual subjects' data revealed that 3 of 10 subjects demonstrated improved perception of one or more spectral cues with Fidelity120 relative to HiRes after 8 weeks or longer experience with Fidelity120. Another 3 subjects exhibited initial decrements in spectral cue perception with Fidelity120 at the 8 week time point; however, evidence from one subject suggested that such decrements may resolve with additional experience. Place-pitch thresholds were inversely related to improvements in Vow F2 perception with Fidelity120 relative to HiRes. However, no relationship was observed between place-pitch thresholds and the other spectral measures (Vow F1 or Con PLC). Conclusions Findings suggest that Fidelity120 supports small improvements in the perception of spectral speech cues in some Advanced Bionics CI users; however, many users show no clear benefit. Benefits are more likely to occur for vowel spectral cues (related to F1 and F2 frequency) than for consonant spectral cues (related to place of articulation). There was an inconsistent relationship between place-pitch sensitivity and improvements in spectral cue perception with Fidelity120 relative to HiRes. This may partly reflect the small number of sites at which place-pitch thresholds were measured. Contrary to some previous reports, there was no clear evidence that Fidelity120 supports improved sentence recognition in noise. PMID:21084987
Vowel Acoustics in Dysarthria: Mapping to Perception
ERIC Educational Resources Information Center
Lansford, Kaitlin L.; Liss, Julie M.
2014-01-01
Purpose: The aim of the present report was to explore whether vowel metrics, demonstrated to distinguish dysarthric and healthy speech in a companion article (Lansford & Liss, 2014), are able to predict human perceptual performance. Method: Vowel metrics derived from vowels embedded in phrases produced by 45 speakers with dysarthria were…
ERIC Educational Resources Information Center
Zee, Eric
A phonetic study of vowel devoicing in the Shanghai dialect of Chinese explored the phonetic conditions under which the high, closed vowels and the apical vowel in Shanghai are most likely to become devoiced. The phonetic conditions may be segmental or suprasegmental. Segmentally, the study sought to determine whether a certain type of pre-vocalic…
Palatalization and Intrinsic Prosodic Vowel Features in Russian
ERIC Educational Resources Information Center
Ordin, Mikhail
2011-01-01
The presented study is aimed at investigating the interaction of palatalization and intrinsic prosodic features of the vowel in CVC (consonant+vowel+consonant) syllables in Russian. The universal nature of intrinsic prosodic vowel features was confirmed with the data from the Russian language. It was found that palatalization of the consonants…
Structural Generalizations over Consonants and Vowels in 11-Month-Old Infants
ERIC Educational Resources Information Center
Pons, Ferran; Toro, Juan M.
2010-01-01
Recent research has suggested consonants and vowels serve different roles during language processing. While statistical computations are preferentially made over consonants but not over vowels, simple structural generalizations are easily made over vowels but not over consonants. Nevertheless, the origins of this asymmetry are unknown. Here we…
A Vowel Is a Vowel: Generalizing Newly Learned Phonotactic Constraints to New Contexts
ERIC Educational Resources Information Center
Chambers, Kyle E.; Onishi, Kristine H.; Fisher, Cynthia
2010-01-01
Adults can learn novel phonotactic constraints from brief listening experience. We investigated the representations underlying phonotactic learning by testing generalization to syllables containing new vowels. Adults heard consonant-vowel-consonant study syllables in which particular consonants were artificially restricted to the onset or coda…
Vowel Intelligibility in Children with and without Dysarthria: An Exploratory Study
ERIC Educational Resources Information Center
Levy, Erika S.; Leone, Dorothy; Moya-Gale, Gemma; Hsu, Sih-Chiao; Chen, Wenli; Ramig, Lorraine O.
2016-01-01
Children with dysarthria due to cerebral palsy (CP) present with decreased vowel space area and reduced word intelligibility. Although a robust relationship exists between vowel space and word intelligibility, little is known about the intelligibility of vowels in this population. This exploratory study investigated the intelligibility of American…
Relationship between consonant recognition in noise and hearing threshold.
Yoon, Yang-soo; Allen, Jont B; Gooler, David M
2012-04-01
Although poorer understanding of speech in noise by listeners who are hearing-impaired (HI) is known not to be directly related to audiometric hearing threshold, HT (f), grouping HI listeners with HT (f) is widely practiced. In this article, the relationship between consonant recognition and HT (f) is considered over a range of signal-to-noise ratios (SNRs). Confusion matrices (CMs) from 25 HI ears were generated in response to 16 consonant-vowel syllables presented at 6 different SNRs. Individual differences scaling (INDSCAL) was applied to both feature-based matrices and CMs in order to evaluate the relationship between HT (f) and consonant recognition among HI listeners. The results showed no predictive relationship between the percent error scores (Pe) and HT (f) across SNRs. The multiple regression models showed that the HT (f) accounted for 39% of the total variance of the slopes of the Pe. Feature-based INDSCAL analysis showed consistent grouping of listeners across SNRs, but not in terms of HT (f). Systematic relationship between measures was also not defined by CM-based INDSCAL analysis across SNRs. HT (f) did not account for the majority of the variance (39%) in consonant recognition in noise when the complete body of the CM was considered.
Li, Tianhao; Fu, Qian-Jie
2013-01-01
Objectives (1) To investigate whether voice gender discrimination (VGD) could be a useful indicator of the spectral and temporal processing abilities of individual cochlear implant (CI) users; (2) To examine the relationship between VGD and speech recognition with CI when comparable acoustic cues are used for both perception processes. Design VGD was measured using two talker sets with different inter-gender fundamental frequencies (F0), as well as different acoustic CI simulations. Vowel and consonant recognition in quiet and noise were also measured and compared with VGD performance. Study sample Eleven postlingually deaf CI users. Results The results showed that (1) mean VGD performance differed for different stimulus sets, (2) VGD and speech recognition performance varied among individual CI users, and (3) individual VGD performance was significantly correlated with speech recognition performance under certain conditions. Conclusions VGD measured with selected stimulus sets might be useful for assessing not only pitch-related perception, but also spectral and temporal processing by individual CI users. In addition to improvements in spectral resolution and modulation detection, the improvement in higher modulation frequency discrimination might be particularly important for CI users in noisy environments. PMID:21696330
Speechant: A Vowel Notation System to Teach English Pronunciation
ERIC Educational Resources Information Center
dos Reis, Jorge; Hazan, Valerie
2012-01-01
This paper introduces a new vowel notation system aimed at aiding the teaching of English pronunciation. This notation system, designed as an enhancement to orthographic text, was designed to use concepts borrowed from the representation of musical notes and is also linked to the acoustic characteristics of vowel sounds. Vowel timbre is…
Talker Differences in Clear and Conversational Speech: Acoustic Characteristics of Vowels
ERIC Educational Resources Information Center
Ferguson, Sarah Hargus; Kewley-Port, Diane
2007-01-01
Purpose: To determine the specific acoustic changes that underlie improved vowel intelligibility in clear speech. Method: Seven acoustic metrics were measured for conversational and clear vowels produced by 12 talkers--6 who previously were found (S. H. Ferguson, 2004) to produce a large clear speech vowel intelligibility effect for listeners with…
Audiovisual Perception of Congruent and Incongruent Dutch Front Vowels
ERIC Educational Resources Information Center
Valkenier, Bea; Duyne, Jurriaan Y.; Andringa, Tjeerd C.; Baskent, Deniz
2012-01-01
Purpose: Auditory perception of vowels in background noise is enhanced when combined with visually perceived speech features. The objective of this study was to investigate whether the influence of visual cues on vowel perception extends to incongruent vowels, in a manner similar to the McGurk effect observed with consonants. Method:…
Stress-Induced Acoustic Variation in L2 and L1 Spanish Vowels.
Romanelli, Sofía; Menegotto, Andrea; Smyth, Ron
2018-05-28
We assessed the effect of lexical stress on the duration and quality of Spanish word-final vowels /a, e, o/ produced by American English late intermediate learners of L2 Spanish, as compared to those of native L1 Argentine Spanish speakers. Participants read 54 real words ending in /a, e, o/, with either final or penultimate lexical stress, embedded in a text and a word list. We measured vowel duration and both F1 and F2 frequencies at 3 temporal points. stressed vowels were longer than unstressed vowels, in Spanish L1 and L2. L1 and L2 Spanish stressed /a/ and /e/ had higher F1 values than their unstressed counterparts. Only the L2 speakers showed evidence of rising offglides for /e/ and /o/. The L2 and L1 Spanish vowel space was compressed in the absence of stress. Lexical stress affected the vowel quality of L1 and L2 Spanish vowels. We provide an up-to-date account of the formant trajectories of Argentine River Plate Spanish word-final /a, e, o/ and offer experimental support to the claim that stress affects the quality of Spanish vowels in word-final contexts. © 2018 S. Karger AG, Basel.
Peters, Jörg; Heeringa, Wilbert J; Schoormann, Heike E
2017-08-01
The present study compares the acoustic realization of Saterland Frisian, Low German, and High German vowels by trilingual speakers in the Saterland. The Saterland is a rural municipality in northwestern Germany. It offers the unique opportunity to study trilingualism with languages that differ both by their vowel inventories and by external factors, such as their social status and the autonomy of their speech communities. The objective of the study was to examine whether the trilingual speakers differ in their acoustic realizations of vowel categories shared by the three languages and whether those differences can be interpreted as effects of either the differences in the vowel systems or of external factors. Monophthongs produced in a /hVt/ frame revealed that High German vowels show the most divergent realizations in terms of vowel duration and formant frequencies, whereas Saterland Frisian and Low German vowels show small differences. These findings suggest that vowels of different languages are likely to share the same phonological space when the speech communities largely overlap, as is the case with Saterland Frisian and Low German, but may resist convergence if at least one language is shared with a larger, monolingual speech community, as is the case with High German.
Vowel Space Characteristics of Speech Directed to Children With and Without Hearing Loss
Wieland, Elizabeth A.; Burnham, Evamarie B.; Kondaurova, Maria; Bergeson, Tonya R.
2015-01-01
Purpose This study examined vowel characteristics in adult-directed (AD) and infant-directed (ID) speech to children with hearing impairment who received cochlear implants or hearing aids compared with speech to children with normal hearing. Method Mothers' AD and ID speech to children with cochlear implants (Study 1, n = 20) or hearing aids (Study 2, n = 11) was compared with mothers' speech to controls matched on age and hearing experience. The first and second formants of vowels /i/, /ɑ/, and /u/ were measured, and vowel space area and dispersion were calculated. Results In both studies, vowel space was modified in ID compared with AD speech to children with and without hearing loss. Study 1 showed larger vowel space area and dispersion in ID compared with AD speech regardless of infant hearing status. The pattern of effects of ID and AD speech on vowel space characteristics in Study 2 was similar to that in Study 1, but depended partly on children's hearing status. Conclusion Given previously demonstrated associations between expanded vowel space in ID compared with AD speech and enhanced speech perception skills, this research supports a focus on vowel pronunciation in developing intervention strategies for improving speech-language skills in children with hearing impairment. PMID:25658071
Effects of Levodopa on Vowel Articulation in Patients with Parkinson's Disease.
Okada, Yukihiro; Murata, Miho; Toda, Tatsushi
2016-04-27
The effects of levodopa on articulatory dysfunction in patients with Parkinson's disease remain inconclusive. This study aimed to investigate the effects of levodopa on isolated vowel articulation and motor performance in patients with moderate to severe Parkinson's disease, excluding speech fluctuations caused by dyskinesia. 21 patients (14 males and 7 females) and 21 age- and sex- matched healthy subjects were enrolled. Together with motor assessment, the patients phonated five Japanese isolated vowels (/a/, /i/, /u/, /e/, and /o/) 20 times before and 1 h after levodopa treatment. We made the frequency analysis of each vowel and measured the first and second formants. From these formants we constructed the pentagonal vowel space area which should be the good indicator for articulatory dysfunction of vowels. In control subjects, only speech samples were analyzed. To investigate the sequential relationship between plasma levodopa concentrations, motor performances, and acoustic measurements after treatment, entire drug cycle tests were performed in 4 patients. The pentagonal vowel space area was significantly expanded together with motor amelioration after levodopa treatment, although the enlargement is not enough for the space area of control subjects. Drug cycle tests revealed that sequential increases or decreases in plasma levodopa levels after treatment correlated well with expansion or decrease of the vowel space areas and improvement or deterioration of motor manifestations. Levodopa expanded the vowel space area and ameliorated motor performance, suggesting that dysfunctions in vowel articulation and motor performance in patients with Parkinson's disease are based on dopaminergic pathology.
Scharinger, Mathias; Monahan, Philip J; Idsardi, William J
2016-03-01
While previous research has established that language-specific knowledge influences early auditory processing, it is still controversial as to what aspects of speech sound representations determine early speech perception. Here, we propose that early processing primarily depends on information propagated top-down from abstractly represented speech sound categories. In particular, we assume that mid-vowels (as in 'bet') exert less top-down effects than the high-vowels (as in 'bit') because of their less specific (default) tongue height position as compared to either high- or low-vowels (as in 'bat'). We tested this assumption in a magnetoencephalography (MEG) study where we contrasted mid- and high-vowels, as well as the low- and high-vowels in a passive oddball paradigm. Overall, significant differences between deviants and standards indexed reliable mismatch negativity (MMN) responses between 200 and 300ms post-stimulus onset. MMN amplitudes differed in the mid/high-vowel contrasts and were significantly reduced when a mid-vowel standard was followed by a high-vowel deviant, extending previous findings. Furthermore, mid-vowel standards showed reduced oscillatory power in the pre-stimulus beta-frequency band (18-26Hz), compared to high-vowel standards. We take this as converging evidence for linguistic category structure to exert top-down influences on auditory processing. The findings are interpreted within the linguistic model of underspecification and the neuropsychological predictive coding framework. Copyright © 2016 Elsevier Inc. All rights reserved.
Deep learning with coherent nanophotonic circuits
NASA Astrophysics Data System (ADS)
Shen, Yichen; Harris, Nicholas C.; Skirlo, Scott; Prabhu, Mihika; Baehr-Jones, Tom; Hochberg, Michael; Sun, Xin; Zhao, Shijie; Larochelle, Hugo; Englund, Dirk; Soljačić, Marin
2017-07-01
Artificial neural networks are computational network models inspired by signal processing in the brain. These models have dramatically improved performance for many machine-learning tasks, including speech and image recognition. However, today's computing hardware is inefficient at implementing neural networks, in large part because much of it was designed for von Neumann computing schemes. Significant effort has been made towards developing electronic architectures tuned to implement artificial neural networks that exhibit improved computational speed and accuracy. Here, we propose a new architecture for a fully optical neural network that, in principle, could offer an enhancement in computational speed and power efficiency over state-of-the-art electronics for conventional inference tasks. We experimentally demonstrate the essential part of the concept using a programmable nanophotonic processor featuring a cascaded array of 56 programmable Mach-Zehnder interferometers in a silicon photonic integrated circuit and show its utility for vowel recognition.
Morton, L L
1994-08-01
Identifying disabilities in word-attack, word-recognition, or reading comprehension, allowed for four categories of reading disability: (1) reading comprehension only (RC), (2) word-attack plus comprehension (WA+RC), (3) word-attack, word-recognition, and comprehension (WA+WR+RC), and (4) word-attack but not comprehension (WA-RC). Along with age-matched controls (AMC) and developmental-delay controls (DDC), the disabled were tested on a directed-attention dichotic task using consonant-vowel combinations. Laterality results for each place of articulation (i.e., bilabial, alveolar, and velar) selectively attested to greater left hemisphere involvement or engagement for the RC group and greater right hemisphere involvement or engagement for the WA+RC group. Performance of the other two disabled groups was consistent with less efficient right hemisphere involvement or callosal transfer. Implications for theory, research, and remediation are discussed.
Vowel Space Characteristics and Vowel Identification Accuracy
ERIC Educational Resources Information Center
Neel, Amy T.
2008-01-01
Purpose: To examine the relation between vowel production characteristics and intelligibility. Method: Acoustic characteristics of 10 vowels produced by 45 men and 48 women from the J. M. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler (1995) study were examined and compared with identification accuracy. Global (mean f0, F1, and F2;…
Lip Movements for an Unfamiliar Vowel: Mandarin Front Rounded Vowel Produced by Japanese Speakers
ERIC Educational Resources Information Center
Saito, Haruka
2016-01-01
Purpose: The study was aimed at investigating what kind of lip positions are selected by Japanese adult participants for an unfamiliar Mandarin rounded vowel /y/ and if their lip positions are related to and/or differentiated from those for their native vowels. Method: Videotaping and post hoc tracking measurements for lip positions, namely…
The Role of Consonant/Vowel Organization in Perceptual Discrimination
ERIC Educational Resources Information Center
Chetail, Fabienne; Drabs, Virginie; Content, Alain
2014-01-01
According to a recent hypothesis, the CV pattern (i.e., the arrangement of consonant and vowel letters) constrains the mental representation of letter strings, with each vowel or vowel cluster being the core of a unit. Six experiments with the same/different task were conducted to test whether this structure is extracted prelexically. In the…
A Componential Approach to Training Reading Skills.
1983-03-17
1 syllable, mixed vowels A2 16 one-syll., 4 two-syll., mixed vowels A3 14 one-syll., 6 two-syll., mixed vowels A4 All two-syllable, mixed vowels* B1 ...06520 I ERIC Facility-Acquisitions I Dr. John S. Brown 4833 Rugby Avenue XEROX Palo Alto Research Center Bethesda, MD 20014 3333 Coyote Road Palo Alto, CA
Vowel Harmony Is a Basic Phonetic Rule of the Turkic Languages
ERIC Educational Resources Information Center
Shoibekova, Gaziza B.; Odanova, Sagira A.; Sultanova, Bibigul M.; Yermekova, Tynyshtyk N.
2016-01-01
The present study comprehensively analyzes vowel harmony as an important phonetic rule in Turkic languages. Recent changes in the vowel harmony potential of Turkic sounds caused by linguistic and extra-linguistic factors were described. Vowels in the Kazakh, Turkish, and Uzbek language were compared. The way this or that phoneme sounded in the…
Vowel Formant Values in Hearing and Hearing-Impaired Children: A Discriminant Analysis
ERIC Educational Resources Information Center
Ozbic, Martina; Kogovsek, Damjana
2010-01-01
Hearing-impaired speakers show changes in vowel production and formant pitch and variability, as well as more cases of overlapping between vowels and more restricted formant space, than hearing speakers; consequently their speech is less intelligible. The purposes of this paper were to determine the differences in vowel formant values between 32…
Biomechanically Preferred Consonant-Vowel Combinations Fail to Appear in Adult Spoken Corpora
ERIC Educational Resources Information Center
Whalen, D. H.; Giulivi, Sara; Nam, Hosung; Levitt, Andrea G.; Halle, Pierre; Goldstein, Louis M.
2012-01-01
Certain consonant/vowel (CV) combinations are more frequent than would be expected from the individual C and V frequencies alone, both in babbling and, to a lesser extent, in adult language, based on dictionary counts: Labial consonants co-occur with central vowels more often than chance would dictate; coronals co-occur with front vowels, and…
Study of acoustic correlates associate with emotional speech
NASA Astrophysics Data System (ADS)
Yildirim, Serdar; Lee, Sungbok; Lee, Chul Min; Bulut, Murtaza; Busso, Carlos; Kazemzadeh, Ebrahim; Narayanan, Shrikanth
2004-10-01
This study investigates the acoustic characteristics of four different emotions expressed in speech. The aim is to obtain detailed acoustic knowledge on how a speech signal is modulated by changes from neutral to a certain emotional state. Such knowledge is necessary for automatic emotion recognition and classification and emotional speech synthesis. Speech data obtained from two semi-professional actresses are analyzed and compared. Each subject produces 211 sentences with four different emotions; neutral, sad, angry, happy. We analyze changes in temporal and acoustic parameters such as magnitude and variability of segmental duration, fundamental frequency and the first three formant frequencies as a function of emotion. Acoustic differences among the emotions are also explored with mutual information computation, multidimensional scaling and acoustic likelihood comparison with normal speech. Results indicate that speech associated with anger and happiness is characterized by longer duration, shorter interword silence, higher pitch and rms energy with wider ranges. Sadness is distinguished from other emotions by lower rms energy and longer interword silence. Interestingly, the difference in formant pattern between [happiness/anger] and [neutral/sadness] are better reflected in back vowels such as /a/(/father/) than in front vowels. Detailed results on intra- and interspeaker variability will be reported.
Morton, L L; Siegel, L S
1991-02-01
Twenty reading comprehension-disabled (CD) and 20 reading comprehension and word recognition-disabled (CWRD), right-handed male children were matched with 20 normal-achieving age-matched controls and 20 normal-achieving reading level-matched controls and tested for left ear report on dichotic listening tasks using digits and consonant-vowel combinations (CVs). Left ear report for CVs and digits did not correlate for any of the groups. Both reading-disabled groups showed lower left ear report on digits. On CVs the CD group showed a high left ear report but only when there were no priming precursors, such as directions to attend right first and to process digits first. Priming effects interfered with the processing of both digits and CVs. Theoretically, the CWRD group seems to be characterized by a depressed right hemisphere, whereas the CD group may have a more labile right hemisphere, perhaps tending to overengagement for CV tasks but vulnerable to situational precursors in the form of priming effects. Implications extend to (1) subtyping practices in research with the learning-disabled, (2) inferences drawn from studies using different dichotic stimuli, and (3) the neuropsychology of reading disorders.
Voice recognition through phonetic features with Punjabi utterances
NASA Astrophysics Data System (ADS)
Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.
2017-07-01
This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.
The Effect of Lexical Content on Dichotic Speech Recognition in Older Adults.
Findlen, Ursula M; Roup, Christina M
2016-01-01
Age-related auditory processing deficits have been shown to negatively affect speech recognition for older adult listeners. In contrast, older adults gain benefit from their ability to make use of semantic and lexical content of the speech signal (i.e., top-down processing), particularly in complex listening situations. Assessment of auditory processing abilities among aging adults should take into consideration semantic and lexical content of the speech signal. The purpose of this study was to examine the effects of lexical and attentional factors on dichotic speech recognition performance characteristics for older adult listeners. A repeated measures design was used to examine differences in dichotic word recognition as a function of lexical and attentional factors. Thirty-five older adults (61-85 yr) with sensorineural hearing loss participated in this study. Dichotic speech recognition was evaluated using consonant-vowel-consonant (CVC) word and nonsense CVC syllable stimuli administered in the free recall, directed recall right, and directed recall left response conditions. Dichotic speech recognition performance for nonsense CVC syllables was significantly poorer than performance for CVC words. Dichotic recognition performance varied across response condition for both stimulus types, which is consistent with previous studies on dichotic speech recognition. Inspection of individual results revealed that five listeners demonstrated an auditory-based left ear deficit for one or both stimulus types. Lexical content of stimulus materials affects performance characteristics for dichotic speech recognition tasks in the older adult population. The use of nonsense CVC syllable material may provide a way to assess dichotic speech recognition performance while potentially lessening the effects of lexical content on performance (i.e., measuring bottom-up auditory function both with and without top-down processing). American Academy of Audiology.
Frustration in the pattern formation of polysyllabic words
NASA Astrophysics Data System (ADS)
Hayata, Kazuya
2016-12-01
A novel frustrated system is given for the analysis of (m + 1)-syllabled vocal sounds for languages with the m-vowel system, where the varieties of vowels are assumed to be m (m > 2). The necessary and sufficient condition for observing the sound frustration is that the configuration of m vowels in an m-syllabled word has a preference for the ‘repulsive’ type, in which there is no duplication of an identical vowel. For languages that meet this requirement, no (m + 1)-syllabled word can in principle select the present type because at most m different vowels are available and consequently the duplicated use of an identical vowel is inevitable. For languages showing a preference for the ‘attractive’ type, where an identical vowel aggregates in a word, there arises no such conflict. In this paper, we first elucidate for Arabic with m = 3 how to deal with the conflicting situation, where a statistical approach based on the chi-square testing is employed. In addition to the conventional three-vowel system, analyses are made also for Russian, where a polysyllabic word contains both a stressed and an indeterminate vowel. Through the statistical analyses the selection scheme for quadrisyllabic configurations is found to be strongly dependent on the parts of speech as well as the gender of nouns. In order to emphasize the relevance to the sound model of binary oppositions, analyzed results of Greek verbs are also given.
Roy, Nelson; Nissen, Shawn L; Dromey, Christopher; Sapir, Shimon
2009-01-01
In a preliminary study, we documented significant changes in formant transitions associated with successful manual circumlaryngeal treatment (MCT) of muscle tension dysphonia (MTD), suggesting improvement in speech articulation. The present study explores further the effects of MTD on vowel articulation by means of additional vowel acoustic measures. Pre- and post-treatment audio recordings of 111 women with MTD were analyzed acoustically using two measures: vowel space area (VSA) and vowel articulation index (VAI), constructed using the first (F1) and second (F2) formants of 4 point vowels/ a, i, ae, u/, extracted from eight words within a standard reading passage. Pairwise t-tests revealed significant increases in both VSA and VAI, confirming that successful treatment of MTD is associated with vowel space expansion. Although MTD is considered a voice disorder, its treatment with MCT appears to positively affect vocal tract dynamics. While the precise mechanism underlying vowel space expansion remains unknown, improvements may be related to lowering of the larynx, expanding oropharyngeal space, and improving articulatory movements. The reader will be able to: (1) describe possible articulatory changes associated with successful treatment of muscle tension dysphonia; (2) describe two acoustic methods to assess vowel centralization and decentralization, and; (3) understand the basis for viewing muscle tension dysphonia as a disorder not solely confined to the larynx.
Cross-language comparisons of contextual variation in the production and perception of vowels
NASA Astrophysics Data System (ADS)
Strange, Winifred
2005-04-01
In the last two decades, a considerable amount of research has investigated second-language (L2) learners problems with perception and production of non-native vowels. Most studies have been conducted using stimuli in which the vowels are produced and presented in simple, citation-form (lists) monosyllabic or disyllabic utterances. In my laboratory, we have investigated the spectral (static/dynamic formant patterns) and temporal (syllable duration) variation in vowel productions as a function of speech-style (list/sentence utterances), speaking rate (normal/rapid), sentence focus (narrow focus/post-focus) and phonetic context (voicing/place of surrounding consonants). Data will be presented for a set of languages that include large and small vowel inventories, stress-, syllable-, and mora-timed prosody, and that vary in the phonological/phonetic function of vowel length, diphthongization, and palatalization. Results show language-specific patterns of contextual variation that affect the cross-language acoustic similarity of vowels. Research on cross-language patterns of perceived phonetic similarity by naive listeners suggests that listener's knowledge of native language (L1) patterns of contextual variation influences their L1/L2 similarity judgments and subsequently, their discrimination of L2 contrasts. Implications of these findings for assessing L2 learners perception of vowels and for developing laboratory training procedures to improve L2 vowel perception will be discussed. [Work supported by NIDCD.
NASA Astrophysics Data System (ADS)
Gilichinskaya, Yana D.; Hisagi, Miwako; Law, Franzo F.; Berkowitz, Shari; Ito, Kikuyo
2005-04-01
Contextual variability of vowels in three languages with large vowel inventories was examined previously. Here, variability of vowels in two languages with small inventories (Russian, Japanese) was explored. Vowels were produced by three female speakers of each language in four contexts: (Vba) disyllables and in 3-syllable nonsense words (gaC1VC2a) embedded within carrier sentences; contexts included bilabial stops (bVp) in normal rate sentences and alveolar stops (dVt) in both normal and rapid rate sentences. Dependent variables were syllable durations and formant frequencies at syllable midpoint. Results showed very little variation across consonant and rate conditions in formants for /i/ in both languages. Japanese short /u, o, a/ showed fronting (F2 increases) in alveolar context relative to labial context (1.3-2.0 Barks), which was more pronounced in rapid sentences. Fronting of Japanese long vowels was less pronounced (0.3 to 0.9 Barks). Japanese long/short vowel ratios varied with speaking style (syllables versus sentences) and speaking rate. All Russian vowels except /i/ were fronted in alveolar vs labial context (1.1-3.1 Barks) but showed little change in either spectrum or duration with speaking rate. Comparisons of these patterns of variability with American English, French and German vowel results will be discussed.
The Influence of Working Memory on Reading Comprehension in Vowelized versus Non-Vowelized Arabic
ERIC Educational Resources Information Center
Elsayyad, Hossam; Everatt, John; Mortimore, Tilly; Haynes, Charles
2017-01-01
Unlike English, short vowel sounds in Arabic are represented by diacritics rather than letters. According to the presence and absence of these vowel diacritics, the Arabic script can be considered more or less transparent in comparison with other orthographies. The purpose of this study was to investigate the contribution of working memory to…
ERIC Educational Resources Information Center
Shport, Irina A.
2018-01-01
Purpose: The goal of this study was to test whether fronting and lengthening of lax vowels influence the perception of femininity in listeners whose dialect is characterized as already having relatively fronted and long lax vowels in male and female speech. Method: Sixteen English words containing the /? ? ? ?/ vowels were produced by a male…
Textual Input Enhancement for Vowel Blindness: A Study with Arabic ESL Learners
ERIC Educational Resources Information Center
Alsadoon, Reem; Heift, Trude
2015-01-01
This study explores the impact of textual input enhancement on the noticing and intake of English vowels by Arabic L2 learners of English. Arabic L1 speakers are known to experience "vowel blindness," commonly defined as a difficulty in the textual decoding and encoding of English vowels due to an insufficient decoding of the word form.…
The Role of Vowels in Reading Semitic Scripts: Data from Arabic and Hebrew.
ERIC Educational Resources Information Center
Abu-Rabia, Salim
2001-01-01
Investigates the effect of vowels and context on reading accuracy of skilled adult native Arabic speakers in Arabic and in Hebrew, their second language. Reveals a significant effect for vowels and for context across all reading conditions in Arabic and Hebrew. Finds that the vowelized texts in Arabic and the pointed and unpointed texts in Hebrew…
Early integration of vowel and pitch processing: a mismatch negativity study.
Lidji, Pascale; Jolicoeur, Pierre; Kolinsky, Régine; Moreau, Patricia; Connolly, John F; Peretz, Isabelle
2010-04-01
Several studies have explored the processing specificity of music and speech, but only a few have addressed the processing autonomy of their fundamental components: pitch and phonemes. Here, we examined the additivity of the mismatch negativity (MMN) indexing the early interactions between vowels and pitch when sung. Event-related potentials (ERPs) were recorded while participants heard frequent sung vowels and rare stimuli deviating in pitch only, in vowel only, or in both pitch and vowel. The task was to watch a silent movie while ignoring the sounds. All three types of deviants elicited both an MMN and a P3a ERP component. The observed MMNs were of similar amplitude for the three types of deviants and the P3a was larger for double deviants. The MMNs to deviance in vowel and deviance in pitch were not additive. The underadditivity of the MMN responses suggests that vowel and pitch differences are processed by interacting neural networks. The results indicate that vowel and pitch are processed as integrated units, even at a pre-attentive level. Music-processing specificity thus rests on more complex dimensions of music and speech. 2009 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Intrinsic fundamental frequency of vowels is moderated by regional dialect
Jacewicz, Ewa; Fox, Robert Allen
2015-01-01
There has been a long-standing debate whether the intrinsic fundamental frequency (IF0) of vowels is an automatic consequence of articulation or whether it is independently controlled by speakers to perceptually enhance vowel contrasts along the height dimension. This paper provides evidence from regional variation in American English that IF0 difference between high and low vowels is, in part, controlled and varies across dialects. The sources of this F0 control are socio-cultural and cannot be attributed to differences in the vowel inventory size. The socially motivated enhancement was found only in prosodically prominent contexts. PMID:26520352
ERIC Educational Resources Information Center
Georgeton, Laurianne; Antolík, Tanja Kocjancic; Fougeron, Cécile
2016-01-01
Purpose: Phonetic variation due to domain initial strengthening was investigated with respect to the acoustic and articulatory distinctiveness of vowels within a subset of the French oral vowel system /i, e, ?, a, o, u/, organized along 4 degrees of height for the front vowels and 2 degrees of backness at the close and midclose height levels.…
Jafari, Narges; Yadegari, Fariba; Jalaie, Shohreh
2016-11-01
Vowel production in essence is auditorily controlled; hence, the role of the auditory feedback in vowel production is very important. The purpose of this study was to compare formant frequencies and vowel space in Persian-speaking deaf children with cochlear implantation (CI), hearing-impaired children with hearing aid (HA), and their normal-hearing (NH) peers. A total of 40 prelingually children with hearing impairment and 20 NH groups participated in this study. Participants were native Persian speakers. The average of first formant frequency (F 1 ) and second formant frequency (F 2 ) of the six vowels were measured using Praat software (version 5.1.44). One-way analysis of variance (ANOVA) was used to analyze the differences between the three3 groups. The mean value of F 1 for vowel /i/ was significantly different (between CI and NH children and also between HA and NH groups) (F 2, 57 = 9.229, P < 0.001). For vowel /a/, the mean value of F 1 was significantly different (between HA and NH groups) (F 2, 57 = 3.707, P < 0.05). Regarding the second formant frequency, a post hoc Tukey test revealed that the differences were between HA and NH children (P < 0.05). F 2 for vowel /o/ was significantly different (F 2, 57 = 4.572, P < 0.05). Also, the mean value of F 2 for vowel /a/ was significantly different (F 2, 57 = 3.184, P < 0.05). About 1 year after implantation, the formants shift closer to those of the NH listeners who tend to have more expanded vowel spaces than hearing-impaired listeners with hearing aids. Probably, this condition is because CI has a subtly positive impact on the place of articulation of vowels. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Effects of gender on the production of emphasis in Jordanian Arabic: A sociophonetic study
NASA Astrophysics Data System (ADS)
Abudalbuh, Mujdey D.
Emphasis, or pharyngealization, is a distinctive phonetic phenomenon and a phonemic feature of Semitic languages such as Arabic and Hebrew. The goal of this study is to investigate the effect of gender on the production of emphasis in Jordanian Arabic as manifested on the consonants themselves as well as on the adjacent vowels. To this end, 22 speakers of Jordanian Arabic, 12 males and 10 females, participated in a production experiment where they produced monosyllabic minimal CVC pairs contrasted on the basis of the presence of a word-initial plain or emphatic consonant. Several acoustic parameters were measured including Voice Onset Time (VOT), friction duration, the spectral mean of the friction noise, vowel duration and the formant frequencies (F1-F3) of the target vowels. The results of this study indicated that VOT is a reliable acoustic correlate of emphasis in Jordanian Arabic only for voiceless stops whose emphatic VOT was significantly shorter than their plain VOT. Also, emphatic fricatives were shorter than plain fricatives. Emphatic vowels were found to be longer than plain vowels. Overall, the results showed that emphatic vowels were characterized by a raised F1 at the onset and midpoint of the vowel, lowered F2 throughout the vowel, and raised F3 at the onset and offset of the vowel relative to the corresponding values of the plain vowels. Finally, results using Nearey's (1978) normalization algorithm indicated that emphasis was more acoustically evident in the speech of males than in the speech of females in terms of the F-pattern. The results are discussed from a sociolinguistic perspective in light of the previous literature and the notion of linguistic feminism.
A narrow band pattern-matching model of vowel perception
NASA Astrophysics Data System (ADS)
Hillenbrand, James M.; Houde, Robert A.
2003-02-01
The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a ``flooring'' procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /h
Graf, M; Kaping, D; Bülthoff, H H
2005-03-01
How do observers recognize objects after spatial transformations? Recent neurocomputational models have proposed that object recognition is based on coordinate transformations that align memory and stimulus representations. If the recognition of a misoriented object is achieved by adjusting a coordinate system (or reference frame), then recognition should be facilitated when the object is preceded by a different object in the same orientation. In the two experiments reported here, two objects were presented in brief masked displays that were in close temporal contiguity; the objects were in either congruent or incongruent picture-plane orientations. Results showed that naming accuracy was higher for congruent than for incongruent orientations. The congruency effect was independent of superordinate category membership (Experiment 1) and was found for objects with different main axes of elongation (Experiment 2). The results indicate congruency effects for common familiar objects even when they have dissimilar shapes. These findings are compatible with models in which object recognition is achieved by an adjustment of a perceptual coordinate system.
Orientation-Invariant Object Recognition: Evidence from Repetition Blindness
ERIC Educational Resources Information Center
Harris, Irina M.; Dux, Paul E.
2005-01-01
The question of whether object recognition is orientation-invariant or orientation-dependent was investigated using a repetition blindness (RB) paradigm. In RB, the second occurrence of a repeated stimulus is less likely to be reported, compared to the occurrence of a different stimulus, if it occurs within a short time of the first presentation.…
Vowels Development in Babbling of typically developing 6-to-12-month old Persian-learning Infants.
Fotuhi, Mina; Yadegari, Fariba; Teymouri, Robab
2017-10-01
Pre-linguistic vocalizations including early consonants, vowels, and their combinations into syllables are considered as important predictors of the speech and language development. The purpose of this study was to examine vowel development in babblings of normally developing Persian-learning infants. Eight typically developing 6-8-month-old Persian-learning infants (3 boys and 5 girls) participated in this 4-month longitudinal descriptive-analytic study. A weekly 30-60-minute audio- and video-recording was obtained at home from the comfort state vocalizations of infants and the mother-child interactions. A total of 74:02:03 hours of vocalizations were phonetically transcribed. Seven vowels comprising /i/,/e/,/a/,/u/,/o/,/ɑ/, and /ә/ were identified in the babblings. The inter-rater reliability was obtained for 20% of vocalizations. The data were analyzed by repeated measures ANOVA and Pearson's correlation coefficient using SPSS software version 20. The results showed that two vowels /a/ (46.04) and /e/ (23.60) were produced with the highest mean frequency of occurrence, respectively. Regarding front/back dimension, the front vowels were the most prominent ones (71.87); in terms of height, low (46.78) and mid (32.45) vowels occurred maximally. A good inter-rater reliability was obtained (0.99, P < .01). The increased frequency of occurrence of the low and mid front vowels in the current study was consistent with previous studies on the emergence of vowels in pre-linguistic vocalization in other languages.
Orienting to face expression during encoding improves men's recognition of own gender faces.
Fulton, Erika K; Bulluck, Megan; Hertzog, Christopher
2015-10-01
It is unclear why women have superior episodic memory of faces, but the benefit may be partially the result of women engaging in superior processing of facial expressions. Therefore, we hypothesized that orienting instructions to attend to facial expression at encoding would significantly improve men's memory of faces and possibly reduce gender differences. We directed 203 college students (122 women) to study 120 faces under instructions to orient to either the person's gender or their emotional expression. They later took a recognition test of these faces by either judging whether they had previously studied the same person or that person with the exact same expression; the latter test evaluated recollection of specific facial details. Orienting to facial expressions during encoding significantly improved men's recognition of own-gender faces and eliminated the advantage that women had for male faces under gender orienting instructions. Although gender differences in spontaneous strategy use when orienting to faces cannot fully account for gender differences in face recognition, orienting men to facial expression during encoding is one way to significantly improve their episodic memory for male faces. Copyright © 2015 Elsevier B.V. All rights reserved.
Exceptionality in vowel harmony
NASA Astrophysics Data System (ADS)
Szeredi, Daniel
Vowel harmony has been of great interest in phonological research. It has been widely accepted that vowel harmony is a phonetically natural phenomenon, which means that it is a common pattern because it provides advantages to the speaker in articulation and to the listener in perception. Exceptional patterns proved to be a challenge to the phonetically grounded analysis as they, by their nature, introduce phonetically disadvantageous sequences to the surface form, that consist of harmonically different vowels. Such forms are found, for example in the Finnish stem tuoli 'chair' or in the Hungarian suffixed form hi:d-hoz 'to the bridge', both word forms containing a mix of front and back vowels. There has recently been evidence shown that there might be a phonetic level explanation for some exceptional patterns, as the possibility that some vowels participating in irregular stems (like the vowel [i] in the Hungarian stem hi:d 'bridge' above) differ in some small phonetic detail from vowels in regular stems. The main question has not been raised, though: does this phonetic detail matter for speakers? Would they use these minor differences when they have to categorize a new word as regular or irregular? A different recent trend in explaining morphophonological exceptionality by looking at the phonotactic regularities characteristic of classes of stems based on their morphological behavior. Studies have shown that speakers are aware of these regularities, and use them as cues when they have to decide what class a novel stem belongs to. These sublexical phonotactic regularities have already been shown to be present in some exceptional patterns vowel harmony, but many questions remain open: how is learning the static generalization linked to learning the allomorph selection facet of vowel harmony? How much does the effect of consonants on vowel harmony matter, when compared to the effect of vowel-to-vowel correspondences? This dissertation aims to test these two ideas -- that speakers use phonetic cues and/or that they use sublexical phonotactic regularities in categorizing stems as regular or irregular -- and attempt to answer the more detailed questions, like the effect of consonantal patterns on exceptional patterns or the link between allomorph selection and static phonotactic generalizations as well. The phonetic hypothesis is tested on the Hungarian antiharmonicity pattern (stems with front vowels consistently selecting back suffixes, like in the example hi:d-hoz 'to the bridge' above), and the results indicate that while there may be some small phonetic differences between vowels in regular and irregular stems, speakers do not use these, or even enhanced differences when they have to categorize stems. The sublexical hypothesis is tested and confirmed by looking at the disharmonicity pattern in Finnish. In Finnish, stems that contain both back and certain front vowels are frequent and perfectly grammatical, like in the example tuoli 'chair' above, while the mixing of back and some other front vowels is very rare and mostly confined to loanwords like amatoori 'amateur'. It will be seen that speakers do use sublexical phonotactic regularities to decide on the acceptability of novel stems, but certain patterns that are phonetically or phonologically more natural (vowel-to-vowel correspondences) seem to matter much more than other effects (like consonantal effects). Finally, a computational account will be given on how exceptionality might be learned by speakers by using maximum entropy grammars available in the literature to simulate the acquisition of the Finnish disharmonicity pattern. It will be shown that in order to clearly model the overall behavior on the exact pattern, the learner has to have access not only to the lexicon, but also to the allomorph selection patterns in the language.
Bizley, Jennifer K; Walker, Kerry M M; King, Andrew J; Schnupp, Jan W H
2013-01-01
Spectral timbre is an acoustic feature that enables human listeners to determine the identity of a spoken vowel. Despite its importance to sound perception, little is known about the neural representation of sound timbre and few psychophysical studies have investigated timbre discrimination in non-human species. In this study, ferrets were positively conditioned to discriminate artificial vowel sounds in a two-alternative-forced-choice paradigm. Animals quickly learned to discriminate the vowel sound /u/ from /ε/ and were immediately able to generalize across a range of voice pitches. They were further tested in a series of experiments designed to assess how well they could discriminate these vowel sounds under different listening conditions. First, a series of morphed vowels was created by systematically shifting the location of the first and second formant frequencies. Second, the ferrets were tested with single formant stimuli designed to assess which spectral cues they could be using to make their decisions. Finally, vowel discrimination thresholds were derived in the presence of noise maskers presented from either the same or a different spatial location. These data indicate that ferrets show robust vowel discrimination behavior across a range of listening conditions and that this ability shares many similarities with human listeners.
Liu, Chang; Jin, Su-Hyun
2015-11-01
This study investigated whether native listeners processed speech differently from non-native listeners in a speech detection task. Detection thresholds of Mandarin Chinese and Korean vowels and non-speech sounds in noise, frequency selectivity, and the nativeness of Mandarin Chinese and Korean vowels were measured for Mandarin Chinese- and Korean-native listeners. The two groups of listeners exhibited similar non-speech sound detection and frequency selectivity; however, the Korean listeners had better detection thresholds of Korean vowels than Chinese listeners, while the Chinese listeners performed no better at Chinese vowel detection than the Korean listeners. Moreover, thresholds predicted from an auditory model highly correlated with behavioral thresholds of the two groups of listeners, suggesting that detection of speech sounds not only depended on listeners' frequency selectivity, but also might be affected by their native language experience. Listeners evaluated their native vowels with higher nativeness scores than non-native listeners. Native listeners may have advantages over non-native listeners when processing speech sounds in noise, even without the required phonetic processing; however, such native speech advantages might be offset by Chinese listeners' lower sensitivity to vowel sounds, a characteristic possibly resulting from their sparse vowel system and their greater cognitive and attentional demands for vowel processing.
Bizley, Jennifer K; Walker, Kerry MM; King, Andrew J; Schnupp, Jan WH
2013-01-01
Spectral timbre is an acoustic feature that enables human listeners to determine the identity of a spoken vowel. Despite its importance to sound perception, little is known about the neural representation of sound timbre and few psychophysical studies have investigated timbre discrimination in non-human species. In this study, ferrets were positively conditioned to discriminate artificial vowel sounds in a two-alternative-forced-choice paradigm. Animals quickly learned to discriminate the vowel sound /u/ from /ε/, and were immediately able to generalize across a range of voice pitches. They were further tested in a series of experiments designed to assess how well they could discriminate these vowel sounds under different listening conditions. First, a series of morphed vowels was created by systematically shifting the location of the first and second formant frequencies. Second, the ferrets were tested with single formant stimuli designed to assess which spectral cues they could be using to make their decisions. Finally, vowel discrimination thresholds were derived in the presence of noise maskers presented from either the same or a different spatial location. These data indicate that ferrets show robust vowel discrimination behavior across a range of listening conditions and that this ability shares many similarities with human listeners. PMID:23297909
Typological Asymmetries in Round Vowel Harmony: Support from Artificial Grammar Learning
Finley, Sara
2012-01-01
Providing evidence for the universal tendencies of patterns in the world’s languages can be difficult, as it is impossible to sample all possible languages, and linguistic samples are subject to interpretation. However, experimental techniques such as artificial grammar learning paradigms make it possible to uncover the psychological reality of claimed universal tendencies. This paper addresses learning of phonological patterns (systematic tendencies in the sounds in language). Specifically, I explore the role of phonetic grounding in learning round harmony, a phonological process in which words must contain either all round vowels ([o, u]) or all unround vowels ([i, e]). The phonetic precursors to round harmony are such that mid vowels ([o, e]), which receive the greatest perceptual benefit from harmony, are most likely to trigger harmony. High vowels ([i, u]), however, are cross-linguistically less likely to trigger round harmony. Adult participants were exposed to a miniature language that contained a round harmony pattern in which the harmony source triggers were either high vowels ([i, u]) (poor harmony source triggers) or mid vowels ([o, e]) (ideal harmony source triggers). Only participants who were exposed to the ideal mid vowel harmony source triggers were successfully able to generalize the harmony pattern to novel instances, suggesting that perception and phonetic naturalness play a role in learning. PMID:23264713
Phonetic Modification of Vowel Space in Storybook Speech to Infants up to 2 Years of Age
Burnham, Evamarie B.; Wieland, Elizabeth A.; Kondaurova, Maria V.; McAuley, J. Devin; Bergeson, Tonya R.
2015-01-01
Purpose A large body of literature has indicated vowel space area expansion in infant-directed (ID) speech compared with adult-directed (AD) speech, which may promote language acquisition. The current study tested whether this expansion occurs in storybook speech read to infants at various points during their first 2 years of life. Method In 2 studies, mothers read a storybook containing target vowels in ID and AD speech conditions. Study 1 was longitudinal, with 11 mothers recorded when their infants were 3, 6, and 9 months old. Study 2 was cross-sectional, with 48 mothers recorded when their infants were 3, 9, 13, or 20 months old (n = 12 per group). The 1st and 2nd formants of vowels /i/, /ɑ/, and /u/ were measured, and vowel space area and dispersion were calculated. Results Across both studies, 1st and/or 2nd formant frequencies shifted systematically for /i/ and /u/ vowels in ID compared with AD speech. No difference in vowel space area or dispersion was found. Conclusions The results suggest that a variety of communication and situational factors may affect phonetic modifications in ID speech, but that vowel space characteristics in speech to infants stay consistent across the first 2 years of life. PMID:25659121
Carey, Daniel; Miquel, Marc E.; Evans, Bronwen G.; Adank, Patti; McGettigan, Carolyn
2017-01-01
Abstract Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants’ vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST. PMID:28334401
NASA Astrophysics Data System (ADS)
Mattock, Karen; Rvachew, Susan; Polka, Linda; Turner, Sara
2005-04-01
It is well established that normally developing infants typically enter the canonical babbling stage of production between 6 and 8 months of age. However, whether the linguistic environment affects babbling, either in terms of the phonetic inventory of vowels produced by infants [Oller & Eiler (1982)] or the acoustics of vowel formants [Boysson-Bardies et al. (1989)] is controversial. The spontaneous speech of 42 Canadian English- and Canadian French-learning infants aged 8 to 11, 12 to 15 and 16 to 18 months of age was recorded and digitized to yield a total of 1253 vowels that were spectrally analyzed and statistically compared for differences in first and second formant frequencies. Language-specific influences on vowel acoustics were hypothesized. Preliminary results reveal changes in formant frequencies as a function of age and language background. There is evidence of decreases over age in the F1 values of French but not English infants vowels, and decreases over age in the F2 values of English but not French infants vowels. The notion of an age-related shift in infants attention to language-specific acoustic features and the implications of this for early vocal development as well as for the production of Canadian English and Canadian French vowels will be discussed.
Carey, Daniel; Miquel, Marc E; Evans, Bronwen G; Adank, Patti; McGettigan, Carolyn
2017-05-01
Imitating speech necessitates the transformation from sensory targets to vocal tract motor output, yet little is known about the representational basis of this process in the human brain. Here, we address this question by using real-time MR imaging (rtMRI) of the vocal tract and functional MRI (fMRI) of the brain in a speech imitation paradigm. Participants trained on imitating a native vowel and a similar nonnative vowel that required lip rounding. Later, participants imitated these vowels and an untrained vowel pair during separate fMRI and rtMRI runs. Univariate fMRI analyses revealed that regions including left inferior frontal gyrus were more active during sensorimotor transformation (ST) and production of nonnative vowels, compared with native vowels; further, ST for nonnative vowels activated somatomotor cortex bilaterally, compared with ST of native vowels. Using test representational similarity analysis (RSA) models constructed from participants' vocal tract images and from stimulus formant distances, we found that RSA searchlight analyses of fMRI data showed either type of model could be represented in somatomotor, temporal, cerebellar, and hippocampal neural activation patterns during ST. We thus provide the first evidence of widespread and robust cortical and subcortical neural representation of vocal tract and/or formant parameters, during prearticulatory ST. © The Author 2017. Published by Oxford University Press.
Ha, Seunghee; Jung, Seungeun; Koh, Kyung S
2018-06-01
The purpose of this study was to determine whether test-retest nasalance score variability differs between Korean children with and without cleft palate (CP) and vowel context influences variability in nasalance score. Thirty-four 3-to-5-year-old children with and without CP participated in the study. Three 8-syllable speech stimuli devoid of nasal consonants were used for data collection. Each stimulus was loaded with high, low, or mixed vowels, respectively. All participants were asked to repeat the speech stimuli twice after the examiner, and an immediate test-retest nasalance score was assessed with no headgear change. Children with CP exhibited significantly greater absolute difference in nasalance scores than children without CP. Variability in nasalance scores was significantly different for the vowel context, and the high vowel sentence showed a significantly larger difference in nasalance scores than the low vowel sentence. The cumulative frequencies indicated that, for children with CP in the high vowel sentence, only 8 of 17 (47%) repeated nasalance scores were within 5 points. Test-retest nasalance score variability was greater for children with CP than children without CP, and there was greater variability for the high vowel sentence(s) for both groups. Copyright © 2018 Elsevier B.V. All rights reserved.
Garadat, Soha N.; Zwolan, Teresa A.; Pfingst, Bryan E.
2013-01-01
Previous studies in our laboratory showed that temporal acuity as assessed by modulation detection thresholds (MDTs) varied across activation sites and that this site-to-site variability was subject specific. Using two 10-channel MAPs, the previous experiments showed that processor MAPs that had better across-site mean (ASM) MDTs yielded better speech recognition than MAPs with poorer ASM MDTs tested in the same subject. The current study extends our earlier work on developing more optimal fitting strategies to test the feasibility of using a site-selection approach in the clinical domain. This study examined the hypothesis that revising the clinical speech processor MAP for cochlear implant (CI) recipients by turning off selected sites that have poorer temporal acuity and reallocating frequencies to the remaining electrodes would lead to improved speech recognition. Twelve CI recipients participated in the experiments. We found that site selection procedure based on MDTs in the presence of a masker resulted in improved performance on consonant recognition and recognition of sentences in noise. In contrast, vowel recognition was poorer with the experimental MAP than with the clinical MAP, possibly due to reduced spectral resolution when sites were removed from the experimental MAP. Overall, these results suggest a promising path for improving recipient outcomes using personalized processor-fitting strategies based on a psychophysical measure of temporal acuity. PMID:23881208
Speaking rate effects on locus equation slope.
Berry, Jeff; Weismer, Gary
2013-11-01
A locus equation describes a 1st order regression fit to a scatter of vowel steady-state frequency values predicting vowel onset frequency values. Locus equation coefficients are often interpreted as indices of coarticulation. Speaking rate variations with a constant consonant-vowel form are thought to induce changes in the degree of coarticulation. In the current work, the hypothesis that locus slope is a transparent index of coarticulation is examined through the analysis of acoustic samples of large-scale, nearly continuous variations in speaking rate. Following the methodological conventions for locus equation derivation, data pooled across ten vowels yield locus equation slopes that are mostly consistent with the hypothesis that locus equations vary systematically with coarticulation. Comparable analyses between different four-vowel pools reveal variations in the locus slope range and changes in locus slope sensitivity to rate change. Analyses across rate but within vowels are substantially less consistent with the locus hypothesis. Taken together, these findings suggest that the practice of vowel pooling exerts a non-negligible influence on locus outcomes. Results are discussed within the context of articulatory accounts of locus equations and the effects of speaking rate change.
Ho, Michael R; Pezdek, Kathy
2016-06-01
The cross-race effect (CRE) describes the finding that same-race faces are recognized more accurately than cross-race faces. According to social-cognitive theories of the CRE, processes of categorization and individuation at encoding account for differential recognition of same- and cross-race faces. Recent face memory research has suggested that similar but distinct categorization and individuation processes also occur postencoding, at recognition. Using a divided-attention paradigm, in Experiments 1A and 1B we tested and confirmed the hypothesis that distinct postencoding categorization and individuation processes occur during the recognition of same- and cross-race faces. Specifically, postencoding configural divided-attention tasks impaired recognition accuracy more for same-race than for cross-race faces; on the other hand, for White (but not Black) participants, postencoding featural divided-attention tasks impaired recognition accuracy more for cross-race than for same-race faces. A social categorization paradigm used in Experiments 2A and 2B tested the hypothesis that the postencoding in-group or out-group social orientation to faces affects categorization and individuation processes during the recognition of same-race and cross-race faces. Postencoding out-group orientation to faces resulted in categorization for White but not for Black participants. This was evidenced by White participants' impaired recognition accuracy for same-race but not for cross-race out-group faces. Postencoding in-group orientation to faces had no effect on recognition accuracy for either same-race or cross-race faces. The results of Experiments 2A and 2B suggest that this social orientation facilitates White but not Black participants' individuation and categorization processes at recognition. Models of recognition memory for same-race and cross-race faces need to account for processing differences that occur at both encoding and recognition.
Visual feedback in stuttering therapy
NASA Astrophysics Data System (ADS)
Smolka, Elzbieta
1997-02-01
The aim of this paper is to present the results concerning the influence of visual echo and reverberation on the speech process of stutterers. Visual stimuli along with the influence of acoustic and visual-acoustic stimuli have been compared. Following this the methods of implementing visual feedback with the aid of electroluminescent diodes directed by speech signals have been presented. The concept of a computerized visual echo based on the acoustic recognition of Polish syllabic vowels has been also presented. All the research nd trials carried out at our center, aside from cognitive aims, generally aim at the development of new speech correctors to be utilized in stuttering therapy.
A Wavelet Model for Vocalic Speech Coarticulation
1994-10-01
control vowel’s signal as the mother wavelet. A practical experiment is conducted to evaluate the coarticulation channel using samples 01 real speech...transformation from a control speech state (input) to an effected speech state (output). Specifically, a vowel produced in isolation is transformed into an...the wavelet transform of the effected vowel’s signal, using the control vowel’s signal as the mother wavelet. A practical experiment is conducted to
Green, K P; Gerdeman, A
1995-12-01
Two experiments examined the impact of a discrepancy in vowel quality between the auditory and visual modalities on the perception of a syllable-initial consonant. One experiment examined the effect of such a discrepancy on the McGurk effect by cross-dubbing auditory /bi/ tokens onto visual /ga/ articulations (and vice versa). A discrepancy in vowel category significantly reduced the magnitude of the McGurk effect and changed the pattern of responses. A 2nd experiment investigated the effect of such a discrepancy on the speeded classification of the initial consonant. Mean reaction times to classify the tokens increased when the vowel information was discrepant between the 2 modalities but not when the vowel information was consistent. These experiments indicate that the perceptual system is sensitive to cross-modal discrepancies in the coarticulatory information between a consonant and its following vowel during phonetic perception.
Differential processing of consonants and vowels in lexical access through reading.
New, Boris; Araújo, Verónica; Nazzi, Thierry
2008-12-01
Do consonants and vowels have the same importance during reading? Recently, it has been proposed that consonants play a more important role than vowels for language acquisition and adult speech processing. This proposal has started receiving developmental support from studies showing that infants are better at processing specific consonantal than vocalic information while learning new words. This proposal also received support from adult speech processing. In our study, we directly investigated the relative contributions of consonants and vowels to lexical access while reading by using a visual masked-priming lexical decision task. Test items were presented following four different primes: identity (e.g., for the word joli, joli), unrelated (vabu), consonant-related (jalu), and vowel-related (vobi). Priming was found for the identity and consonant-related conditions, but not for the vowel-related condition. These results establish the privileged role of consonants during lexical access while reading.
Orientation estimation of anatomical structures in medical images for object recognition
NASA Astrophysics Data System (ADS)
Bağci, Ulaş; Udupa, Jayaram K.; Chen, Xinjian
2011-03-01
Recognition of anatomical structures is an important step in model based medical image segmentation. It provides pose estimation of objects and information about "where" roughly the objects are in the image and distinguishing them from other object-like entities. In,1 we presented a general method of model-based multi-object recognition to assist in segmentation (delineation) tasks. It exploits the pose relationship that can be encoded, via the concept of ball scale (b-scale), between the binary training objects and their associated grey images. The goal was to place the model, in a single shot, close to the right pose (position, orientation, and scale) in a given image so that the model boundaries fall in the close vicinity of object boundaries in the image. Unlike position and scale parameters, we observe that orientation parameters require more attention when estimating the pose of the model as even small differences in orientation parameters can lead to inappropriate recognition. Motivated from the non-Euclidean nature of the pose information, we propose in this paper the use of non-Euclidean metrics to estimate orientation of the anatomical structures for more accurate recognition and segmentation. We statistically analyze and evaluate the following metrics for orientation estimation: Euclidean, Log-Euclidean, Root-Euclidean, Procrustes Size-and-Shape, and mean Hermitian metrics. The results show that mean Hermitian and Cholesky decomposition metrics provide more accurate orientation estimates than other Euclidean and non-Euclidean metrics.
A speech processing study using an acoustic model of a multiple-channel cochlear implant
NASA Astrophysics Data System (ADS)
Xu, Ying
1998-10-01
A cochlear implant is an electronic device designed to provide sound information for adults and children who have bilateral profound hearing loss. The task of representing speech signals as electrical stimuli is central to the design and performance of cochlear implants. Studies have shown that the current speech- processing strategies provide significant benefits to cochlear implant users. However, the evaluation and development of speech-processing strategies have been complicated by hardware limitations and large variability in user performance. To alleviate these problems, an acoustic model of a cochlear implant with the SPEAK strategy is implemented in this study, in which a set of acoustic stimuli whose psychophysical characteristics are as close as possible to those produced by a cochlear implant are presented on normal-hearing subjects. To test the effectiveness and feasibility of this acoustic model, a psychophysical experiment was conducted to match the performance of a normal-hearing listener using model- processed signals to that of a cochlear implant user. Good agreement was found between an implanted patient and an age-matched normal-hearing subject in a dynamic signal discrimination experiment, indicating that this acoustic model is a reasonably good approximation of a cochlear implant with the SPEAK strategy. The acoustic model was then used to examine the potential of the SPEAK strategy in terms of its temporal and frequency encoding of speech. It was hypothesized that better temporal and frequency encoding of speech can be accomplished by higher stimulation rates and a larger number of activated channels. Vowel and consonant recognition tests were conducted on normal-hearing subjects using speech tokens processed by the acoustic model, with different combinations of stimulation rate and number of activated channels. The results showed that vowel recognition was best at 600 pps and 8 activated channels, but further increases in stimulation rate and channel numbers were not beneficial. Manipulations of stimulation rate and number of activated channels did not appreciably affect consonant recognition. These results suggest that overall speech performance may improve by appropriately increasing stimulation rate and number of activated channels. Future revision of this acoustic model is necessary to provide more accurate amplitude representation of speech.
Vowel category dependence of the relationship between palate height, tongue height, and oral area.
Hasegawa-Johnson, Mark; Pizza, Shamala; Alwan, Abeer; Cha, Jul Setsu; Haker, Katherine
2003-06-01
This article evaluates intertalker variance of oral area, logarithm of the oral area, tongue height, and formant frequencies as a function of vowel category. The data consist of coronal magnetic resonance imaging (MRI) sequences and acoustic recordings of 5 talkers, each producing 11 different vowels. Tongue height (left, right, and midsagittal), palate height, and oral area were measured in 3 coronal sections anterior to the oropharyngeal bend and were subjected to multivariate analysis of variance, variance ratio analysis, and regression analysis. The primary finding of this article is that oral area (between palate and tongue) showed less intertalker variance during production of vowels with an oral place of articulation (palatal and velar vowels) than during production of vowels with a uvular or pharyngeal place of articulation. Although oral area variance is place dependent, percentage variance (log area variance) is not place dependent. Midsagittal tongue height in the molar region was positively correlated with palate height during production of palatal vowels, but not during production of nonpalatal vowels. Taken together, these results suggest that small oral areas are characterized by relatively talker-independent vowel targets and that meeting these talker-independent targets is important enough that each talker adjusts his or her own tongue height to compensate for talker-dependent differences in constriction anatomy. Computer simulation results are presented to demonstrate that these results may be explained by an acoustic control strategy: When talkers with very different anatomical characteristics try to match talker-independent formant targets, the resulting area variances are minimized near the primary vocal tract constriction.
Reading Arabic Texts: Effects of Text Type, Reader Type and Vowelization.
ERIC Educational Resources Information Center
Abu-Rabia, Salim
1998-01-01
Investigates the effect of vowels on reading accuracy in Arabic orthography. Finds that vowels had a significant effect on reading accuracy of poor and skilled readers in reading each of four kinds of texts. (NH)
Auditory temporal-order processing of vowel sequences by young and elderly listeners1
Fogerty, Daniel; Humes, Larry E.; Kewley-Port, Diane
2010-01-01
This project focused on the individual differences underlying observed variability in temporal processing among older listeners. Four measures of vowel temporal-order identification were completed by young (N=35; 18–31 years) and older (N=151; 60–88 years) listeners. Experiments used forced-choice, constant-stimuli methods to determine the smallest stimulus onset asynchrony (SOA) between brief (40 or 70 ms) vowels that enabled identification of a stimulus sequence. Four words (pit, pet, pot, and put) spoken by a male talker were processed to serve as vowel stimuli. All listeners identified the vowels in isolation with better than 90% accuracy. Vowel temporal-order tasks included the following: (1) monaural two-item identification, (2) monaural four-item identification, (3) dichotic two-item vowel identification, and (4) dichotic two-item ear identification. Results indicated that older listeners had more variability and performed poorer than young listeners on vowel-identification tasks, although a large overlap in distributions was observed. Both age groups performed similarly on the dichotic ear-identification task. For both groups, the monaural four-item and dichotic two-item tasks were significantly harder than the monaural two-item task. Older listeners’ SOA thresholds improved with additional stimulus exposure and shorter dichotic stimulus durations. Individual differences of temporal-order performance among the older listeners demonstrated the influence of cognitive measures, but not audibility or age. PMID:20370033
Quantitative and descriptive comparison of four acoustic analysis systems: vowel measurements.
Burris, Carlyn; Vorperian, Houri K; Fourakis, Marios; Kent, Ray D; Bolt, Daniel M
2014-02-01
This study examines accuracy and comparability of 4 trademarked acoustic analysis software packages (AASPs): Praat, WaveSurfer, TF32, and CSL by using synthesized and natural vowels. Features of AASPs are also described. Synthesized and natural vowels were analyzed using each of the AASP's default settings to secure 9 acoustic measures: fundamental frequency (F0), formant frequencies (F1-F4), and formant bandwidths (B1-B4). The discrepancy between the software measured values and the input values (synthesized, previously reported, and manual measurements) was used to assess comparability and accuracy. Basic AASP features are described. Results indicate that Praat, WaveSurfer, and TF32 generate accurate and comparable F0 and F1-F4 data for synthesized vowels and adult male natural vowels. Results varied by vowel for women and children, with some serious errors. Bandwidth measurements by AASPs were highly inaccurate as compared with manual measurements and published data on formant bandwidths. Values of F0 and F1-F4 are generally consistent and fairly accurate for adult vowels and for some child vowels using the default settings in Praat, WaveSurfer, and TF32. Manipulation of default settings yields improved output values in TF32 and CSL. Caution is recommended especially before accepting F1-F4 results for children and B1-B4 results for all speakers.
Auditory temporal-order processing of vowel sequences by young and elderly listeners.
Fogerty, Daniel; Humes, Larry E; Kewley-Port, Diane
2010-04-01
This project focused on the individual differences underlying observed variability in temporal processing among older listeners. Four measures of vowel temporal-order identification were completed by young (N=35; 18-31 years) and older (N=151; 60-88 years) listeners. Experiments used forced-choice, constant-stimuli methods to determine the smallest stimulus onset asynchrony (SOA) between brief (40 or 70 ms) vowels that enabled identification of a stimulus sequence. Four words (pit, pet, pot, and put) spoken by a male talker were processed to serve as vowel stimuli. All listeners identified the vowels in isolation with better than 90% accuracy. Vowel temporal-order tasks included the following: (1) monaural two-item identification, (2) monaural four-item identification, (3) dichotic two-item vowel identification, and (4) dichotic two-item ear identification. Results indicated that older listeners had more variability and performed poorer than young listeners on vowel-identification tasks, although a large overlap in distributions was observed. Both age groups performed similarly on the dichotic ear-identification task. For both groups, the monaural four-item and dichotic two-item tasks were significantly harder than the monaural two-item task. Older listeners' SOA thresholds improved with additional stimulus exposure and shorter dichotic stimulus durations. Individual differences of temporal-order performance among the older listeners demonstrated the influence of cognitive measures, but not audibility or age.
NASA Astrophysics Data System (ADS)
Ertmer, David Joseph
1994-01-01
The effectiveness of vowel production training which incorporated direct instruction in combination with spectrographic models and feedback was assessed for two children who exhibited profound hearing impairment. A multiple-baseline design across behaviors, with replication across subjects was implemented to determine if vowel production accuracy improved following the introduction of treatment. Listener judgments of vowel correctness were obtained during the baseline, training, and follow-up phases of the study. Data were analyzed through visual inspection of changes in levels of accuracy, changes in trends of accuracy, and changes in variability of accuracy within and across phases. One subject showed significant improvement of all three trained vowel targets; the second subject for the first trained target only (Kolmogorov-Smirnov Two Sample Test). Performance trends during training sessions suggest that continued treatment would have resulted in further improvement for both subjects. Vowel duration, fundamental frequency, and the frequency locations of the first and second formants were measured before and after training. Acoustic analysis revealed highly individualized changes in the frequency locations of F1 and F2. Vowels which received the most training were maintained at higher levels than those which were introduced later in training, Some generalization of practiced vowel targets to untrained words was observed in both subjects. A bias towards judging productions as "correct" was observed for both subjects during self-evaluation tasks using spectrographic feedback.
Schiff, Rachel; Katzir, Tami; Shoshan, Noa
2013-07-01
The present study examined the effects of orthographic transparency on reading ability of children with dyslexia in two Hebrew scripts. The study explored the reading accuracy and speed of vowelized and unvowelized Hebrew words of fourth-grade children with dyslexia. A comparison was made to typically developing readers of two age groups: a group matched by chronological age and a group of children who are 2 years younger, presumably at the end of the reading acquisition process. An additional purpose was to investigate the role of vowelization in the reading ability of unvowelized script among readers with dyslexia in an attempt to assess whether vowelization plays a mediating role for reading speed of unvowelized scripts. The present study found no significant differences in reading accuracy and speed between vowelized and unvowelized scripts among fourth-grade readers with dyslexia. The reading speed of fourth-graders with dyslexia was similar to typically developing second-graders for both the vowelized and unvowelized words. However, fourth-grade children with dyslexia performed lower than the typically developing second-graders in the reading accuracy of vowelized script. Furthermore, for readers with dyslexia, accuracy in reading both vowelized and unvowelized words mediated the reading speed of unvowelized scripts. These results may be a sign that Hebrew-speaking children with dyslexia have severe difficulties that prevent them from developing strategies for more efficient reading.
Production and perception of whispered vowels
NASA Astrophysics Data System (ADS)
Kiefte, Michael
2005-09-01
Information normally associated with pitch, such as intonation, can still be conveyed in whispered speech despite the absence of voicing. For example, it is possible to whisper the question ``You are going today?'' without any syntactic information to distinguish this sentence from a simple declarative. It has been shown that pitch change in whispered speech is correlated with the simultaneous raising or lowering of several formants [e.g., M. Kiefte, J. Acoust. Soc. Am. 116, 2546 (2004)]. However, spectral peak frequencies associated with formants have been identified as important correlates to vowel identity. Spectral peak frequencies may serve two roles in the perception of whispered speech: to indicate both vowel identity and intended pitch. Data will be presented to examine the relative importance of several acoustic properties including spectral peak frequencies and spectral shape parameters in both the production and perception of whispered vowels. Speakers were asked to phonate and whisper vowels at three different pitches across a range of roughly a musical fifth. It will be shown that relative spectral change is preserved within vowels across intended pitches in whispered speech. In addition, several models of vowel identification by listeners will be presented. [Work supported by SSHRC.
Prosodic domain-initial effects on the acoustic structure of vowels
NASA Astrophysics Data System (ADS)
Fox, Robert Allen; Jacewicz, Ewa; Salmons, Joseph
2003-10-01
In the process of language change, vowels tend to shift in ``chains,'' leading to reorganizations of entire vowel systems over time. A long research tradition has described such patterns, but little is understood about what factors motivate such shifts. Drawing data from changes in progress in American English dialects, the broad hypothesis is tested that changes in vowel systems are related to prosodic organization and stress patterns. Changes in vowels under greater prosodic prominence correlate directly with, and likely underlie, historical patterns of shift. This study examines acoustic characteristics of vowels at initial edges of prosodic domains [Fougeron and Keating, J. Acoust. Soc. Am. 101, 3728-3740 (1997)]. The investigation is restricted to three distinct prosodic levels: utterance (sentence-initial), phonological phrase (strong branch of a foot), and syllable (weak branch of a foot). The predicted changes in vowels /e/ and /ɛ/ in two American English dialects (from Ohio and Wisconsin) are examined along a set of acoustic parameters: duration, formant frequencies (including dynamic changes over time), and fundamental frequency (F0). In addition to traditional methodology which elicits list-like intonation, a design is adapted to examine prosodic patterns in more typical sentence intonations. [Work partially supported by NIDCD R03 DC005560-01.
Nishi, Kanae; Kewley-Port, Diane
2008-01-01
Purpose Nishi and Kewley-Port (2007) trained Japanese listeners to perceive nine American English monophthongs and showed that a protocol using all nine vowels (fullset) produced better results than the one using only the three more difficult vowels (subset). The present study extended the target population to Koreans and examined whether protocols combining the two stimulus sets would provide more effective training. Method Three groups of five Korean listeners were trained on American English vowels for nine days using one of the three protocols: fullset only, first three days on subset then six days on fullset, or first six days on fullset then three days on subset. Participants' performance was assessed by pre- and post-training tests, as well as by a mid-training test. Results 1) Fullset training was also effective for Koreans; 2) no advantage was found for the two combined protocols over the fullset only protocol, and 3) sustained “non-improvement” was observed for training using one of the combined protocols. Conclusions In using subsets for training American English vowels, care should be taken not only in the selection of subset vowels, but also for the training orders of subsets. PMID:18664694
Smith, David R R; Walters, Thomas C; Patterson, Roy D
2007-12-01
A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.
A comprehensive three-dimensional cortical map of vowel space.
Scharinger, Mathias; Idsardi, William J; Poe, Samantha
2011-12-01
Mammalian cortex is known to contain various kinds of spatial encoding schemes for sensory information including retinotopic, somatosensory, and tonotopic maps. Tonotopic maps are especially interesting for human speech sound processing because they encode linguistically salient acoustic properties. In this study, we mapped the entire vowel space of a language (Turkish) onto cortical locations by using the magnetic N1 (M100), an auditory-evoked component that peaks approximately 100 msec after auditory stimulus onset. We found that dipole locations could be structured into two distinct maps, one for vowels produced with the tongue positioned toward the front of the mouth (front vowels) and one for vowels produced in the back of the mouth (back vowels). Furthermore, we found spatial gradients in lateral-medial, anterior-posterior, and inferior-superior dimensions that encoded the phonetic, categorical distinctions between all the vowels of Turkish. Statistical model comparisons of the dipole locations suggest that the spatial encoding scheme is not entirely based on acoustic bottom-up information but crucially involves featural-phonetic top-down modulation. Thus, multiple areas of excitation along the unidimensional basilar membrane are mapped into higher dimensional representations in auditory cortex.
Mechanisms of Vowel Variation in African American English.
Holt, Yolanda Feimster
2018-02-15
This research explored mechanisms of vowel variation in African American English by comparing 2 geographically distant groups of African American and White American English speakers for participation in the African American Shift and the Southern Vowel Shift. Thirty-two male (African American: n = 16, White American controls: n = 16) lifelong residents of cities in eastern and western North Carolina produced heed,hid,heyd,head,had,hod,hawed,whod,hood,hoed,hide,howed,hoyd, and heard 3 times each in random order. Formant frequency, duration, and acoustic analyses were completed for the vowels /i, ɪ, e, ɛ, æ, ɑ, ɔ, u, ʊ, o, aɪ, aʊ, oɪ, ɝ/ produced in the listed words. African American English speakers show vowel variation. In the west, the African American English speakers are participating in the Southern Vowel Shift and hod fronting of the African American Shift. In the east, neither the African American English speakers nor their White peers are participating in the Southern Vowel Shift. The African American English speakers show limited participation in the African American Shift. The results provide evidence of regional and socio-ethnic variation in African American English in North Carolina.
An Acoustic Study of Vowels Produced by Alaryngeal Speakers in Taiwan.
Liao, Jia-Shiou
2016-11-01
This study investigated the acoustic properties of 6 Taiwan Southern Min vowels produced by 10 laryngeal speakers (LA), 10 speakers with a pneumatic artificial larynx (PA), and 8 esophageal speakers (ES). Each of the 6 monophthongs of Taiwan Southern Min (/i, e, a, ɔ, u, ə/) was represented by a Taiwan Southern Min character and appeared randomly on a list 3 times (6 Taiwan Southern Min characters × 3 repetitions = 18 tokens). Each Taiwan Southern Min character in this study has the same syllable structure, /V/, and all were read with tone 1 (high and level). Acoustic measurements of the 1st formant, 2nd formant, and 3rd formant were taken for each vowel. Then, vowel space areas (VSAs) enclosed by /i, a, u/ were calculated for each group of speakers. The Euclidean distance between vowels in the pairs /i, a/, /i, u/, and /a, u/ was also calculated and compared across the groups. PA and ES have higher 1st or 2nd formant values than LA for each vowel. The distance is significantly shorter between vowels in the corner vowel pairs /i, a/ and /i, u/. PA and ES have a significantly smaller VSA compared with LA. In accordance with previous studies, alaryngeal speakers have higher formant frequency values than LA because they have a shortened vocal tract as a result of their total laryngectomy. Furthermore, the resonance frequencies are inversely related to the length of the vocal tract (on the basis of the assumption of the source filter theory). PA and ES have a smaller VSA and shorter distances between corner vowels compared with LA, which may be related to speech intelligibility. This hypothesis needs further support from future study.
Japanese Listeners' Perceptions of Phonotactic Violations
ERIC Educational Resources Information Center
Fais, Laurel; Kajikawa, Sachiyo; Werker, Janet; Amano, Shigeaki
2005-01-01
The canonical form for Japanese words is (Consonant)Vowel(Consonant) Vowel[approximately]. However, a regular process of high vowel devoicing between voiceless consonants and word-finally after voiceless consonants results in consonant clusters and word-final consonants, apparent violations of that phonotactic pattern. We investigated Japanese…
Maurer, D; Hess, M; Gross, M
1996-12-01
Theoretic investigations of the "source-filter" model have indicated a pronounced acoustic interaction of glottal source and vocal tract. Empirical investigations of formant pattern variations apart from changes in vowel identity have demonstrated a direct relationship between the fundamental frequency and the patterns. As a consequence of both findings, independence of phonation and articulation may be limited in the speech process. Within the present study, possible interdependence of phonation and phoneme was investigated: vocal fold vibrations and larynx position for vocalizations of different vowels in a healthy man and woman were examined by high-speed light-intensified digital imaging. We found 1) different movements of the vocal folds for vocalizations of different vowel identities within one speaker and at similar fundamental frequency, and 2) constant larynx position within vocalization of one vowel identity, but different positions for vocalizations of different vowel identities. A possible relationship between the vocal fold vibrations and the phoneme is discussed.
Introduction to the Special Issue on Advancing Methods for Analyzing Dialect Variation.
Clopper, Cynthia G
2017-07-01
Documenting and analyzing dialect variation is traditionally the domain of dialectology and sociolinguistics. However, modern approaches to acoustic analysis of dialect variation have their roots in Peterson and Barney's [(1952). J. Acoust. Soc. Am. 24, 175-184] foundational work on the acoustic analysis of vowels that was published in the Journal of the Acoustical Society of America (JASA) over 6 decades ago. Although Peterson and Barney (1952) were not primarily concerned with dialect variation, their methods laid the groundwork for the acoustic methods that are still used by scholars today to analyze vowel variation within and across languages. In more recent decades, a number of methodological advances in the study of vowel variation have been published in JASA, including work on acoustic vowel overlap and vowel normalization. The goal of this special issue was to honor that tradition by bringing together a set of papers describing the application of emerging acoustic, articulatory, and computational methods to the analysis of dialect variation in vowels and beyond.
Now you hear it, now you don't: vowel devoicing in Japanese infant-directed speech.
Fais, Laurel; Kajikawa, Sachiyo; Amano, Shigeaki; Werker, Janet F
2010-03-01
In this work, we examine a context in which a conflict arises between two roles that infant-directed speech (IDS) plays: making language structure salient and modeling the adult form of a language. Vowel devoicing in fluent adult Japanese creates violations of the canonical Japanese consonant-vowel word structure pattern by systematically devoicing particular vowels, yielding surface consonant clusters. We measured vowel devoicing rates in a corpus of infant- and adult-directed Japanese speech, for both read and spontaneous speech, and found that the mothers in our study preserve the fluent adult form of the language and mask underlying phonological structure by devoicing vowels in infant-directed speech at virtually the same rates as those for adult-directed speech. The results highlight the complex interrelationships among the modifications to adult speech that comprise infant-directed speech, and that form the input from which infants begin to build the eventual mature form of their native language.
Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.
Gelfer, Marylou Pausewang; Bennett, Quinn E
2013-09-01
The purpose of the present study was to investigate the contribution of vowel formant frequencies to gender identification in connected speech, the distinctiveness of vowel formants in males versus females, and how ambiguous speaking fundamental frequencies (SFFs) and vowel formants might affect perception of gender. Multivalent experimental. Speakers subjects (eight tall males, eight short females, and seven males and seven females of "middle" height) were recorded saying two carrier phrases to elicit the vowels /i/ and /α/ and a sentence. The gender/height groups were selected to (presumably) maximize formant differences between some groups (tall vs short) and minimize differences between others (middle height). Each subjects' samples were digitally altered to distinct SFFs (116, 145, 155, 165, and 207 Hz) to represent SFFs typical of average males, average females, and in an ambiguous range. Listeners judged the gender of each randomized altered speech sample. Results indicated that female speakers were perceived as female even with an SFF in the typical male range. For male speakers, gender perception was less accurate at SFFs of 165 Hz and higher. Although the ranges of vowel formants had considerable overlap between genders, significant differences in formant frequencies of males and females were seen. Vowel formants appeared to be important to perception of gender, especially for SFFs in the range of 145-165 Hz; however, formants may be a more salient cue in connected speech when compared with isolated vowels or syllables. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Gender identification from high-pass filtered vowel segments: the use of high-frequency energy.
Donai, Jeremy J; Lass, Norman J
2015-10-01
The purpose of this study was to examine the use of high-frequency information for making gender identity judgments from high-pass filtered vowel segments produced by adult speakers. Specifically, the effect of removing lower-frequency spectral detail (i.e., F3 and below) from vowel segments via high-pass filtering was evaluated. Thirty listeners (ages 18-35) with normal hearing participated in the experiment. A within-subjects design was used to measure gender identification for six 250-ms vowel segments (/æ/, /ɪ /, /ɝ/, /ʌ/, /ɔ/, and /u/), produced by ten male and ten female speakers. The results of this experiment demonstrated that despite the removal of low-frequency spectral detail, the listeners were accurate in identifying speaker gender from the vowel segments, and did so with performance significantly above chance. The removal of low-frequency spectral detail reduced gender identification by approximately 16 % relative to unfiltered vowel segments. Classification results using linear discriminant function analyses followed the perceptual data, using spectral and temporal representations derived from the high-pass filtered segments. Cumulatively, these findings indicate that normal-hearing listeners are able to make accurate perceptual judgments regarding speaker gender from vowel segments with low-frequency spectral detail removed via high-pass filtering. Therefore, it is reasonable to suggest the presence of perceptual cues related to gender identity in the high-frequency region of naturally produced vowel signals. Implications of these findings and possible mechanisms for performing the gender identification task from high-pass filtered stimuli are discussed.
Immediate effects of anticipatory coarticulation in spoken-word recognition
Salverda, Anne Pier; Kleinschmidt, Dave; Tanenhaus, Michael K.
2014-01-01
Two visual-world experiments examined listeners’ use of pre word-onset anticipatory coarticulation in spoken-word recognition. Experiment 1 established the shortest lag with which information in the speech signal influences eye-movement control, using stimuli such as “The … ladder is the target”. With a neutral token of the definite article preceding the target word, saccades to the referent were not more likely than saccades to an unrelated distractor until 200–240 ms after the onset of the target word. In Experiment 2, utterances contained definite articles which contained natural anticipatory coarticulation pertaining to the onset of the target word (“ The ladder … is the target”). A simple Gaussian classifier was able to predict the initial sound of the upcoming target word from formant information from the first few pitch periods of the article’s vowel. With these stimuli, effects of speech on eye-movement control began about 70 ms earlier than in Experiment 1, suggesting rapid use of anticipatory coarticulation. The results are interpreted as support for “data explanation” approaches to spoken-word recognition. Methodological implications for visual-world studies are also discussed. PMID:24511179
Good Phonic Generalizations for Decoding
ERIC Educational Resources Information Center
Gates, Louis
2007-01-01
An exhaustive analysis of 88,641 individual letters and letter combinations within 16,928 words drawn from the Zeno, et al. word list unveiled remarkable phonic transparency. The individual letter and letter combinations sorted into just six general categories: three basic categories of vowels (single vowels, vowel digraphs, and final…
Perception of resyllabification in French.
Gaskell, M Gareth; Spinelli, Elsa; Meunier, Fanny
2002-07-01
In three experiments, we examined the effects of phonological resyllabification processes on the perception of French speech. Enchainment involves the resyllabification of a word-final consonant across a syllable boundary (e.g., in chaque avion, the /k/ crosses the syllable boundary to become syllable initial). Liaison involves a further process of realization of a latent consonant, alongside resyllabification (e.g., the /t/ in petit avion). If the syllable is a dominant unit of perception in French (Mehler, Dommergues, Frauenfelder, & Segui, 1981), these processes should cause problems for recognition of the following word. A cross-modal priming experiment showed no cost attached to either type of resyllabification in terms of reduced activation of the following word. Furthermore, word- and sequence-monitoring experiments again showed no cost and suggested that the recognition of vowel-initial words may be facilitated when they are preceded by a word that had undergone resyllabification through enchainment or liaison. We examine the sources of information that could underpin facilitation and propose a refinement of the syllable's role in the perception of French speech.
Visual speech information: a help or hindrance in perceptual processing of dysarthric speech.
Borrie, Stephanie A
2015-03-01
This study investigated the influence of visual speech information on perceptual processing of neurologically degraded speech. Fifty listeners identified spastic dysarthric speech under both audio (A) and audiovisual (AV) conditions. Condition comparisons revealed that the addition of visual speech information enhanced processing of the neurologically degraded input in terms of (a) acuity (percent phonemes correct) of vowels and consonants and (b) recognition (percent words correct) of predictive and nonpredictive phrases. Listeners exploited stress-based segmentation strategies more readily in AV conditions, suggesting that the perceptual benefit associated with adding visual speech information to the auditory signal-the AV advantage-has both segmental and suprasegmental origins. Results also revealed that the magnitude of the AV advantage can be predicted, to some degree, by the extent to which an individual utilizes syllabic stress cues to inform word recognition in AV conditions. Findings inform the development of a listener-specific model of speech perception that applies to processing of dysarthric speech in everyday communication contexts.
Enhancing Vowel Discrimination Using Constructed Spelling
ERIC Educational Resources Information Center
Stewart, Katherine; Hayashi, Yusuke; Saunders, Kathryn
2010-01-01
In a computerized task, an adult with intellectual disabilities learned to construct consonant-vowel-consonant words in the presence of corresponding spoken words. During the initial assessment, the participant demonstrated high accuracy on one word group (containing the vowel-consonant units "it" and "un") but low accuracy on the other group…
Interspeaker Variability in Hard Palate Morphology and Vowel Production
ERIC Educational Resources Information Center
Lammert, Adam; Proctor, Michael; Narayanan, Shrikanth
2013-01-01
Purpose: Differences in vocal tract morphology have the potential to explain interspeaker variability in speech production. The potential acoustic impact of hard palate shape was examined in simulation, in addition to the interplay among morphology, articulation, and acoustics in real vowel production data. Method: High-front vowel production from…
Acoustic Analysis on the Palatalized Vowels of Modern Mongolian
ERIC Educational Resources Information Center
Bulgantamir, Sangidkhorloo
2015-01-01
In Modern Mongolian the palatalized vowels [a?, ??, ?? ] before palatalized consonants are considered as phoneme allophones according to the most scholars. Nevertheless theses palatalized vowels have the distinctive features what could be proved by the minimal pairs and nowadays this question is open and not profoundly studied. The purpose of this…
Effects of Long-Term Tracheostomy on Spectral Characteristics of Vowel Production.
ERIC Educational Resources Information Center
Kamen, Ruth Saletsky; Watson, Ben C.
1991-01-01
Eight preschool children who underwent tracheotomy during the prelingual period were compared to matched controls on a variety of speech measures. Children with tracheotomies showed reduced acoustic vowel space, suggesting they were limited in their ability to produce extreme vocal tract configurations for vowels postdecannulation. Oral motor…
Mechanisms of Vowel Variation in African American English
ERIC Educational Resources Information Center
Holt, Yolanda Feimster
2018-01-01
Purpose: This research explored mechanisms of vowel variation in African American English by comparing 2 geographically distant groups of African American and White American English speakers for participation in the African American Shift and the Southern Vowel Shift. Method: Thirty-two male (African American: n = 16, White American controls: n =…
Children Use Vowels to Help Them Spell Consonants
ERIC Educational Resources Information Center
Hayes, Heather; Treiman, Rebecca; Kessler, Brett
2006-01-01
English spelling is highly inconsistent in terms of simple sound-to-spelling correspondences but is more consistent when context is taken into account. For example, the choice between "ch" and "tch" is determined by the preceding vowel ("coach," "roach" vs. "catch," "hatch"). We investigated children's sensitivity to vowel context when spelling…
Influences of Tone on Vowel Articulation in Mandarin Chinese
ERIC Educational Resources Information Center
Shaw, Jason A.; Chen, Wei-rong; Proctor, Michael I.; Derrick, Donald
2016-01-01
Purpose: Models of speech production often abstract away from shared physiology in pitch control and lingual articulation, positing independent control of tone and vowel units. We assess the validity of this assumption in Mandarin Chinese by evaluating the stability of lingual articulation for vowels across variation in tone. Method:…
NON-GRAMMATICAL APOPHONY IN ENGLISH.
ERIC Educational Resources Information Center
WESCOTT, ROGER W.
AN APOPHONE MAY BE DEFINED GENERALLY AS A POLYSYLLABIC VOWEL SEQUENCE SUCH THAT EACH CONTAINED VOWEL IS LOWER OR MORE RETRACTED THAN THE VOWEL WHICH PRECEDES IT --"SING, SANG, SUNG," AND "CLINK, CLANK, CLUNK" ARE EXAMPLES IN ENGLISH. FOR NEARLY EVERY CASE OF GRAMMATICAL APOPHONY IN ENGLISH THERE IS A NON-GRAMMATICAL (YET…
Perceptual Adaptation of Voice Gender Discrimination with Spectrally Shifted Vowels
ERIC Educational Resources Information Center
Li, Tianhao; Fu, Qian-Jie
2011-01-01
Purpose: To determine whether perceptual adaptation improves voice gender discrimination of spectrally shifted vowels and, if so, which acoustic cues contribute to the improvement. Method: Voice gender discrimination was measured for 10 normal-hearing subjects, during 5 days of adaptation to spectrally shifted vowels, produced by processing the…
Vowel Representations in the Invented Spellings of Spanish-English Bilingual Kindergartners
ERIC Educational Resources Information Center
Raynolds, Laura B.; Uhry, Joanna K.; Brunner, Jessica
2013-01-01
The study compared the invented spelling of vowels in kindergarten native Spanish speaking children with that of English monolinguals. It examined whether, after receiving phonics instruction for short vowels, the spelling of native Spanish-speaking kindergartners would contain phonological errors that were influenced by their first language.…
Bite Block Vowel Production in Apraxia of Speech
ERIC Educational Resources Information Center
Jacks, Adam
2008-01-01
Purpose: This study explored vowel production and adaptation to articulatory constraints in adults with acquired apraxia of speech (AOS) plus aphasia. Method: Five adults with acquired AOS plus aphasia and 5 healthy control participants produced the vowels [iota], [epsilon], and [ash] in four word-length conditions in unconstrained and bite block…
Cross-modal associations in synaesthesia: Vowel colours in the ear of the beholder.
Moos, Anja; Smith, Rachel; Miller, Sam R; Simmons, David R
2014-01-01
Human speech conveys many forms of information, but for some exceptional individuals (synaesthetes), listening to speech sounds can automatically induce visual percepts such as colours. In this experiment, grapheme-colour synaesthetes and controls were asked to assign colours, or shades of grey, to different vowel sounds. We then investigated whether the acoustic content of these vowel sounds influenced participants' colour and grey-shade choices. We found that both colour and grey-shade associations varied systematically with vowel changes. The colour effect was significant for both participant groups, but significantly stronger and more consistent for synaesthetes. Because not all vowel sounds that we used are "translatable" into graphemes, we conclude that acoustic-phonetic influences co-exist with established graphemic influences in the cross-modal correspondences of both synaesthetes and non-synaesthetes.
Early sound symbolism for vowel sounds.
Spector, Ferrinne; Maurer, Daphne
2013-01-01
Children and adults consistently match some words (e.g., kiki) to jagged shapes and other words (e.g., bouba) to rounded shapes, providing evidence for non-arbitrary sound-shape mapping. In this study, we investigated the influence of vowels on sound-shape matching in toddlers, using four contrasting pairs of nonsense words differing in vowel sound (/i/ as in feet vs. /o/ as in boat) and four rounded-jagged shape pairs. Crucially, we used reduplicated syllables (e.g., kiki vs. koko) rather than confounding vowel sound with consonant context and syllable variability (e.g., kiki vs. bouba). Toddlers consistently matched words with /o/ to rounded shapes and words with /i/ to jagged shapes (p < 0.01). The results suggest that there may be naturally biased correspondences between vowel sound and shape.
Early sound symbolism for vowel sounds
Spector, Ferrinne; Maurer, Daphne
2013-01-01
Children and adults consistently match some words (e.g., kiki) to jagged shapes and other words (e.g., bouba) to rounded shapes, providing evidence for non-arbitrary sound–shape mapping. In this study, we investigated the influence of vowels on sound–shape matching in toddlers, using four contrasting pairs of nonsense words differing in vowel sound (/i/ as in feet vs. /o/ as in boat) and four rounded–jagged shape pairs. Crucially, we used reduplicated syllables (e.g., kiki vs. koko) rather than confounding vowel sound with consonant context and syllable variability (e.g., kiki vs. bouba). Toddlers consistently matched words with /o/ to rounded shapes and words with /i/ to jagged shapes (p < 0.01). The results suggest that there may be naturally biased correspondences between vowel sound and shape. PMID:24349684
Vowel change across three age groups of speakers in three regional varieties of American English
Jacewicz, Ewa; Fox, Robert A.; Salmons, Joseph
2011-01-01
This acoustic study examines sound (vowel) change in apparent time across three successive generations of 123 adult female speakers ranging in age from 20 to 65 years old, representing three regional varieties of American English, typical of western North Carolina, central Ohio and southeastern Wisconsin. A set of acoustic measures characterized the dynamic nature of formant trajectories, the amount of spectral change over the course of vowel duration and the position of the spectral centroid. The study found a set of systematic changes to /I, ε, æ/ including positional changes in the acoustic space (mostly lowering of the vowels) and significant variation in formant dynamics (increased monophthongization). This common sound change is evident in both emphatic (articulated clearly) and nonemphatic (casual) productions and occurs regardless of dialect-specific vowel dispersions in the vowel space. The cross-generational and cross-dialectal patterns of variation found here support an earlier report by Jacewicz, Fox, and Salmons (2011) which found this recent development in these three dialect regions in isolated citation-form words. While confirming the new North American Shift in different styles of production, the study underscores the importance of addressing the stress-related variation in vowel production in a careful and valid assessment of sound change. PMID:22125350
Locus equations and coarticulation in three Australian languages.
Graetzer, Simone; Fletcher, Janet; Hajek, John
2015-02-01
Locus equations were applied to F2 data for bilabial, alveolar, retroflex, palatal, and velar plosives in three Australian languages. In addition, F2 variance at the vowel-consonant boundary, and, by extension, consonantal coarticulatory sensitivity, was measured. The locus equation slopes revealed that there were place-dependent differences in the magnitude of vowel-to-consonant coarticulation. As in previous studies, the non-coronal (bilabial and velar) consonants tended to be associated with the highest slopes, palatal consonants tended to be associated with the lowest slopes, and alveolar and retroflex slopes tended to be low to intermediate. Similarly, F2 variance measurements indicated that non-coronals displayed greater coarticulatory sensitivity to adjacent vowels than did coronals. Thus, both the magnitude of vowel-to-consonant coarticulation and the magnitude of consonantal coarticulatory sensitivity were seen to vary inversely with the magnitude of consonantal articulatory constraint. The findings indicated that, unlike results reported previously for European languages such as English, anticipatory vowel-to-consonant coarticulation tends to exceed carryover coarticulation in these Australian languages. Accordingly, on the F2 variance measure, consonants tended to be more sensitive to the coarticulatory effects of the following vowel. Prosodic prominence of vowels was a less significant factor in general, although certain language-specific patterns were observed.
Baudonck, Nele; Van Lierde, K; Dhooge, I; Corthals, P
2011-01-01
The purpose of this study was to compare vowel productions by deaf cochlear implant (CI) children, hearing-impaired hearing aid (HA) children and normal-hearing (NH) children. 73 children [mean age: 9;14 years (years;months)] participated: 40 deaf CI children, 34 moderately to profoundly hearing-impaired HA children and 42 NH children. For the 3 corner vowels [a], [i] and [u], F(1), F(2) and the intrasubject SD were measured using the Praat software. Spectral separation between these vowel formants and vowel space were calculated. The significant effects in the CI group all pertain to a higher intrasubject variability in formant values, whereas the significant effects in the HA group all pertain to lower formant values. Both hearing-impaired subgroups showed a tendency toward greater intervowel distances and vowel space. Several subtle deviations in the vowel production of deaf CI children and hearing-impaired HA children could be established, using a well-defined acoustic analysis. CI children as well as HA children in this study tended to overarticulate, which hypothetically can be explained by a lack of auditory feedback and an attempt to compensate it by proprioceptive feedback during articulatory maneuvers. Copyright © 2010 S. Karger AG, Basel.
Kondaurova, Maria V; Francis, Alexander L
2008-12-01
Two studies explored the role of native language use of an acoustic cue, vowel duration, in both native and non-native contexts in order to test the hypothesis that non-native listeners' reliance on vowel duration instead of vowel quality to distinguish the English tense/lax vowel contrast could be explained by the role of duration as a cue in native phonological contrasts. In the first experiment, native Russian, Spanish, and American English listeners identified stimuli from a beat/bit continuum varying in nine perceptually equal spectral and duration steps. English listeners relied predominantly on spectrum, but showed some reliance on duration. Russian and Spanish speakers relied entirely on duration. In the second experiment, three tests examined listeners' use of vowel duration in native contrasts. Duration was equally important for the perception of lexical stress for all three groups. However, English listeners relied more on duration as a cue to postvocalic consonant voicing than did native Spanish or Russian listeners, and Spanish listeners relied on duration more than did Russian listeners. Results suggest that, although allophonic experience may contribute to cross-language perceptual patterns, other factors such as the application of statistical learning mechanisms and the influence of language-independent psychoacoustic proclivities cannot be ruled out.
Kondaurova, Maria V.; Francis, Alexander L.
2008-01-01
Two studies explored the role of native language use of an acoustic cue, vowel duration, in both native and non-native contexts in order to test the hypothesis that non-native listeners’ reliance on vowel duration instead of vowel quality to distinguish the English tense∕lax vowel contrast could be explained by the role of duration as a cue in native phonological contrasts. In the first experiment, native Russian, Spanish, and American English listeners identified stimuli from a beat∕bit continuum varying in nine perceptually equal spectral and duration steps. English listeners relied predominantly on spectrum, but showed some reliance on duration. Russian and Spanish speakers relied entirely on duration. In the second experiment, three tests examined listeners’ use of vowel duration in native contrasts. Duration was equally important for the perception of lexical stress for all three groups. However, English listeners relied more on duration as a cue to postvocalic consonant voicing than did native Spanish or Russian listeners, and Spanish listeners relied on duration more than did Russian listeners. Results suggest that, although allophonic experience may contribute to cross-language perceptual patterns, other factors such as the application of statistical learning mechanisms and the influence of language-independent psychoacoustic proclivities cannot be ruled out. PMID:19206820
The effect of L1 orthography on non-native vowel perception.
Escudero, Paola; Wanrooij, Karin
2010-01-01
Previous research has shown that orthography influences the learning and processing of spoken non-native words. In this paper, we examine the effect of L1 orthography on non-native sound perception. In Experiment 1, 204 Spanish learners of Dutch and a control group of 20 native speakers of Dutch were asked to classify Dutch vowel tokens by choosing from auditorily presented options, in one task, and from the orthographic representations of Dutch vowels, in a second task. The results show that vowel categorization varied across tasks: the most difficult vowels in the purely auditory task were the easiest in the orthographic task and, conversely, vowels with a relatively high success rate in the purely auditory task were poorly classified in the orthographic task. The results of Experiment 2 with 22 monolingual Peruvian Spanish listeners replicated the main results of Experiment 1 and confirmed the existence of orthographic effects. Together, the two experiments show that when listening to auditory stimuli only, native speakers of Spanish have great difficulty classifying certain Dutch vowels, regardless of the amount of experience they may have with the Dutch language. Importantly, the pairing of auditory stimuli with orthographic labels can help or hinder Spanish listeners' sound categorization, depending on the specific sound contrast.
Acoustic characteristics of different target vowels during the laryngeal telescopy.
Shu, Min-Tsan; Lee, Kuo-Shen; Chang, Chin-Wen; Hsieh, Li-Chun; Yang, Cheng-Chien
2014-10-01
The aim of this study was to investigate the acoustic characteristics of target vowels phonated in normal voice persons while performing laryngeal telescopy. The acoustic characteristics are compared to show the extent of possible difference to speculate their impact on phonation function. Thirty-four male subjects aged 20-39 years with normal voice were included in this study. The target vowels were /i/ and /ɛ/. Recording of voice samples was done under natural phonation and during laryngeal telescopy. The acoustic analysis included the parameters of fundamental frequency, jitter, shimmer and noise-to-harmonic ratio. The sound of a target vowel /ɛ/ was perceived identical in more than 90% of the subjects by the examiner and speech language pathologist during the telescopy. Both /i/ and /ɛ/ sounds showed significant difference when compared with the results under natural phonation. There was no significant difference between /i/ and /ɛ/ during the telescopy. The present study showed that change in target vowels during laryngeal telescopy makes no significant difference in the acoustic characteristics. The results may lead to the speculation that the phonation mechanism was not affected significantly by different vowels during the telescopy. This study may suggest that in the principle of comfortable phonation, introduction of the target vowels /i/ and /ɛ/ is practical. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Cantin, Léo; Prud'Homme, Michel; Langlois, Mélanie
2015-01-01
Purpose. To investigate the impact of deep brain stimulation of the subthalamic nucleus (STN DBS) and levodopa intake on vowel articulation in dysarthric speakers with Parkinson's disease (PD). Methods. Vowel articulation was assessed in seven Quebec French speakers diagnosed with idiopathic PD who underwent STN DBS. Assessments were conducted on- and off-medication, first prior to surgery and then 1 year later. All recordings were made on-stimulation. Vowel articulation was measured using acoustic vowel space and formant centralization ratio. Results. Compared to the period before surgery, vowel articulation was reduced after surgery when patients were off-medication, while it was better on-medication. The impact of levodopa intake on vowel articulation changed with STN DBS: before surgery, levodopa impaired articulation, while it no longer had a negative effect after surgery. Conclusions. These results indicate that while STN DBS could lead to a direct deterioration in articulation, it may indirectly improve it by reducing the levodopa dose required to manage motor symptoms. These findings suggest that, with respect to speech production, STN DBS and levodopa intake cannot be investigated separately because the two are intrinsically linked. Along with motor symptoms, speech production should be considered when optimizing therapeutic management of patients with PD. PMID:26558134
The contribution of waveform interactions to the perception of concurrent vowels.
Assmann, P F; Summerfield, Q
1994-01-01
Models of the auditory and phonetic analysis of speech must account for the ability of listeners to extract information from speech when competing voices are present. When two synthetic vowels are presented simultaneously and monaurally, listeners can exploit cues provided by a difference in fundamental frequency (F0) between the vowels to help determine their phonemic identities. Three experiments examined the effects of stimulus duration on the perception of such "double vowels." Experiment 1 confirmed earlier findings that a difference in F0 provides a smaller advantage when the duration of the stimulus is brief (50 ms rather than 200 ms). With brief stimuli, there may be insufficient time for attentional mechanisms to switch from the "dominant" member of the pair to the "nondominant" vowel. Alternatively, brief segments may restrict the availability of cues that are distributed over the time course of a longer segment of a double vowel. In experiment 1, listeners did not perform better when the same 50-ms segment was presented four times in succession (with 100-ms silent intervals) rather than only once, suggesting that limits on attention switching do not underlie the duration effect. However, performance improved in some conditions when four successive 50-ms segments were extracted from the 200-ms double vowels and presented in sequence, again with 100-ms silent intervals. Similar improvements were observed in experiment 2 between performance with the first 50-ms segment and one or more of the other three segments when the segments were presented individually. Experiment 3 demonstrated that part of the improvement observed in experiments 1 and 2 could be attributed to waveform interactions that either reinforce or attenuate harmonics that lie near vowel formants. Such interactions were beneficial only when the difference in F0 was small (0.25-1 semitone). These results are compatible with the idea that listeners benefit from small differences in F0 by performing a sequence of analyses of different time segments of a double vowel to determine where the formants of the constituent vowels are best defined.
Auditory orientation in crickets: Pattern recognition controls reactive steering
NASA Astrophysics Data System (ADS)
Poulet, James F. A.; Hedwig, Berthold
2005-10-01
Many groups of insects are specialists in exploiting sensory cues to locate food resources or conspecifics. To achieve orientation, bees and ants analyze the polarization pattern of the sky, male moths orient along the females' odor plume, and cicadas, grasshoppers, and crickets use acoustic signals to locate singing conspecifics. In comparison with olfactory and visual orientation, where learning is involved, auditory processing underlying orientation in insects appears to be more hardwired and genetically determined. In each of these examples, however, orientation requires a recognition process identifying the crucial sensory pattern to interact with a localization process directing the animal's locomotor activity. Here, we characterize this interaction. Using a sensitive trackball system, we show that, during cricket auditory behavior, the recognition process that is tuned toward the species-specific song pattern controls the amplitude of auditory evoked steering responses. Females perform small reactive steering movements toward any sound patterns. Hearing the male's calling song increases the gain of auditory steering within 2-5 s, and the animals even steer toward nonattractive sound patterns inserted into the speciesspecific pattern. This gain control mechanism in the auditory-to-motor pathway allows crickets to pursue species-specific sound patterns temporarily corrupted by environmental factors and may reflect the organization of recognition and localization networks in insects. localization | phonotaxis
Ahmad, Riaz; Naz, Saeeda; Afzal, Muhammad Zeshan; Amin, Sayed Hassan; Breuel, Thomas
2015-01-01
The presence of a large number of unique shapes called ligatures in cursive languages, along with variations due to scaling, orientation and location provides one of the most challenging pattern recognition problems. Recognition of the large number of ligatures is often a complicated task in oriental languages such as Pashto, Urdu, Persian and Arabic. Research on cursive script recognition often ignores the fact that scaling, orientation, location and font variations are common in printed cursive text. Therefore, these variations are not included in image databases and in experimental evaluations. This research uncovers challenges faced by Arabic cursive script recognition in a holistic framework by considering Pashto as a test case, because Pashto language has larger alphabet set than Arabic, Persian and Urdu. A database containing 8000 images of 1000 unique ligatures having scaling, orientation and location variations is introduced. In this article, a feature space based on scale invariant feature transform (SIFT) along with a segmentation framework has been proposed for overcoming the above mentioned challenges. The experimental results show a significantly improved performance of proposed scheme over traditional feature extraction techniques such as principal component analysis (PCA). PMID:26368566
Discrimination of Phonemic Vowel Length by Japanese Infants
ERIC Educational Resources Information Center
Sato, Yutaka; Sogabe, Yuko; Mazuka, Reiko
2010-01-01
Japanese has a vowel duration contrast as one component of its language-specific phonemic repertory to distinguish word meanings. It is not clear, however, how a sensitivity to vowel duration can develop in a linguistic context. In the present study, using the visual habituation-dishabituation method, the authors evaluated infants' abilities to…
Cross-Linguistic Differences in the Immediate Serial Recall of Consonants versus Vowels
ERIC Educational Resources Information Center
Kissling, Elizabeth M.
2012-01-01
The current study investigated native English and native Arabic speakers' phonological short-term memory for sequences of consonants and vowels. Phonological short-term memory was assessed in immediate serial recall tasks conducted in Arabic and English for both groups. Participants (n = 39) heard series of six consonant-vowel syllables and wrote…
ERIC Educational Resources Information Center
Knobel, Mark; Caramazza, Alfonso
2007-01-01
Caramazza et al. [Caramazza, A., Chialant, D., Capasso, R., & Miceli, G. (2000). Separable processing of consonants and vowels. "Nature," 403(6768), 428-430.] report two patients who exhibit a double dissociation between consonants and vowels in speech production. The patterning of this double dissociation cannot be explained by appealing to…
The Effects of Surgical Rapid Maxillary Expansion (SRME) on Vowel Formants
ERIC Educational Resources Information Center
Sari, Emel; Kilic, Mehmet Akif
2009-01-01
The objective of this study was to investigate the effect of surgical rapid maxillary expansion (SRME) on vowel production. The subjects included 12 patients, whose speech were considered perceptually normal, that had undergone surgical RME for expansion of a narrow maxilla. They uttered the following Turkish vowels, ([a], [[epsilon
ERIC Educational Resources Information Center
Emerich, Giang Huong
2012-01-01
In this dissertation, I provide a new analysis of the Vietnamese vowel system as a system with fourteen monophthongs and nineteen diphthongs based on phonetic and phonological data. I propose that these Vietnamese contour vowels - /ie/, /[turned m]?/ and /uo/-should be grouped with these eleven monophthongs /i e epsilon a [turned a] ? ? [turned m]…
Children's Perception of Conversational and Clear American-English Vowels in Noise
ERIC Educational Resources Information Center
Leone, Dorothy; Levy, Erika S.
2015-01-01
Purpose: Much of a child's day is spent listening to speech in the presence of background noise. Although accurate vowel perception is important for listeners' accurate speech perception and comprehension, little is known about children's vowel perception in noise. "Clear speech" is a speech style frequently used by talkers in the…
Criteria for the Segmentation of Vowels on Duplex Oscillograms.
ERIC Educational Resources Information Center
Naeser, Margaret A.
This paper develops criteria for the segmentation of vowels on duplex oscillograms. Previous vowel duration studies have primarily used sound spectrograms. The use of duplex oscillograms, rather than sound spectrograms, permits faster production (real time) at less expense (adding machine paper may be used). The speech signal can be more spread…
ERIC Educational Resources Information Center
Roy, Nelson; Nissen, Shawn L.; Dromey, Christopher; Sapir, Shimon
2009-01-01
In a preliminary study, we documented significant changes in formant transitions associated with successful manual circumlaryngeal treatment (MCT) of muscle tension dysphonia (MTD), suggesting improvement in speech articulation. The present study explores further the effects of MTD on vowel articulation by means of additional vowel acoustic…
ERIC Educational Resources Information Center
Schiff, Rachel
2012-01-01
The present study explored the speed, accuracy, and reading comprehension of vowelized versus unvowelized scripts among 126 native Hebrew speaking children in second, fourth, and sixth grades. Findings indicated that second graders read and comprehended vowelized scripts significantly more accurately and more quickly than unvowelized scripts,…
ERIC Educational Resources Information Center
Becker-Kristal, Roy
2010-01-01
This dissertation examines the relationship between the structural, phonemic properties of vowel inventories and their acoustic phonetic realization, with particular focus on the adequacy of Dispersion Theory, which maintains that inventories are structured so as to maximize perceptual contrast between their component vowels. In order to assess…
The Sound of Mute Vowels in Auditory Word-Stem Completion
ERIC Educational Resources Information Center
Beland, Renee; Prunet, Jean-Francois; Peretz, Isabelle
2009-01-01
Some studies have argued that orthography can influence speakers when they perform oral language tasks. Words containing a mute vowel provide well-suited stimuli to investigate this phenomenon because mute vowels, such as the second "e" in "vegetable", are present orthographically but absent phonetically. Using an auditory word-stem completion…
Cross-modal associations in synaesthesia: Vowel colours in the ear of the beholder
Moos, Anja; Smith, Rachel; Miller, Sam R.; Simmons, David R.
2014-01-01
Human speech conveys many forms of information, but for some exceptional individuals (synaesthetes), listening to speech sounds can automatically induce visual percepts such as colours. In this experiment, grapheme–colour synaesthetes and controls were asked to assign colours, or shades of grey, to different vowel sounds. We then investigated whether the acoustic content of these vowel sounds influenced participants' colour and grey-shade choices. We found that both colour and grey-shade associations varied systematically with vowel changes. The colour effect was significant for both participant groups, but significantly stronger and more consistent for synaesthetes. Because not all vowel sounds that we used are “translatable” into graphemes, we conclude that acoustic–phonetic influences co-exist with established graphemic influences in the cross-modal correspondences of both synaesthetes and non-synaesthetes. PMID:25469218
Yoon, Ji Hye; Jeong, Yong
2018-01-01
Background and Purpose Korean-speaking patients with a brain injury may show agraphia that differs from that of English-speaking patients due to the unique features of Hangul syllabic writing. Each grapheme in Hangul must be arranged from left to right and/or top to bottom within a square space to form a syllable, which requires greater visuospatial abilities than when writing the letters constituting an alphabetic writing system. Among the Hangul grapheme positions within a syllable, the position of a vowel is important because it determines the writing direction and the whole configuration in Korean syllabic writing. Due to the visuospatial characteristics of the Hangul vowel, individuals with early-onset Alzheimer's disease (EOAD) may experiences differences between the difficulties of writing Hangul vowels and consonants due to prominent visuospatial dysfunctions caused by parietal lesions. Methods Eighteen patients with EOAD and 18 age-and-education-matched healthy adults participated in this study. The participants were requested to listen to and write 30 monosyllabic characters that consisted of an initial consonant, medial vowel, and final consonant with a one-to-one phoneme-to-grapheme correspondence. We measured the writing time for each grapheme, the pause time between writing the initial consonant and the medial vowel (P1), and the pause time between writing the medial vowel and the final consonant (P2). Results All grapheme writing and pause times were significantly longer in the EOAD group than in the controls. P1 was also significantly longer than P2 in the EOAD group. Conclusions Patients with EOAD might require a higher judgment ability and longer processing time for determining the visuospatial grapheme position before writing medial vowels. This finding suggests that a longer pause time before writing medial vowels is an early marker of visuospatial dysfunction in patients with EOAD. PMID:29504296
Yoon, Ji Hye; Jeong, Yong; Na, Duk L
2018-04-01
Korean-speaking patients with a brain injury may show agraphia that differs from that of English-speaking patients due to the unique features of Hangul syllabic writing. Each grapheme in Hangul must be arranged from left to right and/or top to bottom within a square space to form a syllable, which requires greater visuospatial abilities than when writing the letters constituting an alphabetic writing system. Among the Hangul grapheme positions within a syllable, the position of a vowel is important because it determines the writing direction and the whole configuration in Korean syllabic writing. Due to the visuospatial characteristics of the Hangul vowel, individuals with early-onset Alzheimer's disease (EOAD) may experiences differences between the difficulties of writing Hangul vowels and consonants due to prominent visuospatial dysfunctions caused by parietal lesions. Eighteen patients with EOAD and 18 age-and-education-matched healthy adults participated in this study. The participants were requested to listen to and write 30 monosyllabic characters that consisted of an initial consonant, medial vowel, and final consonant with a one-to-one phoneme-to-grapheme correspondence. We measured the writing time for each grapheme, the pause time between writing the initial consonant and the medial vowel (P1), and the pause time between writing the medial vowel and the final consonant (P2). All grapheme writing and pause times were significantly longer in the EOAD group than in the controls. P1 was also significantly longer than P2 in the EOAD group. Patients with EOAD might require a higher judgment ability and longer processing time for determining the visuospatial grapheme position before writing medial vowels. This finding suggests that a longer pause time before writing medial vowels is an early marker of visuospatial dysfunction in patients with EOAD. Copyright © 2018 Korean Neurological Association.
Jafari, Narges; Drinnan, Michael; Mohamadi, Reyhane; Yadegari, Fariba; Nourbakhsh, Mandana; Torabinezhad, Farhad
2016-05-01
Normal-hearing (NH) acuity and auditory feedback control are crucial for human voice production and articulation. The lack of auditory feedback in individuals with profound hearing impairment changes their vowel production. The purpose of this study was to compare Persian vowel production in deaf children with cochlear implants (CIs) and that in NH children. The participants were 20 children (12 girls and 8 boys) with age range of 5 years; 1 month to 9 years. All patients had congenital hearing loss and received a multichannel CI at an average age of 3 years. They had at least 6 months experience of their current device (CI). The control group consisted of 20 NH children (12 girls and 8 boys) with age range of 5 to 9 years old. The two groups were matched by age. Participants were native Persian speakers who were asked to produce the vowels /i/, /e/, /ӕ/, /u/, /o/, and /a/. The averages for first formant frequency (F1) and second formant frequency (F2) of six vowels were measured using Praat software (Version 5.1.44, Boersma & Weenink, 2012). The independent samples t test was conducted to assess the differences in F1 and F2 values and the area of the vowel space between the two groups. Mean values of F1 were increased in CI children; the mean values of F1 for vowel /i/ and /a/, F2 for vowel /a/ and /o/ were significantly different (P < 0.05). The changes in F1 and F2 showed a centralized vowel space for CI children. F1 is increased in CI children, probably because CI children tend to overarticulate. We hypothesis this is due to a lack of auditory feedback; there is an attempt by hearing-impaired children to compensate via proprioceptive feedback during articulatory process. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Analysis of Acoustic Features in Speakers with Cognitive Disorders and Speech Impairments
NASA Astrophysics Data System (ADS)
Saz, Oscar; Simón, Javier; Rodríguez, W. Ricardo; Lleida, Eduardo; Vaquero, Carlos
2009-12-01
This work presents the results in the analysis of the acoustic features (formants and the three suprasegmental features: tone, intensity and duration) of the vowel production in a group of 14 young speakers suffering different kinds of speech impairments due to physical and cognitive disorders. A corpus with unimpaired children's speech is used to determine the reference values for these features in speakers without any kind of speech impairment within the same domain of the impaired speakers; this is 57 isolated words. The signal processing to extract the formant and pitch values is based on a Linear Prediction Coefficients (LPCs) analysis of the segments considered as vowels in a Hidden Markov Model (HMM) based Viterbi forced alignment. Intensity and duration are also based in the outcome of the automated segmentation. As main conclusion of the work, it is shown that intelligibility of the vowel production is lowered in impaired speakers even when the vowel is perceived as correct by human labelers. The decrease in intelligibility is due to a 30% of increase in confusability in the formants map, a reduction of 50% in the discriminative power in energy between stressed and unstressed vowels and to a 50% increase of the standard deviation in the length of the vowels. On the other hand, impaired speakers keep good control of tone in the production of stressed and unstressed vowels.
Oh, Grace E.; Guion-Anderson, Susan; Aoyama, Katsura; Flege, James E.; Akahane-Yamada, Reiko; Yamada, Tsuneo
2011-01-01
The effect of age of acquisition on first- and second-language vowel production was investigated. Eight English vowels were produced by Native Japanese (NJ) adults and children as well as by age-matched Native English (NE) adults and children. Productions were recorded shortly after the NJ participants’ arrival in the USA and then one year later. In agreement with previous investigations [Aoyama, et al., J. Phon. 32, 233–250 (2004)], children were able to learn more, leading to higher accuracy than adults in a year’s time. Based on the spectral quality and duration comparisons, NJ adults had more accurate production at Time 1, but showed no improvement over time. The NJ children’s productions, however, showed significant differences from the NE children’s for English “new” vowels /ɪ/, /ε/, /ɑ/, /ʌ/ and /ʊ/ at Time 1, but produced all eight vowels in a native-like manner at Time 2. An examination of NJ speakers’ productions of Japanese /i/, /a/, /u/ over time revealed significant changes for the NJ Child Group only. Japanese /i/ and /a/ showed changes in production that can be related to second language (L2) learning. The results suggest that L2 vowel production is affected importantly by age of acquisition and that there is a dynamic interaction, whereby the first and second language vowels affect each other. PMID:21603058
Electrophysiological correlates of grapheme-phoneme conversion.
Huang, Koongliang; Itoh, Kosuke; Suwazono, Shugo; Nakada, Tsutomu
2004-08-19
The cortical processes underlying grapheme-phoneme conversion were investigated by event-related potentials (ERPs). The task consisted of silent reading or vowel-matching of three Japanese hiragana characters, each representing a consonant-vowel syllable. At earlier latencies, typical components of the visual ERP, namely, P1 (110 ms), N1 (170 ms) and P2 (300 ms), were elicited in the temporo-occipital area for both tasks as well as control task (observing the orthographic shapes of three Korean characters). Following these earlier components, two sustained negativities were identified. The earlier sustained negativity, referred here to as SN1, was found in both the silent-reading and vowel-matching task but not in the control task. The scalp distribution of SN1 was over the left occipito-temporal area, with maximum amplitude over O1. The amplitude of SN1 was larger in the vowel-matching task compared to the silent-reading task, consistent with previous reports that ERP amplitude correlates with task difficulty. SN2, the later sustained negativity, was only observed in the vowel-matching task. The scalp distribution of SN2 was over the midsagittal centro-parietal area with maximum amplitude over Cz. Elicitation of SN2 in the vowel-matching task suggested that the vowel-matching task requires a wider range of neural activities exceeding the established conventional area of language processing.
Delgado-Hernandez, J
2017-02-01
The acoustic analysis is a tool that provides objective data on changes of speech in dysarthria. To evaluate, in the ataxic dysarthria, the relationship between the vowel space area (VSA), the formant centralization ratio (FCR) and the mean of the primary distances with the speech intelligibility. A sample of fourteen Spanish speakers, ten with dysarthria and four controls, was used. The values of first and second formants in 140 vowels extracted of 140 words were analyzed. To calculate the level of intelligibility seven listeners were involved and a task of identification verbal stimuli was used. The dysarthric subjects have less contrast between middle and high vowels and between back vowels. Significant differences in the VSA, FCR and mean of the primary distances compared to control subjects (p = 0.007, 0.005 and 0.030, respectively) are observed. Regression analysis show the relationship between VSA and the mean of primary distances with the level of speech intelligibility (r = 0.60 and 0.74, respectively). Ataxic dysarthria subjects have lower contrast and vowel centralization in carrying out the vowels. The acoustic measures studied in this preliminary work have a high sensitivity in the detection of dysarthria but only the VSA and the mean of primary distances provide information on the severity of this type of speech disturbance.
ERIC Educational Resources Information Center
Pons, Ferran; Albareda-Castellot, Barbara; Sebastian-Galles, Nuria
2012-01-01
Vowels with extreme articulatory-acoustic properties act as natural referents. Infant perceptual asymmetries point to an underlying bias favoring these referent vowels. However, as language experience is gathered, distributional frequency of speech sounds could modify this initial bias. The perception of the /i/-/e/ contrast was explored in 144…
ERIC Educational Resources Information Center
Kriengwatana, Buddhamas Pralle; Escudero, Paola
2017-01-01
Purpose: This study tested an assumption of the Natural Referent Vowel (Polka & Bohn, 2011) framework, namely, that directional asymmetries in adult vowel perception can be influenced by language experience. Method: Data from participants reported in Escudero and Williams (2014) were analyzed. Spanish participants categorized the Dutch vowels…
ERIC Educational Resources Information Center
Asadi, Ibrahim A.; Khateb, Asaid
2017-01-01
This study examined the orthographic transparency of Arabic by investigating the contribution of phonological awareness (PA), vocabulary, and Rapid Automatized Naming (RAN) to reading vowelized and unvowelized words. The results from first and second grade children showed that PA contribution was similar in the vowelized and unvowelized…
Vowel Acoustic Space Development in Children: A Synthesis of Acoustic and Anatomic Data
ERIC Educational Resources Information Center
Vorperian, Houri K.; Kent, Ray D.
2007-01-01
Purpose: This article integrates published acoustic data on the development of vowel production. Age specific data on formant frequencies are considered in the light of information on the development of the vocal tract (VT) to create an anatomic-acoustic description of the maturation of the vowel acoustic space for English. Method: Literature…
ERIC Educational Resources Information Center
Bernstein, Stuart E.
2009-01-01
A descriptive study of vowel spelling errors made by children first diagnosed with dyslexia (n = 79) revealed that phonological errors, such as "bet" for "bat", outnumbered orthographic errors, such as "bate" for "bait". These errors were more frequent in nonwords than words, suggesting that lexical context helps with vowel spelling. In a second…
ERIC Educational Resources Information Center
Gogate, Lakshmi J.; Bahrick, Lorraine E.
1998-01-01
Investigated 7-month olds' ability to relate vowel sounds with objects when intersensory redundancy was present versus absent. Found that infants detected a mismatch in the vowel-object pairs in the moving-synchronous condition but not in the still or moving-asynchronous condition, demonstrating that temporal synchrony between vocalizations and…
Now You Hear It, Now You Don't: Vowel Devoicing in Japanese Infant-Directed Speech
ERIC Educational Resources Information Center
Fais, Laurel; Kajikawa, Sachiyo; Amano, Shigeaki; Werker, Janet F.
2010-01-01
In this work, we examine a context in which a conflict arises between two roles that infant-directed speech (IDS) plays: making language structure salient and modeling the adult form of a language. Vowel devoicing in fluent adult Japanese creates violations of the canonical Japanese consonant-vowel word structure pattern by systematically…
Perceptual Training of Second-Language Vowels: Does Musical Ability Play a Role?
ERIC Educational Resources Information Center
Ghaffarvand Mokari, Payam; Werner, Stefan
2018-01-01
The present study attempts to extend the research on the effects of phonetic training on the production and perception of second-language (L2) vowels. We also examined whether success in learning L2 vowels through high-variability intensive phonetic training is related to the learners' general musical abilities. Forty Azerbaijani learners of…
Toward a Systematic Evaluation of Vowel Target Events across Speech Tasks
ERIC Educational Resources Information Center
Kuo, Christina
2011-01-01
The core objective of this study was to examine whether acoustic variability of vowel production in American English, across speaking tasks, is systematic. Ten male speakers who spoke a relatively homogeneous Wisconsin dialect produced eight monophthong vowels (in hVd and CVC contexts) in four speaking tasks, including clear-speech, citation form,…
ERIC Educational Resources Information Center
Steinbrink, Claudia; Groth, Katarina; Lachmann, Thomas; Riecker, Axel
2012-01-01
This fMRI study investigated phonological vs. auditory temporal processing in developmental dyslexia by means of a German vowel length discrimination paradigm (Groth, Lachmann, Riecker, Muthmann, & Steinbrink, 2011). Behavioral and fMRI data were collected from dyslexics and controls while performing same-different judgments of vowel duration in…
Multichannel Compression: Effects of Reduced Spectral Contrast on Vowel Identification
ERIC Educational Resources Information Center
Bor, Stephanie; Souza, Pamela; Wright, Richard
2008-01-01
Purpose: To clarify if large numbers of wide dynamic range compression channels provide advantages for vowel identification and to measure its acoustic effects. Methods: Eight vowels produced by 12 talkers in the /hVd/ context were compressed using 1, 2, 4, 8, and 16 channels. Formant contrast indices (mean formant peak minus mean formant trough;…
Quantitative and Descriptive Comparison of Four Acoustic Analysis Systems: Vowel Measurements
ERIC Educational Resources Information Center
Burris, Carlyn; Vorperian, Houri K.; Fourakis, Marios; Kent, Ray D.; Bolt, Daniel M.
2014-01-01
Purpose: This study examines accuracy and comparability of 4 trademarked acoustic analysis software packages (AASPs): Praat, WaveSurfer, TF32, and CSL by using synthesized and natural vowels. Features of AASPs are also described. Method: Synthesized and natural vowels were analyzed using each of the AASP's default settings to secure 9…
Morphological Parsing and the Use of Segmentation Cues in Reading Finnish Compounds
ERIC Educational Resources Information Center
Bertram, Raymond; Pollatsek, Alexander; Hyona, Jukka
2004-01-01
This eye movement study investigated the use of two types of segmentation cues in processing long Finnish compounds. The cues were related to the vowel quality properties of the constituents and properties of the consonant starting the second constituent. In Finnish, front vowels never appear with back vowels in a lexeme, but different quality…
The Rise and Fall of Unstressed Vowel Reduction in the Spanish of Cusco, Peru: A Sociophonetic Study
ERIC Educational Resources Information Center
Delforge, Ann Marie
2009-01-01
This dissertation describes the phonetic characteristics of a phenomenon that has previously been denominated "unstressed vowel reduction" in Andean Spanish based on the spectrographic analysis of 40,556 unstressed vowels extracted from the conversational speech of 150 residents of the city of Cusco, Peru. Results demonstrate that this…
Early and Late Spanish-English Bilingual Adults' Perception of American English Vowels
ERIC Educational Resources Information Center
Baigorri, Miriam
2016-01-01
Increasing numbers of Hispanic immigrants are entering the US (US Census Bureau, 2011) and are learning American English (AE) as a second language (L2). Many may experience difficulty in understanding AE. Accurate perception of AE vowels is important because vowels carry a large part of the speech signal (Kewley-Port, Burkle, & Lee, 2007). The…
Phonological Representations in Children's Native and Non-native Lexicon
ERIC Educational Resources Information Center
Simon, Ellen; Sjerps, Matthias J.; Fikkert, Paula
2014-01-01
This study investigated the phonological representations of vowels in children's native and non-native lexicons. Two experiments were mispronunciation tasks (i.e., a vowel in words was substituted by another vowel from the same language). These were carried out by Dutch-speaking 9-12-year-old children and Dutch-speaking adults, in their…
Characteristics of the Lax Vowel Space in Dysarthria
ERIC Educational Resources Information Center
Tjaden, Kris; Rivera, Deanna; Wilding, Gregory; Turner, Greg S.
2005-01-01
It has been hypothesized that lax vowels may be relatively unaffected by dysarthria, owing to the reduced vocal tract shapes required for these phonetic events (G. S. Turner, K. Tjaden, & G. Weismer, 1995). It also has been suggested that lax vowels may be especially susceptible to speech mode effects (M. A. Picheny, N. I. Durlach, & L. D. Braida,…
Sayles, Mark; Stasiak, Arkadiusz; Winter, Ian M.
2015-01-01
The auditory system typically processes information from concurrently active sound sources (e.g., two voices speaking at once), in the presence of multiple delayed, attenuated and distorted sound-wave reflections (reverberation). Brainstem circuits help segregate these complex acoustic mixtures into “auditory objects.” Psychophysical studies demonstrate a strong interaction between reverberation and fundamental-frequency (F0) modulation, leading to impaired segregation of competing vowels when segregation is on the basis of F0 differences. Neurophysiological studies of complex-sound segregation have concentrated on sounds with steady F0s, in anechoic environments. However, F0 modulation and reverberation are quasi-ubiquitous. We examine the ability of 129 single units in the ventral cochlear nucleus (VCN) of the anesthetized guinea pig to segregate the concurrent synthetic vowel sounds /a/ and /i/, based on temporal discharge patterns under closed-field conditions. We address the effects of added real-room reverberation, F0 modulation, and the interaction of these two factors, on brainstem neural segregation of voiced speech sounds. A firing-rate representation of single-vowels' spectral envelopes is robust to the combination of F0 modulation and reverberation: local firing-rate maxima and minima across the tonotopic array code vowel-formant structure. However, single-vowel F0-related periodicity information in shuffled inter-spike interval distributions is significantly degraded in the combined presence of reverberation and F0 modulation. Hence, segregation of double-vowels' spectral energy into two streams (corresponding to the two vowels), on the basis of temporal discharge patterns, is impaired by reverberation; specifically when F0 is modulated. All unit types (primary-like, chopper, onset) are similarly affected. These results offer neurophysiological insights to perceptual organization of complex acoustic scenes under realistically challenging listening conditions. PMID:25628545
Verbal short-term memory as an articulatory system: evidence from an alternative paradigm.
Cheung, Him; Wooltorton, Lana
2002-01-01
In a series of experiments, the role of articulatory rehearsal in verbal [corrected] short-term memory was examined via a shadowing-plus-recall paradigm. In this paradigm, subjects shadowed a word target presented closely after an auditory memory list before they recalled the list. The phonological relationship between the shadowing target and the final item on the memory list was manipulated. Experiments 1 and 2 demonstrated that targets sounding similar to the list-final memory item generally took longer to shadow than unrelated targets. This inhibitory effect of phonological relatedness was more pronounced with tense- than lax-vowel pseudoword recall lists. The interaction between vowel tenseness and phonological relatedness was replicated in Experiment 3 using shorter lists of real words. In Experiment 4, concurrent articulation was applied during list learning to block rehearsal; consequently, neither the phonological relatedness effect nor its interaction with vowel tenseness emerged. Experiments 5 and 6 manipulated the occurrence frequencies and lexicality of the recall items, respectively, instead of vowel tenseness. Unlike vowel tenseness, these non-articulatory memory factors failed to interact with the phonological relatedness effect. Experiment 7 orthogonally manipulated the vowel tenseness and frequencies of the recall items; slowing in shadowing times due to phonological relatedness was modulated by vowel tenseness but not frequency. Taken together, these results suggest that under the present paradigm, the modifying effect of vowel tenseness on the magnitude of slowing in shadowing due to phonological relatedness is indicative of a prominent articulatory component in verbal short-term retention. The shadowing-plus-recall approach avoids confounding overt recall into internal memory processing, which is an inherent problem of the traditional immediate serial recall and span tasks.
Brockmann, Meike; Drinnan, Michael J; Storck, Claudio; Carding, Paul N
2011-01-01
The aims of this study were to examine vowel and gender effects on jitter and shimmer in a typical clinical voice task while correcting for the confounding effects of voice sound pressure level (SPL) and fundamental frequency (F(0)). Furthermore the relative effect sizes of vowel, gender, voice SPL, and F(0) were assessed, and recommendations for clinical measurements were derived. With this cross-sectional single cohort study, 57 healthy adults (28 women, 29 men) aged 20-40 years were investigated. Three phonations of /a/, /o/, and /i/ at "normal" voice loudness were analyzed using Praat (software). The effects of vowel, gender, voice SPL, and F(0) on jitter and shimmer were assessed using descriptive and inferential (analysis of covariance) statistics. The effect sizes were determined with the eta-squared statistic. Vowels, gender, voice SPL, and F(0), each had significant effects either on jitter or on shimmer, or both. Voice SPL was the most important factor, whereas vowel, gender, and F(0) effects were comparatively small. Because men had systematically higher voice SPL, the gender effects on jitter and shimmer were smaller when correcting for SPL and F(0). Surprisingly, in clinical assessments, voice SPL has the single biggest impact on jitter and shimmer. Vowel and gender effects were clinically important, whereas fundamental frequency had a relatively small influence. Phonations at a predefined voice SPL (80 dB minimum) and vowel (/a/) would enhance measurement reliability. Furthermore, gender-specific thresholds applying these guidelines should be established. However, the efficiency of these measures should be verified and tested with patients. Copyright © 2011 The Voice Foundation. All rights reserved.
NASA Astrophysics Data System (ADS)
Miyahara, Tomoko; Koda, Ai; Sekiguchi, Rikuko; Amemiya, Toshihiko
In this study, we investigated the nature of cross-modal associations between colors and vowels. In Experiment 1, we examined the patterns of synesthetic correspondence between colors and vowels in a perceptual similarity experiment. The results were as follows: red was chosen for /a/, yellow was chosen for /i/, and blue was chosen for /o/ significantly more than any other vowels. Interestingly this pattern of correspondence is similar to the pattern of colored hearing reported by synesthetes. In Experiment 2, we investigated the robustness of these cross-modal associations using an implicit association test (IAT). A clear congruence effect was found. Participants responded faster in congruent conditions (/i/ and yellow, /o/ and blue) than in incongruent conditions (/i/ and blue, /o/ and yellow). This result suggests that the weak synesthesia between vowels and colors in non-synesthtes is not the fact of mere conscious choice, but reflects some underlying implicit associations.
The Prosodic Licensing of Coda Consonants in Early Speech: Interactions with Vowel Length
ERIC Educational Resources Information Center
Miles, Kelly; Yuen, Ivan; Cox, Felicity; Demuth, Katherine
2016-01-01
English has a word-minimality requirement that all open-class lexical items must contain at least two moras of structure, forming a bimoraic foot (Hayes, 1995).Thus, a word with either a long vowel, or a short vowel and a coda consonant, satisfies this requirement. This raises the question of when and how young children might learn this…
ERIC Educational Resources Information Center
Demirezen, Mehmet
2017-01-01
In North American English (NAE) and British English, [ae] and [?] are open vowel phonemes which are articulated by a speaker easily without a build-up of air pressure. Among all English vowels, the greatest problem for most Turkish majors of English is the discrimination of [ae] and [?]. In English, [ae] is called the "short a" or ash,…
An Index of Phonic Patterns by Vowel Types. AVKO "Great Idea" Reprint Series No. 622.
ERIC Educational Resources Information Center
McCabe, Don
Intended for the use of teachers or diagnosticians, this booklet presents charts that list various phonic patterns, word families, or "rimes" associated with specific vowel patterns. Lists in the booklet are arranged according to the 14 basic vowel phonemes in English (including long a, long e, long i, long aw, short ah, and short u).…
ERIC Educational Resources Information Center
Tomita, Kaoru; Yamada, Jun; Takatsuka, Shigenobu
2010-01-01
This study investigated how Japanese-speaking learners of English pronounce the three point vowels /i/, /u/, and /a/ appearing in the first and second monosyllabic words of English noun phrases, and the schwa /[image omitted]/ appearing in English disyllabic words. First and second formant (F1 and F2) values were measured for four Japanese…
Effects of Short- and Long-Term Changes in Auditory Feedback on Vowel and Sibilant Contrasts
ERIC Educational Resources Information Center
Lane, Harlan; Matthies, Melanie L.; Guenther, Frank H.; Denny, Margaret; Perkell, Joseph S.; Stockmann, Ellen; Tiede, Mark; Vick, Jennell; Zandipour, Majid
2007-01-01
Purpose: To assess the effects of short- and long-term changes in auditory feedback on vowel and sibilant contrasts and to evaluate hypotheses arising from a model of speech motor planning. Method: The perception and production of vowel and sibilant contrasts were measured in 8 postlingually deafened adults prior to activation of their cochlear…
Hiatus Resolution in Spanish: An Experimental Study
ERIC Educational Resources Information Center
Souza, Benjamin J.
2010-01-01
In Spanish, adjacent vowels across and within word boundaries are either in hiatus or form a diphthong. Generally, when either of the unstressed high vowels /i/ and /u/ appears next to any of the other vowels /e/, /a/, or /o/ the result is a diphthong (i.e., "puerta" "door" less than [pwer.ta], "miel" "honey" less than [mjel], and so on). All…
Vowel Height Allophony and Dorsal Place Contrasts in Cochabamba Quechua.
Gallagher, Gillian
2016-01-01
This paper reports on the results of two studies investigating the role of allophony in cueing phonemic contrasts. In Cochabamba Quechua, the uvularvelar place distinction is often cued by additional differences in the height of the surrounding vowels. An acoustic study documents the lowering effect of a preceding tautomorphemic or a following heteromorphemic uvular on the high vowels /i u/. A discrimination study finds that vowel height is a significant cue to the velar-uvular place contrast. These findings support a view of contrasts as collections of distinguishing properties, as opposed to oppositions in a single distinctive feature. © 2016 S. Karger AG, Basel.
Perceptual prothesis in native Spanish speakers
NASA Astrophysics Data System (ADS)
Theodore, Rachel M.; Schmidt, Anna M.
2003-04-01
Previous research suggests a perceptual bias exists for native phonotactics [D. Massaro and M. Cohen, Percept. Psychophys. 34, 338-348 (1983)] such that listeners report nonexistent segments when listening to stimuli that violate native phonotactics [E. Dupoux, K. Kakehi, Y. Hirose, C. Pallier, and J. Mehler, J. Exp. Psychol.: Human Percept. Perform. 25, 1568-1578 (1999)]. This study investigated how native-language experience affects second language processing, focusing on how native Spanish speakers perceive the English clusters /st/, /sp/, and /sk/, which represent phonotactically illegal forms in Spanish. To preserve native phonotactics, Spanish speakers often produce prothetic vowels before English words beginning with /s/ clusters. Is the influence of native phonotactics also present in the perception of illegal clusters? A stimuli continuum ranging from no vowel (e.g., ``sku'') to a full vowel (e.g., ``esku'') before the cluster was used. Four final vowel contexts were used for each cluster, resulting in 12 sCV and 12 VsCV nonword endpoints. English and Spanish listeners were asked to discriminate between pairs differing in vowel duration and to identify the presence or absence of a vowel before the cluster. Results will be discussed in terms of implications for theories of second language speech perception.
Variability in English vowels is comparable in articulation and acoustics
Noiray, Aude; Iskarous, Khalil; Whalen, D. H.
2014-01-01
The nature of the links between speech production and perception has been the subject of longstanding debate. The present study investigated the articulatory parameter of tongue height and the acoustic F1-F0 difference for the phonological distinction of vowel height in American English front vowels. Multiple repetitions of /i, ɪ, e, ε, æ/ in [(h)Vd] sequences were recorded in seven adult speakers. Articulatory (ultrasound) and acoustic data were collected simultaneously to provide a direct comparison of variability in vowel production in both domains. Results showed idiosyncratic patterns of articulation for contrasting the three front vowel pairs /i-ɪ/, /e-ε/ and /ε-æ/ across subjects, with the degree of variability in vowel articulation comparable to that observed in the acoustics for all seven participants. However, contrary to what was expected, some speakers showed reversals for tongue height for /ɪ/-/e/ that was also reflected in acoustics with F1 higher for /ɪ/ than for /e/. The data suggest the phonological distinction of height is conveyed via speaker-specific articulatory-acoustic patterns that do not strictly match features descriptions. However, the acoustic signal is faithful to the articulatory configuration that generated it, carrying the crucial information for perceptual contrast. PMID:25101144
Vowel space development in a child acquiring English and Spanish from birth
NASA Astrophysics Data System (ADS)
Andruski, Jean; Kim, Sahyang; Nathan, Geoffrey; Casielles, Eugenia; Work, Richard
2005-04-01
To date, research on bilingual first language acquisition has tended to focus on the development of higher levels of language, with relatively few analyses of the acoustic characteristics of bilingual infants' and childrens' speech. Since monolingual infants begin to show perceptual divisions of vowel space that resemble adult native speakers divisions by about 6 months of age [Kuhl et al., Science 255, 606-608 (1992)], bilingual childrens' vowel production may provide evidence of their awareness of language differences relatively early during language development. This paper will examine the development of vowel categories in a child whose mother is a native speaker of Castilian Spanish, and whose father is a native speaker of American English. Each parent speaks to the child only in her/his native language. For this study, recordings made at the ages of 2;5 and 2;10 were analyzed and F1-F2 measurements were made of vowels from the stressed syllables of content words. The development of vowel space is compared across ages within each language, and across languages at each age. In addition, the child's productions are compared with the mother's and father's vocalic productions, which provide the predominant input in Spanish and English respectively.
NASA Astrophysics Data System (ADS)
Lin, Yeong-Fen Emily
This thesis is the result of an investigation of the source-vowel interaction from the point of view of perception. Major objectives include the identification of the acoustic correlates of breathy voice and the disclosure of the interdependent relationship between the perception of vowel identity and breathiness. Two experiments were conducted to achieve these objectives. In the first experiment, voice samples from one control group and seven patient groups were compared. The control group consisted of five female and five male adults. The ten normals were recruited to perform a sustained vowel phonation task with constant pitch and loudness. The voice samples of seventy patients were retrieved from a hospital data base, with vowels extracted from sentences repeated by patients at their habitual pitch and loudness. The seven patient groups were divided, based on a unique combination of patients' measures on mean flow rate and glottal resistance. Eighteen acoustic variables were treated with a three-way (Gender x Group x Vowel) ANOVA. Parameters showing a significant female-male difference as well as group differences, especially those between the presumed breathy group and the other groups, were identified as relevant to the distinction of breathy voice. As a result, F1-F3 amplitude difference and slope were found to be most effective in distinguishing breathy voice. Other acoustic correlates of breathy voice included F1 bandwidth, RMS-H1 amplitude difference, and F1-F2 amplitude difference and slope. In the second experiment, a formant synthesizer was used to generate vowel stimuli with varying spectral tilt and F1 bandwidth. Thirteen native American English speakers made dissimilarity judgements on paired stimuli in terms of vowel identity and breathiness. Listeners' perceptual vowel spaces were found to be affected by changes in the acoustic correlates of breathy voice. The threshold of detecting a change of vocal quality in the breathiness domain was also found to be vowel-dependent.
Schenk, Barbara S; Baumgartner, Wolf Dieter; Hamzavi, Jafar Sasan
2003-12-01
The most obvious and best documented changes in speech of postlingually deafened speakers are the rate, fundamental frequency, and volume (energy). These changes are due to the lack of auditory feedback. But auditory feedback affects not only the suprasegmental parameters of speech. The aim of this study was to determine the change at the segmental level of speech in terms of vowel formants. Twenty-three postlingually deafened and 18 normally hearing speakers were recorded reading a German text. The frequencies of the first and second formants and the vowel spaces of selected vowels in word-in-context condition were compared. All first formant frequencies (F1) of the postlingually deafened speakers were significantly different from those of the normally hearing people. The values of F1 were higher for the vowels /e/ (418+/-61 Hz compared with 359+/-52 Hz, P=0.006) and /o/ (459+/-58 compared with 390+/-45 Hz, P=0.0003) and lower for /a/ (765+/-115 Hz compared with 851+/-146 Hz, P=0.038). The second formant frequency (F2) only showed a significant increase for the vowel/e/(2016+/-347 Hz compared with 2279+/-250 Hz, P=0.012). The postlingually deafened people were divided into two subgroups according to duration of deafness (shorter/longer than 10 years of deafness). There was no significant difference in formant changes between the two groups. Our report demonstrated an effect of auditory feedback also on segmental features of speech of postlingually deafened people.
The impact of inverted text on visual word processing: An fMRI study.
Sussman, Bethany L; Reddigari, Samir; Newman, Sharlene D
2018-06-01
Visual word recognition has been studied for decades. One question that has received limited attention is how different text presentation orientations disrupt word recognition. By examining how word recognition processes may be disrupted by different text orientations it is hoped that new insights can be gained concerning the process. Here, we examined the impact of rotating and inverting text on the neural network responsible for visual word recognition focusing primarily on a region of the occipto-temporal cortex referred to as the visual word form area (VWFA). A lexical decision task was employed in which words and pseudowords were presented in one of three orientations (upright, rotated or inverted). The results demonstrate that inversion caused the greatest disruption of visual word recognition processes. Both rotated and inverted text elicited increased activation in spatial attention regions within the right parietal cortex. However, inverted text recruited phonological and articulatory processing regions within the left inferior frontal and left inferior parietal cortices. Finally, the VWFA was found to not behave similarly to the fusiform face area in that unusual text orientations resulted in increased activation and not decreased activation. It is hypothesized here that the VWFA activation is modulated by feedback from linguistic processes. Copyright © 2018 Elsevier Inc. All rights reserved.
Normative data for the Maryland CNC Test.
Mendel, Lisa Lucks; Mustain, William D; Magro, Jessica
2014-09-01
The Maryland consonant-vowel nucleus-consonant (CNC) Test is routinely used in Veterans Administration medical centers, yet there is a paucity of published normative data for this test. The purpose of this study was to provide information on the means and distribution of word-recognition scores on the Maryland CNC Test as a function of degree of hearing loss for a veteran population. A retrospective, descriptive design was conducted. The sample consisted of records from veterans who had Compensation and Pension (C&P) examinations at a Veterans Administration medical center (N = 1,760 ears). Audiometric records of veterans who had C&P examinations during a 10 yr period were reviewed, and the pure-tone averages (PTA4) at four frequencies (1000, 2000, 3000, and 4000 Hz) were documented. The maximum word-recognition score (PBmax) was determined from the performance-intensity functions obtained using the Maryland CNC Test. Correlations were made between PBmax and PTA4. A wide range of word-recognition scores were obtained at all levels of PTA4 for this population. In addition, a strong negative correlation between the PBmax and the PTA4 was observed, indicating that as PTA4 increased, PBmax decreased. Word-recognition scores decreased significantly as hearing loss increased beyond a mild hearing loss. Although threshold was influenced by age, no statistically significant relationship was found between word-recognition score and the age of the participants. RESULTS from this study provide normative data in table and figure format to assist audiologists in interpreting patient results on the Maryland CNC test for a veteran population. These results provide a quantitative method for audiologists to use to interpret word-recognition scores based on pure-tone hearing loss. American Academy of Audiology.
Shimokura, Ryota; Akasaka, Sakie; Nishimura, Tadashi; Hosoi, Hiroshi; Matsui, Toshie
2017-02-01
Some Japanese monosyllables contain consonants that are not easily discernible for individuals with sensorineural hearing loss. However, the acoustic features that make these monosyllables difficult to discern have not been clearly identified. Here, this study used the autocorrelation function (ACF), which can capture temporal features of signals, to clarify the factors influencing speech intelligibility. For each monosyllable, five factors extracted from the ACF [Φ(0): total energy; τ 1 and ϕ 1 : delay time and amplitude of the maximum peak; τ e : effective duration; W ϕ (0) : spectral centroid], voice onset time, speech intelligibility index, and loudness level were compared with the percentage of correctly perceived articulations (144 ears) obtained by 50 Japanese vowel and consonant-vowel monosyllables produced by one female speaker. Results showed that median effective duration [(τ e ) med ] was strongly correlated with the percentage of correctly perceived articulations of the consonants (r = 0.87, p < 0.01). (τ e ) med values were computed by running ACFs with the time lag at which the magnitude of the logarithmic-ACF envelope had decayed to -10 dB. Effective duration is a measure of temporal pattern persistence, i.e., the duration over which the waveform maintains a stable pattern. The authors postulate that low recognition ability is related to degraded perception of temporal fluctuation patterns.
ERIC Educational Resources Information Center
Darcy, Isabelle; Dekydtspotter, Laurent; Sprouse, Rex A.; Glover, Justin; Kaden, Christiane; McGuire, Michael; Scott, John H. G.
2012-01-01
It is well known that adult US-English-speaking learners of French experience difficulties acquiring high /y/-/u/ and mid /oe/-/[openo]/ front vs. back rounded vowel contrasts in French. This study examines the acquisition of these French vowel contrasts at two levels: phonetic categorization and lexical representations. An ABX categorization task…
Vowel Development in an Emergent Mandarin-English Bilingual Child: A Longitudinal Study
ERIC Educational Resources Information Center
Yang, Jing; Fox, Robert A.; Jacewicz, Ewa
2015-01-01
This longitudinal case study documents the emergence of bilingualism in a young monolingual Mandarin boy on the basis of an acoustic analysis of his vowel productions recorded via a picture-naming task over 20 months following his enrollment in an all-English (L2) preschool at the age of 3;7. The study examined (1) his initial L2 vowel space, (2)…
ERIC Educational Resources Information Center
Han, Jeong-Im; Hwang, Jong-Bai; Choi, Tae-Hwan
2011-01-01
The purpose of this study was to evaluate the acquisition of non-contrastive phonetic details of a second language. Reduced vowels in English are realized as a schwa or barred- i depending on their phonological contexts, but Korean has no reduced vowels. Two groups of Korean learners of English who differed according to the experience of residence…
ERIC Educational Resources Information Center
Tomaschek, Fabian; Truckenbrodt, Hubert; Hertrich, Ingo
2013-01-01
Recent experiments showed that the perception of vowel length by German listeners exhibits the characteristics of categorical perception. The present study sought to find the neural activity reflecting categorical vowel length and the short-long boundary by examining the processing of non-contrastive durations and categorical length using MEG.…
Oriented attachment by enantioselective facet recognition in millimeter-sized gypsum crystals.
Viedma, Cristóbal; Cuccia, Louis A; McTaggart, Alicia; Kahr, Bart; Martin, Alexander T; McBride, J Michael; Cintas, Pedro
2016-09-22
Crystal growth by oriented attachment involves the spontaneous self-assembly of adjoining crystals with common crystallographic orientations. Herein, we report the oriented attachment of gypsum crystals on agitation to form stereoselective mesoscale aggregates.
Effects of blocking and presentation on the recognition of word and nonsense syllables in noise
NASA Astrophysics Data System (ADS)
Benkí, José R.
2003-10-01
Listener expectations may have significant effects on spoken word recognition, modulating word similarity effects from the lexicon. This study investigates the effect of blocking by lexical status on the recognition of word and nonsense syllables in noise. 240 phonemically matched word and nonsense CVC syllables [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-108 (1988)] were presented to listeners at different S/N ratios for identification. In the mixed condition, listeners were presented with blocks containing both words and nonwords, while listeners in the blocked condition were presented with the trials in blocks containing either words or nonwords. The targets were presented in isolation with 50 ms of preceding and following noise. Preliminary results indicate no effect of blocking on accuracy for either word or nonsense syllables; results from neighborhood density analyses will be presented. Consistent with previous studies, a j-factor analysis indicates that words are perceived as containing at least 0.5 fewer independent units than nonwords in both conditions. Relative to previous work on syllables presented in a frame sentence [Benkí, J. Acoust. Soc. Am. 113, 1689-1705 (2003)], initial consonants were perceived significantly less accurately, while vowels and final consonants were perceived at comparable rates.
Novel probabilistic neuroclassifier
NASA Astrophysics Data System (ADS)
Hong, Jiang; Serpen, Gursel
2003-09-01
A novel probabilistic potential function neural network classifier algorithm to deal with classes which are multi-modally distributed and formed from sets of disjoint pattern clusters is proposed in this paper. The proposed classifier has a number of desirable properties which distinguish it from other neural network classifiers. A complete description of the algorithm in terms of its architecture and the pseudocode is presented. Simulation analysis of the newly proposed neuro-classifier algorithm on a set of benchmark problems is presented. Benchmark problems tested include IRIS, Sonar, Vowel Recognition, Two-Spiral, Wisconsin Breast Cancer, Cleveland Heart Disease and Thyroid Gland Disease. Simulation results indicate that the proposed neuro-classifier performs consistently better for a subset of problems for which other neural classifiers perform relatively poorly.
Different Timescales for the Neural Coding of Consonant and Vowel Sounds
Perez, Claudia A.; Engineer, Crystal T.; Jakkamsetti, Vikram; Carraway, Ryan S.; Perry, Matthew S.
2013-01-01
Psychophysical, clinical, and imaging evidence suggests that consonant and vowel sounds have distinct neural representations. This study tests the hypothesis that consonant and vowel sounds are represented on different timescales within the same population of neurons by comparing behavioral discrimination with neural discrimination based on activity recorded in rat inferior colliculus and primary auditory cortex. Performance on 9 vowel discrimination tasks was highly correlated with neural discrimination based on spike count and was not correlated when spike timing was preserved. In contrast, performance on 11 consonant discrimination tasks was highly correlated with neural discrimination when spike timing was preserved and not when spike timing was eliminated. These results suggest that in the early stages of auditory processing, spike count encodes vowel sounds and spike timing encodes consonant sounds. These distinct coding strategies likely contribute to the robust nature of speech sound representations and may help explain some aspects of developmental and acquired speech processing disorders. PMID:22426334
Mapping the Speech Code: Cortical Responses Linking the Perception and Production of Vowels
Schuerman, William L.; Meyer, Antje S.; McQueen, James M.
2017-01-01
The acoustic realization of speech is constrained by the physical mechanisms by which it is produced. Yet for speech perception, the degree to which listeners utilize experience derived from speech production has long been debated. In the present study, we examined how sensorimotor adaptation during production may affect perception, and how this relationship may be reflected in early vs. late electrophysiological responses. Participants first performed a baseline speech production task, followed by a vowel categorization task during which EEG responses were recorded. In a subsequent speech production task, half the participants received shifted auditory feedback, leading most to alter their articulations. This was followed by a second, post-training vowel categorization task. We compared changes in vowel production to both behavioral and electrophysiological changes in vowel perception. No differences in phonetic categorization were observed between groups receiving altered or unaltered feedback. However, exploratory analyses revealed correlations between vocal motor behavior and phonetic categorization. EEG analyses revealed correlations between vocal motor behavior and cortical responses in both early and late time windows. These results suggest that participants' recent production behavior influenced subsequent vowel perception. We suggest that the change in perception can be best characterized as a mapping of acoustics onto articulation. PMID:28439232
The Korean Prevocalic Palatal Glide: A Comparison with the Russian Glide and Palatalization.
Suh, Yunju; Hwang, Jiwon
2016-01-01
Phonetic studies of the Korean prevocalic glides have often suggested that they are shorter in duration than those of languages like English, and lack a prolonged steady state. In addition, the formant frequencies of the Korean labiovelar glide are reported to be greatly influenced by the following vowel. In this study the Korean prevocalic palatal glide is investigated vis-à-vis the two phonologically similar configurations of another language - the glide /j/ and the secondary palatalization of Russian, with regard to the inherent duration of the glide component, F2 trajectory, vowel-to-glide coarticulation and glide-to-vowel coarticulation. It is revealed that the Korean palatal glide is closer to the Russian palatalization in duration and F2 trajectory, indicating a lack of steady state, and to the Russian segmental glide in the vowel-to-glide coarticulation degree. When the glide-to-vowel coarticulation is considered, the Korean palatal glide is distinguished from both Russian categories. The results suggest that both the Korean palatal glide and the Russian palatalization involve significant articulatory overlap, the former with the vowel and the latter with the consonant. Phonological implications of such a difference in coarticulation pattern are discussed, as well as the comparison between the Korean labiovelar and palatal glides. © 2016 S. Karger AG, Basel.
Comesaña, Montserrat; Soares, Ana P; Marcet, Ana; Perea, Manuel
2016-11-01
In skilled adult readers, transposed-letter effects (jugde-JUDGE) are greater for consonant than for vowel transpositions. These differences are often attributed to phonological rather than orthographic processing. To examine this issue, we employed a scenario in which phonological involvement varies as a function of reading experience: A masked priming lexical decision task with 50-ms primes in adult and developing readers. Indeed, masked phonological priming at this prime duration has been consistently reported in adults, but not in developing readers (Davis, Castles, & Iakovidis, 1998). Thus, if consonant/vowel asymmetries in letter position coding with adults are due to phonological influences, transposed-letter priming should occur for both consonant and vowel transpositions in developing readers. Results with adults (Experiment 1) replicated the usual consonant/vowel asymmetry in transposed-letter priming. In contrast, no signs of an asymmetry were found with developing readers (Experiments 2-3). However, Experiments 1-3 did not directly test the existence of phonological involvement. To study this question, Experiment 4 manipulated the phonological prime-target relationship in developing readers. As expected, we found no signs of masked phonological priming. Thus, the present data favour an interpretation of the consonant/vowel dissociation in letter position coding as due to phonological rather than orthographic processing. © 2016 The British Psychological Society.
A mathematical model of vowel identification by users of cochlear implants
Sagi, Elad; Meyer, Ted A.; Kaiser, Adam R.; Teoh, Su Wooi; Svirsky, Mario A.
2010-01-01
A simple mathematical model is presented that predicts vowel identification by cochlear implant users based on these listeners’ resolving power for the mean locations of first, second, and∕or third formant energies along the implanted electrode array. This psychophysically based model provides hypotheses about the mechanism cochlear implant users employ to encode and process the input auditory signal to extract information relevant for identifying steady-state vowels. Using one free parameter, the model predicts most of the patterns of vowel confusions made by users of different cochlear implant devices and stimulation strategies, and who show widely different levels of speech perception (from near chance to near perfect). Furthermore, the model can predict results from the literature, such as Skinner, et al. [(1995). Ann. Otol. Rhinol. Laryngol. 104, 307–311] frequency mapping study, and the general trend in the vowel results of Zeng and Galvin’s [(1999). Ear Hear. 20, 60–74] studies of output electrical dynamic range reduction. The implementation of the model presented here is specific to vowel identification by cochlear implant users, but the framework of the model is more general. Computational models such as the one presented here can be useful for advancing knowledge about speech perception in hearing impaired populations, and for providing a guide for clinical research and clinical practice. PMID:20136228
Chen, Fei; Loizou, Philipos C.
2012-01-01
Recent evidence suggests that spectral change, as measured by cochlea-scaled entropy (CSE), predicts speech intelligibility better than the information carried by vowels or consonants in sentences. Motivated by this finding, the present study investigates whether intelligibility indices implemented to include segments marked with significant spectral change better predict speech intelligibility in noise than measures that include all phonetic segments paying no attention to vowels/consonants or spectral change. The prediction of two intelligibility measures [normalized covariance measure (NCM), coherence-based speech intelligibility index (CSII)] is investigated using three sentence-segmentation methods: relative root-mean-square (RMS) levels, CSE, and traditional phonetic segmentation of obstruents and sonorants. While the CSE method makes no distinction between spectral changes occurring within vowels/consonants, the RMS-level segmentation method places more emphasis on the vowel-consonant boundaries wherein the spectral change is often most prominent, and perhaps most robust, in the presence of noise. Higher correlation with intelligibility scores was obtained when including sentence segments containing a large number of consonant-vowel boundaries than when including segments with highest entropy or segments based on obstruent/sonorant classification. These data suggest that in the context of intelligibility measures the type of spectral change captured by the measure is important. PMID:22559382
NASA Astrophysics Data System (ADS)
Seifart, Frank; Meyer, Julien; Grawunder, Sven; Dentel, Laure
2018-04-01
Many drum communication systems around the world transmit information by emulating tonal and rhythmic patterns of spoken languages in sequences of drumbeats. Their rhythmic characteristics, in particular, have not been systematically studied so far, although understanding them represents a rare occasion for providing an original insight into the basic units of speech rhythm as selected by natural speech practices directly based on beats. Here, we analyse a corpus of Bora drum communication from the northwest Amazon, which is nowadays endangered with extinction. We show that four rhythmic units are encoded in the length of pauses between beats. We argue that these units correspond to vowel-to-vowel intervals with different numbers of consonants and vowel lengths. By contrast, aligning beats with syllables, mora or only vowel length yields inconsistent results. Moreover, we also show that Bora drummed messages conventionally select rhythmically distinct markers to further distinguish words. The two phonological tones represented in drummed speech encode only few lexical contrasts. Rhythm thus appears to crucially contribute to the intelligibility of drummed Bora. Our study provides novel evidence for the role of rhythmic structures composed of vowel-to-vowel intervals in the complex puzzle concerning the redundancy and distinctiveness of acoustic features embedded in speech.
Feedforward and feedback control in apraxia of speech: effects of noise masking on vowel production.
Maas, Edwin; Mailend, Marja-Liisa; Guenther, Frank H
2015-04-01
This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed.
Feedforward and Feedback Control in Apraxia of Speech: Effects of Noise Masking on Vowel Production
Mailend, Marja-Liisa; Guenther, Frank H.
2015-01-01
Purpose This study was designed to test two hypotheses about apraxia of speech (AOS) derived from the Directions Into Velocities of Articulators (DIVA) model (Guenther et al., 2006): the feedforward system deficit hypothesis and the feedback system deficit hypothesis. Method The authors used noise masking to minimize auditory feedback during speech. Six speakers with AOS and aphasia, 4 with aphasia without AOS, and 2 groups of speakers without impairment (younger and older adults) participated. Acoustic measures of vowel contrast, variability, and duration were analyzed. Results Younger, but not older, speakers without impairment showed significantly reduced vowel contrast with noise masking. Relative to older controls, the AOS group showed longer vowel durations overall (regardless of masking condition) and a greater reduction in vowel contrast under masking conditions. There were no significant differences in variability. Three of the 6 speakers with AOS demonstrated the group pattern. Speakers with aphasia without AOS did not differ from controls in contrast, duration, or variability. Conclusion The greater reduction in vowel contrast with masking noise for the AOS group is consistent with the feedforward system deficit hypothesis but not with the feedback system deficit hypothesis; however, effects were small and not present in all individual speakers with AOS. Theoretical implications and alternative interpretations of these findings are discussed. PMID:25565143
Grawunder, Sven; Dentel, Laure
2018-01-01
Many drum communication systems around the world transmit information by emulating tonal and rhythmic patterns of spoken languages in sequences of drumbeats. Their rhythmic characteristics, in particular, have not been systematically studied so far, although understanding them represents a rare occasion for providing an original insight into the basic units of speech rhythm as selected by natural speech practices directly based on beats. Here, we analyse a corpus of Bora drum communication from the northwest Amazon, which is nowadays endangered with extinction. We show that four rhythmic units are encoded in the length of pauses between beats. We argue that these units correspond to vowel-to-vowel intervals with different numbers of consonants and vowel lengths. By contrast, aligning beats with syllables, mora or only vowel length yields inconsistent results. Moreover, we also show that Bora drummed messages conventionally select rhythmically distinct markers to further distinguish words. The two phonological tones represented in drummed speech encode only few lexical contrasts. Rhythm thus appears to crucially contribute to the intelligibility of drummed Bora. Our study provides novel evidence for the role of rhythmic structures composed of vowel-to-vowel intervals in the complex puzzle concerning the redundancy and distinctiveness of acoustic features embedded in speech. PMID:29765620
Tsukada, Kimiko; Hirata, Yukari; Roengpitya, Rungpat
2014-06-01
The purpose of this research was to compare the perception of Japanese vowel length contrasts by 4 groups of listeners who differed in their familiarity with length contrasts in their first language (L1; i.e., American English, Italian, Japanese, and Thai). Of the 3 nonnative groups, native Thai listeners were expected to outperform American English and Italian listeners, because vowel length is contrastive in their L1. Native Italian listeners were expected to demonstrate a higher level of accuracy for length contrasts than American English listeners, because the former are familiar with consonant (but not vowel) length contrasts (i.e., singleton vs. geminate) in their L1. A 2-alternative forced-choice AXB discrimination test that included 125 trials was administered to all the participants, and the listeners' discrimination accuracy (d') was reported. As expected, Japanese listeners were more accurate than all 3 nonnative groups in their discrimination of Japanese vowel length contrasts. The 3 nonnative groups did not differ from one another in their discrimination accuracy despite varying experience with length contrasts in their L1. Only Thai listeners were more accurate in their length discrimination when the target vowel was long than when it was short. Being familiar with vowel length contrasts in L1 may affect the listeners' cross-language perception, but it does not guarantee that their L1 experience automatically results in efficient processing of length contrasts in unfamiliar languages. The extent of success may be related to how length contrasts are phonetically implemented in listeners' L1.
NASA Astrophysics Data System (ADS)
Monroe, Roberta Lynn
The intrinsic fundamental frequency effect among vowels is a vocalic phenomenon of adult speech in which high vowels have higher fundamental frequencies in relation to low vowels. Acoustic investigations of children's speech have shown that variability of the speech signal decreases as children's ages increase. Fundamental frequency measures have been suggested as an indirect metric for the development of laryngeal stability and coordination. Studies of the intrinsic fundamental frequency effect have been conducted among 8- and 9-year old children and in infants. The present study investigated this effect among 2- and 4-year old children. Eight 2-year old and eight 4-year old children produced four vowels, /ae/, /i/, /u/, and /a/, in CVC syllables. Three measures of fundamental frequency were taken. These were mean fundamental frequency, the intra-utterance standard deviation of the fundamental frequency, and the extent to which the cycle-to-cycle pattern of the fundamental frequency was predicted by a linear trend. An analysis of variance was performed to compare the two age groups, the four vowels, and the earlier and later repetitions of the CVC syllables. A significant difference between the two age groups was detected using the intra-utterance standard deviation of the fundamental frequency. Mean fundamental frequencies and linear trend analysis showed that voicing of the preceding consonant determined the statistical significance of the age-group comparisons. Statistically significant differences among the fundamental frequencies of the four vowels were not detected for either age group.
Speech evaluation after palatal augmentation in patients undergoing glossectomy.
de Carvalho-Teles, Viviane; Sennes, Luiz Ubirajara; Gielow, Ingrid
2008-10-01
To assess, in patients undergoing glossectomy, the influence of the palatal augmentation prosthesis on the speech intelligibility and acoustic spectrographic characteristics of the formants of oral vowels in Brazilian Portuguese, specifically the first 3 formants (F1 [/a,e,u/], F2 [/o,ó,u/], and F3 [/a,ó/]). Speech evaluation with and without a palatal augmentation prosthesis using blinded randomized listener judgments. Tertiary referral center. Thirty-six patients (33 men and 3 women) aged 30 to 80 (mean [SD], 53.9 [10.5]) years underwent glossectomy (14, total glossectomy; 12, total glossectomy and partial mandibulectomy; 6, hemiglossectomy; and 4, subtotal glossectomy) with use of the augmentation prosthesis for at least 3 months before inclusion in the study. Spontaneous speech intelligibility (assessed by expert listeners using a 4-category scale) and spectrographic formants assessment. We found a statistically significant improvement of spontaneous speech intelligibility and the average number of correctly identified syllables with the use of the prosthesis (P < .05). Statistically significant differences occurred for the F1 values of the vowels /a,e,u/; for F2 values, there was a significant difference of the vowels /o,ó,u/; and for F3 values, there was a significant difference of the vowels /a,ó/ (P < .001). The palatal augmentation prosthesis improved the intelligibility of spontaneous speech and syllables for patients who underwent glossectomy. It also increased the F2 and F3 values for all vowels and the F1 values for the vowels /o,ó,u/. This effect brought the values of many vowel formants closer to normal.
2015-01-01
Abstract Vowels provide the acoustic foundation of communication through speech and song, but little is known about how the brain orchestrates their production. Positron emission tomography was used to study regional cerebral blood flow (rCBF) during sustained production of the vowel /a/. Acoustic and blood flow data from 13, normal, right-handed, native speakers of American English were analyzed to identify CBF patterns that predicted the stability of the first and second formants of this vowel. Formants are bands of resonance frequencies that provide vowel identity and contribute to voice quality. The results indicated that formant stability was directly associated with blood flow increases and decreases in both left- and right-sided brain regions. Secondary brain regions (those associated with the regions predicting formant stability) were more likely to have an indirect negative relationship with first formant variability, but an indirect positive relationship with second formant variability. These results are not definitive maps of vowel production, but they do suggest that the level of motor control necessary to produce stable vowels is reflected in the complexity of an underlying neural system. These results also extend a systems approach to functional image analysis, previously applied to normal and ataxic speech rate that is solely based on identifying patterns of brain activity associated with specific performance measures. Understanding the complex relationships between multiple brain regions and the acoustic characteristics of vocal stability may provide insight into the pathophysiology of the dysarthrias, vocal disorders, and other speech changes in neurological and psychiatric disorders. PMID:25295385
Masapollo, Matthew; Polka, Linda; Ménard, Lucie
2016-03-01
To learn to produce speech, infants must effectively monitor and assess their own speech output. Yet very little is known about how infants perceive speech produced by an infant, which has higher voice pitch and formant frequencies compared to adult or child speech. Here, we tested whether pre-babbling infants (at 4-6 months) prefer listening to vowel sounds with infant vocal properties over vowel sounds with adult vocal properties. A listening preference favoring infant vowels may derive from their higher voice pitch, which has been shown to attract infant attention in infant-directed speech (IDS). In addition, infants' nascent articulatory abilities may induce a bias favoring infant speech given that 4- to 6-month-olds are beginning to produce vowel sounds. We created infant and adult /i/ ('ee') vowels using a production-based synthesizer that simulates the act of speaking in talkers at different ages and then tested infants across four experiments using a sequential preferential listening task. The findings provide the first evidence that infants preferentially attend to vowel sounds with infant voice pitch and/or formants over vowel sounds with no infant-like vocal properties, supporting the view that infants' production abilities influence how they process infant speech. The findings with respect to voice pitch also reveal parallels between IDS and infant speech, raising new questions about the role of this speech register in infant development. Research exploring the underpinnings and impact of this perceptual bias can expand our understanding of infant language development. © 2015 John Wiley & Sons Ltd.
Acoustic analyses of thyroidectomy-related changes in vowel phonation.
Solomon, Nancy Pearl; Awan, Shaheen N; Helou, Leah B; Stojadinovic, Alexander
2012-11-01
Changes in vocal function that can occur after thyroidectomy were tracked with acoustic analyses of sustained vowel productions. The purpose was to determine which time-based or spectral/cepstral-based measures of two vowels were able to detect voice changes over time in patients undergoing thyroidectomy. Prospective, longitudinal, and observational clinical trial. Voice samples of sustained /ɑ/ and /i/ recorded from 70 adults before and approximately 2 weeks, 3 months, and 6 months after thyroid surgery were analyzed for jitter, shimmer, harmonic-to-noise ratio (HNR), cepstral peak prominence (CPP), low-to-high ratio of spectral energy (L/H ratio), and the standard deviations of CPP and L/H ratio. Three trained listeners rated vowel and sentence productions for the four data collection sessions for each participant. For analysis purposes, participants were categorized post hoc according to voice outcome (VO) at their first postthyroidectomy assessment session. Shimmer, HNR, and CPP differed significantly across sessions; follow-up analyses revealed the strongest effect for CPP. CPP for /ɑ/ and /i/ differed significantly between groups of participants with normal versus negative (adverse) VO and between the pre- and 2-week postthyroidectomy sessions for the negative VO group. HNR, CPP, and L/H ratio differed across vowels, but both /ɑ/ and /i/ were similarly effective in tracking voice changes over time and differentiating VO groups. This study indicated that shimmer, HNR, and CPP determined from vowel productions can be used to track changes in voice over time as patients undergo and subsequently recover from thyroid surgery, with CPP being the strongest variable for this purpose. Evidence did not clearly reveal whether acoustic voice evaluations should include both /ɑ/ and /i/ vowels, but they should specify which vowel is used to allow for comparisons across studies and multiple clinical assessments. Copyright © 2012 The Voice Foundation. All rights reserved.
Rødvik, Arne Kirkhorn; von Koss Torkildsen, Janne; Wie, Ona Bø; Storaker, Marit Aarvaag; Silvola, Juha Tapio
2018-04-17
The purpose of this systematic review and meta-analysis was to establish a baseline of the vowel and consonant identification scores in prelingually and postlingually deaf users of multichannel cochlear implants (CIs) tested with consonant-vowel-consonant and vowel-consonant-vowel nonsense syllables. Six electronic databases were searched for peer-reviewed articles reporting consonant and vowel identification scores in CI users measured by nonsense words. Relevant studies were independently assessed and screened by 2 reviewers. Consonant and vowel identification scores were presented in forest plots and compared between studies in a meta-analysis. Forty-seven articles with 50 studies, including 647 participants, thereof 581 postlingually deaf and 66 prelingually deaf, met the inclusion criteria of this study. The mean performance on vowel identification tasks for the postlingually deaf CI users was 76.8% (N = 5), which was higher than the mean performance for the prelingually deaf CI users (67.7%; N = 1). The mean performance on consonant identification tasks for the postlingually deaf CI users was higher (58.4%; N = 44) than for the prelingually deaf CI users (46.7%; N = 6). The most common consonant confusions were found between those with same manner of articulation (/k/ as /t/, /m/ as /n/, and /p/ as /t/). The mean performance on consonant identification tasks for the prelingually and postlingually deaf CI users was found. There were no statistically significant differences between the scores for prelingually and postlingually deaf CI users. The consonants that were incorrectly identified were typically confused with other consonants with the same acoustic properties, namely, voicing, duration, nasality, and silent gaps. A univariate metaregression model, although not statistically significant, indicated that duration of implant use in postlingually deaf adults predict a substantial portion of their consonant identification ability. As there is no ceiling effect, a nonsense syllable identification test may be a useful addition to the standard test battery in audiology clinics when assessing the speech perception of CI users.
The Glasgow Voice Memory Test: Assessing the ability to memorize and recognize unfamiliar voices.
Aglieri, Virginia; Watson, Rebecca; Pernet, Cyril; Latinus, Marianne; Garrido, Lúcia; Belin, Pascal
2017-02-01
One thousand one hundred and twenty subjects as well as a developmental phonagnosic subject (KH) along with age-matched controls performed the Glasgow Voice Memory Test, which assesses the ability to encode and immediately recognize, through an old/new judgment, both unfamiliar voices (delivered as vowels, making language requirements minimal) and bell sounds. The inclusion of non-vocal stimuli allows the detection of significant dissociations between the two categories (vocal vs. non-vocal stimuli). The distributions of accuracy and sensitivity scores (d') reflected a wide range of individual differences in voice recognition performance in the population. As expected, KH showed a dissociation between the recognition of voices and bell sounds, her performance being significantly poorer than matched controls for voices but not for bells. By providing normative data of a large sample and by testing a developmental phonagnosic subject, we demonstrated that the Glasgow Voice Memory Test, available online and accessible from all over the world, can be a valid screening tool (~5 min) for a preliminary detection of potential cases of phonagnosia and of "super recognizers" for voices.
Analysis of Spanish consonant recognition in 8-talker babble.
Moreno-Torres, Ignacio; Otero, Pablo; Luna-Ramírez, Salvador; Garayzábal Heinze, Elena
2017-05-01
This paper presents the results of a closed-set recognition task for 80 Spanish consonant-vowel sounds (16 C × 5 V, spoken by 2 talkers) in 8-talker babble (-6, -2, +2 dB). A ranking of resistance to noise was obtained using the signal detection d' measure, and confusion patterns were analyzed using a graphical method (confusion graphs). The resulting ranking indicated the existence of three resistance groups: (1) high resistance: /ʧ, s, ʝ/; (2) mid resistance: /r, l, m, n/; and (3) low resistance: /t, θ, x, ɡ, b, d, k, f, p/. Confusions involved mostly place of articulation and voicing errors, and occurred especially among consonants in the same resistance group. Three perceptual confusion groups were identified: the three low-energy fricatives (i.e., /f, θ, x/), the six stops (i.e., /p, t, k, b, d, ɡ/), and three consonants with clear formant structure (i.e., /m, n, l/). The factors underlying consonant resistance and confusion patterns are discussed. The results are compared with data from other languages.
Shuai, Lan; Malins, Jeffrey G
2017-02-01
Despite its prevalence as one of the most highly influential models of spoken word recognition, the TRACE model has yet to be extended to consider tonal languages such as Mandarin Chinese. A key reason for this is that the model in its current state does not encode lexical tone. In this report, we present a modified version of the jTRACE model in which we borrowed on its existing architecture to code for Mandarin phonemes and tones. Units are coded in a way that is meant to capture the similarity in timing of access to vowel and tone information that has been observed in previous studies of Mandarin spoken word recognition. We validated the model by first simulating a recent experiment that had used the visual world paradigm to investigate how native Mandarin speakers process monosyllabic Mandarin words (Malins & Joanisse, 2010). We then subsequently simulated two psycholinguistic phenomena: (1) differences in the timing of resolution of tonal contrast pairs, and (2) the interaction between syllable frequency and tonal probability. In all cases, the model gave rise to results comparable to those of published data with human subjects, suggesting that it is a viable working model of spoken word recognition in Mandarin. It is our hope that this tool will be of use to practitioners studying the psycholinguistics of Mandarin Chinese and will help inspire similar models for other tonal languages, such as Cantonese and Thai.
The effect of implied orientation derived from verbal context on picture recognition.
Stanfield, R A; Zwaan, R A
2001-03-01
Perceptual symbol systems assume an analogue relationship between a symbol and its referent, whereas amodal symbol systems assume an arbitrary relationship between a symbol and its referent. According to perceptual symbol theories, the complete representation of an object, called a simulation, should reflect physical characteristics of the object. Amodal theories, in contrast, do not make this prediction. We tested the hypothesis, derived from perceptual symbol theories, that people mentally represent the orientation of an object implied by a verbal description. Orientation (vertical-horizontal) was manipulated by having participants read a sentence that implicitly suggested a particular orientation for an object. Then recognition latencies to pictures of the object in each of the two orientations were measured. Pictures matching the orientation of the object implied by the sentence were responded to faster than pictures that did not match the orientation. This finding is interpreted as offering support for theories positing perceptual symbol systems.
NASA Astrophysics Data System (ADS)
Zhang, Qiang; Li, Jiafeng; Zhuo, Li; Zhang, Hui; Li, Xiaoguang
2017-12-01
Color is one of the most stable attributes of vehicles and often used as a valuable cue in some important applications. Various complex environmental factors, such as illumination, weather, noise and etc., result in the visual characteristics of the vehicle color being obvious diversity. Vehicle color recognition in complex environments has been a challenging task. The state-of-the-arts methods roughly take the whole image for color recognition, but many parts of the images such as car windows; wheels and background contain no color information, which will have negative impact on the recognition accuracy. In this paper, a novel vehicle color recognition method using local vehicle-color saliency detection and dual-orientational dimensionality reduction of convolutional neural network (CNN) deep features has been proposed. The novelty of the proposed method includes two parts: (1) a local vehicle-color saliency detection method has been proposed to determine the vehicle color region of the vehicle image and exclude the influence of non-color regions on the recognition accuracy; (2) dual-orientational dimensionality reduction strategy has been designed to greatly reduce the dimensionality of deep features that are learnt from CNN, which will greatly mitigate the storage and computational burden of the subsequent processing, while improving the recognition accuracy. Furthermore, linear support vector machine is adopted as the classifier to train the dimensionality reduced features to obtain the recognition model. The experimental results on public dataset demonstrate that the proposed method can achieve superior recognition performance over the state-of-the-arts methods.
Effects of prosodic boundary on /aC/ sequences: articulatory results
NASA Astrophysics Data System (ADS)
Tabain, Marija
2003-05-01
This study presents EMA (electromagnetic articulography) data on articulation of the vowel /a/ at different prosodic boundaries in French. Three speakers of metropolitan French produced utterances containing the vowel /a/, preceded by /tee/ and followed by one of six consonants /bee dee gee eff ess sh/ (three stops and three fricatives), with different prosodic boundaries intervening between the /a/ and the six different consonants. The prosodic boundaries investigated are the Utterance, the Intonational phrase, the Accentual phrase, and the Word. Data for the Tongue Tip, Tongue Body, and Jaw are presented. The articulatory data presented here were recorded at the same time as the acoustic data presented in Tabain [J. Acoust. Soc. Am. 113, 516-531 (2003)]. Analyses show that there is a strong effect on peak displacement of the vowel according to the prosodic hierarchy, with the stronger prosodic boundaries inducing a much lower Tongue Body and Jaw position than the weaker prosodic boundaries. Durations of both the opening movement into and the closing movement out of the vowel are also affected. Peak velocity of the articulatory movements is also examined, and, contrary to results for phrase-final lengthening, it is found that peak velocity of the opening movement into the vowel tends to increase with the higher prosodic boundaries, together with the increased magnitude of the movement between the consonant and the vowel. Results for the closing movement out of the vowel and into the consonant are not so clear. Since one speaker shows evidence of utterance-level articulatory declension, it is suggested that the competing constraints of articulatory declension and prosodic effects might explain some previous results on phrase-final lengthening.
Deme, Andrea
2017-03-01
High-pitched sung vowels may be considered phonetically "underspecified" because of (i) the tuning of the F 1 to the f 0 accompanying pitch raising and (ii) the wide harmonic spacing of the voice source resulting in the undersampling of the vocal tract transfer function. Therefore, sung vowel intelligibility is expected to decrease as the f 0 increases. Based on the literature of speech perception, it is often suggested that sung vowels are better perceived if uttered in consonantal (CVC) context than in isolation even at high f 0 . The results for singing, however, are contradictory. In the present study, we further investigate this question. We compare vowel identification in sense and nonsense CVC sequences and show that the positive effect of the context disappears if the number of legal choices in a perception test is similar in both conditions, meaning that any positive effect of the CVC context may only stem from the smaller number of possible responses, i.e., from higher probabilities. Additionally, it is also tested whether the training in production (i.e., singing training) may also lead to a perceptual advantage of the singers over nonsingers in the identification of high-pitched sung vowels. The results show no advantage of this kind. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Effects of stimulus response compatibility on covert imitation of vowels.
Adank, Patti; Nuttall, Helen; Bekkering, Harold; Maegherman, Gwijde
2018-03-13
When we observe someone else speaking, we tend to automatically activate the corresponding speech motor patterns. When listening, we therefore covertly imitate the observed speech. Simulation theories of speech perception propose that covert imitation of speech motor patterns supports speech perception. Covert imitation of speech has been studied with interference paradigms, including the stimulus-response compatibility paradigm (SRC). The SRC paradigm measures covert imitation by comparing articulation of a prompt following exposure to a distracter. Responses tend to be faster for congruent than for incongruent distracters; thus, showing evidence of covert imitation. Simulation accounts propose a key role for covert imitation in speech perception. However, covert imitation has thus far only been demonstrated for a select class of speech sounds, namely consonants, and it is unclear whether covert imitation extends to vowels. We aimed to demonstrate that covert imitation effects as measured with the SRC paradigm extend to vowels, in two experiments. We examined whether covert imitation occurs for vowels in a consonant-vowel-consonant context in visual, audio, and audiovisual modalities. We presented the prompt at four time points to examine how covert imitation varied over the distracter's duration. The results of both experiments clearly demonstrated covert imitation effects for vowels, thus supporting simulation theories of speech perception. Covert imitation was not affected by stimulus modality and was maximal for later time points.
A Formant Range Profile for Singers.
Titze, Ingo R; Maxfield, Lynn M; Walker, Megan C
2017-05-01
Vowel selection is important in differentiating between singing styles. The timbre of the vocal instrument, which is related to its frequency spectrum, is governed by both the glottal sound source and the vowel choices made by singers. Consequently, the ability to modify the vowel space is a measure of how successfully a singer can maintain a desired timbre across a range of pitches. Formant range profiles were produced as a means of quantifying this ability. Seventy-seven subjects (including trained and untrained vocalists) participated, producing vowels with three intended mouth shapes: (1) neutral or speech-like, (2) megaphone-shaped (wide open mouth), and (3) inverted-megaphone-shaped (widened oropharynx with moderate mouth opening). The first and second formant frequencies (F 1 and F 2 ) were estimated with fry phonation for each shape and values were plotted in F1-F2 space. By taking four vowels of a quadrangle /i, æ, a, u/, the resulting area was quantified in kHz 2 (kHz squared) as a measure of the subject's ability to modify their vocal tract for spectral differences. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Lip movements affect infants' audiovisual speech perception.
Yeung, H Henny; Werker, Janet F
2013-05-01
Speech is robustly audiovisual from early in infancy. Here we show that audiovisual speech perception in 4.5-month-old infants is influenced by sensorimotor information related to the lip movements they make while chewing or sucking. Experiment 1 consisted of a classic audiovisual matching procedure, in which two simultaneously displayed talking faces (visual [i] and [u]) were presented with a synchronous vowel sound (audio /i/ or /u/). Infants' looking patterns were selectively biased away from the audiovisual matching face when the infants were producing lip movements similar to those needed to produce the heard vowel. Infants' looking patterns returned to those of a baseline condition (no lip movements, looking longer at the audiovisual matching face) when they were producing lip movements that did not match the heard vowel. Experiment 2 confirmed that these sensorimotor effects interacted with the heard vowel, as looking patterns differed when infants produced these same lip movements while seeing and hearing a talking face producing an unrelated vowel (audio /a/). These findings suggest that the development of speech perception and speech production may be mutually informative.
Sparseness of vowel category structure: Evidence from English dialect comparison
Scharinger, Mathias; Idsardi, William J.
2014-01-01
Current models of speech perception tend to emphasize either fine-grained acoustic properties or coarse-grained abstract characteristics of speech sounds. We argue for a particular kind of 'sparse' vowel representations and provide new evidence that these representations account for the successful access of the corresponding categories. In an auditory semantic priming experiment, American English listeners made lexical decisions on targets (e.g. load) preceded by semantically related primes (e.g. pack). Changes of the prime vowel that crossed a vowel-category boundary (e.g. peck) were not treated as a tolerable variation, as assessed by a lack of priming, although the phonetic categories of the two different vowels considerably overlap in American English. Compared to the outcome of the same experiment with New Zealand English listeners, where such prime variations were tolerated, our experiment supports the view that phonological representations are important in guiding the mapping process from the acoustic signal to an abstract mental representation. Our findings are discussed with regard to current models of speech perception and recent findings from brain imaging research. PMID:24653528
A FORMANT RANGE PROFILE FOR SINGERS
Titze, Ingo R.; Maxfield, Lynn; Walker, Megan
2016-01-01
Vowel selection is important in differentiating between singing styles. The timbre of the vocal instrument, which is related to its frequency spectrum, is governed by both the glottal sound source and the vowel choices made by singers. Consequently, the ability to modify the vowel space is a measure of how successfully a singer can maintain a desired timbre across a range of pitches. Formant range profiles (FRPs) were produced as a means of quantifying this ability. 77 subjects (including trained and untrained vocalists) participated, producing vowels with three intended mouth shapes, (1) neutral or speech-like, (2) megaphone-shaped (wide open mouth), and (3) inverted-megaphone-shaped (widened oropharynx with moderate mouth opening). The first and second formant frequencies (F1 and F2) were estimated with fry phonation for each shape and values were plotted in F1–F2 space. By taking four vowels of a quadrangle /i, æ, a, u/, the resulting area was quantified in kHz2 (kHz squared) as a measure of the subject’s ability to modify their vocal tract for spectral differences. PMID:28029556
Lexical reorganization in Brazilian Portuguese: an articulatory study
Meireles, A. R.; Barbosa, P. A.
2008-01-01
This work, which is couched in the theoretical framework of Articulatory Phonology, deals with the influence of speech rate on the change/variation from antepenultimate stress words into penultimate stress words in Brazilian Portuguese. Both acoustic and articulatory (EMMA) studies were conducted. On the acoustic side, results show different patterns of post-stressed vowel reduction according to the word type. Some words reduced their medial post-stressed vowels more than their final post-stressed vowels, and others reduced their final post-stressed vowels more than their medial post-stressed vowels. On the articulatory side, results show that the coarticulation degree of the post-stressed consonants increases with speech rate. Also, with the use of a measure called proportional consonantal interval (PCI), it was found in measurements of articulation that such measure is influenced by the word type. Three different groups of words were found according to their PCI. These results show how dynamical aspects influenced by speech rate increase are related to the lexical process of change/variation from antepenultimate stress words into penultimate ones. PMID:19885366
Coordinate Transformations in Object Recognition
ERIC Educational Resources Information Center
Graf, Markus
2006-01-01
A basic problem of visual perception is how human beings recognize objects after spatial transformations. Three central classes of findings have to be accounted for: (a) Recognition performance varies systematically with orientation, size, and position; (b) recognition latencies are sequentially additive, suggesting analogue transformation…
NASA Astrophysics Data System (ADS)
Fernández, Ariel; Ferrari, José A.
2017-05-01
Pattern recognition and feature extraction are image processing applications of great interest in defect inspection and robot vision among others. In comparison to purely digital methods, the attractiveness of optical processors for pattern recognition lies in their highly parallel operation and real-time processing capability. This work presents an optical implementation of the generalized Hough transform (GHT), a well-established technique for recognition of geometrical features in binary images. Detection of a geometric feature under the GHT is accomplished by mapping the original image to an accumulator space; the large computational requirements for this mapping make the optical implementation an attractive alternative to digital-only methods. We explore an optical setup where the transformation is obtained, and the size and orientation parameters can be controlled, allowing for dynamic scale and orientation-variant pattern recognition. A compact system for the above purposes results from the use of an electrically tunable lens for scale control and a pupil mask implemented on a high-contrast spatial light modulator for orientation/shape variation of the template. Real-time can also be achieved. In addition, by thresholding of the GHT and optically inverse transforming, the previously detected features of interest can be extracted.
NASA Astrophysics Data System (ADS)
Several articles addressing topics in speech research are presented. The topics include: exploring the functional significance of physiological tremor: A biospectroscopic approach; differences between experienced and inexperienced listeners to deaf speech; a language-oriented view of reading and its disabilities; Phonetic factors in letter detection; categorical perception; Short-term recall by deaf signers of American sign language; a common basis for auditory sensory storage in perception and immediate memory; phonological awareness and verbal short-term memory; initiation versus execution time during manual and oral counting by stutterers; trading relations in the perception of speech by five-year-old children; the role of the strap muscles in pitch lowering; phonetic validation of distinctive features; consonants and syllable boundaires; and vowel information in postvocalic frictions.
Smits, Cas; Merkus, Paul; Festen, Joost M.; Goverts, S. Theo
2017-01-01
Not all of the variance in speech-recognition performance of cochlear implant (CI) users can be explained by biographic and auditory factors. In normal-hearing listeners, linguistic and cognitive factors determine most of speech-in-noise performance. The current study explored specifically the influence of visually measured lexical-access ability compared with other cognitive factors on speech recognition of 24 postlingually deafened CI users. Speech-recognition performance was measured with monosyllables in quiet (consonant-vowel-consonant [CVC]), sentences-in-noise (SIN), and digit-triplets in noise (DIN). In addition to a composite variable of lexical-access ability (LA), measured with a lexical-decision test (LDT) and word-naming task, vocabulary size, working-memory capacity (Reading Span test [RSpan]), and a visual analogue of the SIN test (text reception threshold test) were measured. The DIN test was used to correct for auditory factors in SIN thresholds by taking the difference between SIN and DIN: SRTdiff. Correlation analyses revealed that duration of hearing loss (dHL) was related to SIN thresholds. Better working-memory capacity was related to SIN and SRTdiff scores. LDT reaction time was positively correlated with SRTdiff scores. No significant relationships were found for CVC or DIN scores with the predictor variables. Regression analyses showed that together with dHL, RSpan explained 55% of the variance in SIN thresholds. When controlling for auditory performance, LA, LDT, and RSpan separately explained, together with dHL, respectively 37%, 36%, and 46% of the variance in SRTdiff outcome. The results suggest that poor verbal working-memory capacity and to a lesser extent poor lexical-access ability limit speech-recognition ability in listeners with a CI. PMID:29205095
Sex differences in face processing are mediated by handedness and sexual orientation.
Brewster, Paul W H; Mullin, Caitlin R; Dobrin, Roxana A; Steeves, Jennifer K E
2011-03-01
Previous research has demonstrated sex differences in face processing at both neural and behavioural levels. The present study examined the role of handedness and sexual orientation as mediators of this effect. We compared the performance of LH (left-handed) and RH (right-handed) heterosexual and homosexual male and female participants on a face recognition memory task. Our main findings were that homosexual males have better face recognition memory than both heterosexual males and homosexual women. We also demonstrate better face processing in women than in men. Finally, LH heterosexual participants had better face recognition than LH homosexual participants and also tended to be better than RH heterosexual participants. These findings are consistent with differences in the organisation and laterality of face-processing mechanisms as a function of sex, handedness, and sexual orientation.
Language comprehenders retain implied shape and orientation of objects.
Pecher, Diane; van Dantzig, Saskia; Zwaan, Rolf A; Zeelenberg, René
2009-06-01
According to theories of embodied cognition, language comprehenders simulate sensorimotor experiences to represent the meaning of what they read. Previous studies have shown that picture recognition is better if the object in the picture matches the orientation or shape implied by a preceding sentence. In order to test whether strategic imagery may explain previous findings, language comprehenders first read a list of sentences in which objects were mentioned. Only once the complete list had been read was recognition memory tested with pictures. Recognition performance was better if the orientation or shape of the object matched that implied by the sentence, both immediately after reading the complete list of sentences and after a 45-min delay. These results suggest that previously found match effects were not due to strategic imagery and show that details of sensorimotor simulations are retained over longer periods.
ERIC Educational Resources Information Center
ten Holt, G. A.; van Doorn, A. J.; de Ridder, H.; Reinders, M. J. T.; Hendriks, E. A.
2009-01-01
We present the results of an experiment on lexical recognition of human sign language signs in which the available perceptual information about handshape and hand orientation was manipulated. Stimuli were videos of signs from Sign Language of the Netherlands (SLN). The videos were processed to create four conditions: (1) one in which neither…
Digitized Speech Characteristics in Patients with Maxillectomy Defects.
Elbashti, Mahmoud E; Sumita, Yuka I; Hattori, Mariko; Aswehlee, Amel M; Taniguchi, Hisashi
2017-12-06
Accurate evaluation of speech characteristics through formant frequency measurement is important for proper speech rehabilitation in patients after maxillectomy. This study aimed to evaluate the utility of digital acoustic analysis and vowel pentagon space for the prediction of speech ability after maxillectomy, by comparing the acoustic characteristics of vowel articulation in three classes of maxillectomy defects. Aramany's classifications I, II, and IV were used to group 27 male patients after maxillectomy. Digital acoustic analysis of five Japanese vowels-/a/, /e/, /i/, /o/, and /u/-was performed using a speech analysis system. First formant (F1) and second formant (F2) frequencies were calculated using an autocorrelation method. Data were plotted on an F1-F2 plane for each patient, and the F1 and F2 ranges were calculated. The vowel pentagon spaces were also determined. One-way ANOVA was applied to compare all results between the three groups. Class II maxillectomy patients had a significantly higher F2 range than did Class I and Class IV patients (p = 0.002). In contrast, there was no significant difference in the F1 range between the three classes. The vowel pentagon spaces were significantly larger in class II maxillectomy patients than in Class I and Class IV patients (p = 0.014). The results of this study indicate that the acoustic characteristics of maxillectomy patients are affected by the defect area. This finding may provide information for obturator design based on vowel articulation and defect class. © 2017 by the American College of Prosthodontists.
Acoustic and Perceptual Analyses of Adductor Spasmodic Dysphonia in Mandarin-speaking Chinese.
Chen, Zhipeng; Li, Jingyuan; Ren, Qingyi; Ge, Pingjiang
2018-02-12
The objective of this study was to examine the perceptual structure and acoustic characteristics of speech of patients with adductor spasmodic dysphonia (ADSD) in Mandarin. Case-Control Study MATERIALS AND METHODS: For the estimation of dysphonia level, perceptual and acoustic analysis were used for patients with ADSD (N = 20) and the control group (N = 20) that are Mandarin-Chinese speakers. For both subgroups, a sustained vowel and connected speech samples were obtained. The difference of perceptual and acoustic parameters between the two subgroups was assessed and analyzed. For acoustic assessment, the percentage of phonatory breaks (PBs) of connected reading and the percentage of aperiodic segments and frequency shifts (FS) of vowel and reading in patients with ADSD were significantly worse than controls, the mean harmonics-to-noise ratio and the fundamental frequency standard deviation of vowel as well. For perceptual evaluation, the rating of speech and vowel in patients with ADSD are significantly higher than controls. The percentage of aberrant acoustic events (PB, frequency shift, and aperiodic segment) and the fundamental frequency standard deviation and mean harmonics-to-noise ratio were significantly correlated with the perceptual rating in the vowel and reading productions. The perceptual and acoustic parameters of connected vowel and reading in patients with ADSD are worse than those in normal controls, and could validly and reliably estimate dysphonia of ADSD in Mandarin-speaking Chinese. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.
Larson, Eric; Terry, Howard P; Canevari, Margaux M; Stepp, Cara E
2013-01-01
Human-machine interface (HMI) designs offer the possibility of improving quality of life for patient populations as well as augmenting normal user function. Despite pragmatic benefits, utilizing auditory feedback for HMI control remains underutilized, in part due to observed limitations in effectiveness. The goal of this study was to determine the extent to which categorical speech perception could be used to improve an auditory HMI. Using surface electromyography, 24 healthy speakers of American English participated in 4 sessions to learn to control an HMI using auditory feedback (provided via vowel synthesis). Participants trained on 3 targets in sessions 1-3 and were tested on 3 novel targets in session 4. An "established categories with text cues" group of eight participants were trained and tested on auditory targets corresponding to standard American English vowels using auditory and text target cues. An "established categories without text cues" group of eight participants were trained and tested on the same targets using only auditory cuing of target vowel identity. A "new categories" group of eight participants were trained and tested on targets that corresponded to vowel-like sounds not part of American English. Analyses of user performance revealed significant effects of session and group (established categories groups and the new categories group), and a trend for an interaction between session and group. Results suggest that auditory feedback can be effectively used for HMI operation when paired with established categorical (native vowel) targets with an unambiguous cue.
Biomechanically Preferred Consonant-Vowel Combinations Fail to Appear in Adult Spoken Corpora
Whalen, D. H.; Giulivi, Sara; Nam, Hosung; Levitt, Andrea G.; Hallé, Pierre; Goldstein, Louis M.
2012-01-01
Certain consonant/vowel (CV) combinations are more frequent than would be expected from the individual C and V frequencies alone, both in babbling and, to a lesser extent, in adult language, based on dictionary counts: Labial consonants co-occur with central vowels more often than chance would dictate; coronals co-occur with front vowels, and velars with back vowels (Davis & MacNeilage, 1994). Plausible biomechanical explanations have been proposed, but it is also possible that infants are mirroring the frequency of the CVs that they hear. As noted, previous assessments of adult language were based on dictionaries; these “type” counts are incommensurate with the babbling measures, which are necessarily “token” counts. We analyzed the tokens in two spoken corpora for English, two for French and one for Mandarin. We found that the adult spoken CV preferences correlated with the type counts for Mandarin and French, not for English. Correlations between the adult spoken corpora and the babbling results had all three possible outcomes: significantly positive (French), uncorrelated (Mandarin), and significantly negative (English). There were no correlations of the dictionary data with the babbling results when we consider all nine combinations of consonants and vowels. The results indicate that spoken frequencies of CV combinations can differ from dictionary (type) counts and that the CV preferences apparent in babbling are biomechanically driven and can ignore the frequencies of CVs in the ambient spoken language. PMID:23420980
Characterizing resonant component in speech: A different view of tracking fundamental frequency
NASA Astrophysics Data System (ADS)
Dong, Bin
2017-05-01
Inspired by the nonlinearity and nonstationarity and the modulations in speech, Hilbert-Huang Transform and cyclostationarity analysis are employed to investigate the speech resonance in vowel in sequence. Cyclostationarity analysis is not directly manipulated on the target vowel, but on its intrinsic mode functions one by one. Thanks to the equivalence between the fundamental frequency in speech and the cyclic frequency in cyclostationarity analysis, the modulation intensity distributions of the intrinsic mode functions provide much information for the estimation of the fundamental frequency. To highlight the relationship between frequency and time, the pseudo-Hilbert spectrum is proposed to replace the Hilbert spectrum here. After contrasting the pseudo-Hilbert spectra of and the modulation intensity distributions of the intrinsic mode functions, it finds that there is usually one intrinsic mode function which works as the fundamental component of the vowel. Furthermore, the fundamental frequency of the vowel can be determined by tracing the pseudo-Hilbert spectrum of its fundamental component along the time axis. The later method is more robust to estimate the fundamental frequency, when meeting nonlinear components. Two vowels [a] and [i], picked up from a speech database FAU Aibo Emotion Corpus, are applied to validate the above findings.
DIMENSION-BASED STATISTICAL LEARNING OF VOWELS
Liu, Ran; Holt, Lori L.
2015-01-01
Speech perception depends on long-term representations that reflect regularities of the native language. However, listeners rapidly adapt when speech acoustics deviate from these regularities due to talker idiosyncrasies such as foreign accents and dialects. To better understand these dual aspects of speech perception, we probe native English listeners’ baseline perceptual weighting of two acoustic dimensions (spectral quality and vowel duration) towards vowel categorization and examine how they subsequently adapt to an “artificial accent” that deviates from English norms in the correlation between the two dimensions. At baseline, listeners rely relatively more on spectral quality than vowel duration to signal vowel category, but duration nonetheless contributes. Upon encountering an “artificial accent” in which the spectral-duration correlation is perturbed relative to English language norms, listeners rapidly down-weight reliance on duration. Listeners exhibit this type of short-term statistical learning even in the context of nonwords, confirming that lexical information is not necessary to this form of adaptive plasticity in speech perception. Moreover, learning generalizes to both novel lexical contexts and acoustically-distinct altered voices. These findings are discussed in the context of a mechanistic proposal for how supervised learning may contribute to this type of adaptive plasticity in speech perception. PMID:26280268
Lip Movement Exaggerations During Infant-Directed Speech
Green, Jordan R.; Nip, Ignatius S. B.; Wilson, Erin M.; Mefferd, Antje S.; Yunusova, Yana
2011-01-01
Purpose Although a growing body of literature has indentified the positive effects of visual speech on speech and language learning, oral movements of infant-directed speech (IDS) have rarely been studied. This investigation used 3-dimensional motion capture technology to describe how mothers modify their lip movements when talking to their infants. Method Lip movements were recorded from 25 mothers as they spoke to their infants and other adults. Lip shapes were analyzed for differences across speaking conditions. The maximum fundamental frequency, duration, acoustic intensity, and first and second formant frequency of each vowel also were measured. Results Lip movements were significantly larger during IDS than during adult-directed speech, although the exaggerations were vowel specific. All of the vowels produced during IDS were characterized by an elevated vocal pitch and a slowed speaking rate when compared with vowels produced during adult-directed speech. Conclusion The pattern of lip-shape exaggerations did not provide support for the hypothesis that mothers produce exemplar visual models of vowels during IDS. Future work is required to determine whether the observed increases in vertical lip aperture engender visual and acoustic enhancements that facilitate the early learning of speech. PMID:20699342
Formant frequencies in Middle Eastern singers.
Hamdan, Abdul-latif; Tabri, Dollen; Deeb, Reem; Rifai, Hani; Rameh, Charbel; Fuleihan, Nabil
2008-01-01
This work was conducted to describe the formant frequencies in a group of Middle Eastern singers and to look for the presence of the singer's formant described in operatic singers. A total of 13 Middle Eastern singers were enrolled in this study. There were 5 men and 8 women. Descriptive analysis was performed to report the various formants (F1, F2, F3, and F4) in both speaking and singing. The Wilcoxon test was used to compare the means of the formants under both conditions. For both sexes combined, for the /a/ vowel, F1 singing was significantly lower than F1 speaking (P = .05) and F3 singing was significantly higher than F3 speaking (P = .046). For the /u/ vowel, only F2 singing was significantly higher than F2 speaking (P = .012). For the /i/ vowel, both F2 and F3 singing were significantly lower than F2 and F3 speaking, respectively (P = .006 and .012, respectively). There was no clustering of the formants in any of the Middle Eastern sung vowels. Formant frequencies for the vowels /a/, /i/, and /u/ differ between Middle Eastern singing vs speaking. There is absence of the singer's formant.
Evaluating acoustic speaker normalization algorithms: evidence from longitudinal child data.
Kohn, Mary Elizabeth; Farrington, Charlie
2012-03-01
Speaker vowel formant normalization, a technique that controls for variation introduced by physical differences between speakers, is necessary in variationist studies to compare speakers of different ages, genders, and physiological makeup in order to understand non-physiological variation patterns within populations. Many algorithms have been established to reduce variation introduced into vocalic data from physiological sources. The lack of real-time studies tracking the effectiveness of these normalization algorithms from childhood through adolescence inhibits exploration of child participation in vowel shifts. This analysis compares normalization techniques applied to data collected from ten African American children across five time points. Linear regressions compare the reduction in variation attributable to age and gender for each speaker for the vowels BEET, BAT, BOT, BUT, and BOAR. A normalization technique is successful if it maintains variation attributable to a reference sociolinguistic variable, while reducing variation attributable to age. Results indicate that normalization techniques which rely on both a measure of central tendency and range of the vowel space perform best at reducing variation attributable to age, although some variation attributable to age persists after normalization for some sections of the vowel space. © 2012 Acoustical Society of America
Complex auditory behaviour emerges from simple reactive steering
NASA Astrophysics Data System (ADS)
Hedwig, Berthold; Poulet, James F. A.
2004-08-01
The recognition and localization of sound signals is fundamental to acoustic communication. Complex neural mechanisms are thought to underlie the processing of species-specific sound patterns even in animals with simple auditory pathways. In female crickets, which orient towards the male's calling song, current models propose pattern recognition mechanisms based on the temporal structure of the song. Furthermore, it is thought that localization is achieved by comparing the output of the left and right recognition networks, which then directs the female to the pattern that most closely resembles the species-specific song. Here we show, using a highly sensitive method for measuring the movements of female crickets, that when walking and flying each sound pulse of the communication signal releases a rapid steering response. Thus auditory orientation emerges from reactive motor responses to individual sound pulses. Although the reactive motor responses are not based on the song structure, a pattern recognition process may modulate the gain of the responses on a longer timescale. These findings are relevant to concepts of insect auditory behaviour and to the development of biologically inspired robots performing cricket-like auditory orientation.
ERIC Educational Resources Information Center
Rochette, Claude; Simard, Claude
A study of the phonetic combination of a constrictive consonant (specifically, [f], [v], and [r]) and a vowel in French using x-ray and oscillograph technology focused on the speed and process of articulation between the consonant and the vowel. The study considered aperture size, nasality, labiality, and accent. Articulation of a total of 407…
Infant word recognition: Insights from TRACE simulations☆
Mayor, Julien; Plunkett, Kim
2014-01-01
The TRACE model of speech perception (McClelland & Elman, 1986) is used to simulate results from the infant word recognition literature, to provide a unified, theoretical framework for interpreting these findings. In a first set of simulations, we demonstrate how TRACE can reconcile apparently conflicting findings suggesting, on the one hand, that consonants play a pre-eminent role in lexical acquisition (Nespor, Peña & Mehler, 2003; Nazzi, 2005), and on the other, that there is a symmetry in infant sensitivity to vowel and consonant mispronunciations of familiar words (Mani & Plunkett, 2007). In a second series of simulations, we use TRACE to simulate infants’ graded sensitivity to mispronunciations of familiar words as reported by White and Morgan (2008). An unexpected outcome is that TRACE fails to demonstrate graded sensitivity for White and Morgan’s stimuli unless the inhibitory parameters in TRACE are substantially reduced. We explore the ramifications of this finding for theories of lexical development. Finally, TRACE mimics the impact of phonological neighbourhoods on early word learning reported by Swingley and Aslin (2007). TRACE offers an alternative explanation of these findings in terms of mispronunciations of lexical items rather than imputing word learning to infants. Together these simulations provide an evaluation of Developmental (Jusczyk, 1993) and Familiarity (Metsala, 1999) accounts of word recognition by infants and young children. The findings point to a role for both theoretical approaches whereby vocabulary structure and content constrain infant word recognition in an experience-dependent fashion, and highlight the continuity in the processes and representations involved in lexical development during the second year of life. PMID:24493907
Infant word recognition: Insights from TRACE simulations.
Mayor, Julien; Plunkett, Kim
2014-02-01
The TRACE model of speech perception (McClelland & Elman, 1986) is used to simulate results from the infant word recognition literature, to provide a unified, theoretical framework for interpreting these findings. In a first set of simulations, we demonstrate how TRACE can reconcile apparently conflicting findings suggesting, on the one hand, that consonants play a pre-eminent role in lexical acquisition (Nespor, Peña & Mehler, 2003; Nazzi, 2005), and on the other, that there is a symmetry in infant sensitivity to vowel and consonant mispronunciations of familiar words (Mani & Plunkett, 2007). In a second series of simulations, we use TRACE to simulate infants' graded sensitivity to mispronunciations of familiar words as reported by White and Morgan (2008). An unexpected outcome is that TRACE fails to demonstrate graded sensitivity for White and Morgan's stimuli unless the inhibitory parameters in TRACE are substantially reduced. We explore the ramifications of this finding for theories of lexical development. Finally, TRACE mimics the impact of phonological neighbourhoods on early word learning reported by Swingley and Aslin (2007). TRACE offers an alternative explanation of these findings in terms of mispronunciations of lexical items rather than imputing word learning to infants. Together these simulations provide an evaluation of Developmental (Jusczyk, 1993) and Familiarity (Metsala, 1999) accounts of word recognition by infants and young children. The findings point to a role for both theoretical approaches whereby vocabulary structure and content constrain infant word recognition in an experience-dependent fashion, and highlight the continuity in the processes and representations involved in lexical development during the second year of life.
Whitfield, Jason A; Goberman, Alexander M
2014-01-01
Individuals with Parkinson disease (PD) often exhibit decreased range of movement secondary to the disease process, which has been shown to affect articulatory movements. A number of investigations have failed to find statistically significant differences between control and disordered groups, and between speaking conditions, using traditional vowel space area measures. The purpose of the current investigation was to evaluate both between-group (PD versus control) and within-group (habitual versus clear) differences in articulatory function using a novel vowel space measure, the articulatory-acoustic vowel space (AAVS). The novel AAVS is calculated from continuously sampled formant trajectories of connected speech. In the current study, habitual and clear speech samples from twelve individuals with PD along with habitual control speech samples from ten neurologically healthy adults were collected and acoustically analyzed. In addition, a group of listeners completed perceptual rating of speech clarity for all samples. Individuals with PD were perceived to exhibit decreased speech clarity compared to controls. Similarly, the novel AAVS measure was significantly lower in individuals with PD. In addition, the AAVS measure significantly tracked changes between the habitual and clear conditions that were confirmed by perceptual ratings. In the current study, the novel AAVS measure is shown to be sensitive to disease-related group differences and within-person changes in articulatory function of individuals with PD. Additionally, these data confirm that individuals with PD can modulate the speech motor system to increase articulatory range of motion and speech clarity when given a simple prompt. The reader will be able to (i) describe articulatory behavior observed in the speech of individuals with Parkinson disease; (ii) describe traditional measures of vowel space area and how they relate to articulation; (iii) describe a novel measure of vowel space, the articulatory-acoustic vowel space and its relationship to articulation and the perception of speech clarity. Copyright © 2014 Elsevier Inc. All rights reserved.
Baer-Henney, Dinah; Kügler, Frank; van de Vijver, Ruben
2015-09-01
Using the artificial language paradigm, we studied the acquisition of morphophonemic alternations with exceptions by 160 German adult learners. We tested the acquisition of two types of alternations in two regularity conditions while additionally varying length of training. In the first alternation, a vowel harmony, backness of the stem vowel determines backness of the suffix. This process is grounded in substance (phonetic motivation), and this universal phonetic factor bolsters learning a generalization. In the second alternation, tenseness of the stem vowel determines backness of the suffix vowel. This process is not based in substance, but it reflects a phonotactic property of German and our participants benefit from this language-specific factor. We found that learners use both cues, while substantive bias surfaces mainly in the most unstable situation. We show that language-specific and universal factors interact in learning. Copyright © 2014 Cognitive Science Society, Inc.
On the Role of Cognitive Abilities in Second Language Vowel Learning.
Ghaffarvand Mokari, Payam; Werner, Stefan
2018-03-01
This study investigated the role of different cognitive abilities-inhibitory control, attention control, phonological short-term memory (PSTM), and acoustic short-term memory (AM)-in second language (L2) vowel learning. The participants were 40 Azerbaijani learners of Standard Southern British English. Their perception of L2 vowels was tested through a perceptual discrimination task before and after five sessions of high-variability phonetic training. Inhibitory control was significantly correlated with gains from training in the discrimination of L2 vowel pairs. However, there were no significant correlations between attention control, AM, PSTM, and gains from training. These findings suggest the potential role of inhibitory control in L2 phonological learning. We suggest that inhibitory control facilitates the processing of L2 sounds by allowing learners to ignore the interfering information from L1 during training, leading to better L2 segmental learning.
NASA Astrophysics Data System (ADS)
Pei, Xiaomei; Barbour, Dennis L.; Leuthardt, Eric C.; Schalk, Gerwin
2011-08-01
Several stories in the popular media have speculated that it may be possible to infer from the brain which word a person is speaking or even thinking. While recent studies have demonstrated that brain signals can give detailed information about actual and imagined actions, such as different types of limb movements or spoken words, concrete experimental evidence for the possibility to 'read the mind', i.e. to interpret internally-generated speech, has been scarce. In this study, we found that it is possible to use signals recorded from the surface of the brain (electrocorticography) to discriminate the vowels and consonants embedded in spoken and in imagined words, and we defined the cortical areas that held the most information about discrimination of vowels and consonants. The results shed light on the distinct mechanisms associated with production of vowels and consonants, and could provide the basis for brain-based communication using imagined speech.
Body Emotion Recognition Disproportionately Depends on Vertical Orientations during Childhood
ERIC Educational Resources Information Center
Balas, Benjamin; Auen, Amanda; Saville, Alyson; Schmidt, Jamie
2018-01-01
Children's ability to recognize emotional expressions from faces and bodies develops during childhood. However, the low-level features that support accurate body emotion recognition during development have not been well characterized. This is in marked contrast to facial emotion recognition, which is known to depend upon specific spatial frequency…
Ramírez, Fernando M
2018-05-01
Viewpoint-invariant face recognition is thought to be subserved by a distributed network of occipitotemporal face-selective areas that, except for the human anterior temporal lobe, have been shown to also contain face-orientation information. This review begins by highlighting the importance of bilateral symmetry for viewpoint-invariant recognition and face-orientation perception. Then, monkey electrophysiological evidence is surveyed describing key tuning properties of face-selective neurons-including neurons bimodally tuned to mirror-symmetric face-views-followed by studies combining functional magnetic resonance imaging (fMRI) and multivariate pattern analyses to probe the representation of face-orientation and identity information in humans. Altogether, neuroimaging studies suggest that face-identity is gradually disentangled from face-orientation information along the ventral visual processing stream. The evidence seems to diverge, however, regarding the prevalent form of tuning of neural populations in human face-selective areas. In this context, caveats possibly leading to erroneous inferences regarding mirror-symmetric coding are exposed, including the need to distinguish angular from Euclidean distances when interpreting multivariate pattern analyses. On this basis, this review argues that evidence from the fusiform face area is best explained by a view-sensitive code reflecting head angular disparity, consistent with a role of this area in face-orientation perception. Finally, the importance is stressed of explicit models relating neural properties to large-scale signals.
Gutschalk, Alexander; Uppenkamp, Stefan; Riedel, Bernhard; Bartsch, Andreas; Brandt, Tobias; Vogt-Schaden, Marlies
2015-12-01
Based on results from functional imaging, cortex along the superior temporal sulcus (STS) has been suggested to subserve phoneme and pre-lexical speech perception. For vowel classification, both superior temporal plane (STP) and STS areas have been suggested relevant. Lesion of bilateral STS may conversely be expected to cause pure word deafness and possibly also impaired vowel classification. Here we studied a patient with bilateral STS lesions caused by ischemic strokes and relatively intact medial STPs to characterize the behavioral consequences of STS loss. The patient showed severe deficits in auditory speech perception, whereas his speech production was fluent and communication by written speech was grossly intact. Auditory-evoked fields in the STP were within normal limits on both sides, suggesting that major parts of the auditory cortex were functionally intact. Further studies showed that the patient had normal hearing thresholds and only mild disability in tests for telencephalic hearing disorder. Prominent deficits were discovered in an auditory-object classification task, where the patient performed four standard deviations below the control group. In marked contrast, performance in a vowel-classification task was intact. Auditory evoked fields showed enhanced responses for vowels compared to matched non-vowels within normal limits. Our results are consistent with the notion that cortex along STS is important for auditory speech perception, although it does not appear to be entirely speech specific. Formant analysis and single vowel classification, however, appear to be already implemented in auditory cortex on the STP. Copyright © 2015 Elsevier Ltd. All rights reserved.
On the assimilation-discrimination relationship in American English adults’ French vowel learning1
Levy, Erika S.
2009-01-01
A quantitative “cross-language assimilation overlap” method for testing predictions of the Perceptual Assimilation Model (PAM) was implemented to compare results of a discrimination experiment with the listeners’ previously reported assimilation data. The experiment examined discrimination of Parisian French (PF) front rounded vowels ∕y∕ and ∕œ∕. Three groups of American English listeners differing in their French experience (no experience [NoExp], formal experience [ModExp], and extensive formal-plus-immersion experience [HiExp]) performed discrimination of PF ∕y-u∕, ∕y-o∕, ∕œ-o∕, ∕œ-u∕, ∕y-i∕, ∕y-ɛ∕, ∕œ-ɛ∕, ∕œ-i∕, ∕y-œ∕, ∕u-i∕, and ∕a-ɛ∕. Vowels were in bilabial ∕rabVp∕ and alveolar ∕radVt∕ contexts. More errors were found for PF front vs back rounded vowel pairs (16%) than for PF front unrounded vs rounded pairs (2%). Overall, ModExp listeners did not perform more accurately (11% errors) than NoExp listeners (13% errors). Extensive immersion experience, however, was associated with fewer errors (3%) than formal experience alone, although discrimination of PF ∕y-u∕ remained relatively poor (12% errors) for HiExp listeners. More errors occurred on pairs involving front vs back rounded vowels in alveolar context (20% errors) than in bilabial (11% errors). Significant correlations were revealed between listeners’ assimilation overlap scores and their discrimination errors, suggesting that the PAM may be extended to second-language (L2) vowel learning. PMID:19894844
Maternal Vocal Feedback to 9-Month-Old Infant Siblings of Children with ASD
Talbott, Meagan R.; Nelson, Charles A.; Tager-Flusberg, Helen
2016-01-01
Infant siblings of children with autism spectrum disorder display differences in early language and social communication skills beginning as early as the first year of life. While environmental influences on early language development are well documented in other infant populations, they have received relatively little attention inside of the infant sibling context. In this study, we analyzed home video diaries collected prospectively as part of a longitudinal study of infant siblings. Infant vowel and consonant-vowel vocalizations and maternal language-promoting and non-promoting verbal responses were scored for 30 infant siblings and 30 low risk control infants at 9 months of age. Analyses evaluated whether infant siblings or their mothers exhibited differences from low risk dyads in vocalization frequency or distribution, and whether mothers’ responses were associated with other features of the high risk context. Analyses were conducted with respect to both initial risk group and preliminary outcome classification. Overall, we found no differences in infants’ consonant-vowel vocalizations, the frequency of overall maternal utterances, or the distribution of mothers’ response types. Both groups of infants produced more vowel than consonant-vowel vocalizations, and both groups of mothers responded to consonant-vowel vocalizations with more language-promoting than non-promoting responses. These results indicate that as a group, mothers of high risk infants provide equally high quality linguistic input to their infants in the first year of life and suggest that impoverished maternal linguistic input does not contribute to high risk infants’ initial language difficulties. Implications for intervention strategies are also discussed. PMID:26174704
Yu, Yan H; Shafer, Valerie L; Sussman, Elyse S
2018-01-01
Speech perception behavioral research suggests that rates of sensory memory decay are dependent on stimulus properties at more than one level (e.g., acoustic level, phonemic level). The neurophysiology of sensory memory decay rate has rarely been examined in the context of speech processing. In a lexical tone study, we showed that long-term memory representation of lexical tone slows the decay rate of sensory memory for these tones. Here, we tested the hypothesis that long-term memory representation of vowels slows the rate of auditory sensory memory decay in a similar way to that of lexical tone. Event-related potential (ERP) responses were recorded to Mandarin non-words contrasting the vowels /i/ vs. /u/ and /y/ vs. /u/ from first-language (L1) Mandarin and L1 American English participants under short and long interstimulus interval (ISI) conditions (short ISI: an average of 575 ms, long ISI: an average of 2675 ms). Results revealed poorer discrimination of the vowel contrasts for English listeners than Mandarin listeners, but with different patterns for behavioral perception and neural discrimination. As predicted, English listeners showed the poorest discrimination and identification for the vowel contrast /y/ vs. /u/, and poorer performance in the long ISI condition. In contrast to Yu et al. (2017), however, we found no effect of ISI reflected in the neural responses, specifically the mismatch negativity (MMN), P3a and late negativity ERP amplitudes. We did see a language group effect, with Mandarin listeners generally showing larger MMN and English listeners showing larger P3a. The behavioral results revealed that native language experience plays a role in echoic sensory memory trace maintenance, but the failure to find an effect of ISI on the ERP results suggests that vowel and lexical tone memory traces decay at different rates. Highlights : We examined the interaction between auditory sensory memory decay and language experience. We compared MMN, P3a, LN and behavioral responses in short vs. long interstimulus intervals. We found that different from lexical tone contrast, MMN, P3a, and LN changes to vowel contrasts are not influenced by lengthening the ISI to 2.6 s. We also found that the English listeners discriminated the non-native vowel contrast with lower accuracy under the long ISI condition.
Lexical statistics of competition in L2 versus L1 listening
NASA Astrophysics Data System (ADS)
Cutler, Anne
2005-09-01
Spoken-word recognition involves multiple activation of alternative word candidates and competition between these alternatives. Phonemic confusions in L2 listening increase the number of potentially active words, thus slowing word recognition by adding competitors. This study used a 70,000-word English lexicon backed by frequency statistics from a 17,900,000-word corpus to assess the competition increase resulting from two representative phonemic confusions, one vocalic (ae/E) and one consonantal (r/l), in L2 versus L1 listening. The first analysis involved word embedding. Embedded words (cat in cattle, rib in ribbon) cause competition, which phonemic confusion can increase (cat in kettle, rib in liberty). The average increase in number of embedded words was 59.6 and 48.3 temporary ambiguity. Even when no embeddings are present, multiple alternatives are possible: para- can become parrot, paradise, etc., but also pallet, palace given /r/-/l/ confusion. Phoneme confusions (vowel or consonant) in first or second position in the word approximately doubled the number of activated candidates; confusions later in the word increased activation by on average 53 third, 42 confusions significantly increase competition for L2 compared with L1 listeners.
Analytic study of the Tadoma method: background and preliminary results.
Norton, S J; Schultz, M C; Reed, C M; Braida, L D; Durlach, N I; Rabinowitz, W M; Chomsky, C
1977-09-01
Certain deaf-blind persons have been taught, through the Tadoma method of speechreading, to use vibrotactile cues from the face and neck to understand speech. This paper reports the results of preliminary tests of the speechreading ability of one adult Tadoma user. The tests were of four major types: (1) discrimination of speech stimuli; (2) recognition of words in isolation and in sentences; (3) interpretation of prosodic and syntactic features in sentences; and (4) comprehension of written (Braille) and oral speech. Words in highly contextual environments were much better perceived than were words in low-context environments. Many of the word errors involved phonemic substitutions which shared articulatory features with the target phonemes, with a higher error rate for vowels than consonants. Relative to performance on word-recognition tests, performance on some of the discrimination tests was worse than expected. Perception of sentences appeared to be mildly sensitive to rate of talking and to speaker differences. Results of the tests on perception of prosodic and syntactic features, while inconclusive, indicate that many of the features tested were not used in interpreting sentences. On an English comprehension test, a higher score was obtained for items administered in Braille than through oral presentation.
Comparative Study of Nonlinear Time Warping Techniques in Isolated Word Speech Recognition Systems
1981-06-17
all modules are loaded under a flexible research oriented supervisor, " Cicada ". Cicada allows for the integration of experimental ideas, extensions...evaluate alternate recognition methods. More detailed information about Cicada can be found in7 . In the following we limit our discussion to the design of...43.70 37.78 32.47 44.44 44.32 38 8. Figures Cicada - a flexible research oriented supervisor ReferenceSTernpl ates Front End Matching Digital Signal
Auditory cortical change detection in adults with Asperger syndrome.
Lepistö, Tuulia; Nieminen-von Wendt, Taina; von Wendt, Lennart; Näätänen, Risto; Kujala, Teija
2007-03-06
The present study investigated whether auditory deficits reported in children with Asperger syndrome (AS) are also present in adulthood. To this end, event-related potentials (ERPs) were recorded from adults with AS for duration, pitch, and phonetic changes in vowels, and for acoustically matched non-speech stimuli. These subjects had enhanced mismatch negativity (MMN) amplitudes particularly for pitch and duration deviants, indicating enhanced sound-discrimination abilities. Furthermore, as reflected by the P3a, their involuntary orienting was enhanced for changes in non-speech sounds, but tended to be deficient for changes in speech sounds. The results are consistent with those reported earlier in children with AS, except for the duration-MMN, which was diminished in children and enhanced in adults.
Palmprint Recognition Across Different Devices.
Jia, Wei; Hu, Rong-Xiang; Gui, Jie; Zhao, Yang; Ren, Xiao-Ming
2012-01-01
In this paper, the problem of Palmprint Recognition Across Different Devices (PRADD) is investigated, which has not been well studied so far. Since there is no publicly available PRADD image database, we created a non-contact PRADD image database containing 12,000 grayscale captured from 100 subjects using three devices, i.e., one digital camera and two smart-phones. Due to the non-contact image acquisition used, rotation and scale changes between different images captured from a same palm are inevitable. We propose a robust method to calculate the palm width, which can be effectively used for scale normalization of palmprints. On this PRADD image database, we evaluate the recognition performance of three different methods, i.e., subspace learning method, correlation method, and orientation coding based method, respectively. Experiments results show that orientation coding based methods achieved promising recognition performance for PRADD.
Palmprint Recognition across Different Devices
Jia, Wei; Hu, Rong-Xiang; Gui, Jie; Zhao, Yang; Ren, Xiao-Ming
2012-01-01
In this paper, the problem of Palmprint Recognition Across Different Devices (PRADD) is investigated, which has not been well studied so far. Since there is no publicly available PRADD image database, we created a non-contact PRADD image database containing 12,000 grayscale captured from 100 subjects using three devices, i.e., one digital camera and two smart-phones. Due to the non-contact image acquisition used, rotation and scale changes between different images captured from a same palm are inevitable. We propose a robust method to calculate the palm width, which can be effectively used for scale normalization of palmprints. On this PRADD image database, we evaluate the recognition performance of three different methods, i.e., subspace learning method, correlation method, and orientation coding based method, respectively. Experiments results show that orientation coding based methods achieved promising recognition performance for PRADD. PMID:22969380
Amengual, Mark
2016-01-01
The present study examines cognate effects in the phonetic production and processing of the Catalan back mid-vowel contrast (/o/-/ɔ/) by 24 early and highly proficient Spanish-Catalan bilinguals in Majorca (Spain). Participants completed a picture-naming task and a forced-choice lexical decision task in which they were presented with either words (e.g., /bɔsk/ "forest") or non-words based on real words, but with the alternate mid-vowel pair in stressed position ((*)/bosk/). The same cognate and non-cognate lexical items were included in the production and lexical decision experiments. The results indicate that even though these early bilinguals maintained the back mid-vowel contrast in their productions, they had great difficulties identifying non-words and real words based on the identity of the Catalan mid-vowel. The analyses revealed language dominance and cognate effects: Spanish-dominants exhibited higher error rates than Catalan-dominants, and production and lexical decision accuracy were also affected by cognate status. The present study contributes to the discussion of the organization of early bilinguals' dominant and non-dominant sound systems, and proposes that exemplar theoretic approaches can be extended to include bilingual lexical connections that account for the interactions between the phonetic and lexical levels of early bilingual individuals.
Ménard, Lucie; Polak, Marek; Denny, Margaret; Burton, Ellen; Lane, Harlan; Matthies, Melanie L; Marrone, Nicole; Perkell, Joseph S; Tiede, Mark; Vick, Jennell
2007-06-01
This study investigates the effects of speaking condition and auditory feedback on vowel production by postlingually deafened adults. Thirteen cochlear implant users produced repetitions of nine American English vowels prior to implantation, and at one month and one year after implantation. There were three speaking conditions (clear, normal, and fast), and two feedback conditions after implantation (implant processor turned on and off). Ten normal-hearing controls were also recorded once. Vowel contrasts in the formant space (expressed in mels) were larger in the clear than in the fast condition, both for controls and for implant users at all three time samples. Implant users also produced differences in duration between clear and fast conditions that were in the range of those obtained from the controls. In agreement with prior work, the implant users had contrast values lower than did the controls. The implant users' contrasts were larger with hearing on than off and improved from one month to one year postimplant. Because the controls and implant users responded similarly to a change in speaking condition, it is inferred that auditory feedback, although demonstrably important for maintaining normative values of vowel contrasts, is not needed to maintain the distinctiveness of those contrasts in different speaking conditions.
Emotions in freely varying and mono-pitched vowels, acoustic and EGG analyses.
Waaramaa, Teija; Palo, Pertti; Kankare, Elina
2015-12-01
Vocal emotions are expressed either by speech or singing. The difference is that in singing the pitch is predetermined while in speech it may vary freely. It was of interest to study whether there were voice quality differences between freely varying and mono-pitched vowels expressed by professional actors. Given their profession, actors have to be able to express emotions both by speech and singing. Electroglottogram and acoustic analyses of emotional utterances embedded in expressions of freely varying vowels [a:], [i:], [u:] (96 samples) and mono-pitched protracted vowels (96 samples) were studied. Contact quotient (CQEGG) was calculated using 35%, 55%, and 80% threshold levels. Three different threshold levels were used in order to evaluate their effects on emotions. Genders were studied separately. The results suggested significant gender differences for CQEGG 80% threshold level. SPL, CQEGG, and F4 were used to convey emotions, but to a lesser degree, when F0 was predetermined. Moreover, females showed fewer significant variations than males. Both genders used more hypofunctional phonation type in mono-pitched utterances than in the expressions with freely varying pitch. The present material warrants further study of the interplay between CQEGG threshold levels and formant frequencies, and listening tests to investigate the perceptual value of the mono-pitched vowels in the communication of emotions.
Allen, J S; Miller, J L
1999-10-01
Two speech production experiments tested the validity of the traditional method of creating voice-onset-time (VOT) continua for perceptual studies in which the systematic increase in VOT across the continuum is accompanied by a concomitant decrease in the duration of the following vowel. In experiment 1, segmental durations were measured for matched monosyllabic words beginning with either a voiced stop (e.g., big, duck, gap) or a voiceless stop (e.g., pig, tuck, cap). Results from four talkers showed that the change from voiced to voiceless stop produced not only an increase in VOT, but also a decrease in vowel duration. However, the decrease in vowel duration was consistently less than the increase in VOT. In experiment 2, results from four new talkers replicated these findings at two rates of speech, as well as highlighted the contrasting temporal effects on vowel duration of an increase in VOT due to a change in syllable-initial voicing versus a change in speaking rate. It was concluded that the traditional method of creating VOT continua for perceptual experiments, although not perfect, approximates natural speech by capturing the basic trade-off between VOT and vowel duration in syllable-initial voiced versus voiceless stop consonants.
Altvater-Mackensen, Nicole; Mani, Nivedita; Grossmann, Tobias
2016-02-01
Recent studies suggest that infants' audiovisual speech perception is influenced by articulatory experience (Mugitani et al., 2008; Yeung & Werker, 2013). The current study extends these findings by testing if infants' emerging ability to produce native sounds in babbling impacts their audiovisual speech perception. We tested 44 6-month-olds on their ability to detect mismatches between concurrently presented auditory and visual vowels and related their performance to their productive abilities and later vocabulary size. Results show that infants' ability to detect mismatches between auditory and visually presented vowels differs depending on the vowels involved. Furthermore, infants' sensitivity to mismatches is modulated by their current articulatory knowledge and correlates with their vocabulary size at 12 months of age. This suggests that-aside from infants' ability to match nonnative audiovisual cues (Pons et al., 2009)-their ability to match native auditory and visual cues continues to develop during the first year of life. Our findings point to a potential role of salient vowel cues and productive abilities in the development of audiovisual speech perception, and further indicate a relation between infants' early sensitivity to audiovisual speech cues and their later language development. PsycINFO Database Record (c) 2016 APA, all rights reserved.
Sound Symbolism in the Languages of Australia
Haynie, Hannah; Bowern, Claire; LaPalombara, Hannah
2014-01-01
The notion that linguistic forms and meanings are related only by convention and not by any direct relationship between sounds and semantic concepts is a foundational principle of modern linguistics. Though the principle generally holds across the lexicon, systematic exceptions have been identified. These “sound symbolic” forms have been identified in lexical items and linguistic processes in many individual languages. This paper examines sound symbolism in the languages of Australia. We conduct a statistical investigation of the evidence for several common patterns of sound symbolism, using data from a sample of 120 languages. The patterns examined here include the association of meanings denoting “smallness” or “nearness” with front vowels or palatal consonants, and the association of meanings denoting “largeness” or “distance” with back vowels or velar consonants. Our results provide evidence for the expected associations of vowels and consonants with meanings of “smallness” and “proximity” in Australian languages. However, the patterns uncovered in this region are more complicated than predicted. Several sound-meaning relationships are only significant for segments in prominent positions in the word, and the prevailing mapping between vowel quality and magnitude meaning cannot be characterized by a simple link between gradients of magnitude and vowel F2, contrary to the claims of previous studies. PMID:24752356
Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T
2001-09-01
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.
Vowel selection and its effects on perturbation and nonlinear dynamic measures.
Maccallum, Julia K; Zhang, Yu; Jiang, Jack J
2011-01-01
Acoustic analysis of voice is typically conducted on recordings of sustained vowel phonation. This study applied perturbation and nonlinear dynamic analyses to the vowels /a/, /i/, and /u/ in order to determine vowel selection effects on analysis. Forty subjects (20 males and 20 females) with normal voices participated in recording. Traditional parameters of fundamental frequency, signal-to-noise ratio, percent jitter, and percent shimmer were calculated for the signals using CSpeech. Nonlinear dynamic parameters of correlation dimension and second-order entropy were also calculated. Perturbation analysis results were largely incongruous in this study and in previous research. Fundamental frequency results corroborated previous work, indicating higher fundamental frequency for /i/ and /u/ and lower fundamental frequency for /a/. Signal-to-noise ratio results showed that /i/ and /u/ have greater harmonic levels than /a/. Results of nonlinear dynamic analysis suggested that more complex activity may be evident in /a/ than in /i/ or /u/. Percent jitter and percent shimmer may not be useful for description of acoustic differences between vowels. Fundamental frequency, signal-to-noise ratio, and nonlinear dynamic parameters may be applied to characterize /a/ as having lower frequency, higher noise, and greater nonlinear components than /i/ and /u/. Copyright © 2010 S. Karger AG, Basel.
Zhu, Yiqing; Wayland, Ratree
2017-01-01
We investigated categorical perception of rising and falling pitch contours by tonal and non-tonal listeners. Specifically, we determined minimum durations needed to perceive both contours and compared to those of production, how stimuli duration affects their perception, whether there is an intrinsic F0 effect, and how first language background, duration, directions of pitch and vowel quality interact with each other. Continua of fundamental frequency on different vowels with 9 duration values were created for identification and discrimination tasks. Less time is generally needed to effectively perceive a pitch direction than to produce it. Overall, tonal listeners’ perception is more categorical than non-tonal listeners. Stimuli duration plays a critical role for both groups, but tonal listeners showed a stronger duration effect, and may benefit more from the extra time in longer stimuli for context-coding, consistent with the multistore model of categorical perception. Within a certain range of semitones, tonal listeners also required shorter stimulus duration to perceive pitch direction changes than non-tonal listeners. Finally, vowel quality plays a limited role and only interacts with duration in perceiving falling pitch directions. These findings further our understanding on models of categorical perception, the relationship between speech perception and production, and the interaction between the perception of tones and vowel quality. PMID:28671991
Vowel reduction across tasks for male speakers of American English.
Kuo, Christina; Weismer, Gary
2016-07-01
This study examined acoustic variation of vowels within speakers across speech tasks. The overarching goal of the study was to understand within-speaker variation as one index of the range of normal speech motor behavior for American English vowels. Ten male speakers of American English performed four speech tasks including citation form sentence reading with a clear-speech style (clear-speech), citation form sentence reading (citation), passage reading (reading), and conversational speech (conversation). Eight monophthong vowels in a variety of consonant contexts were studied. Clear-speech was operationally defined as the reference point for describing variation. Acoustic measures associated with the conventions of vowel targets were obtained and examined. These included temporal midpoint formant frequencies for the first three formants (F1, F2, and F3) and the derived Euclidean distances in the F1-F2 and F2-F3 planes. Results indicated that reduction toward the center of the F1-F2 and F2-F3 planes increased in magnitude across the tasks in the order of clear-speech, citation, reading, and conversation. The cross-task variation was comparable for all speakers despite fine-grained individual differences. The characteristics of systematic within-speaker acoustic variation across tasks have potential implications for the understanding of the mechanisms of speech motor control and motor speech disorders.
Perceptual effects of dialectal and prosodic variation in vowels
NASA Astrophysics Data System (ADS)
Fox, Robert Allen; Jacewicz, Ewa; Hatcher, Kristin; Salmons, Joseph
2005-09-01
As was reported earlier [Fox et al., J. Acoust. Soc. Am. 114, 2396 (2003)], certain vowels in the Ohio and Wisconsin dialects of American English are shifting in different directions. In addition, we have found that the spectral characteristics of these vowels (e.g., duration and formant frequencies) changed systematically under varying degrees of prosodic prominence, with somewhat different changes occurring within each dialect. The question addressed in the current study is whether naive listeners from these two dialects are sensitive to both the dialect variations and to the prosodically induced spectral differences. Listeners from Ohio and Wisconsin listened to the stimulus tokens [beIt] and [bɛt] produced in each of three prosodic contexts (representing three different levels of prominence). These words were produced by speakers from Ohio or from Wisconsin (none of the listeners were also speakers). Listeners identified the stimulus tokens in terms of vowel quality and indicated whether it was a good, fair, or poor exemplar of that phonetic category. Results showed that both phonetic quality decisions and goodness ratings were systematically and significantly affected by speaker dialect, listener dialect, and prosodic context. Implications of source and nature of ongoing vowel changes in these two dialects will be discussed. [Work partially supported by NIDCD R03 DC005560-01.
NASA Astrophysics Data System (ADS)
Munson, Benjamin; Deboe, Nancy
2003-10-01
A recent study (Pierrehumbert, Bent, Munson, and Bailey, submitted) found differences in vowel production between people who are lesbian, bisexual, or gay (LBG) and people who are not. The specific differences (more fronted /u/ and /a/ in the non-LB women; an overall more-contracted vowel space in the non-gay men) were not amenable to an interpretation based on simple group differences in vocal-tract geometry. Rather, they suggested that differences were either due to group differences in some other skill, such as motor control or phonological encoding, or learned. This paper expands on this research by examining vowel production, speech-motor control (measured by diadochokinetic rates), and phonological encoding (measured by error rates in a tongue-twister task) in people who are LBG and people who are not. Analyses focus on whether the findings of Pierrehumbert et al. (submitted) are replicable, and whether group differences in vowel production are related to group differences in speech-motor control or phonological encoding. To date, 20 LB women, 20 non-LB women, 7 gay men, and 7 non-gay men have participated. Preliminary analyses suggest that there are no group differences in speech motor control or phonological encoding, suggesting that the earlier findings of Pierrehumbert et al. reflected learned behaviors.
Chinese character recognition based on Gabor feature extraction and CNN
NASA Astrophysics Data System (ADS)
Xiong, Yudian; Lu, Tongwei; Jiang, Yongyuan
2018-03-01
As an important application in the field of text line recognition and office automation, Chinese character recognition has become an important subject of pattern recognition. However, due to the large number of Chinese characters and the complexity of its structure, there is a great difficulty in the Chinese character recognition. In order to solve this problem, this paper proposes a method of printed Chinese character recognition based on Gabor feature extraction and Convolution Neural Network(CNN). The main steps are preprocessing, feature extraction, training classification. First, the gray-scale Chinese character image is binarized and normalized to reduce the redundancy of the image data. Second, each image is convoluted with Gabor filter with different orientations, and the feature map of the eight orientations of Chinese characters is extracted. Third, the feature map through Gabor filters and the original image are convoluted with learning kernels, and the results of the convolution is the input of pooling layer. Finally, the feature vector is used to classify and recognition. In addition, the generalization capacity of the network is improved by Dropout technology. The experimental results show that this method can effectively extract the characteristics of Chinese characters and recognize Chinese characters.
Weber, Theresa; Chandrasekaran, Vijayanand; Stamer, Insa; Thygesen, Mikkel B; Terfort, Andreas; Lindhorst, Thisbe K
2014-12-22
The surface recognition in many biological systems is guided by the interaction of carbohydrate-specific proteins (lectins) with carbohydrate epitopes (ligands) located within the unordered glycoconjugate layer (glycocalyx) of cells. Thus, for recognition, the respective ligand has to reorient for a successful matching event. Herein, we present for the first time a model system, in which only the orientation of the ligand is altered in a controlled manner without changing the recognition quality of the ligand itself. The key for this orientational control is the embedding into an interfacial system and the use of a photoswitchable mechanical joint, such as azobenzene. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hur, Taeho; Bang, Jaehun; Kim, Dohyeong; Banos, Oresti; Lee, Sungyoung
2017-04-23
Activity recognition through smartphones has been proposed for a variety of applications. The orientation of the smartphone has a significant effect on the recognition accuracy; thus, researchers generally propose using features invariant to orientation or displacement to achieve this goal. However, those features reduce the capability of the recognition system to differentiate among some specific commuting activities (e.g., bus and subway) that normally involve similar postures. In this work, we recognize those activities by analyzing the vibrations of the vehicle in which the user is traveling. We extract natural vibration features of buses and subways to distinguish between them and address the confusion that can arise because the activities are both static in terms of user movement. We use the gyroscope to fix the accelerometer to the direction of gravity to achieve an orientation-free use of the sensor. We also propose a correction algorithm to increase the accuracy when used in free living conditions and a battery saving algorithm to consume less power without reducing performance. Our experimental results show that the proposed system can adequately recognize each activity, yielding better accuracy in the detection of bus and subway activities than existing methods.
Hur, Taeho; Bang, Jaehun; Kim, Dohyeong; Banos, Oresti; Lee, Sungyoung
2017-01-01
Activity recognition through smartphones has been proposed for a variety of applications. The orientation of the smartphone has a significant effect on the recognition accuracy; thus, researchers generally propose using features invariant to orientation or displacement to achieve this goal. However, those features reduce the capability of the recognition system to differentiate among some specific commuting activities (e.g., bus and subway) that normally involve similar postures. In this work, we recognize those activities by analyzing the vibrations of the vehicle in which the user is traveling. We extract natural vibration features of buses and subways to distinguish between them and address the confusion that can arise because the activities are both static in terms of user movement. We use the gyroscope to fix the accelerometer to the direction of gravity to achieve an orientation-free use of the sensor. We also propose a correction algorithm to increase the accuracy when used in free living conditions and a battery saving algorithm to consume less power without reducing performance. Our experimental results show that the proposed system can adequately recognize each activity, yielding better accuracy in the detection of bus and subway activities than existing methods. PMID:28441743
Roy, Nelson; Mazin, Alqhazo; Awan, Shaheen N
2014-03-01
Distinguishing muscle tension dysphonia (MTD) from adductor spasmodic dysphonia (ADSD) can be difficult. Unlike MTD, ADSD is described as "task-dependent," implying that dysphonia severity varies depending upon the demands of the vocal task, with connected speech thought to be more symptomatic than sustained vowels. This study used an acoustic index of dysphonia severity (i.e., the Cepstral Spectral Index of Dysphonia [CSID]) to: 1) assess the value of "task dependency" to distinguish ADSD from MTD, and to 2) examine associations between the CSID and listener ratings. Case-Control Study. CSID estimates of dysphonia severity for connected speech and sustained vowels of patients with ADSD (n = 36) and MTD (n = 45) were compared. The diagnostic precision of task dependency (as evidenced by differences in CSID-estimated dysphonia severity between connected speech and sustained vowels) was examined. In ADSD, CSID-estimated severity for connected speech (M = 39. 2, SD = 22.0) was significantly worse than for sustained vowels (M = 29.3, SD = 21.9), [P = .020]. Whereas in MTD, no significant difference in CSID-estimated severity was observed between connected speech (M = 55.1, SD = 23.8) and sustained vowels (M = 50.0, SD = 27.4), [P = .177]. CSID evidence of task dependency correctly identified 66.7% of ADSD cases (sensitivity) and 64.4% of MTD cases (specificity). CSID and listener ratings were significantly correlated. Task dependency in ADSD, as revealed by differences in acoustically-derived estimates of dysphonia severity between connected speech and sustained vowel production, is a potentially valuable diagnostic marker. © 2013 The American Laryngological, Rhinological and Otological Society, Inc.
Cui, Xiaoyu; Gao, Chuanji; Zhou, Jianshe; Guo, Chunyan
2016-09-28
It has been widely shown that recognition memory includes two distinct retrieval processes: familiarity and recollection. Many studies have shown that recognition memory can be facilitated when there is a perceptual match between the studied and the tested items. Most event-related potential studies have explored the perceptual match effect on familiarity on the basis of the hypothesis that the specific event-related potential component associated with familiarity is the FN400 (300-500 ms mid-frontal effect). However, it is currently unclear whether the FN400 indexes familiarity or conceptual implicit memory. In addition, on the basis of the findings of a previous study, the so-called perceptual manipulations in previous studies may also involve some conceptual alterations. Therefore, we sought to determine the influence of perceptual manipulation by color changes on recognition memory when the perceptual or the conceptual processes were emphasized. Specifically, different instructions (perceptually or conceptually oriented) were provided to the participants. The results showed that color changes may significantly affect overall recognition memory behaviorally and that congruent items were recognized with a higher accuracy rate than incongruent items in both tasks, but no corresponding neural changes were found. Despite the evident familiarity shown in the two tasks (the behavioral performance of recognition memory was much higher than at the chance level), the FN400 effect was found in conceptually oriented tasks, but not perceptually oriented tasks. It is thus highly interesting that the FN400 effect was not induced, although color manipulation of recognition memory was behaviorally shown, as seen in previous studies. Our findings of the FN400 effect for the conceptual but not perceptual condition support the explanation that the FN400 effect indexes conceptual implicit memory.
Assessing Vowel Centralization in Dysarthria: A Comparison of Methods
ERIC Educational Resources Information Center
Fletcher, Annalise R.; McAuliffe, Megan J.; Lansford, Kaitlin L.; Liss, Julie M.
2017-01-01
Purpose: The strength of the relationship between vowel centralization measures and perceptual ratings of dysarthria severity has varied considerably across reports. This article evaluates methods of acoustic-perceptual analysis to determine whether procedural changes can strengthen the association between these measures. Method: Sixty-one…
Verbal Modification via Visual Display
ERIC Educational Resources Information Center
Richmond, Edmun B.; Wallace-Childers, La Donna
1977-01-01
The inability of foreign language students to produce acceptable approximations of new vowel sounds initiated a study to devise a real-time visual display system whereby the students could match vowel production to a visual pedagogical model. The system used amateur radio equipment and a standard oscilloscope. (CHK)
The Adolescent Dyslexic: Strategies for Spelling.
ERIC Educational Resources Information Center
Stirling, Eileen
1989-01-01
The spelling difficulties of the adolescent dyslexic student are described, and techniques are presented to provide the student with the tools needed to cope with spelling requirements, including the study of vowel sounds, doubling the consonant following a short vowel, root words, and laws of probabilities. (JDD)
Theoretical Aspects of Speech Production.
ERIC Educational Resources Information Center
Stevens, Kenneth N.
1992-01-01
This paper on speech production in children and youth with hearing impairments summarizes theoretical aspects, including the speech production process, sound sources in the vocal tract, vowel production, and consonant production. Examples of spectra for several classes of vowel and consonant sounds in simple syllables are given. (DB)
Pattern recognition: A basis for remote sensing data analysis
NASA Technical Reports Server (NTRS)
Swain, P. H.
1973-01-01
The theoretical basis for the pattern-recognition-oriented algorithms used in the multispectral data analysis software system is discussed. A model of a general pattern recognition system is presented. The receptor or sensor is usually a multispectral scanner. For each ground resolution element the receptor produces n numbers or measurements corresponding to the n channels of the scanner.
ERIC Educational Resources Information Center
Keller, Heidi; Yovsi, Relindis; Borke, Joern; Krtner, Joscha; Jensen, Henning; Papaligoura, Zaira
2004-01-01
This study relates parenting of 3-month-old children to children's self-recognition and self-regulation at 18 to 20 months. As hypothesized, observational data revealed differences in the sociocultural orientations of the 3 cultural samples' parenting styles and in toddlers' development of self-recognition and self-regulation. Children of…
Optical Pattern Recognition With Self-Amplification
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang
1994-01-01
In optical pattern recognition system with self-amplification, no reference beam used in addressing mode. Polarization of laser beam and orientation of photorefractive crystal chosen to maximize photorefractive effect. Intensity of recognition signal is orders of magnitude greater than other optical correlators. Apparatus regarded as real-time or quasi-real-time optical pattern recognizer with memory and reprogrammability.
Ting, Hua-Nong; Chia, See-Yan; Manap, Hany Hazfiza; Ho, Ai-Hui; Tiu, Kian-Yean; Abdul Hamid, Badrulzaman
2012-07-01
The study is going to investigate the fundamental frequency (F(0)) and perturbation measures of sustained vowels in 360 native Malaysian Malay children aged between 7 and 12 years using acoustical analysis. Praat software (Boersma and Weenink, University of Amsterdam, The Netherlands) was used to analyze the F(0) and perturbation measures of the sustained vowels. Statistical analyses were conducted to determine the significant differences in F(0) and perturbation measures across the vowels, sex, and age groups. The mean F(0) of Malaysian Malay male and female children were reported at 240±34.88 and 254.48±23.35Hz, respectively. The jitter (Jitt), relative average perturbation (RAP), five-point period perturbation quotient (PPQ5), shimmer (Shim), and 11-point amplitude perturbation quotient (APQ11) of Malaysian male children were reported at 0.43±0.26%, 0.25±0.16%, 0.26±0.15%, 2.48±1.61%, and 1.75±1.04%, respectively. As for female children, the Jitt, RAP, PPQ5, Shim, and APQ11 were reported at 0.42±0.22%, 0.25±0.14%, 0.25±0.13%, 2.47±1.53%, and 1.75±1.10%, respectively. No significant differences in F(0) were reported across the Malay vowels for both males and females. Malay females had significantly higher F(0) than that in Malay males at the age of 8, 10, and 12 years. Malaysian Malay children underwent the nonsystematic decrement in F(0) across the age groups. Significant differences in F(0) were found across the age groups. Significant differences in perturbation measures were observed across the vowels in certain age groups of Malay males and females. Generally, no significant differences in perturbation measures between the sex were observed in all the age groups and vowels. No significant differences in all the perturbation measures across the age groups were reported in both Malaysian Malay male and female children. Copyright © 2012 The Voice Foundation. Published by Mosby, Inc. All rights reserved.
Recognition Of Complex Three Dimensional Objects Using Three Dimensional Moment Invariants
NASA Astrophysics Data System (ADS)
Sadjadi, Firooz A.
1985-01-01
A technique for the recognition of complex three dimensional objects is presented. The complex 3-D objects are represented in terms of their 3-D moment invariants, algebraic expressions that remain invariant independent of the 3-D objects' orientations and locations in the field of view. The technique of 3-D moment invariants has been used successfully for simple 3-D object recognition in the past. In this work we have extended this method for the representation of more complex objects. Two complex objects are represented digitally; their 3-D moment invariants have been calculated, and then the invariancy of these 3-D invariant moment expressions is verified by changing the orientation and the location of the objects in the field of view. The results of this study have significant impact on 3-D robotic vision, 3-D target recognition, scene analysis and artificial intelligence.
Lepistö, T; Silokallio, S; Nieminen-von Wendt, T; Alku, P; Näätänen, R; Kujala, T
2006-10-01
Language development is delayed and deviant in individuals with autism, but proceeds quite normally in those with Asperger syndrome (AS). We investigated auditory-discrimination and orienting in children with AS using an event-related potential (ERP) paradigm that was previously applied to children with autism. ERPs were measured to pitch, duration, and phonetic changes in vowels and to corresponding changes in non-speech sounds. Active sound discrimination was evaluated with a sound-identification task. The mismatch negativity (MMN), indexing sound-discrimination accuracy, showed right-hemisphere dominance in the AS group, but not in the controls. Furthermore, the children with AS had diminished MMN-amplitudes and decreased hit rates for duration changes. In contrast, their MMN to speech pitch changes was parietally enhanced. The P3a, reflecting involuntary orienting to changes, was diminished in the children with AS for speech pitch and phoneme changes, but not for the corresponding non-speech changes. The children with AS differ from controls with respect to their sound-discrimination and orienting abilities. The results of the children with AS are relatively similar to those earlier obtained from children with autism using the same paradigm, although these clinical groups differ markedly in their language development.
Parkinson's disease and the effect of lexical factors on vowel articulation.
Watson, Peter J; Munson, Benjamin
2008-11-01
Lexical factors (i.e., word frequency and phonological neighborhood density) influence speech perception and production. It is unknown if these factors are affected by Parkinson's disease (PD). Ten men with PD and ten healthy men read CVC words (varying orthogonally for word frequency and density) aloud while audio recorded. Acoustic analysis was performed on duration and Bark-scaled F1-F2 values of the vowels contained in the words. Vowel space was larger for low-frequency words from dense neighborhoods than from sparse ones for both groups. However, the participants with PD did not show an effect of density on dispersion for high-frequency words.
Addressing Phonological Questions with Ultrasound
ERIC Educational Resources Information Center
Davidson, Lisa
2005-01-01
Ultrasound can be used to address unresolved questions in phonological theory. To date, some studies have shown that results from ultrasound imaging can shed light on how differences in phonological elements are implemented. Phenomena that have been investigated include transitional schwa, vowel coalescence, and transparent vowels. A study of…
Vowel perception by noise masked normal-hearing young adults
NASA Astrophysics Data System (ADS)
Richie, Carolyn; Kewley-Port, Diane; Coughlin, Maureen
2005-08-01
This study examined vowel perception by young normal-hearing (YNH) adults, in various listening conditions designed to simulate mild-to-moderate sloping sensorineural hearing loss. YNH listeners were individually age- and gender-matched to young hearing-impaired (YHI) listeners tested in a previous study [Richie et al., J. Acoust. Soc. Am. 114, 2923-2933 (2003)]. YNH listeners were tested in three conditions designed to create equal audibility with the YHI listeners; a low signal level with and without a simulated hearing loss, and a high signal level with a simulated hearing loss. Listeners discriminated changes in synthetic vowel tokens /smcapi e ɛ invv æ/ when F1 or F2 varied in frequency. Comparison of YNH with YHI results failed to reveal significant differences between groups in terms of performance on vowel discrimination, in conditions of similar audibility by using both noise masking to elevate the hearing thresholds of the YNH and applying frequency-specific gain to the YHI listeners. Further, analysis of learning curves suggests that while the YHI listeners completed an average of 46% more test blocks than YNH listeners, the YHI achieved a level of discrimination similar to that of the YNH within the same number of blocks. Apparently, when age and gender are closely matched between young hearing-impaired and normal-hearing adults, performance on vowel tasks may be explained by audibility alone.
Amengual, Mark
2016-01-01
The present study examines cognate effects in the phonetic production and processing of the Catalan back mid-vowel contrast (/o/-/ɔ/) by 24 early and highly proficient Spanish-Catalan bilinguals in Majorca (Spain). Participants completed a picture-naming task and a forced-choice lexical decision task in which they were presented with either words (e.g., /bɔsk/ “forest”) or non-words based on real words, but with the alternate mid-vowel pair in stressed position (*/bosk/). The same cognate and non-cognate lexical items were included in the production and lexical decision experiments. The results indicate that even though these early bilinguals maintained the back mid-vowel contrast in their productions, they had great difficulties identifying non-words and real words based on the identity of the Catalan mid-vowel. The analyses revealed language dominance and cognate effects: Spanish-dominants exhibited higher error rates than Catalan-dominants, and production and lexical decision accuracy were also affected by cognate status. The present study contributes to the discussion of the organization of early bilinguals' dominant and non-dominant sound systems, and proposes that exemplar theoretic approaches can be extended to include bilingual lexical connections that account for the interactions between the phonetic and lexical levels of early bilingual individuals. PMID:27199849
Effect of cognitive load on articulation rate and formant frequencies during simulator flights.
Huttunen, Kerttu H; Keränen, Heikki I; Pääkkönen, Rauno J; Päivikki Eskelinen-Rönkä, R; Leino, Tuomo K
2011-03-01
It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings. © 2011 Acoustical Society of America
Levy, Erika S.
2009-01-01
Recent research has called for an examination of perceptual assimilation patterns in second-language speech learning. This study examined the effects of language learning and consonantal context on perceptual assimilation of Parisian French (PF) front rounded vowels ∕y∕ and ∕œ∕ by American English (AE) learners of French. AE listeners differing in their French language experience (no experience, formal instruction, formal-plus-immersion experience) performed an assimilation task involving PF ∕y, œ, u, o, i, ε, a∕ in bilabial ∕rabVp∕ and alveolar ∕radVt∕ contexts, presented in phrases. PF front rounded vowels were assimilated overwhelmingly to back AE vowels. For PF ∕œ∕, assimilation patterns differed as a function of language experience and consonantal context. However, PF ∕y∕ revealed no experience effect in alveolar context. In bilabial context, listeners with extensive experience assimilated PF ∕y∕ to ∕ju∕ less often than listeners with no or only formal experience, a pattern predicting the poorest ∕u-y∕ discrimination for the most experienced group. An “internal consistency” analysis indicated that responses were most consistent with extensive language experience and in bilabial context. Acoustical analysis revealed that acoustical similarities among PF vowels alone cannot explain context-specific assimilation patterns. Instead it is suggested that native-language allophonic variation influences context-specific perceptual patterns in second-language learning. PMID:19206888
The role of linguistic experience on the perception of phonation
NASA Astrophysics Data System (ADS)
Esposito, Christina
2005-09-01
This study investigates the role linguistic experience has on the perception of phonation and the acoustic properties that correlate with this perception. Listeners from three languages (Gujarati, which contrasts breathy versus modal vowels, Italian, which has no breathiness, and English, which has allophonic breathiness) participated in two tasks. In the Visual Sort task, breathy and modal vowels from a variety of languages (e.g., Chong, Mon, etc.) were presented as icons on a computer screen. Subjects sorted the icons (the stimuli) into two groups based on perceived similarity of the talker's voices. In the multidimensional scaling task, listeners heard pairs of Mazatec vowels, and moved an on-screen slider to indicate the perceived similarity of the vowels in each pair. Results will show that judgments were more uniform across subjects who had breathy categories present in their native language(s), and varied across subjects who lack a breathy category. The perceived similarity among the stimuli will correlate with a measurable acoustic property (H1-H2, H1-H2, H1-A3, H1-A1 or H1-A2, the average of H1-H2 compared to A1, and A2-A3). It is hypothesized that H1-H2 will be the most salient acoustic property for Gujarati listeners because this correlates with their production of breathy vowels.
Benders, Titia
2013-12-01
Exaggeration of the vowel space in infant-directed speech (IDS) is well documented for English, but not consistently replicated in other languages or for other speech-sound contrasts. A second attested, but less discussed, pattern of change in IDS is an overall rise of the formant frequencies, which may reflect an affective speaking style. The present study investigates longitudinally how Dutch mothers change their corner vowels, voiceless fricatives, and pitch when speaking to their infant at 11 and 15 months of age. In comparison to adult-directed speech (ADS), Dutch IDS has a smaller vowel space, higher second and third formant frequencies in the vowels, and a higher spectral frequency in the fricatives. The formants of the vowels and spectral frequency of the fricatives are raised more strongly for infants at 11 than at 15 months, while the pitch is more extreme in IDS to 15-month olds. These results show that enhanced positive affect is the main factor influencing Dutch mothers' realisation of speech sounds in IDS, especially to younger infants. This study provides evidence that mothers' expression of emotion in IDS can influence the realisation of speech sounds, and that the loss or gain of speech clarity may be secondary effects of affect. Copyright © 2013 Elsevier Inc. All rights reserved.
Henry, Kenneth S; Amburgey, Kassidy N; Abrams, Kristina S; Idrobo, Fabio; Carney, Laurel H
2017-10-01
Vowels are complex sounds with four to five spectral peaks known as formants. The frequencies of the two lowest formants, F1and F2, are sufficient for vowel discrimination. Behavioral studies show that many birds and mammals can discriminate vowels. However, few studies have quantified thresholds for formant-frequency discrimination. The present study examined formant-frequency discrimination in budgerigars (Melopsittacus undulatus) and humans using stimuli with one or two formants and a constant fundamental frequency of 200 Hz. Stimuli had spectral envelopes similar to natural speech and were presented with random level variation. Thresholds were estimated for frequency discrimination of F1, F2, and simultaneous F1 and F2 changes. The same two-down, one-up tracking procedure and single-interval, two-alternative task were used for both species. Formant-frequency discrimination thresholds were as sensitive in budgerigars as in humans and followed the same patterns across all conditions. Thresholds expressed as percent frequency difference were higher for F1 than for F2, and were unchanged between stimuli with one or two formants. Thresholds for simultaneous F1 and F2 changes indicated that discrimination was based on combined information from both formant regions. Results were consistent with previous human studies and show that budgerigars provide an exceptionally sensitive animal model of vowel feature discrimination.
Kim, Kwang S; Max, Ludo
2014-01-01
To estimate the contributions of feedforward vs. feedback control systems in speech articulation, we analyzed the correspondence between initial and final kinematics in unperturbed tongue and jaw movements for consonant-vowel (CV) and vowel-consonant (VC) syllables. If movement extents and endpoints are highly predictable from early kinematic information, then the movements were most likely completed without substantial online corrections (feedforward control); if the correspondence between early kinematics and final amplitude or position is low, online adjustments may have altered the planned trajectory (feedback control) (Messier and Kalaska, 1999). Five adult speakers produced CV and VC syllables with high, mid, or low vowels while movements of the tongue and jaw were tracked electromagnetically. The correspondence between the kinematic parameters peak acceleration or peak velocity and movement extent as well as between the articulators' spatial coordinates at those kinematic landmarks and movement endpoint was examined both for movements across different target distances (i.e., across vowel height) and within target distances (i.e., within vowel height). Taken together, results suggest that jaw and tongue movements for these CV and VC syllables are mostly under feedforward control but with feedback-based contributions. One type of feedback-driven compensatory adjustment appears to regulate movement duration based on variation in peak acceleration. Results from a statistical model based on multiple regression are presented to illustrate how the relative strength of these feedback contributions can be estimated.
Flower orientation enhances pollen transfer in bilaterally symmetrical flowers.
Ushimaru, Atushi; Dohzono, Ikumi; Takami, Yasuoki; Hyodo, Fujio
2009-07-01
Zygomorphic flowers are usually more complex than actinomorphic flowers and are more likely to be visited by specialized pollinators. Complex zygomorphic flowers tend to be oriented horizontally. It is hypothesized that a horizontal flower orientation ensures effective pollen transfer by facilitating pollinator recognition (the recognition-facilitation hypothesis) and/or pollinator landing (the landing-control hypothesis). To examine these two hypotheses, we altered the angle of Commelina communis flowers and examined the efficiency of pollen transfer, as well as the behavior of their visitors. We exposed unmanipulated (horizontal-), upward-, and downward-facing flowers to syrphid flies (mostly Episyrphus balteatus), which are natural visitors to C. communis. The frequency of pollinator approaches and landings, as well as the amount of pollen deposited by E. balteatus, decreased for the downward-facing flowers, supporting both hypotheses. The upward-facing flowers received the same numbers of approaches and landings as the unmanipulated flowers, but experienced more illegitimate landings. In addition, the visitors failed to touch the stigmas or anthers on the upward-facing flowers, leading to reduced pollen export and receipt, and supporting the landing-control hypothesis. Collectively, our data suggested that the horizontal orientation of zygomorphic flowers enhances pollen transfer by both facilitating pollinator recognition and controlling pollinator landing position. These findings suggest that zygomorphic flowers which deviate from a horizontal orientation may have lower fitness because of decreased pollen transfer.
Notes on the Changing Nature of Popular French.
ERIC Educational Resources Information Center
Curnow, Maureen Cheney
This paper provides examples of a variety of phonological, orthographical, and morphological changes in current popular French are noted. They include: dropping of silent vowels in spelling, particularly in advertising and product names; changes in the pronunciation of vowels due to manipulation for product names; combinations of otherwise…
A Comprehensive Three-Dimensional Cortical Map of Vowel Space
ERIC Educational Resources Information Center
Scharinger, Mathias; Idsardi, William J.; Poe, Samantha
2011-01-01
Mammalian cortex is known to contain various kinds of spatial encoding schemes for sensory information including retinotopic, somatosensory, and tonotopic maps. Tonotopic maps are especially interesting for human speech sound processing because they encode linguistically salient acoustic properties. In this study, we mapped the entire vowel space…
Auditory Spectral Integration in the Perception of Static Vowels
ERIC Educational Resources Information Center
Fox, Robert Allen; Jacewicz, Ewa; Chang, Chiung-Yun
2011-01-01
Purpose: To evaluate potential contributions of broadband spectral integration in the perception of static vowels. Specifically, can the auditory system infer formant frequency information from changes in the intensity weighting across harmonics when the formant itself is missing? Does this type of integration produce the same results in the lower…
Concept of Tone in Mandarin Revisited: A Perceptual Study on Tonal Coarticulation.
ERIC Educational Resources Information Center
Shen, Xiaonan Susan; Lin, Maocan
1991-01-01
Examination of the perceptibility of carryover coarticulatory perturbations occurring at syllabic vowels in Mandarin Chinese suggests that, in connected speech, a portion of fundamental frequency at intertonemic onset is perturbed, including initial voiced consonants and vowels, and that the perturbations result from preservative as well as…
Representation of Sound Categories in Auditory Cortical Maps
ERIC Educational Resources Information Center
Guenther, Frank H.; Nieto-Castanon, Alfonso; Ghosh, Satrajit S.; Tourville, Jason A.
2004-01-01
Functional magnetic resonance imaging (fMRI) was used to investigate the representation of sound categories in human auditory cortex. Experiment 1 investigated the representation of prototypical (good) and nonprototypical (bad) examples of a vowel sound. Listening to prototypical examples of a vowel resulted in less auditory cortical activation…
ERIC Educational Resources Information Center
Moon, Gui-Sun
A discussion of the nasal harmony of Aguaruna, a language of the Jivaroan family in South America, approaches the subject from the viewpoint of generative phonology. This theory of phonology proposes an underlying nasal consonant, later deleted, that accounts for vowel nasalization. Complex rules that suppose a complex system of vowel and…
Dynamic Articulation of Vowels.
ERIC Educational Resources Information Center
Morgan, Willie B.
1979-01-01
A series of exercises and a theory of vowel descriptions can help minimize speakers' problems of excessive tension, awareness of tongue height, and tongue retraction. Eight exercises to provide Forward Facial Stretch neutralize tensions in the face and vocal resonator and their effect on the voice. Three experiments in which sounds are repeated…
Hemispheric Differences in the Effects of Context on Vowel Perception
ERIC Educational Resources Information Center
Sjerps, Matthias J.; Mitterer, Holger; McQueen, James M.
2012-01-01
Listeners perceive speech sounds relative to context. Contextual influences might differ over hemispheres if different types of auditory processing are lateralized. Hemispheric differences in contextual influences on vowel perception were investigated by presenting speech targets and both speech and non-speech contexts to listeners' right or left…
Acoustic Correlates of Emphatic Stress in Central Catalan
ERIC Educational Resources Information Center
Nadeu, Marianna; Hualde, Jose Ignacio
2012-01-01
A common feature of public speech in Catalan is the placement of prominence on lexically unstressed syllables ("emphatic stress"). This paper presents an acoustic study of radio speech data. Instances of emphatic stress were perceptually identified. Within-word comparison between vowels with emphatic stress and vowels with primary lexical stress…
Comparing Identification of Standardized and Regionally Valid Vowels
ERIC Educational Resources Information Center
Wright, Richard; Souza, Pamela
2012-01-01
Purpose: In perception studies, it is common to use vowel stimuli from standardized recordings or synthetic stimuli created using values from well-known published research. Although the use of standardized stimuli is convenient, unconsidered dialect and regional accent differences may introduce confounding effects. The goal of this study was to…
The Prosodic Evolution of West Slavic in the Context of the Neo-Acute Stress
ERIC Educational Resources Information Center
Feldstein, Ronald F.
1975-01-01
Because of neo-acute stress--or transferred acute stress--long vowel prosody in West Slavic had a special evolution. Two kinds of long vowel evolution are examined. The nature of transitionality across Slavic territory from tonal opposition to distinctive stress placement is pointed out. (SC)
Comparison of Nasal Acceleration and Nasalance across Vowels
ERIC Educational Resources Information Center
Thorp, Elias B.; Virnik, Boris T.; Stepp, Cara E.
2013-01-01
Purpose: The purpose of this study was to determine the performance of normalized nasal acceleration (NNA) relative to nasalance as estimates of nasalized versus nonnasalized vowel and sentence productions. Method: Participants were 18 healthy speakers of American English. NNA was measured using a custom sensor, and nasalance was measured using…
Formant Amplitude of Children with Down's Syndrome.
ERIC Educational Resources Information Center
Pentz, Arthur L., Jr.
1987-01-01
The sustained vowel sounds of 14 noninstitutionalized 7- to 10-year-old children with Down's syndrome were analyzed acoustically for vowel formant amplitude levels. The subjects with Down's syndrome had formant amplitude intensity levels significantly lower than those of a similar group of speakers without Down's syndrome. (Author/DB)
Status Report on Speech Research, July-December 1990.
ERIC Educational Resources Information Center
Studdert-Kennedy, Michael, Ed.
One of a series of semiannual reports, this publication contains 13 articles which report the status and progress of studies on the nature of speech, instrumentation for its investigation, and practical applications. Articles and their authors are as follows: "The Role of Contrast in Limiting Vowel-to-Vowel Coarticulation in Different…
Luo, Jiebo; Boutell, Matthew
2005-05-01
Automatic image orientation detection for natural images is a useful, yet challenging research topic. Humans use scene context and semantic object recognition to identify the correct image orientation. However, it is difficult for a computer to perform the task in the same way because current object recognition algorithms are extremely limited in their scope and robustness. As a result, existing orientation detection methods were built upon low-level vision features such as spatial distributions of color and texture. Discrepant detection rates have been reported for these methods in the literature. We have developed a probabilistic approach to image orientation detection via confidence-based integration of low-level and semantic cues within a Bayesian framework. Our current accuracy is 90 percent for unconstrained consumer photos, impressive given the findings of a psychophysical study conducted recently. The proposed framework is an attempt to bridge the gap between computer and human vision systems and is applicable to other problems involving semantic scene content understanding.
Sex, Sexual Orientation, and Identification of Positive and Negative Facial Affect
ERIC Educational Resources Information Center
Rahman, Qazi; Wilson, Glenn D.; Abrahams, Sharon
2004-01-01
Sex and sexual orientation related differences in processing of happy and sad facial emotions were examined using an experimental facial emotion recognition paradigm with a large sample (N=240). Analysis of covariance (controlling for age and IQ) revealed that women (irrespective of sexual orientation) had faster reaction times than men for…
Suemitsu, Atsuo; Dang, Jianwu; Ito, Takayuki; Tiede, Mark
2015-10-01
Articulatory information can support learning or remediating pronunciation of a second language (L2). This paper describes an electromagnetic articulometer-based visual-feedback approach using an articulatory target presented in real-time to facilitate L2 pronunciation learning. This approach trains learners to adjust articulatory positions to match targets for a L2 vowel estimated from productions of vowels that overlap in both L1 and L2. Training of Japanese learners for the American English vowel /æ/ that included visual training improved its pronunciation regardless of whether audio training was also included. Articulatory visual feedback is shown to be an effective method for facilitating L2 pronunciation learning.
Pattern recognition and feature extraction with an optical Hough transform
NASA Astrophysics Data System (ADS)
Fernández, Ariel
2016-09-01
Pattern recognition and localization along with feature extraction are image processing applications of great interest in defect inspection and robot vision among others. In comparison to purely digital methods, the attractiveness of optical processors for pattern recognition lies in their highly parallel operation and real-time processing capability. This work presents an optical implementation of the generalized Hough transform (GHT), a well-established technique for the recognition of geometrical features in binary images. Detection of a geometric feature under the GHT is accomplished by mapping the original image to an accumulator space; the large computational requirements for this mapping make the optical implementation an attractive alternative to digital- only methods. Starting from the integral representation of the GHT, it is possible to device an optical setup where the transformation is obtained, and the size and orientation parameters can be controlled, allowing for dynamic scale and orientation-variant pattern recognition. A compact system for the above purposes results from the use of an electrically tunable lens for scale control and a rotating pupil mask for orientation variation, implemented on a high-contrast spatial light modulator (SLM). Real-time (as limited by the frame rate of the device used to capture the GHT) can also be achieved, allowing for the processing of video sequences. Besides, by thresholding of the GHT (with the aid of another SLM) and inverse transforming (which is optically achieved in the incoherent system under appropriate focusing setting), the previously detected features of interest can be extracted.
Representations of Spectral Differences between Vowels in Tonotopic Regions of Auditory Cortex
ERIC Educational Resources Information Center
Fisher, Julia
2017-01-01
This work examines the link between low-level cortical acoustic processing and higher-level cortical phonemic processing. Specifically, using functional magnetic resonance imaging, it looks at 1) whether or not the vowels [alpha] and [i] are distinguishable in regions of interest defined by the first two resonant frequencies (formants) of those…
Articulatory Movements during Vowels in Speakers with Dysarthria and Healthy Controls
ERIC Educational Resources Information Center
Yunusova, Yana; Weismer, Gary; Westbury, John R.; Lindstrom, Mary J.
2008-01-01
Purpose: This study compared movement characteristics of markers attached to the jaw, lower lip, tongue blade, and dorsum during production of selected English vowels by normal speakers and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson disease (PD). The study asked the following questions: (a) Are movement…
Vowel Harmony in Palestinian Arabic: A Metrical Perspective.
ERIC Educational Resources Information Center
Abu-Salim, I. M.
1987-01-01
The autosegmental rule of vowel harmony (VH) in Palestinian Arabic is shown to be constrained simultaneously by metrical and segmental boundaries. The indicative prefix bi- is no longer an exception to VH if a structure is assumed that disallows the prefix from sharing a foot with the stem, consequently blocking VH. (Author/LMO)
ERIC Educational Resources Information Center
Kambuziya, Aliyeh Kord-e Zafaranlu; Dehghan, Masoud
2011-01-01
This paper investigates epenthesis process in Persian to catch some results in relating to vowel and consonant insertion in Persian lexicon. This survey has a close relationship to the description of epenthetic consonants and the conditions in which these consonants are used. Since no word in Persian may begin with a vowel, so that hiatus can't be…
ERIC Educational Resources Information Center
Fagan, Mary K.; Doveikis, Kate N.
2017-01-01
Purpose: This study tested proposals that maternal verbal responses shape infant vocal development, proposals based in part on evidence that infants modified their vocalizations to match mothers' experimentally manipulated vowel or consonant-vowel responses to most (i.e., 70%-80%) infant vocalizations. We tested the proposal in ordinary rather…
Neural Correlates in the Processing of Phoneme-Level Complexity in Vowel Production
ERIC Educational Resources Information Center
Park, Haeil; Iverson, Gregory K.; Park, Hae-Jeong
2011-01-01
We investigated how articulatory complexity at the phoneme level is manifested neurobiologically in an overt production task. fMRI images were acquired from young Korean-speaking adults as they pronounced bisyllabic pseudowords in which we manipulated phonological complexity defined in terms of vowel duration and instability (viz., COMPLEX:…
Spatial Frequency Requirements and Gaze Strategy in Visual-Only and Audiovisual Speech Perception
ERIC Educational Resources Information Center
Wilson, Amanda H.; Alsius, Agnès; Parè, Martin; Munhall, Kevin G.
2016-01-01
Purpose: The aim of this article is to examine the effects of visual image degradation on performance and gaze behavior in audiovisual and visual-only speech perception tasks. Method: We presented vowel-consonant-vowel utterances visually filtered at a range of frequencies in visual-only, audiovisual congruent, and audiovisual incongruent…
Lingual Electromyography Related to Tongue Movements in Swedish Vowel Production.
ERIC Educational Resources Information Center
Hirose, Hajime; And Others
1979-01-01
In order to investigate the articulatory dynamics of the tongue in the production of Swedish vowels, electromyographic (EMG) and X-ray microbeam studies were performed on a native Swedish subject. The EMG signals were used to obtain average indication of the muscle activity of the tongue as a function of time. (NCR)
Vowel Harmony: A Variable Rule in Brazilian Portuguese.
ERIC Educational Resources Information Center
Bisol, Leda
1989-01-01
Examines vowel harmony in the "Gaucho dialect" of the Brazilian state of Rio Grande do Sul. Informants from four areas of the state were studied: the capital city (Porto Alegre), the border region with Uruguay, and two areas of the interior populated by descendants of nineteenth-century immigrants from Europe, mainly Germans and…
Markedness in the Perception of L2 English Consonant Clusters
ERIC Educational Resources Information Center
AlMahmoud, Mahmoud S.
2011-01-01
The central goal of this dissertation is to explore the relative perceptibility of vowel epenthesis in English onset clusters by second language learners whose native language is averse to onset clusters. The dissertation examines how audible vowel epenthesis in different onset clusters is, whether this perceptibility varies from one cluster to…
Vowel and Consonant Lessening: A Study of Articulating Reductions and Their Relations to Genders
ERIC Educational Resources Information Center
Lin, Grace Hui Chin; Chien, Paul Shih Chieh
2011-01-01
Using English as a global communicating tool makes Taiwanese people have to speak in English in diverse international situations. However, consonants and vowels in English are not all effortless for them to articulate. This phonological reduction study explores concepts about phonological (articulating system) approximation. From Taiwanese folks'…
ERIC Educational Resources Information Center
Menke, Mandy R.
2015-01-01
Language immersion students' lexical, syntactic, and pragmatic competencies are well documented, yet their phonological skill has remained relatively unexplored. This study investigates the Spanish vowel productions of a cross-sectional sample of 35 one-way Spanish immersion students. Learner productions were analyzed acoustically and compared to…
Adult Second Language Learning of Spanish Vowels
ERIC Educational Resources Information Center
Cobb, Katherine; Simonet, Miquel
2015-01-01
The present study reports on the findings of a cross-sectional acoustic study of the production of Spanish vowels by three different groups of speakers: 1) native Spanish speakers; 2) native English intermediate learners of Spanish; and 3) native English advanced learners of Spanish. In particular, we examined the production of the five Spanish…
The Development of an Automatic Dialect Classification Test. Final Report.
ERIC Educational Resources Information Center
Willis, Clodius
These experiments investigated and described intra-subject, inter-subject, and inter-group variation in perception of synthetic vowels as well as the possibility that inter-group differences reflect dialect differences. Two tests were made covering the full phonetic range of English vowels. In two other tests subjects chose between one of two…
Sound Change and Mobility in Los Angeles
ERIC Educational Resources Information Center
Johnson, Lawrence
1975-01-01
Deals with the shift of the low-back vowel as in 'caught' to a low-central vowel as in 'cot' thereby merging such pairs as caught/cot, dawn/Don, and stalk/stock. The causes and the sociolinguistic implications of this shift are discussed. The majority of the informants were from West Los Angeles. (TL)
Effects of Computer System and Vowel Loading on Measures of Nasalance
ERIC Educational Resources Information Center
Awan, Shaheen N.; Omlor, Kristin; Watts, Christopher R.
2011-01-01
Purpose: The purpose of this study was to determine similarities and differences in nasalance scores observed with different computerized nasalance systems in the context of vowel-loaded sentences. Methodology: Subjects were 46 Caucasian adults with no perceived hyper-or hyponasality. Nasalance scores were obtained using the Nasometer 6200 (Kay…
Speech Research Status Report, January-June 1994.
ERIC Educational Resources Information Center
Fowler, Carol A., Ed.
This publication (one of a series) contains 14 articles which report the status and progress of studies on the nature of speech, instruments for its investigation, and practical applications. Articles include: "The Universality of Intrinsic FO of Vowels: (D. H. Whalen and Andrea G. Levitt); "Intrinsic F0 of Vowels in the Babbling of 6-,…
Sociophonetic Variations in Korean Constituent Final "-Ko" and "-To"
ERIC Educational Resources Information Center
Yi, So Young L.
2015-01-01
The purpose of this dissertation is to examine (i) linguistic and extralinguistic factors that influence vowel raising of /o/ in constituent-final "-ko" and "-to" in Seoul Korean and (ii) listeners' perceptions of this vowel raising and social meanings of the raised variant. The analyses are based on production data collected…
Exploration of Acoustic Features for Automatic Vowel Discrimination in Spontaneous Speech
ERIC Educational Resources Information Center
Tyson, Na'im R.
2012-01-01
In an attempt to understand what acoustic/auditory feature sets motivated transcribers towards certain labeling decisions, I built machine learning models that were capable of discriminating between canonical and non-canonical vowels excised from the Buckeye Corpus. Specifically, I wanted to model when the dictionary form and the transcribed-form…
Vowelling and semantic priming effects in Arabic.
Mountaj, Nadia; El Yagoubi, Radouane; Himmi, Majid; Lakhdar Ghazal, Faouzi; Besson, Mireille; Boudelaa, Sami
2015-01-01
In the present experiment we used a semantic judgment task with Arabic words to determine whether semantic priming effects are found in the Arabic language. Moreover, we took advantage of the specificity of the Arabic orthographic system, which is characterized by a shallow (i.e., vowelled words) and a deep orthography (i.e., unvowelled words), to examine the relationship between orthographic and semantic processing. Results showed faster Reaction Times (RTs) for semantically related than unrelated words with no difference between vowelled and unvowelled words. By contrast, Event Related Potentials (ERPs) revealed larger N1 and N2 components to vowelled words than unvowelled words suggesting that visual-orthographic complexity taxes the early word processing stages. Moreover, semantically unrelated Arabic words elicited larger N400 components than related words thereby demonstrating N400 effects in Arabic. Finally, the Arabic N400 effect was not influenced by orthographic depth. The implications of these results for understanding the processing of orthographic, semantic, and morphological structures in Modern Standard Arabic are discussed. Copyright © 2014 Elsevier B.V. All rights reserved.
Stop and Fricative Devoicing in European Portuguese, Italian and German.
Pape, Daniel; Jesus, Luis M T
2015-06-01
This paper describes a cross-linguistic production study of devoicing for European Portuguese (EP), Italian, and German. We recorded all stops and fricatives in four vowel contexts and two word positions. We computed the devoicing of the time-varying patterns throughout the stop and fricative duration. Our results show that regarding devoicing behaviour, EP is more similar to German than Italian. While Italian shows almost no devoicing of all phonologically voiced consonants, both EP and German show strong and consistent devoicing through the entire consonant. Differences in consonant position showed no effect for EP and Italian, but were significantly different for German. The height of the vowel context had an effect for German and EP. For EP, we showed that a more posterior place of articulation and low vowel context lead to significantly more devoicing. However, in contrast to German, we could not find an influence of consonant position on devoicing. The high devoicing for all phonologically voiced stops and fricatives and the vowel context influence are a surprising new result. With respect to voicing maintenance, EP is more like German than other Romance languages.
Formant discrimination in noise for isolated vowels
NASA Astrophysics Data System (ADS)
Liu, Chang; Kewley-Port, Diane
2004-11-01
Formant discrimination for isolated vowels presented in noise was investigated for normal-hearing listeners. Discrimination thresholds for F1 and F2, for the seven American English vowels /eye, smcapi, eh, æ, invv, aye, you/, were measured under two types of noise, long-term speech-shaped noise (LTSS) and multitalker babble, and also under quiet listening conditions. Signal-to-noise ratios (SNR) varied from -4 to +4 dB in steps of 2 dB. All three factors, formant frequency, signal-to-noise ratio, and noise type, had significant effects on vowel formant discrimination. Significant interactions among the three factors showed that threshold-frequency functions depended on SNR and noise type. The thresholds at the lowest levels of SNR were highly elevated by a factor of about 3 compared to those in quiet. The masking functions (threshold vs SNR) were well described by a negative exponential over F1 and F2 for both LTSS and babble noise. Speech-shaped noise was a slightly more effective masker than multitalker babble, presumably reflecting small benefits (1.5 dB) due to the temporal variation of the babble. .
Formants and musical harmonics matching in Brazilian lied
NASA Astrophysics Data System (ADS)
Raposo de Medeiros, Beatriz
2004-05-01
This paper reports a comparison of the formant patterns of speech and singing. Measurements of the first three formants were made on the stable portion of the vowels. The main finding of the study is an acoustic effect that can be described as the matching of the vowel formants to the harmonics of the sung note (A flat, 420 Hz). For example, for the vowel [a], F1 generally matched with the second harmonic (840 Hz) and F2 with the third harmonic. This finding is complementary to that of Sundberg (1977) according to which the higher the fundamental frequency of the musical note, e.g., 700 Hz, the more the mandible is lowered causing the elevation of the first formant of the sung vowel. As Sundberg himself named this phenomenon, there is a matching between the first formant and the phonation frequency, causing an increase in the sound energy. The present study establishes that the matching affects not only F1 but also F2 and F3. This finding will be discussed in connection with other manoeuvres (e.g., tongue movements) used by singers.
Psychophysics of complex auditory and speech stimuli
NASA Astrophysics Data System (ADS)
Pastore, Richard E.
1993-10-01
A major focus on the primary project is the use of different procedures to provide converging evidence on the nature of perceptual spaces for speech categories. Completed research examined initial voiced consonants, with results providing strong evidence that different stimulus properties may cue a phoneme category in different vowel contexts. Thus, /b/ is cued by a rising second format (F2) with the vowel /a/, requiring both F2 and F3 to be rising with /i/, and is independent of the release burst for these vowels. Furthermore, cues for phonetic contrasts are not necessarily symmetric, and the strong dependence of prior speech research on classification procedures may have led to errors. Thus, the opposite (falling F2 and F3) transitions lead somewhat ambiguous percepts (i.e., not /b/) which may be labeled consistently (as /d/ or /g/), but requires a release burst to achieve high category quality and similarity to category exemplars. Ongoing research is examining cues in other vowel contexts and issuing procedures to evaluate the nature of interaction between cues for categories of both speech and music.
Informational Masking Effects on Neural Encoding of Stimulus Onset and Acoustic Change.
Niemczak, Christopher E; Vander Werff, Kathy R
2018-05-18
Recent investigations using cortical auditory evoked potentials have shown masker-dependent effects on sensory cortical processing of speech information. Background noise maskers consisting of other people talking are particularly difficult for speech recognition. Behavioral studies have related this to perceptual masking, or informational masking, beyond just the overlap of the masker and target at the auditory periphery. The aim of the present study was to use cortical auditory evoked potentials, to examine how maskers (i.e., continuous speech-shaped noise [SSN] and multi-talker babble) affect the cortical sensory encoding of speech information at an obligatory level of processing. Specifically, cortical responses to vowel onset and formant change were recorded under different background noise conditions presumed to represent varying amounts of energetic or informational masking. The hypothesis was, that even at this obligatory cortical level of sensory processing, we would observe larger effects on the amplitude and latency of the onset and change components as the amount of informational masking increased across background noise conditions. Onset and change responses were recorded to a vowel change from /u-i/ in young adults under four conditions: quiet, continuous SSN, eight-talker (8T) babble, and two-talker (2T) babble. Repeated measures analyses by noise condition were conducted on amplitude, latency, and response area measurements to determine the differential effects of these noise conditions, designed to represent increasing and varying levels of informational and energetic masking, on cortical neural representation of a vowel onset and acoustic change response waveforms. All noise conditions significantly reduced onset N1 and P2 amplitudes, onset N1-P2 peak to peak amplitudes, as well as both onset and change response area compared with quiet conditions. Further, all amplitude and area measures were significantly reduced for the two babble conditions compared with continuous SSN. However, there were no significant differences in peak amplitude or area for either onset or change responses between the two different babble conditions (eight versus two talkers). Mean latencies for all onset peaks were delayed for noise conditions compared with quiet. However, in contrast to the amplitude and area results, differences in peak latency between SSN and the babble conditions did not reach statistical significance. These results support the idea that while background noise maskers generally reduce amplitude and increase latency of speech-sound evoked cortical responses, the type of masking has a significant influence. Speech babble maskers (eight talkers and two talkers) have a larger effect on the obligatory cortical response to speech sound onset and change compared with purely energetic continuous SSN maskers, which may be attributed to informational masking effects. Neither the neural responses to the onset nor the vowel change, however, were sensitive to the hypothesized increase in the amount of informational masking between speech babble maskers with two talkers compared with eight talkers.
ERIC Educational Resources Information Center
Turner, Franklin Dickerson
2012-01-01
The author examined the effectiveness of 2 fluency-oriented reading programs on improving reading fluency for an ethnically diverse sample of second-grade students. The first approach is Fluency-Oriented Reading Instruction (S. A. Stahl & K. Heubach, 2005), which incorporates the repeated reading of a grade-level text over the course of an…
Han, Feifei
2017-01-01
While some first language (L1) reading models suggest that inefficient word recognition and small working memory tend to inhibit higher-level comprehension processes; the Compensatory Encoding Model maintains that slow word recognition and small working memory do not normally hinder reading comprehension, as readers are able to operate metacognitive strategies to compensate for inefficient word recognition and working memory limitation as long as readers process a reading task without time constraint. Although empirical evidence is accumulated for support of the Compensatory Encoding Model in L1 reading, there is lack of research for testing of the Compensatory Encoding Model in foreign language (FL) reading. This research empirically tested the Compensatory Encoding Model in English reading among Chinese college English language learners (ELLs). Two studies were conducted. Study one focused on testing whether reading condition varying time affects the relationship between word recognition, working memory, and reading comprehension. Students were tested on a computerized English word recognition test, a computerized Operation Span task, and reading comprehension in time constraint and non-time constraint reading. The correlation and regression analyses showed that the strength of association was much stronger between word recognition, working memory, and reading comprehension in time constraint than that in non-time constraint reading condition. Study two examined whether FL readers were able to operate metacognitive reading strategies as a compensatory way of reading comprehension for inefficient word recognition and working memory limitation in non-time constraint reading. The participants were tested on the same computerized English word recognition test and Operation Span test. They were required to think aloud while reading and to complete the comprehension questions. The think-aloud protocols were coded for concurrent use of reading strategies, classified into language-oriented strategies, content-oriented strategies, re-reading, pausing, and meta-comment. The correlation analyses showed that while word recognition and working memory were only significantly related to frequency of language-oriented strategies, re-reading, and pausing, but not with reading comprehension. Jointly viewed, the results of the two studies, complimenting each other, supported the applicability of the Compensatory Encoding Model in FL reading with Chinese college ELLs. PMID:28522984
Han, Feifei
2017-01-01
While some first language (L1) reading models suggest that inefficient word recognition and small working memory tend to inhibit higher-level comprehension processes; the Compensatory Encoding Model maintains that slow word recognition and small working memory do not normally hinder reading comprehension, as readers are able to operate metacognitive strategies to compensate for inefficient word recognition and working memory limitation as long as readers process a reading task without time constraint. Although empirical evidence is accumulated for support of the Compensatory Encoding Model in L1 reading, there is lack of research for testing of the Compensatory Encoding Model in foreign language (FL) reading. This research empirically tested the Compensatory Encoding Model in English reading among Chinese college English language learners (ELLs). Two studies were conducted. Study one focused on testing whether reading condition varying time affects the relationship between word recognition, working memory, and reading comprehension. Students were tested on a computerized English word recognition test, a computerized Operation Span task, and reading comprehension in time constraint and non-time constraint reading. The correlation and regression analyses showed that the strength of association was much stronger between word recognition, working memory, and reading comprehension in time constraint than that in non-time constraint reading condition. Study two examined whether FL readers were able to operate metacognitive reading strategies as a compensatory way of reading comprehension for inefficient word recognition and working memory limitation in non-time constraint reading. The participants were tested on the same computerized English word recognition test and Operation Span test. They were required to think aloud while reading and to complete the comprehension questions. The think-aloud protocols were coded for concurrent use of reading strategies, classified into language-oriented strategies, content-oriented strategies, re-reading, pausing, and meta-comment. The correlation analyses showed that while word recognition and working memory were only significantly related to frequency of language-oriented strategies, re-reading, and pausing, but not with reading comprehension. Jointly viewed, the results of the two studies, complimenting each other, supported the applicability of the Compensatory Encoding Model in FL reading with Chinese college ELLs.
Shao, Jing; Huang, Xunan
2017-01-01
Congenital amusia is a lifelong disorder of fine-grained pitch processing in music and speech. However, it remains unclear whether amusia is a pitch-specific deficit, or whether it affects frequency/spectral processing more broadly, such as the perception of formant frequency in vowels, apart from pitch. In this study, in order to illuminate the scope of the deficits, we compared the performance of 15 Cantonese-speaking amusics and 15 matched controls on the categorical perception of sound continua in four stimulus contexts: lexical tone, pure tone, vowel, and voice onset time (VOT). Whereas lexical tone, pure tone and vowel continua rely on frequency/spectral processing, the VOT continuum depends on duration/temporal processing. We found that the amusic participants performed similarly to controls in all stimulus contexts in the identification, in terms of the across-category boundary location and boundary width. However, the amusic participants performed systematically worse than controls in discriminating stimuli in those three contexts that depended on frequency/spectral processing (lexical tone, pure tone and vowel), whereas they performed normally when discriminating duration differences (VOT). These findings suggest that the deficit of amusia is probably not pitch specific, but affects frequency/spectral processing more broadly. Furthermore, there appeared to be differences in the impairment of frequency/spectral discrimination in speech and nonspeech contexts. The amusic participants exhibited less benefit in between-category discriminations than controls in speech contexts (lexical tone and vowel), suggesting reduced categorical perception; on the other hand, they performed inferiorly compared to controls across the board regardless of between- and within-category discriminations in nonspeech contexts (pure tone), suggesting impaired general auditory processing. These differences imply that the frequency/spectral-processing deficit might be manifested differentially in speech and nonspeech contexts in amusics—it is manifested as a deficit of higher-level phonological processing in speech sounds, and as a deficit of lower-level auditory processing in nonspeech sounds. PMID:28829808
Vaden, Kenneth I; Kuchinsky, Stefanie E; Keren, Noam I; Harris, Kelly C; Ahlstrom, Jayne B; Dubno, Judy R; Eckert, Mark A
2011-11-01
The left inferior frontal gyrus (LIFG) exhibits increased responsiveness when people listen to words composed of speech sounds that frequently co-occur in the English language (Vaden, Piquado, & Hickok, 2011), termed high phonotactic frequency (Vitevitch & Luce, 1998). The current experiment aimed to further characterize the relation of phonotactic frequency to LIFG activity by manipulating word intelligibility in participants of varying age. Thirty six native English speakers, 19-79 years old (mean=50.5, sd=21.0) indicated with a button press whether they recognized 120 binaurally presented consonant-vowel-consonant words during a sparse sampling fMRI experiment (TR=8 s). Word intelligibility was manipulated by low-pass filtering (cutoff frequencies of 400 Hz, 1000 Hz, 1600 Hz, and 3150 Hz). Group analyses revealed a significant positive correlation between phonotactic frequency and LIFG activity, which was unaffected by age and hearing thresholds. A region of interest analysis revealed that the relation between phonotactic frequency and LIFG activity was significantly strengthened for the most intelligible words (low-pass cutoff at 3150 Hz). These results suggest that the responsiveness of the left inferior frontal cortex to phonotactic frequency reflects the downstream impact of word recognition rather than support of word recognition, at least when there are no speech production demands. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Mosko, J. D.; Stevens, K. N.; Griffin, G. R.
1983-08-01
Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
Kim, Seongjung; Kim, Jongman; Ahn, Soonjae; Kim, Youngho
2018-04-18
Deaf people use sign or finger languages for communication, but these methods of communication are very specialized. For this reason, the deaf can suffer from social inequalities and financial losses due to their communication restrictions. In this study, we developed a finger language recognition algorithm based on an ensemble artificial neural network (E-ANN) using an armband system with 8-channel electromyography (EMG) sensors. The developed algorithm was composed of signal acquisition, filtering, segmentation, feature extraction and an E-ANN based classifier that was evaluated with the Korean finger language (14 consonants, 17 vowels and 7 numbers) in 17 subjects. E-ANN was categorized according to the number of classifiers (1 to 10) and size of training data (50 to 1500). The accuracy of the E-ANN-based classifier was obtained by 5-fold cross validation and compared with an artificial neural network (ANN)-based classifier. As the number of classifiers (1 to 8) and size of training data (50 to 300) increased, the average accuracy of the E-ANN-based classifier increased and the standard deviation decreased. The optimal E-ANN was composed with eight classifiers and 300 size of training data, and the accuracy of the E-ANN was significantly higher than that of the general ANN.
Brochier, Tim; McDermott, Hugh J; McKay, Colette M
2017-06-01
In order to improve speech understanding for cochlear implant users, it is important to maximize the transmission of temporal information. The combined effects of stimulation rate and presentation level on temporal information transfer and speech understanding remain unclear. The present study systematically varied presentation level (60, 50, and 40 dBA) and stimulation rate [500 and 2400 pulses per second per electrode (pps)] in order to observe how the effect of rate on speech understanding changes for different presentation levels. Speech recognition in quiet and noise, and acoustic amplitude modulation detection thresholds (AMDTs) were measured with acoustic stimuli presented to speech processors via direct audio input (DAI). With the 500 pps processor, results showed significantly better performance for consonant-vowel nucleus-consonant words in quiet, and a reduced effect of noise on sentence recognition. However, no rate or level effect was found for AMDTs, perhaps partly because of amplitude compression in the sound processor. AMDTs were found to be strongly correlated with the effect of noise on sentence perception at low levels. These results indicate that AMDTs, at least when measured with the CP910 Freedom speech processor via DAI, explain between-subject variance of speech understanding, but do not explain within-subject variance for different rates and levels.
Aesthetic judgement of orientation in modern art.
Mather, George
2012-01-01
When creating an artwork, the artist makes a decision regarding the orientation at which the work is to be hung based on their aesthetic judgement and the message conveyed by the piece. Is the impact or aesthetic appeal of a work diminished when it is hung at an incorrect orientation? To investigate this question, Experiment 1 asked whether naïve observers can appreciate the correct orientation (as defined by the artist) of 40 modern artworks, some of which are entirely abstract. Eighteen participants were shown 40 paintings in a series of trials. Each trial presented all four cardinal orientations on a computer screen, and the participant was asked to select the orientation that was most attractive or meaningful. Results showed that the correct orientation was selected in 48% of trials on average, significantly above the 25% chance level, but well below perfect performance. A second experiment investigated the extent to which the 40 paintings contained recognisable content, which may have mediated orientation judgements. Recognition rates varied from 0% for seven of the paintings to 100% for five paintings. Orientation judgements in Experiment 1 correlated significantly with "meaningful" content judgements in Experiment 2: 42% of the variance in orientation judgements in Experiment 1 was shared with recognition of meaningful content in Experiment 2. For the seven paintings in which no meaningful content at all was detected, 41% of the variance in orientation judgements was shared with variance in a physical measure of image content, Fourier amplitude spectrum slope. For some paintings, orientation judgements were quite consistent, despite a lack of meaningful content. The origin of these orientation judgements remains to be identified.
Xu, Xuemiao; Jin, Qiang; Zhou, Le; Qin, Jing; Wong, Tien-Tsin; Han, Guoqiang
2015-02-12
We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement.
Xu, Xuemiao; Jin, Qiang; Zhou, Le; Qin, Jing; Wong, Tien-Tsin; Han, Guoqiang
2015-01-01
We propose a novel biometric recognition method that identifies the inner knuckle print (IKP). It is robust enough to confront uncontrolled lighting conditions, pose variations and low imaging quality. Such robustness is crucial for its application on portable devices equipped with consumer-level cameras. We achieve this robustness by two means. First, we propose a novel feature extraction scheme that highlights the salient structure and suppresses incorrect and/or unwanted features. The extracted IKP features retain simple geometry and morphology and reduce the interference of illumination. Second, to counteract the deformation induced by different hand orientations, we propose a novel structure-context descriptor based on local statistics. To our best knowledge, we are the first to simultaneously consider the illumination invariance and deformation tolerance for appearance-based low-resolution hand biometrics. Settings in previous works are more restrictive. They made strong assumptions either about the illumination condition or the restrictive hand orientation. Extensive experiments demonstrate that our method outperforms the state-of-the-art methods in terms of recognition accuracy, especially under uncontrolled lighting conditions and the flexible hand orientation requirement. PMID:25686317
Crowding by a single bar: probing pattern recognition mechanisms in the visual periphery.
Põder, Endel
2014-11-06
Whereas visual crowding does not greatly affect the detection of the presence of simple visual features, it heavily inhibits combining them into recognizable objects. Still, crowding effects have rarely been directly related to general pattern recognition mechanisms. In this study, pattern recognition mechanisms in visual periphery were probed using a single crowding feature. Observers had to identify the orientation of a rotated T presented briefly in a peripheral location. Adjacent to the target, a single bar was presented. The bar was either horizontal or vertical and located in a random direction from the target. It appears that such a crowding bar has very strong and regular effects on the identification of the target orientation. The observer's responses are determined by approximate relative positions of basic visual features; exact image-based similarity to the target is not important. A version of the "standard model" of object recognition with second-order features explains the main regularities of the data. © 2014 ARVO.
An Analysis of the Most Frequently Occurring Words in Spoken American English.
ERIC Educational Resources Information Center
Plant, Geoff
1999-01-01
A study analyzed frequency of occurrence of consonants, vowels, and diphthongs, syllabic structure of the words, and segmental structure of the 311 monosyllabic words of 500 words that occur most frequently in English. Three mannerisms of articulation accounted for nearly 75 percent of all consonant occurrences: stops, semi-vowels, and nasals.…
Vowel Diphthongs. Fun with Phonics! Book 10. Grades 1-2.
ERIC Educational Resources Information Center
Daniel, Claire
This book provides hands-on activities for grades 1-2 that make phonics instruction easy and fun for teachers and children in the classroom. The book offers methods for practice, reinforcement, and assessment of phonics skills. A poem is used to introduce the phonics element of this book, vowel diphthongs. The poem is duplicated so children can…
Mismatch Responses to Lexical Tone, Initial Consonant, and Vowel in Mandarin-Speaking Preschoolers
ERIC Educational Resources Information Center
Lee, Chia-Ying; Yen, Huei-ling; Yeh, Pei-wen; Lin, Wan-Hsuan; Cheng, Ying-Ying; Tzeng, Yu-Lin; Wu, Hsin-Chi
2012-01-01
The present study investigates how age, phonological saliency, and deviance size affect the presence of mismatch negativity (MMN) and positive mismatch response (P-MMR). This work measured the auditory mismatch responses to Mandarin lexical tones, initial consonants, and vowels in 4- to 6-year-old preschoolers using the multiple-deviant oddball…
Level 2 Foundation Units. Key Stage 3: National Strategy.
ERIC Educational Resources Information Center
Department for Education and Skills, London (England).
These foundation units are aimed at pupils working within Level 2 entry to Year 7. They are designed to remind pupils what they know and take them forward. The units also will teach phonics knowledge from consonant-vowel-consonant (CVC) words to long vowel phonemes. The writing units focus on developing the following skills: understanding what a…
The Impact of Contrastive Stress on Vowel Acoustics and Intelligibility in Dysarthria
ERIC Educational Resources Information Center
Connaghan, Kathryn P.; Patel, Rupal
2017-01-01
Purpose: To compare vowel acoustics and intelligibility in words produced with and without contrastive stress by speakers with spastic (mixed-spastic) dysarthria secondary to cerebral palsy (DYS[subscript CP]) and healthy controls (HCs). Method: Fifteen participants (9 men, 6 women; age M = 42 years) with DYS[subscript CP] and 15 HCs (9 men, 6…
The Breadth of Coarticulatory Units in Children and Adults
ERIC Educational Resources Information Center
Goffman, Lisa; Smith, Anne; Heisler, Lori; Ho, Michael
2008-01-01
Purpose: To assess, in children and adults, the breadth of coarticulatory movements associated with a single rounded vowel. Method: Upper and lower lip movements were recorded from 8 young adults and 8 children (aged 4-5 years). A single rounded versus unrounded vowel was embedded in the medial position of pairs of 7-word/7-syllable sentences.…