Hoonhorst, Ingrid; Colin, Cecile; Markessis, Emily; Radeau, Monique; Deltenre, Paul; Serniclaes, Willy
By examining voice onset time (VOT) discrimination in 4- and 8-month-olds raised in a French-speaking environment, the current study addresses the question of the role played by linguistic experience in the reshaping of the initial perceptual abilities. Results showed that the language-general -30- and +30-ms VOT boundaries are better…
Eadie, Tanya L; Doyle, Philip C; Hansen, Kerry; Beaudin, Paul G
The objectives of this prospective and exploratory study are to determine: (1) naïve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty naïve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Naïve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.
Wertsch, J V
Results are reported for an experiment which examined the influence of listener perception of speaker intention on sentence recognition. Given the same passage and recognition sentences, subjects displayed different false recognition patterns of test items depending on which of two speakers with opposing viewpoints the passage was attributed to. It is argued that the reconstructive process of memory is based on information from the context (e.g., the speaker's perceived intentions) as well as on the actual words used. Retention of different aspects of a message is seen to rely on information from different sources. Specifically, the results of the study indicate that retention of meaning involving the speaker's predictions, opinions, etc., is influenced by the listener's perception of the speaker.
Colaco, Dora; Mineiro, Ana; Leal, Gabriela; Castro-Caldas, Alexandre
Literature suggests that illiterate subjects are unaware of the phonological structure of language. This fact may influence the characteristics of aphasic speech, namely the structure of paraphasias. A battery of tests was developed for this study to be used with aphasic subjects (literate and illiterate), in order to explore this topic in more…
Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik
Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker's age. Here, we report two experiments on age estimation by "naïve" listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers' natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60-65 years) speakers in comparison with younger (20-25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40-45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed.
Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik
Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259
Kroll, R M; Hood, S B
Fourteen stutterers and 14 normal speakers read two passages differing in information value under two different conditions. Condition I provided subjects with a priori knowledge regarding the experimental limits and requirements. Condition II withheld such knowledge. Results indicate that adaptation curves for both stutterers and normal speakers were influenced by the information value of the reading passage. Less adaptation was observed with the high information than with the low information passage. The task presentation variable differentiated stutterers from normal speakers. When a priori instructions were provided to stutterers, the adaptation curve assumed a smooth, decelerating course. When a priori instructions were withheld, the curve deviated from the expected course. For normal speakers, identical adaptation trends were observed whether or not a priori instructions were provided. Stuttering adaptation is a function of both linguistic and situational variables; normal nonfluency adaptation is primarily a function of linguistic variables. Theoretical, experimental, and clinical implications are offered.
This paper examines the influence of age of immersion and proficiency in a second language on speech movement consistency in both a first and a second language. Ten monolingual speakers of English and 20 Bengali-English bilinguals (10 with low L2 proficiency and 10 with high L2 proficiency) participated. Lip movement variability was assessed based…
Weirich, Melanie; Fuchs, Susanne
Purpose: The purpose of this study was to further explore the understanding of speaker-specific realizations of the /s/--/?/ contrast in German in relation to individual differences in palate shape. Method: Two articulatory experiments were carried out with German native speakers. In the first experiment, 4 monozygotic and 2 dizygotic twin pairs…
Schoot, Lotte; Heyselaar, Evelien; Hagoort, Peter; Segaert, Katrien
The way we talk can influence how we are perceived by others. Whereas previous studies have started to explore the influence of social goals on syntactic alignment, in the current study, we additionally investigated whether syntactic alignment effectively influences conversation partners’ perception of the speaker. To this end, we developed a novel paradigm in which we can measure the effect of social goals on the strength of syntactic alignment for one participant (primed participant), while simultaneously obtaining usable social opinions about them from their conversation partner (the evaluator). In Study 1, participants’ desire to be rated favorably by their partner was manipulated by assigning pairs to a Control (i.e., primed participants did not know they were being evaluated) or Evaluation context (i.e., primed participants knew they were being evaluated). Surprisingly, results showed no significant difference in the strength with which primed participants aligned their syntactic choices with their partners’ choices. In a follow-up study, we used a Directed Evaluation context (i.e., primed participants knew they were being evaluated and were explicitly instructed to make a positive impression). However, again, there was no evidence supporting the hypothesis that participants’ desire to impress their partner influences syntactic alignment. With respect to the influence of syntactic alignment on perceived likeability by the evaluator, a negative relationship was reported in Study 1: the more primed participants aligned their syntactic choices with their partner, the more that partner decreased their likeability rating after the experiment. However, this effect was not replicated in the Directed Evaluation context of Study 2. In other words, our results do not support the conclusion that speakers’ desire to be liked affects how much they align their syntactic choices with their partner, nor is there convincing evidence that there is a reliable
Yoneyama, Kiyoko; Munson, Benjamin
Whether or not the influence of listeners' language proficiency on L2 speech recognition was affected by the structure of the lexicon was examined. This specific experiment examined the effect of word frequency (WF) and phonological neighborhood density (PND) on word recognition in native speakers of English and second-language (L2) speakers of English whose first language was Japanese. The stimuli included English words produced by a native speaker of English and English words produced by a native speaker of Japanese (i.e., with Japanese-accented English). The experiment was inspired by the finding of Imai, Flege, and Walley [(2005). J. Acoust. Soc. Am. 117, 896-907] that the influence of talker accent on speech intelligibility for L2 learners of English whose L1 is Spanish varies as a function of words' PND. In the currently study, significant interactions between stimulus accentedness and listener group on the accuracy and speed of spoken word recognition were found, as were significant effects of PND and WF on word-recognition accuracy. However, no significant three-way interaction among stimulus talker, listener group, and PND on either measure was found. Results are discussed in light of recent findings on cross-linguistic differences in the nature of the effects of PND on L2 phonological and lexical processing.
Lang, James M.
While a good class with a guest speaker requires plenty of advance preparation, the real clincher is for the teacher to create a tight fit between the course objectives and the speaker's purpose in being there. The speaker has to play an essential role in fulfilling the learning objectives of the course; if that doesn't happen, the students will…
Staudte, Maria; Crocker, Matthew W; Heloir, Alexis; Kipp, Michael
Previous research has shown that listeners follow speaker gaze to mentioned objects in a shared environment to ground referring expressions, both for human and robot speakers. What is less clear is whether the benefit of speaker gaze is due to the inference of referential intentions (Staudte and Crocker, 2011) or simply the (reflexive) shifts in visual attention. That is, is gaze special in how it affects simultaneous utterance comprehension? In four eye-tracking studies we directly contrast speech-aligned speaker gaze of a virtual agent with a non-gaze visual cue (arrow). Our findings show that both cues similarly direct listeners' attention and that listeners can benefit in utterance comprehension from both cues. Only when they are similarly precise, however, does this equality extend to incongruent cueing sequences: that is, even when the cue sequence does not match the concurrent sequence of spoken referents can listeners benefit from gaze as well as arrows. The results suggest that listeners are able to learn a counter-predictive mapping of both cues to the sequence of referents. Thus, gaze and arrows can in principle be applied with equal flexibility and efficiency during language comprehension.
Hayes-Harb, Rachel; Durham, Kristie
Native English speakers experience difficulty acquiring Arabic emphatic consonants. Arabic language textbooks have suggested that learners focus on adjacent vowels for cues to these consonants; however, the utility of such a strategy has not been empirically tested. This study investigated the perception of Arabic emphatic-plain contrasts by means…
Lukkarila, Päivi; Laukkanen, Anne-Maria; Palo, Pertti
This study examines the relationship of voice quality and speech-based personality assessment of Finnish-speaking female speakers. Five Finnish-speaking female subjects recorded a text passage with eight different vocal qualities. Samples that passed the preselection test for the voice qualities were played to 50 Finnish-speaking listeners, who reported speaker impressions on a scale of 18 opposite trait pairs. Voices produced with forward placement received assessments of femininity and friendliness. Readers speaking with backward placement were considered less feminine, while breathy voice evoked assessments of emotionality and implausibility. Tense phonation as well as creakiness, nasality, and denasality gave rise to numerous negative notions. The results suggest that voice stereotypes have both internationality and cultural dependency.
Hayes-Harb, Rachel; Cheng, Hui-Wen
The role of written input in second language (L2) phonological and lexical acquisition has received increased attention in recent years. Here we investigated the influence of two factors that may moderate the influence of orthography on L2 word form learning: (i) whether the writing system is shared by the native language and the L2, and (ii) if the writing system is shared, whether the relevant grapheme-phoneme correspondences are also shared. The acquisition of Mandarin via the Pinyin and Zhuyin writing systems provides an ecologically valid opportunity to explore these factors. We first asked whether there is a difference in native English speakers' ability to learn Pinyin and Zhuyin grapheme-phoneme correspondences. In Experiment 1, native English speakers assigned to either Pinyin or Zhuyin groups were exposed to Mandarin words belonging to one of two conditions: in the "congruent" condition, the Pinyin forms are possible English spellings for the auditory words (e.g., < nai> for [nai]); in the "incongruent" condition, the Pinyin forms involve a familiar grapheme representing a novel phoneme (e.g., < xiu> for [ɕiou]). At test, participants were asked to indicate whether auditory and written forms matched; in the crucial trials, the written forms from training (e.g., < xiu>) were paired with possible English pronunciations of the Pinyin written forms (e.g., [ziou]). Experiment 2 was identical to Experiment 1 except that participants additionally saw pictures depicting word meanings during the exposure phase, and at test were asked to match auditory forms with the pictures. In both experiments the Zhuyin group outperformed the Pinyin group due to the Pinyin group's difficulty with "incongruent" items. A third experiment confirmed that the groups did not differ in their ability to perceptually distinguish the relevant Mandarin consonants (e.g., [ɕ]) from the foils (e.g., [z]), suggesting that the findings of Experiments 1 and 2 can be attributed to the effects
Hayes-Harb, Rachel; Cheng, Hui-Wen
The role of written input in second language (L2) phonological and lexical acquisition has received increased attention in recent years. Here we investigated the influence of two factors that may moderate the influence of orthography on L2 word form learning: (i) whether the writing system is shared by the native language and the L2, and (ii) if the writing system is shared, whether the relevant grapheme-phoneme correspondences are also shared. The acquisition of Mandarin via the Pinyin and Zhuyin writing systems provides an ecologically valid opportunity to explore these factors. We first asked whether there is a difference in native English speakers' ability to learn Pinyin and Zhuyin grapheme-phoneme correspondences. In Experiment 1, native English speakers assigned to either Pinyin or Zhuyin groups were exposed to Mandarin words belonging to one of two conditions: in the “congruent” condition, the Pinyin forms are possible English spellings for the auditory words (e.g., < nai> for [nai]); in the “incongruent” condition, the Pinyin forms involve a familiar grapheme representing a novel phoneme (e.g., < xiu> for [ɕiou]). At test, participants were asked to indicate whether auditory and written forms matched; in the crucial trials, the written forms from training (e.g., < xiu>) were paired with possible English pronunciations of the Pinyin written forms (e.g., [ziou]). Experiment 2 was identical to Experiment 1 except that participants additionally saw pictures depicting word meanings during the exposure phase, and at test were asked to match auditory forms with the pictures. In both experiments the Zhuyin group outperformed the Pinyin group due to the Pinyin group's difficulty with “incongruent” items. A third experiment confirmed that the groups did not differ in their ability to perceptually distinguish the relevant Mandarin consonants (e.g., [ɕ]) from the foils (e.g., [z]), suggesting that the findings of Experiments 1 and 2 can be attributed to
Lai, Vicky Tzuyin; Boroditsky, Lera
In this paper we examine whether experience with spatial metaphors for time has an influence on people's representation of time. In particular we ask whether spatio-temporal metaphors can have both chronic and immediate effects on temporal thinking. In Study 1, we examine the prevalence of ego-moving representations for time in Mandarin speakers, English speakers, and Mandarin-English (ME) bilinguals. As predicted by observations in linguistic analyses, we find that Mandarin speakers are less likely to take an ego-moving perspective than are English speakers. Further, we find that ME bilinguals tested in English are less likely to take an ego-moving perspective than are English monolinguals (an effect of L1 on meaning-making in L2), and also that ME bilinguals tested in Mandarin are more likely to take an ego-moving perspective than are Mandarin monolinguals (an effect of L2 on meaning-making in L1). These findings demonstrate that habits of metaphor use in one language can influence temporal reasoning in another language, suggesting the metaphors can have a chronic effect on patterns in thought. In Study 2 we test Mandarin speakers using either horizontal or vertical metaphors in the immediate context of the task. We find that Mandarin speakers are more likely to construct front-back representations of time when understanding front-back metaphors, and more likely to construct up-down representations of time when understanding up-down metaphors. These findings demonstrate that spatio-temporal metaphors can also have an immediate influence on temporal reasoning. Taken together, these findings demonstrate that the metaphors we use to talk about time have both immediate and long-term consequences for how we conceptualize and reason about this fundamental domain of experience.
Lai, Vicky Tzuyin; Boroditsky, Lera
In this paper we examine whether experience with spatial metaphors for time has an influence on people’s representation of time. In particular we ask whether spatio-temporal metaphors can have both chronic and immediate effects on temporal thinking. In Study 1, we examine the prevalence of ego-moving representations for time in Mandarin speakers, English speakers, and Mandarin-English (ME) bilinguals. As predicted by observations in linguistic analyses, we find that Mandarin speakers are less likely to take an ego-moving perspective than are English speakers. Further, we find that ME bilinguals tested in English are less likely to take an ego-moving perspective than are English monolinguals (an effect of L1 on meaning-making in L2), and also that ME bilinguals tested in Mandarin are more likely to take an ego-moving perspective than are Mandarin monolinguals (an effect of L2 on meaning-making in L1). These findings demonstrate that habits of metaphor use in one language can influence temporal reasoning in another language, suggesting the metaphors can have a chronic effect on patterns in thought. In Study 2 we test Mandarin speakers using either horizontal or vertical metaphors in the immediate context of the task. We find that Mandarin speakers are more likely to construct front-back representations of time when understanding front-back metaphors, and more likely to construct up-down representations of time when understanding up-down metaphors. These findings demonstrate that spatio-temporal metaphors can also have an immediate influence on temporal reasoning. Taken together, these findings demonstrate that the metaphors we use to talk about time have both immediate and long-term consequences for how we conceptualize and reason about this fundamental domain of experience. PMID:23630505
Examines the art of professional verse speaking and suggests the creation of standards for the training of specialized speakers. Available from: Speech and Drama, Anthony Jackman, Editor, 205 Ashby Road, Loughborough, Leics LE11 3AD. Subscription Rates: non-members (USA) $6.00 p.a.; singles $2.50 surface post free. (MH)
Recent research provides evidence that individuals shift in their perception of variants depending on social characteristics attributed to the speaker. This paper reports on a speech perception experiment designed to test the degree to which the age attributed to a speaker influences the perception of vowels undergoing a chain shift. As a result…
Chakraborty, Rahul; Shanmugam, Ramalingam
The study explored the influence of second language proficiency on the kinematic duration of single words. Participants produced real and novel words with variable stress targets (e.g., trochaic and iambic) embedded in first language (L1) and second language (L2) sentence frames. Participants were monolingual English speakers (n=10) and Bengali-English bilinguals with early exposure to English (n=10) and late exposure to English (n=10). Bengali was the L1 and English was the L2 for all 20 bilingual participants. Duration of lip movements for the target real and novel words was analysed. Results suggest that kinematic duration of single words was not influenced by speakers' L2 proficiency. However, L2 proficiency influenced foreign accent ratings for the real words, but not the novel words. Kinematic duration and perception of accent were not correlated, which might imply that accent reduction might not always be a direct consequence of shorter word duration.
Recent research provides evidence that individuals shift in their perception of variants depending on social characteristics attributed to the speaker.This paper reports on a speech perception experiment designed to test the degree to which the age attributed to a speaker influences the perception of vowels undergoing a chain shift. As a result of the shift, speakers from different generations produce different variants from one another. Results from the experiment indicate that a speaker's perceived age can influence vowel categorization in the expected direction. However, only older participants are influenced by perceived speaker age.This suggests that social characteristics attributed to a speaker affect speech perception differently depending on the salience of the relationship between the variant and the characteristic.The results also provide evidence of an unexpected interaction between the sex of the participant and the sex of the stimulus.The interaction is interpreted as an effect of the participants' previous exposure with male and female speakers.The results are analyzed under an exemplar model of speech production and perception where social information is indexed to acoustic information and the weight of the connection varies depending on the perceived salience of sociophonetic trends.
author eliminated the inter-speaker variability of the source by using an electro- larynx . There were 10 male speakers and 10 female speakers. The...speakers were instructed how to produce non-phonated speech using an electro- larynx . For both male and female speakers, the fundamental frequency of...the electro- larynx was 85 Hz with a jitter of ±3Hz. The experimental task was to listen to two voices and decide if the voices were the same or
to other methods that use a pooled reference model , this technique normalizes the training speech from multiple reference speakers to a single com...training the reference hidden Markov model (HMM). Our usual prohabilistic spectrum transformation can be applied to the reference HMM to model a new...trained phonetic hidden Markov models of a single reference speaker so that they were appropriate for a new (target) speaker. This method reduced the
van Rossum, M. A.; van As-Brooks, C. J.; Hilgers, F. J. M.; Roozen, M.
Glottal stops are conveyed by an abrupt constriction at the level of the glottis. Tracheoesophageal (TE) speakers are known to have poor control over the new voice source (neoglottis), and this might influence the production of "glottal" stops. This study investigated how TE speakers realized "glottal" stops in abutting words…
The Research Triangle Park Speakers Bureau page is a free resource that schools, universities, and community groups in the Raleigh-Durham-Chapel Hill, N.C. area can use to request speakers and find educational resources.
Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S
Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical
Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam
Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…
Imai, Mutsumi; Schalk, Lennart; Saalbach, Henrik; Okada, Hiroyuki
Grammatical gender is independent of biological sex for the majority of animal names (e.g., any giraffe, be it male or female, is grammatically treated as feminine). However, there is apparent semantic motivation for grammatical gender classes, especially in mapping human terms to gender. This research investigated whether this motivation affects deductive inference in native German speakers. We compared German with Japanese speakers (a language without grammatical gender) when making inferences about sex-specific biological properties. We found that German speakers tended to erroneously draw inferences when the sex in the premise and grammatical gender of the target animal agreed. An over-generalization of the grammar-semantics mapping was found even when the sex of the target was explicitly indicated. However, these effects occurred only when gender-marking articles accompanied the nouns. These results suggest that German speakers project sex-specific biological properties onto gender-marking articles but not onto conceptual representations of animals per se.
Weirich, Melanie; Lancia, Leonardo; Brunner, Jana
The purpose of this study is to examine and compare the amount of inter-speaker variability in the articulation of monozygotic twin pairs (MZ), dizygotic twin pairs (DZ), and pairs of unrelated twins with the goal of examining in greater depth the influence of physiology on articulation. Physiological parameters are assumed to be very similar in MZ twin pairs in contrast to DZ twin pairs or unrelated speakers, and it is hypothesized that the speaker specific shape of articulatory looping trajectories of the tongue is at least partly dependent on biomechanical properties and the speaker's individual physiology. By means of electromagnetic articulography (EMA), inter-speaker variability in the looping trajectories of the tongue back during /VCV/ sequences is analyzed. Results reveal similar looping patterns within MZ twin pairs but in DZ pairs differences in the shape of the loop, the direction of the upward and downward movement, and the amount of horizontal sliding movement at the palate are found.
The issues surrounding native speakers (NSs) and nonnative speakers (NNSs) as teachers (NESTs and NNESTs, respectively) in the field of teaching English to speakers of other languages (TESOL) are a current topic of interest. In many contexts, the native speaker of English is viewed as the model teacher, thus putting the NEST into a position of…
Schimke, Sarah; Colonna, Saveria
This study investigates the influence of grammatical role and discourse-level cues on the interpretation of different pronominal forms in native speakers of French, native speakers of Turkish, and Turkish learners of French. In written questionnaires, we found that native speakers of French were influenced by discourse-level cues when interpreting…
Chen, Sally; Fon, Janice
The devoicing rule of liquids after voiceless aspirated stops presents difficulty for L2 English learners. This study investigates Mandarin speakers' perception of the rule with regards to initial exposure age and environment. Three initial exposure ages were included in this study: kindergarten, elementary, and junior high school. Except for the last group, each group was further divided into two exposure environments: Taiwan and the U.S. As a result, five groups of people were included in total, resulting in 25 subjects. Stimuli were pseudo-words in an SLV structure, recorded by one native English speaker and one non-native English speaker. Half of the stimuli began with a voiceless stop, and the other half began with a voiced stop. The native speakers pronounced the voiceless liquids accordingly while the non-native speaker only had voiced liquids. Preliminary results showed that, in general, L2 learners took longer time in perceiving devoiced liquids. Listeners who had earlier exposure were more likely to respond faster to native speech while late learners responded better to non-native speech.
Brown, Amanda; Gullberg, Marianne
Whereas most research in SLA assumes the relationship between the first language (L1) and the second language (L2) to be unidirectional, this study investigates the possibility of a bidirectional relationship. We examine the domain of manner of motion, in which monolingual Japanese and English speakers differ both in speech and gesture. Parallel…
A study of the translation process compared the decisions that native speakers (experts) and non-native speakers (non-experts) made that influenced resulting translations. Subjects were 40 students, graduate students, and faculty in a university foreign language department. English language proficiency was measured for native speakers by using the…
Erway, Ella Anderson
Scholars agree that listening is an active rather than a passive process. The listening which makes people achieve higher scores on current listening tests is "second speaker" listening or active participation in the encoding of the message. Most of the instructional suggestions in listening curriculum guides are based on this concept. In terms of…
Chambers, M. M.
This paper reviews some of the speaker ban cases that were tested in U.S. district courts. The cases discussed are: (1) the attempt by University of North Carolina administrators to ban Herbert Aptheker (an avowed Communist) from speaking on campus; (2) the class action of the Chicago Circle campus of the University of Illinois brought before a…
application the required resources are provided by the phone itself. Speaker recognition can be used in many areas, like: • homeland security: airport ... security , strengthening the national borders, in travel documents, visas; • enterprise-wide network security infrastructures; • secure electronic
Safran, Stephen P.; Safran, Joan S.
Determined whether a videotaped presentation by a speaker who is blind would more positively influence attitude change and information retention than would a presentation by a sighted speaker. Findings suggested that there were no significant main effects for either presenter or pretest conditions on the measures. (Author/BL)
Walshe, Margaret; Miller, Nick; Leahy, Margaret; Murray, Aisling
Background: Many factors influence listener perception of dysarthric speech. Final consensus on the role of gender and listener experience is still to be reached. The speaker's perception of his/her speech has largely been ignored. Aims: (1) To compare speaker and listener perception of the intelligibility of dysarthric speech; (2) to explore the…
Rosensweig, R E; Hirota, Y; Tsuda, S; Raj, K
This work validates a method for increasing the radial restoring force on the voice coil in audio speakers containing ferrofluid. In addition, a study is made of factors influencing splash loss of the ferrofluid due to shock. Ferrohydrodynamic analysis is employed throughout to model behavior, and predictions are compared to experimental data.
Kong, Anthony Pak-Hin; Law, Sam-Po; Wat, Watson Ka-Chun; Lai, Christy
The use of co-verbal gestures is common in human communication and has been reported to assist word retrieval and to facilitate verbal interactions. This study systematically investigated the impact of aphasia severity, integrity of semantic processing, and hemiplegia on the use of co-verbal gestures, with reference to gesture forms and functions, by 131 normal speakers, 48 individuals with aphasia and their controls. All participants were native Cantonese speakers. It was found that the severity of aphasia and verbal-semantic impairment was associated with significantly more co-verbal gestures. However, there was no relationship between right-sided hemiplegia and gesture employment. Moreover, significantly more gestures were employed by the speakers with aphasia, but about 10% of them did not gesture. Among those who used gestures, content-carrying gestures, including iconic, metaphoric, deictic gestures, and emblems, served the function of enhancing language content and providing information additional to the language content. As for the non-content carrying gestures, beats were used primarily for reinforcing speech prosody or guiding speech flow, while non-identifiable gestures were associated with assisting lexical retrieval or with no specific functions. The above findings would enhance our understanding of the use of various forms of co-verbal gestures in aphasic discourse production and their functions. Speech-language pathologists may also refer to the current annotation system and the results to guide clinical evaluation and remediation of gestures in aphasia. PMID:26186256
Zhang, Xujin; Samuel, Arthur G.; Liu, Siyun
Previous research has found that a speaker's native phonological system has a great influence on perception of another language. In three experiments, we tested the perception and representation of Mandarin phonological contrasts by Guangzhou Cantonese speakers, and compared their performance to that of native Mandarin speakers. Despite their rich…
Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka
This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.
Reilly, Kevin J; Spencer, Kristie A
The present study investigated the effects of sequence complexity, defined in terms of phonemic similarity and phonotoactic probability, on the timing and accuracy of serial ordering for speech production in healthy speakers and speakers with either hypokinetic or ataxic dysarthria. Sequences were comprised of strings of consonant-vowel (CV) syllables with each syllable containing the same vowel, /a/, paired with a different consonant. High complexity sequences contained phonemically similar consonants, and sounds and syllables that had low phonotactic probabilities; low complexity sequences contained phonemically dissimilar consonants and high probability sounds and syllables. Sequence complexity effects were evaluated by analyzing speech error rates and within-syllable vowel and pause durations. This analysis revealed that speech error rates were significantly higher and speech duration measures were significantly longer during production of high complexity sequences than during production of low complexity sequences. Although speakers with dysarthria produced longer overall speech durations than healthy speakers, the effects of sequence complexity on error rates and speech durations were comparable across all groups. These findings indicate that the duration and accuracy of processes for selecting items in a speech sequence is influenced by their phonemic similarity and/or phonotactic probability. Moreover, this robust complexity effect is present even in speakers with damage to subcortical circuits involved in serial control for speech.
Smith, David R. R.; Patterson, Roy D.
Glottal-pulse rate (GPR) and vocal-tract length (VTL) are both related to speaker size and sex-however, it is unclear how they interact to determine our perception of speaker size and sex. Experiments were designed to measure the relative contribution of GPR and VTL to judgements of speaker size and sex. Vowels were scaled to represent people with different GPRs and VTLs, including many well beyond the normal population values. In a single interval, two response rating paradigm, listeners judged the size (using a 7-point scale) and sex/age of the speaker (man, woman, boy, or girl) of these scaled vowels. Results from the size-rating experiments show that VTL has a much greater influence upon judgements of speaker size than GPR. Results from the sex-categorization experiments show that judgements of speaker sex are influenced about equally by GPR and VTL for vowels with normal GPR and VTL values. For abnormal combinations of GPR and VTL, where low GPRs are combined with short VTLs, VTL has more influence than GPR in sex judgements. [Work supported by the UK MRC (G9901257) and the German Volkswagen Foundation (VWF 1/79 783).
Fox, S. E.; Griswold, J.
The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online
Hancock, Adrienne; Colton, Lindsey; Douglas, Fiacre
Intonation is commonly addressed in voice and communication feminization therapy, yet empirical evidence of gender differences for intonation is scarce and rarely do studies examine how it relates to gender perception of transgender speakers. This study examined intonation of 12 males, 12 females, six female-to-male, and 14 male-to-female transgender speakers describing a Norman Rockwell image. Several intonation measures were compared between biological gender groups, between perceived gender groups, and between male-to-female (MTF) speakers who were perceived as male, female, or ambiguous gender. Speakers with a larger percentage of utterances with upward intonation and a larger utterance semitone range were perceived as female by listeners, despite no significant differences between the actual intonation of the four gender groups. MTF speakers who do not pass as female appear to use less upward and more downward intonations than female and passing MTF speakers. Intonation has potential for use in transgender communication therapy because it can influence perception to some degree.
The role of interactional feedback has been a critical area of second language acquisition (SLA) research for decades and while findings suggest interactional feedback can facilitate SLA, the extent of its influence can vary depending on a number of factors, including the native language of those involved in communication. Although studies have…
Wiggins, H. V.; Fahnestock, J.
The Arctic Visiting Speakers Program (AVS) is a program of the Arctic Research Consortium of the U.S. (ARCUS) and funded by the National Science Foundation. AVS provides small grants to researchers and other Arctic experts to travel and share their knowledge in communities where they might not otherwise connect. The program aims to: initiate and encourage arctic science education in communities with little exposure to arctic research; increase collaboration among the arctic research community; nurture communication between arctic researchers and community residents; and foster arctic science education at the local level. Individuals, community organizations, and academic organizations can apply to host a speaker. Speakers cover a wide range of arctic topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Preference is given to tours that reach broad and varied audiences, especially those targeted to underserved populations. Between October 2000 and July 2013, AVS supported 114 tours spanning 9 different countries, including tours in 23 U.S. states. Tours over the past three and a half years have connected Arctic experts with over 6,600 audience members. Post-tour evaluations show that AVS consistently rates high for broadening interest and understanding of arctic issues. AVS provides a case study for how face-to-face interactions between arctic scientists and general audiences can produce high-impact results. Further information can be found at: http://www.arcus.org/arctic-visiting-speakers.
This article identifies some discursive processes by which White, middle-class, native-English-speaking, U.S.-born college students draw on a monolingualist ideology and position themselves and others within a language-race-nationality matrix. These processes construct the speakers' Whiteness and nativeness in English as unmarked and normal; mark…
Atkinson, Mark; Kirby, Simon; Smith, Kenny
A learner’s linguistic input is more variable if it comes from a greater number of speakers. Higher speaker input variability has been shown to facilitate the acquisition of phonemic boundaries, since data drawn from multiple speakers provides more information about the distribution of phonemes in a speech community. It has also been proposed that speaker input variability may have a systematic influence on individual-level learning of morphology, which can in turn influence the group-level characteristics of a language. Languages spoken by larger groups of people have less complex morphology than those spoken in smaller communities. While a mechanism by which the number of speakers could have such an effect is yet to be convincingly identified, differences in speaker input variability, which is thought to be larger in larger groups, may provide an explanation. By hindering the acquisition, and hence faithful cross-generational transfer, of complex morphology, higher speaker input variability may result in structural simplification. We assess this claim in two experiments which investigate the effect of such variability on language learning, considering its influence on a learner’s ability to segment a continuous speech stream and acquire a morphologically complex miniature language. We ultimately find no evidence to support the proposal that speaker input variability influences language learning and so cannot support the hypothesis that it explains how population size determines the structural properties of language. PMID:26057624
Hilliard, Caitlin; Cook, Susan Wagner
Communication is shaped both by what we are trying to say and by whom we are saying it to. We examined whether and how shared information influences the gestures speakers produce along with their speech. Unlike prior work examining effects of common ground on speech and gesture, we examined a situation in which some speakers have the same amount…
This study is concerned with the acquisition of English verb transitivity by native speakers of Japanese. Both a verb's semantic class (Levin, 1993; Pinker, 1989) and its frequency (Ambridge et al., 2008) have been proposed to influence the acquisition of verbs in L1. For example, verbs whose meaning entails change-of-location or…
Cook, Susan Wagner; Tanenhaus, Michael K.
We explored how speakers and listeners use hand gestures as a source of perceptual-motor information during naturalistic communication. After solving the Tower of Hanoi task either with real objects or on a computer, speakers explained the task to listeners. Speakers' hand gestures, but not their speech, reflected properties of the particular…
Ferreira, V.S.; Slevc, L.R.; Rogers, E.S.
Three experiments assessed how speakers avoid linguistically and nonlinguistically ambiguous expressions. Speakers described target objects (a flying mammal, bat) in contexts including foil objects that caused linguistic (a baseball bat) and nonlinguistic (a larger flying mammal) ambiguity. Speakers sometimes avoided linguistic-ambiguity, and they…
Mani, Nivedita; Schneider, Signe
Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…
Theodore, Rachel M.; Schmidt, Anna M.
Previous research suggests a perceptual bias exists for native phonotactics [D. Massaro and M. Cohen, Percept. Psychophys. 34, 338-348 (1983)] such that listeners report nonexistent segments when listening to stimuli that violate native phonotactics [E. Dupoux, K. Kakehi, Y. Hirose, C. Pallier, and J. Mehler, J. Exp. Psychol.: Human Percept. Perform. 25, 1568-1578 (1999)]. This study investigated how native-language experience affects second language processing, focusing on how native Spanish speakers perceive the English clusters /st/, /sp/, and /sk/, which represent phonotactically illegal forms in Spanish. To preserve native phonotactics, Spanish speakers often produce prothetic vowels before English words beginning with /s/ clusters. Is the influence of native phonotactics also present in the perception of illegal clusters? A stimuli continuum ranging from no vowel (e.g., ``sku'') to a full vowel (e.g., ``esku'') before the cluster was used. Four final vowel contexts were used for each cluster, resulting in 12 sCV and 12 VsCV nonword endpoints. English and Spanish listeners were asked to discriminate between pairs differing in vowel duration and to identify the presence or absence of a vowel before the cluster. Results will be discussed in terms of implications for theories of second language speech perception.
Kersten, Alan W; Meissner, Christian A; Lechuga, Julia; Schwartz, Bennett L; Albrechtsen, Justin S; Iglesias, Adam
Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual Spanish/English speakers tested in a Spanish-speaking context at sorting novel, animated objects and events into categories on the basis of manner of motion, an attribute that is prominently marked in English but not in Spanish. In contrast, English and Spanish speakers performed similarly at classifying on the basis of path, an attribute that is prominently marked in both languages. Similar results were obtained regardless of whether categories were labeled by novel words or numbered, suggesting that an English-speaking tendency to focus on manner of motion is a general phenomenon and not limited to word learning. Effects of age of acquisition of English were also observed on the performance of bilinguals, with early bilinguals performing similarly in the 2 language contexts and later bilinguals showing greater contextual variation.
Segment Classification .................... 2-8 2.3.3 Comb Filtering. . I. 2-0 2.4 Co-Channel Speaker Separation Algorithms ........... 2-9 i11 Page...4-7 4.4 Test Results ........ ......................... 4-8 4.4.1 Pitch Deviation Method of Assigning Separated Segments ...developed that if given that a segment of co-channel speech is separated into a "stronger" and "weaker" segment , the corrxt assignment of these separated
Acoustic-to-articulatory inversion, the determination of articulatory parameters from acoustic signals, is a difficult but important problem for many speech processing applications, such as automatic speech recognition (ASR) and computer aided pronunciation training (CAPT). In recent years, several approaches have been successfully implemented for speaker dependent models with parallel acoustic and kinematic training data. However, in many practical applications inversion is needed for new speakers for whom no articulatory data is available. In order to address this problem, this dissertation introduces a novel speaker adaptation approach called Parallel Reference Speaker Weighting (PRSW), based on parallel acoustic and articulatory Hidden Markov Models (HMM). This approach uses a robust normalized articulatory space and palate referenced articulatory features combined with speaker-weighted adaptation to form an inversion mapping for new speakers that can accurately estimate articulatory trajectories. The proposed PRSW method is evaluated on the newly collected Marquette electromagnetic articulography -- Mandarin Accented English (EMA-MAE) corpus using 20 native English speakers. Cross-speaker inversion results show that given a good selection of reference speakers with consistent acoustic and articulatory patterns, the PRSW approach gives good speaker independent inversion performance even without kinematic training data.
Saquib, Zia; Salam, Nirmala; Nair, Rekha P.; Pandey, Nipun; Joshi, Akanksha
Human listeners are capable of identifying a speaker, over the telephone or an entryway out of sight, by listening to the voice of the speaker. Achieving this intrinsic human specific capability is a major challenge for Voice Biometrics. Like human listeners, voice biometrics uses the features of a person's voice to ascertain the speaker's identity. The best-known commercialized forms of voice Biometrics is Speaker Recognition System (SRS). Speaker recognition is the computing task of validating a user's claimed identity using characteristics extracted from their voices. This literature survey paper gives brief introduction on SRS, and then discusses general architecture of SRS, biometric standards relevant to voice/speech, typical applications of SRS, and current research in Speaker Recognition Systems. We have also surveyed various approaches for SRS.
In the 1960s, Glenn Research Center developed a magnetized fluid to draw rocket fuel into spacecraft engines while in space. Sony has incorporated the technology into its line of slim speakers by using the fluid as a liquid stand-in for the speaker's dampers, which prevent the speaker from blowing out while adding stability. The fluid helps to deliver more volume and hi-fidelity sound while reducing distortion.
Turk, Oytun; Arslan, Levent M
This paper focuses on the importance of source speaker selection for a weighted codebook mapping based voice conversion algorithm. First, the dependency on source speakers is evaluated in a subjective listening test using 180 different source-target pairs from a database of 20 speakers. Subjective scores for similarity to target speaker's voice and quality are obtained. Statistical analysis of scores confirms the dependence of performance on source speakers for both male-to-male and female-to-female transformations. A source speaker selection algorithm is devised given a target speaker and a set of source speaker candidates. For this purpose, an artificial neural network (ANN) is trained that learns the regression between a set of acoustical distance measures and the subjective scores. The estimated scores are used in source speaker ranking. The average cross-correlation coefficient between rankings obtained from median subjective scores and rankings estimated by the algorithm is 0.84 for similarity and 0.78 for quality in male-to-male transformations. The results for female-to-female transformations were less reliable with a cross-correlation value of 0.58 for both similarity and quality.
Mirkovic, Bojana; Bleichner, Martin G.; De Vos, Maarten; Debener, Stefan
Target speaker identification is essential for speech enhancement algorithms in assistive devices aimed toward helping the hearing impaired. Several recent studies have reported that target speaker identification is possible through electroencephalography (EEG) recordings. If the EEG system could be reduced to acceptable size while retaining the signal quality, hearing aids could benefit from the integration with concealed EEG. To compare the performance of a multichannel around-the-ear EEG system with high-density cap EEG recordings an envelope tracking algorithm was applied in a competitive speaker paradigm. The data from 20 normal hearing listeners were concurrently collected from the traditional state-of-the-art laboratory wired EEG system and a wireless mobile EEG system with two bilaterally-placed around-the-ear electrode arrays (cEEGrids). The results show that the cEEGrid ear-EEG technology captured neural signals that allowed the identification of the attended speaker above chance-level, with 69.3% accuracy, while cap-EEG signals resulted in the accuracy of 84.8%. Further analyses investigated the influence of ear-EEG signal quality and revealed that the envelope tracking procedure was unaffected by variability in channel impedances. We conclude that the quality of concealed ear-EEG recordings as acquired with the cEEGrid array has potential to be used in the brain-computer interface steering of hearing aids. PMID:27512364
Fairley, Michael S.
This paper presents a case study of an episode in a conversation between a native English speaker (the female director of an English language school) and a non-native English speaker (a student apparently with minimal language skills) in which the native speaker is engaged in an extended telling of seemingly crucial information. The troublesome…
This presentation will provide standards upon which any attempts to meet the challenge of identifying speakers by voice should be based. It is organized into a model based on (i) application of a rigorous research program validating the system, (ii) an upgrading of the organization of the SI area, and (iii) exploitation of new technology. The second part of the presentation will describe an illustrative speech/voice approach to SI development. This effort is also based on an extensive corpus of research. It is suggested that application of the cited standards, plus the illustrative model, will permit reasonable progress to be made. Finally, a number of procedural recommendations are made; they should enhance the efficacy of the proposed approach.
Beasley, Christopher; Torres-Harding, Susan; Pedersen, Paula J.
Recent societal trends indicate more tolerance for homosexuality, but prejudice remains on college campuses. Speaker panels are commonly used in classrooms as a way to educate students about sexual diversity and decrease negative attitudes toward sexual diversity. The advent of computer delivered instruction presents a unique opportunity to broaden the impact of traditional speaker panels. The current investigation examined the influence of an interactive “virtual” gay and lesbian speaker panel on cognitive, affective, and behavioral homonegativity. Findings suggest the computer-administered panel is lowers homonegativity, particularly for affective experiential homonegativity. The implications of these findings for research and practice are discussed. PMID:23646036
Sabbagh, Mark A.; Shafman, Dana
Preschool children typically do not learn words from ignorant or unreliable speakers. Here, we examined the mechanism by which these learning failures occur by modifying the comprehension test procedure that measures word learning. Following lexical training by a knowledgeable or ignorant speaker, 48 preschool-aged children were asked either a…
Investigates the self-reported reading habits and levels of ability in reading of ten heritage speakers of Spanish enrolled in Spanish classes at Purdue University. Results warrant more explicit focus on form instruction and activation of background knowledge for heritage speakers. (Author/VWL)
Herbeck, Dale A.
In this article, the author examines the newly revised speakers policy in Boston College. The revised policy, defended by administrators as being consistent with past practice, differs in two important respects from the speakers policy it replaced. Lest the scope of this unfortunate policy be exaggerated, it is important to note that the policy…
Zäske, Romi; Schweinberger, Stefan R; Kawahara, Hideki
While adaptation to complex auditory stimuli has traditionally been reported for linguistic properties of speech, the present study demonstrates non-linguistic high-level aftereffects in the perception of voice identity, following adaptation to voices or faces of personally familiar speakers. In Exp. 1, prolonged exposure to speaker A's voice biased the perception of identity-ambiguous voice morphs between speakers A and B towards speaker B (and vice versa). Significantly biased voice identity perception was also observed in Exp. 2 when adaptors were videos of speakers' silently articulating faces, although effects were reduced in magnitude relative to those seen in Exp. 1. By contrast, adaptation to an unrelated speaker C elicited an intermediate proportion of speaker A identifications in both experiments. While crossmodal aftereffects on auditory identification (Exp. 2) dissipated rapidly, unimodal aftereffects (Exp. 1) were still measurable a few minutes after adaptation. These novel findings suggest contrastive coding of voice identity in long-term memory, with at least two perceptual mechanisms of voice identity adaptation: one related to auditory coding of voice characteristics, and another related to multimodal coding of familiar speaker identity.
Papafragou, Anna; Fairchild, Sarah; Cohen, Matthew L; Friedberg, Carlyn
During communication, hearers try to infer the speaker's intentions to be able to understand what the speaker means. Nevertheless, whether (and how early) preschoolers track their interlocutors' mental states is still a matter of debate. Furthermore, there is disagreement about how children's ability to consult a speaker's belief in communicative contexts relates to their ability to track someone's belief in non-communicative contexts. Here, we study young children's ability to successfully acquire a word from a speaker with a false belief; we also assess the same children's success on a traditional false belief attribution task. We show that the ability to consult the epistemic state of a speaker during word learning develops between the ages of three and five. We also show that false belief understanding in word-learning contexts proceeds similarly to standard belief-attribution contexts when the tasks are equated. Our data offer evidence for the development of mind-reading abilities during language acquisition.
Marno, Hanna; Guellai, Bahia; Vidal, Yamil; Franzoi, Julia; Nespor, Marina; Mehler, Jacques
From the first moments of their life, infants show a preference for their native language, as well as toward speakers with whom they share the same language. This preference appears to have broad consequences in various domains later on, supporting group affiliations and collaborative actions in children. Here, we propose that infants’ preference for native speakers of their language also serves a further purpose, specifically allowing them to efficiently acquire culture specific knowledge via social learning. By selectively attending to informants who are native speakers of their language and who probably also share the same cultural background with the infant, young learners can maximize the possibility to acquire cultural knowledge. To test whether infants would preferably attend the information they receive from a speaker of their native language, we familiarized 12-month-old infants with a native and a foreign speaker, and then presented them with movies where each of the speakers silently gazed toward unfamiliar objects. At test, infants’ looking behavior to the two objects alone was measured. Results revealed that infants preferred to look longer at the object presented by the native speaker. Strikingly, the effect was replicated also with 5-month-old infants, indicating an early development of such preference. These findings provide evidence that young infants pay more attention to the information presented by a person with whom they share the same language. This selectivity can serve as a basis for efficient social learning by influencing how infants’ allocate attention between potential sources of information in their environment. PMID:27536263
Human conversational participants depend upon the ability of their partners to recognize their intentions, so that those partners may respond appropriately. In such interactions, the speaker encodes his intentions about the hearer's response in a variety of sentence types. Instead of telling the hearer what to do, the speaker may just state his goals, and expect a response that meets these goals at least part way. This paper presents a new model for recognizing the speaker's intended meaning in determining a response. It shows that this recognition makes use of the speaker's plan, his beliefs about the domain and about the hearer's relevant capacities. 12 references.
Jacewicz, Ewa; Fox, Robert Allen; Wei, Lai
This study characterizes the speech tempo (articulation rate, excluding pauses) of two distinct varieties of American English taking into account both between-speaker and within-speaker variation. Each of 192 speakers from Wisconsin (the northern variety) and from North Carolina (the southern variety), men and women, ranging in age from children to old adults, read a set of sentences and produced a spontaneous unconstrained talk. Articulation rate in spontaneous speech was modeled using fixed-mixed effects analyses. The models explored the effects of the between-speaker factors dialect, age and gender and included each phrase and its length as a source of both between- and within-speaker variation. The major findings are: (1) Wisconsin speakers speak significantly faster and produce shorter phrases than North Carolina speakers; (2) speech tempo changes across the lifespan, being fastest for individuals in their 40s; (3) men speak faster than women and this effect is not related to the length of phrases they produce. Articulation rate in reading was slower than in speaking and the effects of gender and age also differed in reading and spontaneous speech. The effects of dialect in reading remained the same, showing again that Wisconsin speakers had faster articulation rates than did North Carolina speakers.
Jhanwar, Nitin; Raina, Ajay K.
Gaussian mixture models (GMMs) are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.
Hou, Limin; Wang, Shuozhong
This paper describes an application of fractal dimensions to speech processing and speaker identification. There are several dimensions that can be used to characterize speech signals such as box dimension, correlation dimension, etc. We are mainly concerned with the generalized dimensions of speech signals as they provide more information than individual dimensions. Generalized dimensions of arbitrary orders are used in speaker identification in this work. Based on the experimental data, the artificial phase space is generated and smooth behavior of correlation integral is obtained in a straightforward and accurate analysis. Using the dimension D(2) derived from the correlation integral, the generalized dimension D(q) of an arbitrary order q is calculated. Moreover, experiments applying the generalized dimension in speaker identification have been carried out. A speaker recognition dedicated Chinese language speech corpus with PKU-SRSC, recorded by Peking University, was used in the experiments. The results are compared to a baseline speaker identification that uses MFCC features. Experimental results have indicated the usefulness of fractal dimensions in characterizing speaker's identity.
Polio, Charlene; Gass, Susan M.
Because interaction gives language learners an opportunity to modify their speech upon a signal of noncomprehension, it should also have a positive effect on native speakers' (NS) comprehension of nonnative speakers (NNS). This study shows that interaction does help NSs comprehend NNSs, contrasting the claims of an earlier study that found no…
Polio, Charlene; Gass, Susan; Chapin, Laura
Implicit negative feedback has been shown to facilitate SLA, and the extent to which such feedback is given is related to a variety of task and interlocutor variables. The background of a native speaker (NS), in terms of amount of experience in interactions with nonnative speakers (NNSs), has been shown to affect the quantity of implicit negative…
Previous research shows that American learners of Japanese (AJs) tend to differ from native Japanese speakers in their compliment responses (CRs). Yokota (1986) and Shimizu (2009) have reported that AJs tend to respond more negatively than native Japanese speakers. It has also been reported that AJs' CRs tend to lack the use of avoidance or…
Bae, Eun Young; Oh, Sun-Young
Within the theoretical and methodological framework of Conversation Analysis, the present study explores the nature of the native speaker (NS) and nonnative speaker (NNS) identities in repair practices of English conversation. It has identified and analyzed in detail repair sequences in the data and has also conducted quantitative analyses in…
Byers-Heinlein, Krista; Behrend, Douglas A; Said, Lyakout Mohamed; Girgis, Helana; Poulin-Dubois, Diane
Past research has shown that young monolingual children exhibit language-based social biases: they prefer native language to foreign language speakers. The current research investigated how children's language preferences are influenced by their own bilingualism and by a speaker's bilingualism. Monolingual and bilingual 4- to 6-year-olds heard pairs of adults (a monolingual and a bilingual, or two monolinguals) and chose the person with whom they wanted to be friends. Whether they were from a largely monolingual or a largely bilingual community, monolingual children preferred monolingual to bilingual speakers, and native language to foreign language speakers. In contrast, bilingual children showed similar affiliation with monolingual and bilingual speakers, as well as for monolingual speakers using their dominant versus non-dominant language. Exploratory analyses showed that individual bilinguals displayed idiosyncratic patterns of preference. These results reveal that language-based preferences emerge from a complex interaction of factors, including preference for in-group members, avoidance of out-group members, and characteristics of the child as they relate to the status of the languages within the community. Moreover, these results have implications for bilingual children's social acceptance by their peers.
Smith, David R. R.; Patterson, Roy D.
Glottal-pulse rate (GPR) and vocal-tract length (VTL) are related to the size, sex, and age of the speaker but it is not clear how the two factors combine to influence our perception of speaker size, sex, and age. This paper describes experiments designed to measure the effect of the interaction of GPR and VTL upon judgements of speaker size, sex, and age. Vowels were scaled to represent people with a wide range of GPRs and VTLs, including many well beyond the normal range of the population, and listeners were asked to judge the size and sex/age of the speaker. The judgements of speaker size show that VTL has a strong influence upon perceived speaker size. The results for the sex and age categorization (man, woman, boy, or girl) show that, for vowels with GPR and VTL values in the normal range, judgements of speaker sex and age are influenced about equally by GPR and VTL. For vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.
Ives, D. Timothy; Smith, David R. R.; Patterson, Roy D.
The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305–318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing. PMID:16419826
Ives, D Timothy; Smith, David R R; Patterson, Roy D
The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing.
Bornkessel-Schlesewsky, Ina; Krauspenhaar, Sylvia; Schlesewsky, Matthias
Evidence is accruing that, in comprehending language, the human brain rapidly integrates a wealth of information sources-including the reader or hearer's knowledge about the world and even his/her current mood. However, little is known to date about how language processing in the brain is affected by the hearer's knowledge about the speaker. Here, we investigated the impact of social attributions to the speaker by measuring event-related brain potentials while participants watched videos of three speakers uttering true or false statements pertaining to politics or general knowledge: a top political decision maker (the German Federal Minister of Finance at the time of the experiment), a well-known media personality and an unidentifiable control speaker. False versus true statements engendered an N400 - late positivity response, with the N400 (150-450 ms) constituting the earliest observable response to message-level meaning. Crucially, however, the N400 was modulated by the combination of speaker and message: for false versus true political statements, an N400 effect was only observable for the politician, but not for either of the other two speakers; for false versus true general knowledge statements, an N400 was engendered by all three speakers. We interpret this result as demonstrating that the neurophysiological response to message-level meaning is immediately influenced by the social status of the speaker and whether he/she has the power to bring about the state of affairs described.
Bishop, Jason; Keating, Patricia
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.
Opinions differ on the importance of the native speaker's concept for language teaching and testing. This Commentary maintains that it is important and seeks to explain why. Three types of grammar are distinguished, the individual's, the community's and the human faculty of language. For first language teaching and testing it is the community's…
Baptista, B. O.
Describes a study that compares Chomsky and Halle's main stress rule with Guierre's stress rules to discover which rules lead to the same word stress replacement that native speakers would give to totally unfamiliar words. Only five of Chomsky and Halle's rules were as consistently followed as Guierre's suffix rules. (SED)which+that
Gilbert, William H.
Before teachers can decide how to teach writing to nonstandard dialect speakers, they should determine whether college students can in fact learn to command a second dialect (in this case, Standard English), as well as the most effective way to provide access to command of Standard English while educating the public about the values of nonstandard…
In the verbal linguistic systems, the target for English learners in China is educated native speaker accuracy. The target for more socially embedded interchange is yet to be established. Its basis needs to be formed from "what members of the target culture consider appropriate for foreigners and attitudes of learners themselves"…
Edwards, Susan; Knott, Raymond
Reports on research to develop a descriptive framework capable of revealing relevant linguistic features of aphasic speech. Spontaneous speech samples collected from aphasic and normal speakers in dyadic conversational settings and from monologic picture descriptions are transcribed; lexical, phrasal and causal elements are coded and quantified.…
Li, Yingli; O'Boyle, Michael W.
In this study we examine how native language, sex, and college major interact to influence accuracy and preferred strategy when performing mental rotation (MR). Native monolingual Chinese and English speakers rotated 3-D shapes while maintaining a concurrent verbal or spatial memory load. For English speakers, male physical science majors were…
Brown-VanHoozer, A.; Kercel, S. W.; Tucker, R. W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the ''huge population'' problem by seeking two completely different kinds of characterizing features. These features are extracted using the techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the
Brown-VanHoozer, S.A.; Kercel, S.W.; Tucker, R.W.
The objective of this research is to develop a system capable of identifying speakers on wiretaps from a large database (>500 speakers) with a short search time duration (<30 seconds), and with better than 90% accuracy. Much previous research in speaker recognition has led to algorithms that produced encouraging preliminary results, but were overwhelmed when applied to populations of more than a dozen or so different speakers. The authors are investigating a solution to the "large population" problem by seeking two completely different kinds of characterizing features. These features are he techniques of Neuro-Linguistic Programming (NLP) and the continuous wavelet transform (CWT). NLP extracts precise neurological, verbal and non-verbal information, and assimilates the information into useful patterns. These patterns are based on specific cues demonstrated by each individual, and provide ways of determining congruency between verbal and non-verbal cues. The primary NLP modalities are characterized through word spotting (or verbal predicates cues, e.g., see, sound, feel, etc.) while the secondary modalities would be characterized through the speech transcription used by the individual. This has the practical effect of reducing the size of the search space, and greatly speeding up the process of identifying an unknown speaker. The wavelet-based line of investigation concentrates on using vowel phonemes and non-verbal cues, such as tempo. The rationale for concentrating on vowels is there are a limited number of vowels phonemes, and at least one of them usually appears in even the shortest of speech segments. Using the fast, CWT algorithm, the details of both the formant frequency and the glottal excitation characteristics can be easily extracted from voice waveforms. The differences in the glottal excitation waveforms as well as the formant frequency are evident in the CWT output. More significantly, the CWT reveals significant detail of the glottal excitation
Johnson, Kenneth R.
In this article, pedagogical problems in adapting second language teaching techniques for teaching standard English to speakers of Ebonics are discussed. Suggestions for improving teacher training programs are made. (Author/MC)
Meister, Hartmut; Fürsen, Katrin; Streicher, Barbara; Lang-Roth, Ruth; Walger, Martin
Purpose: The focus of this study was to examine the influence of fundamental frequency (F0) and vocal tract length (VTL) modifications on speaker gender recognition in cochlear implant (CI) recipients for different stimulus types. Method: Single words and sentences were manipulated using isolated or combined F0 and VTL cues. Using an 11-point…
This study surveyed members of the Association of International Educators and community volunteers to find out how international student speaker programs actually work. An international student speaker program provides speakers (from the university foreign student population) for community organizations and schools. The results of the survey (49…
Bimbot, Frédéric; Bonastre, Jean-François; Fredouille, Corinne; Gravier, Guillaume; Magrin-Chagnolleau, Ivan; Meignier, Sylvain; Merlin, Teva; Ortega-García, Javier; Petrovska-Delacrétaz, Dijana; Reynolds, Douglas A.
This paper presents an overview of a state-of-the-art text-independent speaker verification system. First, an introduction proposes a modular scheme of the training and test phases of a speaker verification system. Then, the most commonly speech parameterization used in speaker verification, namely, cepstral analysis, is detailed. Gaussian mixture modeling, which is the speaker modeling technique used in most systems, is then explained. A few speaker modeling alternatives, namely, neural networks and support vector machines, are mentioned. Normalization of scores is then explained, as this is a very important step to deal with real-world data. The evaluation of a speaker verification system is then detailed, and the detection error trade-off (DET) curve is explained. Several extensions of speaker verification are then enumerated, including speaker tracking and segmentation by speakers. Then, some applications of speaker verification are proposed, including on-site applications, remote applications, applications relative to structuring audio information, and games. Issues concerning the forensic area are then recalled, as we believe it is very important to inform people about the actual performance and limitations of speaker verification systems. This paper concludes by giving a few research trends in speaker verification for the next couple of years.
Bohnenkamp, Todd A.; Stowell, Talena; Hesse, Joy; Wright, Simon
Speakers who use an electrolarynx following a total laryngectomy no longer require pulmonary support for speech. Subsequently, chest wall movements may be affected; however, chest wall movements in these speakers are not well defined. The purpose of this investigation was to evaluate speech breathing in speakers who use an electrolarynx during…
Kim, Sunae; Kalish, Charles W.; Harris, Paul L.
Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…
Lee, Chao-Yang; Zhang, Yu
The effect of speaker variability on accessing the form and meaning of spoken words was evaluated in two short-term priming experiments. In the repetition priming experiment, participants listened to repeated or unrelated prime-target pairs, in which the prime and target were produced by the same speaker or different speakers. The results showed…
Ma, Lili; Woolley, Jacqueline D.
This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…
Dellwo, Volker; Leemann, Adrian; Kolly, Marie-José
Between-speaker variability of acoustically measurable speech rhythm [%V, ΔV(ln), ΔC(ln), and Δpeak(ln)] was investigated when within-speaker variability of (a) articulation rate and (b) linguistic structural characteristics was introduced. To study (a), 12 speakers of Standard German read seven lexically identical sentences under five different intended tempo conditions (very slow, slow, normal, fast, very fast). To study (b), 16 speakers of Zurich Swiss German produced 16 spontaneous utterances each (256 in total) for which transcripts were made and then read by all speakers (4096 sentences; 16 speaker × 256 sentences). Between-speaker variability was tested using analysis of variance with repeated measures on within-speaker factors. Results revealed strong and consistent between-speaker variability while within-speaker variability as a function of articulation rate and linguistic characteristics was typically not significant. It was concluded that between-speaker variability of acoustically measurable speech rhythm is strong and robust against various sources of within-speaker variability. Idiosyncratic articulatory movements were found to be the most plausible factor explaining between-speaker differences.
Lee, Soomin; Shimomura, Yoshihiro; Katsuura, Tetsuo
In recent years, parametric speakers have been used in various circumstances. In our previous studies, we verified that the physiological burden of the sound of parametric speaker set at 2.6 m from the subjects was lower than that of the general speaker. However, nothing has yet been demonstrated about the effects of the sound of a parametric speaker at the shorter distance between parametric speakers the human body. Therefore, we studied this effect on physiological functions and task performance. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-minute quiet period as a baseline, a 30-minute mental task period with general speakers or parametric speakers, and a 20-minute recovery period. We measured electrocardiogram (ECG) photoplethysmogram (PTG), electroencephalogram (EEG), systolic and diastolic blood pressure. Four experiments, one with a speaker condition (general speaker and parametric speaker), the other with a distance condition (0.3 m and 1.0 m), were conducted respectively at the same time of day on separate days. To examine the effects of the speaker and distance, three-way repeated measures ANOVA (speaker factor x distance factor x time factor) were conducted. In conclusion, we found that the physiological responses were not significantly different between the speaker condition and the distance condition. Meanwhile, it was shown that the physiological burdens increased with progress in time independently of speaker condition and distance condition. In summary, the effects of the parametric speaker at the 2.6 m distance were not obtained at the distance of 1 m or less.
Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.
Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…
Casas, Rachel; Guzmán-Vélez, Edmarie; Cardona-Rodriguez, Javier; Rodriguez, Nayra; Quiñones, Gabriela; Izaguirre, Borja; Tranel, Daniel
The primary objective of this study was to investigate empirically whether using an interpreter to conduct neuropsychological testing of monolingual Spanish speakers affects test scores. Participants included 40 neurologically normal Spanish speakers with limited English proficiency, aged 18-65 years (M = 39.7, SD = 13.9), who completed the Vocabulary, Similarities, Block Design, and Matrix Reasoning subtests of the Wechsler Adult Intelligence Scale-III in two counterbalanced conditions: with and without an interpreter. Results indicated that interpreter use significantly increased scores on Vocabulary and Similarities. However, scores on Block Design and Matrix Reasoning did not differ depending on whether or not an interpreter was used. In addition the findings suggested a trend toward higher variability in scores when an interpreter was used to administer Vocabulary and Similarities; this trend did not show up for Block Design or Matrix Reasoning. Together the results indicate that interpreter use may significantly affect scores for some tests commonly used in neuropsychological practice, with this influence being greater for verbally mediated tests. Additional research is needed to identify the types of tests that may be most affected as well as the factors that contribute to the effects. In the meantime neuropsychologists are encouraged to avoid interpreter use whenever practically possible, particularly for tests with high demands on interpreter abilities and skills, with tests that have not been appropriately adapted and translated into the patient's target language, and with interpreters who are not trained professionals.
Knowles, Kristen K; Little, Anthony C
In recent years, the perception of social traits in faces and voices has received much attention. Facial and vocal masculinity are linked to perceptions of trustworthiness; however, while feminine faces are generally considered to be trustworthy, vocal trustworthiness is associated with masculinized vocal features. Vocal traits such as pitch and formants have previously been associated with perceived social traits such as trustworthiness and dominance, but the link between these measurements and perceptions of cooperativeness have yet to be examined. In Experiment 1, cooperativeness ratings of male and female voices were examined against four vocal measurements: fundamental frequency (F0), pitch variation (F0-SD), formant dispersion (Df), and formant position (Pf). Feminine pitch traits (F0 and F0-SD) and masculine formant traits (Df and Pf) were associated with higher cooperativeness ratings. In Experiment 2, manipulated voices with feminized F0 were found to be more cooperative than voices with masculinized F0(,) among both male and female speakers, confirming our results from Experiment 1. Feminine pitch qualities may indicate an individual who is friendly and non-threatening, while masculine formant qualities may reflect an individual that is socially dominant or prestigious, and the perception of these associated traits may influence the perceived cooperativeness of the speakers.
Bahr, Ruth Huntley
While few individuals would argue that vocal cues can signal a person's identity, it is difficult to specify exactly which parameter(s) provide the most salient information for speaker identification. Previous literature has suggested that speaking fundamental frequency, long-term spectra, vowel formant frequencies, and speech tempo can provide speaker-specific information. However, investigations focused on automatic speaker identification have provided less than satisfactory results. These findings could be related to how each acoustic parameter is measured or, more probably, to the idea that these acoustic parameters interact in specific ways that may be more obvious in the perceptual realm and may vary across speaking situations. To further complicate matters, individuals may speak more than one language or use multiple dialects. Little is known about the effect of code switching on voice production and identification. The purpose of this presentation is to present some of the relevant literature on voice recognition and factors related to misidentification. The role of intraspeaker variability will be discussed with a special emphasis on bilingualism and bidialectalism. Implications for voice production in augmentative and alternative communication devices will be described.
Hollien, Harry; Didla, Grace; Harnsberger, James D; Hollien, Keith A
Once forensic speaker identification (SI) was recognized as an entity, it was predicted that valid computer based identification systems would quickly become a reality. This has not happened and the review to follow will provide some of the reasons why. Notable among them are (1) the sharp underestimation of its complexity and (2) its confounding with speaker verification (SV). Consideration of these (and related) issues will be followed by a brief history about how the need for SI developed and some of the responses to the problem. Since much of the SI development preceded the structuring of appropriate standards, the recommended stop-gap response described here is based on somewhat uncoordinated, but extensive, research. The product of that effort will be reviewed and organized into a platform which supports SI procedures consistent with the forensic model. Also discussed are the standards which have been established, their impact on SI development and its present limitations. How the cited approach interacts both with progress in verification and the developing SI machine-based identification systems also will be considered. Finally, a few suggestions will be made that should assist in upgrading the effectiveness of aural perceptual speaker identification (AP SI).
Hung, Wan-Yu; Patrycia, Ferninda; Yow, W Q
Past research has investigated how children use different sources of information such as social cues and word-learning heuristics to infer referential intents. The present research explored how children weigh and use some of these cues to make referential inferences. Specifically, we examined how switching between languages known (familiar) or unknown (unfamiliar) to a child would influence his or her choice of cue to interpret a novel label in a challenging disambiguation task, where a pointing cue was pitted against the mutual exclusivity (ME) principle. Forty-eight 3-and 4-years-old English-Mandarin bilingual children listened to a story told either in English only (No-Switch), English and Mandarin (Familiar-Switch), English and Japanese (Unfamiliar-Switch), or English and English-sounding nonsense sentences (Nonsense-Switch). They were then asked to select an object (from a pair of familiar and novel objects) after hearing a novel label paired with the speaker's point at the familiar object, e.g., "Can you give me the blicket?" Results showed that children in the Familiar-Switch condition were more willing to relax ME to follow the speaker's point to pick the familiar object than those in the Unfamiliar-Switch condition, who were more likely to pick the novel object. No significant differences were found between the other conditions. Further analyses revealed that children in the Unfamiliar-Switch condition looked at the speaker longer than children in the other conditions when the switch happened. Our findings suggest that children weigh speakers' referential cues and word-learning heuristics differently in different language contexts while taking into account their communicative history with the speaker. There are important implications for general education and other learning efforts, such as designing learning games so that the history of credibility with the user is maintained and how learning may be best scaffolded in a helpful and trusting environment.
Campeanu, Sandra; Craik, Fergus I M; Alain, Claude
Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect
Wise, Kevin; Haake, Monica
In this article, the authors describe steps on how to develop a high-impact activity in which students build, test, and improve their own "coffee can" speakers to observe firsthand how loudspeakers work to convert electrical energy to sound. The activity is appropriate for students in grades three to six and lends itself best to students…
Robenalt, Clarice; Goldberg, Adele E.
When native speakers judge the acceptability of novel sentences, they appear to implicitly take competing formulations into account, judging novel sentences with a readily available alternative formulation to be less acceptable than novel sentences with no competing alternative. Moreover, novel sentences with a competing alternative are more…
McDonald, Malcolm W.
The work done on this project this summer has been geared toward setting up the necessary infrastructure and planning to support the operation of an effective speaker outreach program. The program has been given the name, NASA AMBASSADORS. Also, individuals who become participants in the program will be known as "NASA AMBASSADORS". This summer project has been conducted by the joint efforts of this author and those of Professor George Lebo who will be issuing a separate report. The description in this report will indicate that the NASA AMBASSADOR program operates largely on the contributions of volunteers, with the assistance of persons at the Marshall Space Flight Center (MSFC). The volunteers include participants in the various summer programs hosted by MSFC as well as members of the NASA Alumni League. The MSFC summer participation programs include: the Summer Faculty Fellowship Program for college and university professors, the Science Teacher Enrichment Program for middle- and high-school teachers, and the NASA ACADEMY program for college and university students. The NASA Alumni League members are retired NASA employees, scientists, and engineers. The MSFC offices which will have roles in the operation of the NASA AMBASSADORS include the Educational Programs Office and the Public Affairs Office. It is possible that still other MSFC offices may become integrated into the operation of the program. The remainder of this report will establish the operational procedures which will be necessary to sustain the NASA AMBASSADOR speaker outreach program.
Chen, Ke; Salman, Ahmad
Speech signals convey various yet mixed information ranging from linguistic to speaker-specific information. However, most of acoustic representations characterize all different kinds of information as whole, which could hinder either a speech or a speaker recognition (SR) system from producing a better performance. In this paper, we propose a novel deep neural architecture (DNA) especially for learning speaker-specific characteristics from mel-frequency cepstral coefficients, an acoustic representation commonly used in both speech recognition and SR, which results in a speaker-specific overcomplete representation. In order to learn intrinsic speaker-specific characteristics, we come up with an objective function consisting of contrastive losses in terms of speaker similarity/dissimilarity and data reconstruction losses used as regularization to normalize the interference of non-speaker-related information. Moreover, we employ a hybrid learning strategy for learning parameters of the deep neural networks: i.e., local yet greedy layerwise unsupervised pretraining for initialization and global supervised learning for the ultimate discriminative goal. With four Linguistic Data Consortium (LDC) benchmarks and two non-English corpora, we demonstrate that our overcomplete representation is robust in characterizing various speakers, no matter whether their utterances have been used in training our DNA, and highly insensitive to text and languages spoken. Extensive comparative studies suggest that our approach yields favorite results in speaker verification and segmentation. Finally, we discuss several issues concerning our proposed approach.
Kiss, Miklos; Cristescu, Tamara; Fink, Martina; Wittmann, Marc
Neuropsychological studies in brain-injured patients with aphasia and children with specific language-learning deficits have shown the dependence of language comprehension on auditory processing abilities, i.e. the detection of temporal order. An impairment of temporal-order perception can be simulated by time reversing segments of the speech signal. In our study, we investigated how different lengths of time-reversed segments in speech influenced comprehension in ten native German speakers and ten participants who had acquired German as a second language. Results show that native speakers were still able to understand the distorted speech at segment lengths of 50 ms, whereas non-native speakers only could identify sentences with reversed intervals of 32 ms duration. These differences in performance can be interpreted by different levels of semantic and lexical proficiency. Our method of temporally-distorted speech offers a new approach to assess language skills that indirectly taps into lexical and semantic competence of non-native speakers.
Ruf, Helena T.
This dissertation investigates syntactic priming in second language (L2) development among three speaker populations: (1) less proficient L2 speakers; (2) advanced L2 speakers; and (3) LI speakers. Using confederate scripting this study examines how German speakers choose certain word orders in locative constructions (e.g., "Auf dem Tisch…
Middleton, Erica L.; Schwartz, Myrna F.
We investigated the influence of phonological neighborhood density (PND) on the performance of aphasic speakers whose naming impairments differentially implicate phonological or semantic stages of lexical access. A word comes from a dense phonological neighborhood if many words sound like it. Limited evidence suggests that higher density facilitates naming in aphasic speakers, as it does in healthy speakers. Using well controlled stimuli, Experiment 1 confirmed the influence of PND on accuracy and phonological error rates in two aphasic speakers with phonological processing deficits. In Experiments 2 and 3, we extended the investigation to an aphasic speaker who is prone to semantic errors, indicating a semantic deficit and/or a deficit in the mapping from semantics to words. This individual had higher accuracy, and fewer semantic errors, in naming targets from high versus low density neighborhoods. It is argued that the results provide strong support for interactive approaches to lexical access, where reverberatory feedback between word- and phoneme-level lexical representations not only facilitates phonological level processes but also privileges the selection of a target word over its semantic competitors. PMID:21718214
Hemphill, Leaunda S.; Hemphill, Hoyet H.
This study investigated the impact of virtual guest speakers facilitating asynchronous discussions. The setting was an online instructional technology course with 16 graduate students and two guest speakers. The research reports the quantity and level of critical thinking of the students and guests. Each posting was coded for frequency and…
Vongphoe, Michael; Zeng, Fan-Gang
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Gut, Ulrike; Pillai, Stefanie
Various researchers have shown that second language (L2) speakers have difficulties with marking information structure in English prosodically: They deviate from native speakers not only in terms of pitch accent placement (Grosser, 1997; Gut, 2009; Ramírez Verdugo, 2002) and the type of pitch accent they produce (Wennerstrom, 1994, 1998) but also…
Hübscher, Iris; Esteve-Gibert, Núria; Igualada, Alfonso; Prieto, Pilar
This study investigates 3- to 5-year-old children's sensitivity to lexical, intonational and gestural information in the comprehension of speaker uncertainty. Most previous studies on children's understanding of speaker certainty and uncertainty across languages have focused on the comprehension of lexical markers, and little is known about the…
Kagan, Olga; Friedman, Debra
This study explored the possibility of using an ACTFL oral proficiency interview (OPI) to assess the spoken proficiency of heritage language speakers of Russian for the purpose of placing them in Russian language classes. The authors also considered whether the norm of an educated native speaker could be used as a valid reference point for Russian…
Bonilha, Heather Shaw; Deliyski, Dimitar D.; Gerlach, Terri Treman
Purpose: To ascertain the amount of phase asymmetry of the vocal fold vibration in normophonic speakers via visualization techniques and compare findings for habitual and pressed phonations. Method: Fifty-two normophonic speakers underwent stroboscopy and high-speed videoendoscopy (HSV). The HSV images were further processed into 4 visual…
Language learners' language experience is predicted to display a significant effect on their accurate perception of foreign language sounds (Flege, 1995). At the superasegmental level, there is still a debate regarding whether tone language speakers are better able to perceive foreign lexical tones than non-tone language speakers (i.e Lee et al.,…
identification operation. Openshaw & Mason (1994) obtained similar results, where the effects of noise on the speaker identification using both the mel-cepstral...George, E. B., Lee, L. T, and Kay, S. M., " Co-channel speaker separation," Proc. IEEE ICASSP, pp:828-831, 1995. 10. Openshaw , J. P. and Mason, J. S
Holliday, Adrian; Aboshiha, Pamela
There is now general acceptance that the traditional "nonnative speaker" label for teachers of English is problematic on sociolinguistic grounds and can be the source of employment discrimination. However, there continues to be disagreement regarding how far there is a prejudice against "nonnative speaker" teachers which is deep and sustained and…
This article reports on a study contrasting 41 native speakers (NSs) and 38 non-native speakers (NNSs) of English from two short initial teacher training courses, the Cambridge Certificate in English Language Teaching to Adults and the Trinity College London CertTESOL. After a brief history and literature review, I present findings on teachers'…
Gillis, Randall L.; Nilsen, Elizabeth S.
Knowledge transfer is most effective when speakers provide good quality (in addition to accurate) information. Two studies investigated whether preschool- (4-5 years old) and school-age (6-7 years old) children prefer speakers who provide sufficient information over those who provide insufficient (yet accurate) information. Children were provided…
Montgomery County Public Schools, Rockville, MD. Dept. of Adult Education.
This study guide was prepared to assist trained teachers of English to speakers of other languages (ESOL) who work with students at the beginning and intermediate levels. These teachers have had graduate courses in descriptive linguistics, phonology, syntax, morphology, and methodology of teaching English to speakers of other languages. The guide…
Thompson, Amy S.; Fioramonte, Amy
A sizable body of literature has been established surrounding native speaker teachers versus nonnative speaker teachers of English. Presently, a paucity of research exists related to teachers working with languages other than English. In an attempt to fill this research gap, this qualitative research study presents the experiences of novice…
Lee, James J.; Pinker, Steven
Speakers often do not state requests directly but employ innuendos such as "Would you like to see my etchings?" Though such indirectness seems puzzlingly inefficient, it can be explained by a theory of the "strategic speaker", who seeks plausible deniability when he or she is uncertain of whether the hearer is cooperative or…
Pinker, Steven; Birdsong, David
Two studies elicited native speaker and nonnative speaker judgments regarding preferred word order of the idioms known as "freezes." The results support the notion that rules of frozen word order are psychologically real and reflect universal language rules. (Author/AM)
Carvalho, Ana M.; Freire, Juliana Luna; da Silva, Antonio J. B.
Portuguese is the sixth-most-spoken native language in the world, with approximately 240,000,000 speakers. Within the United States, there is a growing demand for K-12 language programs to engage the community of Portuguese heritage speakers. According to the 2000 U.S. census, 85,000 school-age children speak Portuguese at home. As a result, more…
The goal of the study is to analyze the morphological processing of real and novel verb forms by heritage speakers of Russian in order to determine whether it differs from that of native (L1) speakers and second language (L2) learners; if so, how it is different; and which factors may guide the acquisition process. The experiment involved three…
This digest describes some of the issues involved in the Spanish language learning experiences of heritage Spanish speakers, the largest population of heritage language speakers in the United States. It describes ways in which educators can facilitate these students' language development through a better understanding of their language learning…
This paper reports on a comparative study of pauses made by L2 learners and native speakers of English while narrating picture stories. The comparison is based on the number of pauses and total amount of silence in the middle and at the end of clauses in the performance of 40 native speakers and 40 L2 learners of English. The results of the…
Ma, Joan K.-Y.; Whitehill, Tara; Cheung, Katherine S.-K.
Background: Dysprosody is a common feature in speakers with hypokinetic dysarthria. However, speech prosody varies across different types of speech materials. This raises the question of what is the most appropriate speech material for the evaluation of dysprosody. Aims: To characterize the prosodic impairment in Cantonese speakers with…
Kreitewolf, Jens; Gaudrain, Etienne; von Kriegstein, Katharina
Understanding speech from different speakers is a sophisticated process, particularly because the same acoustic parameters convey important information about both the speech message and the person speaking. How the human brain accomplishes speech recognition under such conditions is unknown. One view is that speaker information is discarded at early processing stages and not used for understanding the speech message. An alternative view is that speaker information is exploited to improve speech recognition. Consistent with the latter view, previous research identified functional interactions between the left- and the right-hemispheric superior temporal sulcus/gyrus, which process speech- and speaker-specific vocal tract parameters, respectively. Vocal tract parameters are one of the two major acoustic features that determine both speaker identity and speech message (phonemes). Here, using functional magnetic resonance imaging (fMRI), we show that a similar interaction exists for glottal fold parameters between the left and right Heschl's gyri. Glottal fold parameters are the other main acoustic feature that determines speaker identity and speech message (linguistic prosody). The findings suggest that interactions between left- and right-hemispheric areas are specific to the processing of different acoustic features of speech and speaker, and that they represent a general neural mechanism when understanding speech from different speakers.
Mandrekar, Ishan; Prevelakis, Vassilis; Turner, David Michael
The authors have developed the "Ethernet Speaker" (ES), a network-enabled single board computer embedded into a conventional audio speaker. Audio streams are transmitted in the local area network using multicast packets, and the ES can select any one of them and play it back. A key requirement for the ES is that it must be capable of playing any…
Cuenca, Maria Heliodora; Barrio, Marina M.; Anaya, Pablo; Establier, Carmelo
The purpose of this investigation is to explore the use by Spanish excellent oesophageal speakers of acoustic cues to mark syllabic stress. The speech material has consisted of five pairs of disyllabic words which only differed in stress position. Total 44 oesophageal and 9 laryngeal speakers were recorded and a computerised designed "ad…
Whitehill, Tara; Chau, Cynthia
Many speakers with repaired cleft palate have reduced intelligibility, but there are limitations with current procedures for assessing intelligibility. The aim of this study was to construct a single-word intelligibility test for speakers with cleft palate. The test used a multiple-choice identification format, and was based on phonetic contrasts…
Kidd, Celeste; White, Katherine S.; Aslin, Richard N.
The ability to infer the referential intentions of speakers is a crucial part of learning a language. Previous research has uncovered various contextual and social cues that children may use to do this. Here we provide the first evidence that children also use speech disfluencies to infer speaker intention. Disfluencies (e.g. filled pauses "uh"…
Makhoul, John I.
The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…
Hodgson, Kevin Michael
Although the paradigm shift towards English as an International Language (EIL) has been generally accepted within the academic community, a valorization of native speaker norms continues to be prevalent among many non-native speakers (NNSs). Through data drawn from a qualitative questionnaire and proficiency assessment results (TOEIC), this mixed…
Delisle, Helga H.
Presents and analyzes two studies designed to test native speaker reaction to certain types of errors that speakers of English make when learning German. Aim was to establish the role of the medium, spoken or written, in evaluation of errors. Results show overall ratings of errors in written and spoken language are similar, although with…
McRee, Annie-Laurie; Madsen, Nikki; Eisenberg, Marla E.
This study, using data from a statewide survey (n = 332), examined teachers' practices regarding the inclusion of guest speakers to cover sexuality content. More than half of teachers (58%) included guest speakers. In multivariate analyses, teachers who taught high school, had professional preparation in health education, or who received…
Ramsey, Richard David
Teaching native and nonnative English speakers together in the same classroom can be accomplished well only by a teacher who is sensitive to the concerns of the nonnative students. Personal interviews conducted with nonnative speakers indicated that the most recurrent out-of-class problems include separation from friends and family, distinctions…
Blomgren, Michael; Goberman, Alexander M.
The goal of this study was to evaluate stuttering frequency across a multidimensional (2 x 2) hierarchy of speech performance tasks. Specifically, this study examined the interaction between changes in length of utterance and levels of speech rate stability. Forty-four adult male speakers participated in the study (22 stuttering speakers and 22…
This study aims at describing types and usages of deixis in the speech of Jordanian Urban Arabic native speakers. The present study was conducted in different settings which researcher's family members, friends, colleagues, and acquaintances took part in. Data of the study were collected through observing spontaneous speech of native speakers of…
Due to the global momentum of English as a Lingua Franca (ELF), Anglophones may perceive that there is less urgency for them to learn other languages than for speakers of other languages to learn English. The monolingual expectations of English speakers are evidenced not only in Anglophone countries but also abroad. This study reports on the…
Kinzler, Katherine D.; Corriveau, Kathleen H.; Harris, Paul L.
Across two experiments, preschool-aged children demonstrated selective learning of non-linguistic information from native-accented rather than foreign-accented speakers. In Experiment 1, children saw videos of a native- and a foreign-accented speaker of English who each spoke for 10 seconds, and then silently demonstrated different functions with…
Smith, David R R
A man, woman or child saying the same vowel do so with very different voices. The auditory system solves the complex problem of extracting what the man, woman or child has said despite substantial differences in the acoustic properties of their voices. Much of the acoustic variation between the voices of men and woman is due to changes in the underlying anatomical mechanisms for producing speech. If the auditory system knew the sex of the speaker then it could potentially correct for speaker sex related acoustic variation thus facilitating vowel recognition. This study measured the minimum stimulus duration necessary to accurately discriminate whether a brief vowel segment was spoken by a man or woman, and the minimum stimulus duration necessary to accuately recognise what vowel was spoken. Results showed that reliable vowel recognition precedesreliable speaker sex discrimination, thus questioning the use of speaker sex information in compensating for speaker sex related acoustic variation in the voice. Furthermore, the pattern of performance across experiments where the fundamental frequency and formant frequency information of speaker's voices were systematically varied, was markedly different depending on whether the task was speaker-sex discrimination or vowel recognition. This argues for there being little relationship between perception of speaker sex (indexical information) and perception of what has been said (linguistic information) at short durations.
Vongphoe, Michael; Zeng, Fan-Gang
Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.
Roelofs, Ardi; Verhoef, Kim
Phonological encoding is the process by which speakers retrieve phonemic segments for morphemes from memory and use the segments to assemble phonological representations of words to be spoken. When conversing in one language, bilingual speakers have to resist the temptation of encoding word forms using the phonological rules and representations of…
Mason, John S. D.; Evans, Nicholas W. D.; Stapert, Robert; Auckenthaler, Roland
Text-independent speaker recognition systems such as those based on Gaussian mixture models (GMMs) do not include time sequence information (TSI) within the model itself. The level of importance of TSI in speaker recognition is an interesting question and one addressed in this paper. Recent works has shown that the utilisation of higher-level information such as idiolect, pronunciation, and prosodics can be useful in reducing speaker recognition error rates. In accordance with these developments, the aim of this paper is to show that as more data becomes available, the basic GMM can be enhanced by utilising TSI, even in a text-independent mode. This paper presents experimental work incorporating TSI into the conventional GMM. The resulting system, known as the segmental mixture model (SMM), embeds dynamic time warping (DTW) into a GMM framework. Results are presented on the 2000-speaker SpeechDat Welsh database which show improved speaker recognition performance with the SMM.
Chabal, Sarah; Marian, Viorica
Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct linguistic input, showing that language is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. PMID:26030171
Salvata, Caden; Blumstein, Sheila E; Myers, Emily B
The current study explored how listeners map the variable acoustic input onto a common sound structure representation while being able to retain phonetic detail to distinguish among the identity of talkers. An adaptation paradigm was utilized to examine areas which showed an equal neural response (equal release from adaptation) to phonetic change when spoken by the same speaker and when spoken by two different speakers, and insensitivity (failure to show release from adaptation) when the same phonetic input was spoken by a different speaker. Neural areas which showed speaker invariance were located in the anterior portion of the middle superior temporal gyrus bilaterally. These findings provide support for the view that speaker normalization processes allow for the translation of a variable speech input to a common abstract sound structure. That this process appears to occur early in the processing stream, recruiting temporal structures, suggests that this mapping takes place prelexically, before sound structure input is mapped on to lexical representations.
Keintz, Connie K.; Bunton, Kate; Hoit, Jeannette D.
Purpose: To examine the influence of visual information on speech intelligibility for a group of speakers with dysarthria associated with Parkinson's disease. Method: Eight speakers with Parkinson's disease and dysarthria were recorded while they read sentences. Speakers performed a concurrent manual task to facilitate typical speech production.…
Choi, Lee Jin
This qualitative study of English Korean bilinguals explores the ways in which they legitimize themselves as "good" bilinguals in relation to the discourse of native-speakerism. I first survey the essentialist discourse of native speakerism still prevalent in the field of English language teaching and learning despite the growing…
Chiang, Belinda; Pochtrager, Fran
This study investigated the ways in which Chinese-born speakers of English and American-born speakers of English differed or were similar in their responses to compliments on: (1) ability; (2) appearance; and (3) possessions. Subjects were 15 Chinese and 15 American individuals, controlled for gender and status. Subjects were asked to write their…
Kaland, Constantijn; Swerts, Marc; Krahmer, Emiel
The present research investigates what drives the prosodic marking of contrastive information. For example, a typically developing speaker of a Germanic language like Dutch generally refers to a pink car as a "PINK car" (accented words in capitals) when a previously mentioned car was red. The main question addressed in this paper is whether contrastive intonation is produced with respect to the speaker's or (also) the listener's perspective on the preceding discourse. Furthermore, this research investigates the production of contrastive intonation by typically developing speakers and speakers with autism. The latter group is investigated because people with autism are argued to have difficulties accounting for another person's mental state and exhibit difficulties in the production and perception of accentuation and pitch range. To this end, utterances with contrastive intonation are elicited from both groups and analyzed in terms of function and form of prosody using production and perception measures. Contrary to expectations, typically developing speakers and speakers with autism produce functionally similar contrastive intonation as both groups account for both their own and their listener's perspective. However, typically developing speakers use a larger pitch range and are perceived as speaking more dynamically than speakers with autism, suggesting differences in their use of prosodic form.
Foreign-accented speakers are generally regarded as less educated, less reliable and less interesting than native speakers and tend to be associated with cultural stereotypes of their country of origin. This discrimination against foreign accents has, however, been discussed mainly using accented English in English-speaking countries. This study…
This study examined the quality of three English vowels and their Korean counterpart vowels by measuring F1/F2 frequencies and investigating how the different vowel qualities influenced consonant voicing. Participants were six native speakers (NS) of English and six NS of Korean who were graduate students at a large U.S. university. F1/F2…
Velázquez, Isabel; Garrido, Marisol; Millán, Mónica
This article presents the results of an analysis of reported interlocutors in Spanish in a group of heritage speakers (HS), in three communities of the US Midwest. Participants were college-aged bilinguals developing their own personal and professional networks outside the direct influence of their parents. Responses are compared with those from…
de Bres, Julia
It has been claimed that the success of minority language policy initiatives may only be achievable if at least some degree of 'tolerability' of these initiatives is secured among majority language speakers. There has, however, been little consideration in the language planning literature of what practical approaches might be used to influence the…
Kwon, Heak-bong; Song, Young-jun; Chang, Un-dong; Ahn, Jae-hyeong
In this paper, we propose a system that detects the current speaker in multi-speaker videoconferencing by using lip motion. First, the system detects the face and lip region of each of the candidate speakers using face color and shape information. Then, to detect the current speaker, it calculates the change between the current frame and the previous frame in lip region. To close-up the detected current speaker, we used two CCD cameras. One is a general CCD camera, the other is a PTZ camera controlled by RS-232C serial port. The experimental result is the proposed system capable of detecting the face of current speaker in a video feed with more than three people, regardless of orientation of the faces. With this system, it only takes 4 to 5 seconds to zoom in on the speaker from the initial reference image. Also, it is a more efficient image transmission system for such things as video conferencing and internet broadcasting because it offers a close up face image at a resolution of 320x240, while at the same time providing a whole background image.
Bressmann, Tim; Radovanovic, Bojana; Harper, Susan; Klaiman, Paula; Fisher, David; Kulkarni, Gajanan V
Manyspeakers with cleft palate develop atypical consonant productions, especially for pressure consonants such as plosives, fricatives, and affricates. The present study investigated the nature of nasal sound errors. The participants were eight female and three male speakers with cleft palate between the ages of 6 to 20. Speakers were audio-recorded, and midsagittal tongue movement was captured with ultrasound. The speakers repeated vowel-consonant-vowel with the vowels /α/, /i/, and /u/ and the alveolar and velar nasal consonants /n/ and //. The productions were reviewed by three listeners. The participants showed a variety of different placement errors and insertions of plosives, as well as liquid productions. There was considerable error variability between and within speakers, often related to the different vowel contexts. Three speakers co-produced click sounds. The study demonstrated the wide variety of sound errors that some speakers with cleft palate may demonstrate for nasal sounds. Nasal sounds, ideally in different vowel contexts, should be included in articulation screenings for speakers with cleft palate, perhaps more than is currently the case.
Koenig, Laura L.; Mencl, W. Einar; Lucero, Jorge C.
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction-adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.
Michelas, Amandine; Portes, Cristel; Champagne-Lavau, Maud
Recent studies on a variety of languages have shown that a speaker's commitment to the propositional content of his or her utterance can be encoded, among other strategies, by pitch accent types. Since prior research mainly relied on lexical-stress languages, our understanding of how speakers of a non-lexical-stress language encode speaker commitment is limited. This paper explores the contribution of the last pitch accent of an intonation phrase to convey speaker commitment in French, a language that has stress at the phrasal level as well as a restricted set of pitch accents. In a production experiment, participants had to produce sentences in two pragmatic contexts: unbiased questions (the speaker had no particular belief with respect to the expected answer) and negatively biased questions (the speaker believed the proposition to be false). Results revealed that negatively biased questions consistently exhibited an additional unaccented F0 peak in the preaccentual syllable (an H+!H* pitch accent) while unbiased questions were often realized with a rising pattern across the accented syllable (an H* pitch accent). These results provide evidence that pitch accent types in French can signal the speaker's belief about the certainty of the proposition expressed in French. It also has implications for the phonological model of French intonation.
Holmes, Kevin J; Moty, Kelsey; Regier, Terry
The spatial relation of support has been regarded as universally privileged in nonlinguistic cognition and immune to the influence of language. English, but not Korean, obligatorily distinguishes support from nonsupport via basic spatial terms. Despite this linguistic difference, previous research suggests that English and Korean speakers show comparable nonlinguistic sensitivity to the support/nonsupport distinction. Here, using a paradigm previously found to elicit cross-language differences in color discrimination, we provide evidence for a difference in sensitivity to support/nonsupport between native English speakers and native Korean speakers who were late English learners and tested in a context that privileged Korean. Whereas the former group showed categorical perception (CP) when discriminating spatial scenes capturing the support/nonsupport distinction, the latter did not. An additional group of native Korean speakers-relatively early English learners tested in an English-salient context-patterned with the native English speakers in showing CP for support/nonsupport. These findings suggest that obligatory marking of support/nonsupport in one's native language can affect nonlinguistic sensitivity to this distinction, contra earlier findings, but that such sensitivity may also depend on aspects of language background and the immediate linguistic context.
Jiang, Xiaoming; Pell, Marc D
In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by
Thomson, Ron I.
One of the most influential models of second language (L2) speech perception and production [Flege, Speech Perception and Linguistic Experience (York, Baltimore, 1995) pp. 233-277] argues that during initial stages of L2 acquisition, perceptual categories sharing the same or nearly the same acoustic space as first language (L1) categories will be processed as members of that L1 category. Previous research has generally been limited to testing these claims on binary L2 contrasts, rather than larger portions of the perceptual space. This study examines the development of 10 English vowel categories by 20 Mandarin L1 learners of English. Imitation of English vowel stimuli by these learners, at 6 data collection points over the course of one year, were recorded. Using a statistical pattern recognition model, these productions were then assessed against native speaker norms. The degree to which the learners' perception/production shifted toward the target English vowels and the degree to which they matched L1 categories in ways predicted by theoretical models are discussed. The results of this experiment suggest that previous claims about perceptual assimilation of L2 categories to L1 categories may be too strong.
Garon, J E
Presentation skills are vital to clinical systems managers. This article covers four steps to successful presentations: 1) tailoring for an audience, 2) organizing a presentation, 3) mastering presentation techniques, and 4) creating effective visual aids. Tailoring for the audience entails learning about the audience and matching the presentation to their knowledge, educational level, and interests. Techniques to curry favor with an audience include: establishing common ground, relating through universal experiences, and pushing "hot buttons." Tasks involved in organizing the presentation for maximum audience interest begin with arranging the key points in a transparent organizational scheme. Audience attention is sustained using "hooks," such as graphics, anecdotes, humor, and quotations. Basic presentation techniques include appropriate rehearsal, effective eye contact with an audience, and anxiety-reducing strategies. Visual aids include flip charts, slides, transparencies, and computer presentations. Criteria for selecting the type of visual aids are delineated based on audience size and type of presentation, along with respective advantages and disadvantages. The golden rule for presentations is "Never show a slide for which you have to apologize." Rules to maximize visibility and effectiveness, including use of standard templates, sans serif fonts, dark backgrounds with light letters, mixed cases, and effective graphics, ensure that slides or projected computer images are clear and professional. Taken together, these strategies will enhance the delivery of the presentation and decrease the speaker's anxiety.
Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.
Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro
Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245
Li, Dongdong; Yang, Yingchun
In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492
Cuenca, María Heliodora; Barrio, Marina M; Anaya, Pablo; Establier, Carmelo
The purpose of this investigation is to explore the use by Spanish excellent oesophageal speakers of acoustic cues to mark syllabic stress. The speech material has consisted of five pairs of disyllabic words which only differed in stress position. Total 44 oesophageal and 9 laryngeal speakers were recorded and a computerised designed ad hoc perceptual test was run in order to assess the accurate realisation of stress. The items produced by eight excellent oesophageal speakers with highest accuracy levels in the perception experiment were analysed acoustically with Praat, to be compared with the laryngeal control group. Measures of duration, fundamental frequency, spectral balance and overall intensity were taken for each target vowel and syllable. Results revealed that Spanish excellent oesophageal speakers were able to retain appropriate acoustic relations between stressed and unstressed syllables. Although spectral balance revealed as a strong cue for syllabic stress in the two voicing modes, a different hierarchy of acoustic cues in each voicing mode was found.
Williams, W. J.; Bossemeyer, R. W.
Time-Frequency Analysis has previously been successfully applied to characterize and quantify a variety of acoustic signals, including marine mammal sounds. In this research, Time-Frequency analysis is applied to human speech signals in an effort to reveal signal structure salient to the biometric speaker verification challenge. Prior approaches to speaker verification have relied upon signal processing analysis such as linear prediction or weighted Cepstrum spectral representations of segments of speech and classification techniques based on stochastic pattern matching. The authors believe that the classification of identity of a speaker based on time-frequency representation of short time events occurring in speech could have substantial advantages. Using these ideas, a speaker verification algorithm was developed1 and has been refined over the past several years. In this presentation, the authors describe the testing of the algorithm using a large speech database, the results obtained, and recommendations for further improvements.
Caballero Morales, Santiago Omar; Cox, Stephen J.
Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
Investigated how English and Japanese speakers syllabify two-syllable English words and nonwords with single intervocalic consonants Results are discussed in light of linguistic and psycholinguistic theories of syllabification. (Author/VWL)
Sell, Gregory; Suied, Clara; Elhilali, Mounya; Shamma, Shihab
Listeners' ability to discriminate unfamiliar voices is often susceptible to the effects of manipulations of acoustic characteristics of the utterances. This vulnerability was quantified within a task in which participants determined if two utterances were spoken by the same or different speakers. Results of this task were analyzed in relation to a set of historical and novel parameters in order to hypothesize the role of those parameters in the decision process. Listener performance was first measured in a baseline task with unmodified stimuli, and then compared to responses with resynthesized stimuli under three conditions: (1) normalized mean-pitch; (2) normalized duration; and (3) normalized linear predictive coefficients (LPCs). The results of these experiments suggest that perceptual speaker discrimination is robust to acoustic changes, though mean-pitch and LPC modifications are more detrimental to a listener's ability to successfully identify same or different speaker pairings. However, this susceptibility was also found to be partially dependent on the specific speaker and utterances.
Hanulíková, Adriana; Carreiras, Manuel
An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.
Spectrogram Cochleagram (b) 0 dB Multi-ratio E T ot al 10 12 14 16 18 20 512 hidden units 1024 hidden units 1536 hidden units (c) Figure 4: Average...adaptation of gender-pair-dependent DNNs and multi-ratio training, are introduced later to relax constraints. A factorial hidden Markov model (FHMM...networks, speaker-dependent modeling, speaker adaptation, multi-condition training, factorial hidden Markov model. OSU Dept. of Computer Science and
References  J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava, “Speech and language technologies for audio indexing ...DATES COVERED (From - To) 4. TITLE AND SUBTITLE SPEAKER CLUSTERING FOR A MIXTURE OF SINGING AND READING (PREPRINT) 5a. CONTRACT NUMBER FA8750-09...based on reading and singing speech samples for each speaker. As a speaking style, singing introduces changes in the time-frequency structure of a
Honorof, Douglas N.; Whalen, D. H.
Fundamental frequency (F0) is used for many purposes in speech, but its linguistic significance is based on its relation to the speaker's range, not its absolute value. While it may be that listeners can gauge a specific pitch relative to a speaker's range by recognizing it from experience, whether they can do the same for an unfamiliar voice is an open question. The present experiment explored that question. Twenty native speakers of English (10 male, 10 female) produced the vowel /opena/ with a spoken (not sung) voice quality at varying pitches within their own ranges. Listeners then judged, without familiarization or context, where each isolated F0 lay within each speaker's range. Correlations were high both for the entire range (0.721) and for the range minus the extremes (0.609). Correlations were somewhat higher when the F0s were related to the range of all the speakers, either separated by sex (0.830) or pooled (0.848), but several factors discussed here may help account for this pattern. Regardless, the present data provide strong support for the hypothesis that listeners are able to locate an F0 reliably within a range without external context or prior exposure to a speaker's voice. .
Holt, Yolanda Feimster; Jacewicz, Ewa; Fox, Robert Allen
Purpose Atypical duration of speech segments can signal a speech disorder. This study examined variation in vowel duration in African American English (AAE) relative to White American English (WAE) speakers living in the same dialect region in the South in order to characterize the nature of systematic variation between the two groups. The goal was to establish whether segmental durations in minority populations differ from the well-established patterns in mainstream populations. Method Participants were 32 AAE and 32 WAE speakers differing in age who, in their childhood, attended either segregated (older speakers) or integrated (younger speakers) public schools. Speech materials consisted of 14 vowels produced in hVd-frame. Results AAE vowels were significantly longer than WAE vowels. Vowel duration did not differ as a function of age. The temporal tense-lax contrast was minimized for AAE relative to WAE. Female vowels were significantly longer than male vowels for both AAE and WAE. Conclusions African Americans should be expected to produce longer vowels relative to White speakers in a common geographic area. These longer durations are not deviant but represent a typical feature of AAE. This finding has clinical importance in guiding assessments of speech disorders in AAE speakers. PMID:25951511
A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.
Monetta, Laura; Cheang, Henry S; Pell, Marc D
The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).
Sobel, David M; Macris, Deanna M
Many studies suggest that preschoolers rely on individuals' histories of generating accurate lexical information when learning novel lexical information from them. The present study examined whether children used a speaker's accuracy about one kind of linguistic knowledge to make inferences about another kind of linguistic knowledge, focusing specifically on syntax and the lexicon. In Experiment 1, we presented children with 2 live speakers who were lexically accurate, but differed in their appropriate use of subject-verb agreement. Older 4-year-olds, but not younger 4-year-olds, relied on the accurate speaker to learn new labels for novel objects. We suggest only the older 4-year-olds might have registered that the inaccurate speaker was an unreliable source of novel linguistic information. In Experiment 2, 4-year-olds observed 2 speakers whose utterances were syntactically accurate, but differed in lexical accuracy. All 4-year-olds used the speakers' accuracy to guide how they learned novel lexical information and novel irregular plurals, but not how they learned novel irregular past tense forms that children often regularize. These results suggest that when learning novel syntactic information, reliability information might interact with underlying linguistic regularity.
Lee, Jimin; Shaiman, Susan; Weismer, Gary
This study examined the relationship (1) between acoustic vowel space and the corresponding tongue kinematic vowel space and (2) between formant frequencies (F1 and F2) and tongue x-y coordinates for the same time sampling point. Thirteen healthy female adults participated in this study. Electromagnetic articulography and synchronized acoustic recordings were utilized to obtain vowel acoustic and tongue kinematic data across ten speech tasks. Intra-speaker analyses showed that for 10 of the 13 speakers the acoustic vowel space was moderately to highly correlated with tongue kinematic vowel space; much weaker correlations were obtained for inter-speaker analyses. Correlations of individual formants with tongue positions showed that F1 varied strongly with tongue position variations in the y dimension, whereas F2 was correlated in equal magnitude with variations in the x and y positions. For within-speaker analyses, the size of the acoustic vowel space is likely to provide a reasonable inference of size of the tongue working space for most speakers; unfortunately there is no a priori, obvious way to identify the speakers for whom the covariation is not significant. A second conclusion is that F1 variations reflect tongue height, but F2 is a much more complex reflection of tongue variation in both dimensions.
Boumil, Marcia M; Cutrell, Emily S; Lowney, Kathleen E; Berman, Harris A
Pharmaceutical companies routinely engage physicians, particularly those with prestigious academic credentials, to deliver "educational" talks to groups of physicians in the community to help market the company's brand-name drugs. Although presented as educational, and even though they provide educational content, these events are intended to influence decisions about drug selection in ways that are not based on the suitability and effectiveness of the product, but on the prestige and persuasiveness of the speaker. A number of state legislatures and most academic medical centers have attempted to restrict physician participation in pharmaceutical marketing activities, though most restrictions are not absolute and have proven difficult to enforce. This article reviews the literature on why Speakers' Bureaus have become a lightning rod for academic/industry conflicts of interest and examines the arguments of those who defend physician participation. It considers whether the restrictions on Speakers' Bureaus are consistent with principles of academic freedom and concludes with the legal and institutional efforts to manage industry speaking.
Gao, Yuan; Low, Renae; Jin, Putai; Sweller, John
Using a cognitive load theory approach, we investigated the effects of speaker variability when individuals are learning to understand English as a foreign language (EFL) spoken by foreign-accented speakers. The use of multiple, Indian-accented speakers was compared to that of a single speaker for Chinese EFL learners with a higher or lower…
Herman, Julia; Cote, Nicole Gilbert; Reilly, Lenore; Binder, Katherine S.
The goal of this study was to compare the literacy skills of adult native English and native Spanish ABE speakers. Participants were 169 native English speakers and 124 native Spanish speakers recruited from five prior research projects. The results showed that the native Spanish speakers were less skilled on morphology and passage comprehension…
Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina
Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on
McDonald, Malcolm W.
The National Aeronautics and Space Administration (NASA) and, in particular, the Marshall Space Flight Center (MSFC) have played pivotal roles in the advancement of space exploration and space-related science and discovery since the early 1960's. Many of the extraordinary accomplishments and advancements of NASA and MSFC have gone largely unheralded to the general public, though they often border on the miraculous. This lack of suitable and deserved announcement of these "miracles" seems to have occurred because NASA engineers and scientists are inclined to regard extraordinary accomplishment as a normal course of events. The goal in this project has been to determine an effective structure and mechanism for communicating to the general public the extent to which our investment in our US civilian space program, NASA, is, in fact, a very wise investment. The project has involved discerning important messages of truth which beg to be conveyed to the public. It also sought to identify MSFC personnel who are particularly effective as messengers or communicators. A third aspect of the project was to identify particular target audiences who would appreciate knowing the facts about their NASA investment. The intent is to incorporate the results into the formation of an effective, proactive MSFC speakers bureau. A corollary accomplishment for the summer was participation in the formation of an educational outreach program known as Nasa Ambassadors. Nasa Ambassadors are chosen from the participants in the various MSFC summer programs including: Summer Faculty Fellowship Program (SFFP), Science Teacher Enrichment Program (STEP), Community College Enrichment Program (CCEP), Joint Venture (JOVE) program, and the NASA Academy program. NASA Ambassadors agree to make pre-packaged NASA-related presentations to non-academic audiences in their home communities. The packaged presentations were created by a small cadre of participants from the 1996 MSFC summer programs, volunteering
to represent this speaker/ phoneme combination. For the individual speaker experiments, we chose model order for each speaker/ phoneme combination in...separately trained a model on each speaker/ phoneme combination. In phoneme - class (PC) classifier training, we grouped all speakers of a given phoneme into a...single phoneme . When the subspace is limited, CSIS may be able to find a better statistical model of the distribu- tiuon. A second piece of evidence that
So, Wing Chee; Kita, Sotaro; Goldin-Meadow, Susan
Previous research has found that iconic gestures (i.e., gestures that depict the actions, motions or shapes of entities) identify referents that are also lexically specified in the co-occurring speech produced by proficient speakers. This study examines whether concrete deictic gestures (i.e., gestures that point to physical entities) bear a different kind of relation to speech, and whether this relation is influenced by the language proficiency of the speakers. Two groups of speakers who had different levels of English proficiency were asked to retell a story in English. Their speech and gestures were transcribed and coded. Our findings showed that proficient speakers produced concrete deictic gestures for referents that were not specified in speech, and iconic gestures for referents that were specified in speech, suggesting that these two types of gestures bear different kinds of semantic relations with speech. In contrast, less proficient speakers produced concrete deictic gestures and iconic gestures whether or not referents were lexically specified in speech. Thus, both type of gesture and proficiency of speaker need to be considered when accounting for how gesture and speech are used in a narrative context. PMID:23337950
McDevitt, Teresa M.; Carroll, Marcalee
Speakers' ages and intentions were examined as influences on children's evaluations of orally presented messages. A total of 112 third-grade children listened to one of four speakers, two women and two 9-year-old girls, presenting the same essays on videotape. Half of the children assigned to each of the four speakers were told that the speaker…
Kamourieh, Salwa; Braga, Rodrigo M.; Leech, Robert; Newbould, Rexford D.; Malhotra, Paresh; Wise, Richard J. S.
Remembering what a speaker said depends on attention. During conversational speech, the emphasis is on working memory, but listening to a lecture encourages episodic memory encoding. With simultaneous interference from background speech, the need for auditory vigilance increases. We recreated these context-dependent demands on auditory attention in 2 ways. The first was to require participants to attend to one speaker in either the absence or presence of a distracting background speaker. The second was to alter the task demand, requiring either an immediate or delayed recall of the content of the attended speech. Across 2 fMRI studies, common activated regions associated with segregating attended from unattended speech were the right anterior insula and adjacent frontal operculum (aI/FOp), the left planum temporale, and the precuneus. In contrast, activity in a ventral right frontoparietal system was dependent on both the task demand and the presence of a competing speaker. Additional multivariate analyses identified other domain-general frontoparietal systems, where activity increased during attentive listening but was modulated little by the need for speech stream segregation in the presence of 2 speakers. These results make predictions about impairments in attentive listening in different communicative contexts following focal or diffuse brain pathology. PMID:25596592
Walton, Julie Hart
Sustained /a/ sounds were tape recorded from 50 adult male African-American and 50 adult male European -American speakers. A one-second acoustic sample was extracted from the mid-portion of each sustained vowel. Vowel samples from each African-American subject were randomly paired with those from European-American subjects. A one-second inter-stimulus interval of silence separated the two voices in the pair; the order of the voices in each pair was randomly selected. When presented with a tape of the 50 voice pairs, listeners could determine the race of the speaker with 60% accuracy. An acoustic analysis of the voices revealed that African-American speakers had a tendency toward greater frequency perturbation, significantly greater amplitude perturbation, and a significantly lower harmonics-to-noise ratio than the European-American speakers. An analysis of the listeners' responses revealed that the listeners may have relied on a combination of increased frequency perturbation, increased amplitude perturbation, and a lower harmonics-to-noise ratio to identify the African-American speakers.
Brunskog, Jonas; Gade, Anders Christian; Bellester, Gaspar Payá; Calbo, Lilian Reig
Teachers often suffer from health problems related to their voice. These problems are related to their working environment, including the acoustics of the lecture rooms. However, there is a lack of studies linking the room acoustic parameters to the voice produced by the speaker. In this pilot study, the main goals are to investigate whether objectively measurable parameters of the rooms can be related to an increase in the voice sound power produced by speakers and to the speakers' subjective judgments about the rooms. In six different rooms with different sizes, reverberation times, and other physical attributes, the sound power level produced by six speakers was measured. Objective room acoustic parameters were measured in the same rooms, including reverberation time and room gain, and questionnaires were handed out to people who had experience talking in the rooms. It is found that in different rooms significant changes in the sound power produced by the speaker can be found. It is also found that these changes mainly have to do with the size of the room and to the gain produced by the room. To describe this quality, a new room acoustic quantity called "room gain" is proposed.
Whitehill, Tara L
Most theoretical models of dysarthria have been developed based on research using individuals speaking English or other Indo-European languages. Studies of individuals with dysarthria speaking other languages can allow investigation into the universality of such models, and the interplay between language-specific and language-universal aspects of dysarthria. In this article, studies of Cantonese- and Mandarin-Chinese speakers with dysarthria are reviewed. The studies focused on 2 groups of speakers: those with cerebral palsy and those with Parkinson's disease. Key findings are compared with similar studies of English speakers. Since Chinese is tonal in nature, the impact of dysarthria on lexical tone has received considerable attention in the literature. The relationship between tone [which involves fundamental frequency (F(0)) control at the syllable level] and intonation (involving F(0) control at the sentential level) has received more recent attention. Many findings for Chinese speakers with dysarthria support earlier findings for English speakers, thus affirming the language-universal aspect of dysarthria. However, certain differences, which can be attributed to the distinct phonologies of Cantonese and Mandarin, highlight the language-specific aspects of the condition.
Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?
Kamourieh, Salwa; Braga, Rodrigo M; Leech, Robert; Newbould, Rexford D; Malhotra, Paresh; Wise, Richard J S
Remembering what a speaker said depends on attention. During conversational speech, the emphasis is on working memory, but listening to a lecture encourages episodic memory encoding. With simultaneous interference from background speech, the need for auditory vigilance increases. We recreated these context-dependent demands on auditory attention in 2 ways. The first was to require participants to attend to one speaker in either the absence or presence of a distracting background speaker. The second was to alter the task demand, requiring either an immediate or delayed recall of the content of the attended speech. Across 2 fMRI studies, common activated regions associated with segregating attended from unattended speech were the right anterior insula and adjacent frontal operculum (aI/FOp), the left planum temporale, and the precuneus. In contrast, activity in a ventral right frontoparietal system was dependent on both the task demand and the presence of a competing speaker. Additional multivariate analyses identified other domain-general frontoparietal systems, where activity increased during attentive listening but was modulated little by the need for speech stream segregation in the presence of 2 speakers. These results make predictions about impairments in attentive listening in different communicative contexts following focal or diffuse brain pathology.
Ladich, Friedrich; Wysocki, Lidia Eva
The auditory evoked potential (AEP) recording technique has proved to be a very versatile and successful approach in studying auditory sensitivities in fishes. The AEP protocol introduced by Kenyon, Ladich and Yan in 1998 using an air speaker with the fish positioned at the water surface gave auditory thresholds in goldfish very close to behavioural values published before. This approach was subsequently modified by several laboratories, raising the question whether speaker choice (air vs. underwater) or the position of subjects affect auditory threshold determination. To answer these questions, the hearing specialist Carassius auratus was measured using an air speaker, an underwater speaker and alternately positioning the fish directly at or 5cm below the water surface. Mean hearing thresholds obtained using these 4 different setups varied by 5.6dB, 3.7dB and 4dB at 200Hz, 500Hz and 1000Hz, respectively. Accordingly, pronounced differences in AEP thresholds in goldfish measured in different laboratories reflect other factors than speaker used and depth of the test subjects, namely variations in threshold definition, background noise, population differences, or calibration errors.
Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich
The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.
This paper presents and evaluates a modular/hybrid connectionist system for speaker identification. Modularity has emerged as a powerful technique for reducing the complexity of connectionist systems, and allowing a priori knowledge to be incorporated into their design. Text-independent speaker identification is an inherently complex task where the amount of training data is often limited. It thus provides an ideal domain to test the validity of the modular/hybrid connectionist approach. To achieve such identification, we develop, in this paper, an architecture based upon the cooperation of several connectionist modules, and a Hidden Markov Model module. When tested on a population of 102 speakers extracted from the DARPA-TIMIT database, perfect identification was obtained.
Mayer, Connor; Gick, Bryan
This study looks at how the conflicting goals of chewing and speech production are reconciled by examining the acoustic and articulatory output of talking while chewing. We consider chewing to be a type of perturbation with regard to speech production, but with some important differences. Ultrasound and acoustic measurements were made while participants chewed gum and produced various utterances containing the sounds /s/, /ʃ/, and /r/. Results show a great deal of individual variation in articulation and acoustics between speakers, but consistent productions and maintenance of relative acoustic distances within speakers. Although chewing interfered with speech production, and this interference manifested itself in a variety of ways across speakers, the objectives of speech production were indirectly achieved within the constraints and variability introduced by individual chewing strategies.
Robinson, E J; Champion, H; Mitchell, P
Children between the ages of 3 years 7 months and 6 years 5 months experienced a contradiction between what they knew or guessed to be inside a box and what they were told by an adult. The authors investigated whether children believed what they were told by asking them to make a final judgment about the box's content. Children tended to believe utterances from speakers who were better informed than they themselves were and to disbelieve those from less well-informed speakers, with no age-related differences. This behavior implies an understanding of the speaker's knowledge and suggests that children can learn from oral input while being appropriately skeptical of its truth. Children also gave explicit knowledge judgments on trials on which no utterances were given. Performance on knowledge trials was less accurate than, and unrelated to, performance on utterance trials. Research on children's developing explicit theory of mind needs to be broadened to include behavioral indexes of understanding the mind.
Pruitt, John S; Jenkins, James J; Strange, Winifred
Perception of second language speech sounds is influenced by one's first language. For example, speakers of American English have difficulty perceiving dental versus retroflex stop consonants in Hindi although English has both dental and retroflex allophones of alveolar stops. Japanese, unlike English, has a contrast similar to Hindi, specifically, the Japanese /d/ versus the flapped /r/ which is sometimes produced as a retroflex. This study compared American and Japanese speakers' identification of the Hindi contrast in CV syllable contexts where C varied in voicing and aspiration. The study then evaluated the participants' increase in identifying the distinction after training with a computer-interactive program. Training sessions progressively increased in difficulty by decreasing the extent of vowel truncation in stimuli and by adding new speakers. Although all participants improved significantly, Japanese participants were more accurate than Americans in distinguishing the contrast on pretest, during training, and on posttest. Transfer was observed to three new consonantal contexts, a new vowel context, and a new speaker's productions. Some abstract aspect of the contrast was apparently learned during training. It is suggested that allophonic experience with dental and retroflex stops may be detrimental to perception of the new contrast.
Nappa, Rebecca; Arnold, Jennifer E
A series of experiments explore the effects of attention-directing cues on pronoun resolution, contrasting four specific hypotheses about the interpretation of ambiguous pronouns he and she: (1) it is driven by grammatical rules, (2) it is primarily a function of social processing of the speaker's intention to communicate, (3) it is modulated by the listener's own egocentric attention, and (4) it is primarily a function of learned probabilistic cues. Experiment 1 demonstrates that pronoun interpretation is guided by the well-known N1 (first-mention) bias, which is also modulated by both the speaker's gaze and pointing gestures. Experiment 2 demonstrates that a low-level visual capture cue has no effect on pronoun interpretation, in contrast with the social cue of pointing. Experiment 3 uses a novel intentional cue: the same attention-capture flash as in Experiment 2, but with instructions that the cue is intentionally created by the speaker. This cue does modulate the N1 bias, demonstrating the importance of information about the speaker's intentions to pronoun resolution. Taken in sum, these findings demonstrate that pronoun resolution is a process best categorized as driven by an appreciation of the speaker's communicative intent, which may be subserved by a sensitivity to predictive cues in the environment.
Zhang, Cuiling; van de Weijer, Joost; Cui, Jingxu
Speech variation of speakers is a crucial issue for speaker recognition and identification, especially for forensic practice. Greater intra-speaker variation is one main reason for incorrect speaker identification in real forensic situations. Understanding the stability of acoustic parameters and their variation in speech is therefore significant for the evaluation of effective parameters for speaker identification. In this paper, all vowels in Standard Chinese including five monophthongs, eight diphthongs and four triphthongs were combined with lateral /l/. Finally, 15 lateral syllables with different tones for 10 speakers were selected and acoustically analyzed. Central frequencies of the first four formants for each syllable were measured for quantitative comparison of intra- and inter-speaker variation in order to provide a general idea of speaker variation in Standard Chinese, and finally serving for forensic application. Results show that the overall intra-speaker variation is less than the inter-speaker variation in great extent under laboratory condition though occasionally they are contrary. This supports the basis for forensic speaker identification, that is, intra-speaker variation should be less than inter-speaker variation in many acoustic features, and further validates the probability and reliability of forensic speaker identification.
Mayer, Richard E.; Sobko, Kristina; Mautone, Patricia D.
In 2 experiments, learners who were seated at a computer workstation received a narrated animation about lightning formation. Then, they took a retention test, a transfer test, and rated the speaker. The results are consistent with social agency theory, which posits that social cues in multimedia messages can encourage learners to interpret…
This study investigated the effects of first language prosodic transfer on the perception and production of English lexical stress and the relation between stress perception and production by second language learners. To test the effect of Thai tonal distribution rules and stress patterns on native Thai speakers' perception and production of…
Harris, Catherine L.
Bilingual speakers report experiencing stronger emotions when speaking and hearing their first language compared to their second. Does this occur even when a second language is learned early and becomes the dominant language? Spanish-English bilinguals who had grown up in the USA (early learners) or those who were first exposed to English during…
Chen, Yang; Ng, Manwa L; Li, Tie-Shan
The present study attempted to test the postulate that sounds of a foreign language that are familiar can be produced with less accuracy than sounds that are new to second language (L2) learners. The first two formant frequencies (F1 and F2) were obtained from the 11 English monophthong vowels produced by 40 Cantonese-English (CE) bilingual and 40 native American English monolingual speakers. Based on F1 and F2, compact-diffuse (C-D) and grave-acute (G-A) values, and Euclidean Distance (ED) associated with the English vowels were evaluated and correlated with the perceived amount of accent present in the vowels. Results indicated that both male and female CE speakers exhibited different vowel spaces compared to their AE counterparts. While C-D and G-A indicated that acquisition of familiar and new vowels were not particularly different, ED values suggested better performance in CE speakers' productions of familiar vowels over new vowels. In conclusion, analyses based on spectral measurements obtained from the English vowel sounds produced by CE speakers did not provide favourable evidence to support the Speech Learning Model (SLM) proposed by Flege (1995) . Nevertheless, for both familiar and new sounds, English back vowels were found to be produced with greater inaccuracy than English front vowels.
Raman, Ilhan; Weekes, Brendan S.
The Turkish script is characterised by completely transparent bidirectional mappings between orthography and phonology. To date, there has been no reported evidence of acquired dyslexia in Turkish speakers leading to the naive view that reading and writing problems in Turkish are probably rare. We examined the extent to which phonological…
Boroditsky, Lera; Fuhrman, Orly; McCormick, Kelly
Time is a fundamental domain of experience. In this paper we ask whether aspects of language and culture affect how people think about this domain. Specifically, we consider whether English and Mandarin speakers think about time differently. We review all of the available evidence both for and against this hypothesis, and report new data that…
Craith, M. Nic
Examines the Irish language community in Northern Ireland, and questions the validity of the census results of 1991. Particular focus is on the concept of a mother tongue and its relevance for speakers of Irish in the United Kingdom. Discusses measures to improve the status of Irish as a result of the Good Friday Agreement. (Author/VWL)
Beasley, Christopher; Torres-Harding, Susan; Pedersen, Paula J.
Recent societal trends indicate more tolerance for homosexuality, but prejudice remains on college campuses. Speaker panels are commonly used in classrooms as a way to educate students about sexual diversity and decrease negative attitudes toward sexual diversity. The advent of computer-delivered instruction presents a unique opportunity to…
This article speaks to teachers who have been paired with native speakers (NSs) who have never taught before, and the feelings of frustration, discouragement, and nervousness on the teacher's behalf that can occur as a result. In order to effectively tackle this situation, teachers need to work together with the NSs. Teachers in this scenario…
Brown, Amanda; Gullberg, Marianne
Native speakers show systematic variation in a range of linguistic domains as a function of a variety of sociolinguistic variables. This article addresses native language variation in the context of multicompetence, i.e. knowledge of two languages in one mind (Cook, 1991). Descriptions of motion were elicited from functionally monolingual and…
Mol, Lisette; Krahmer, Emiel; van de Sandt-Koenderman, Mieke
Purpose: To study the independence of gesture and verbal language production. The authors assessed whether gesture can be semantically compensatory in cases of verbal language impairment and whether speakers with aphasia and control participants use similar depiction techniques in gesture. Method: The informativeness of gesture was assessed in 3…
Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.
The idea of a language-specific articulatory setting (AS), an underlying posture of the articulators during speech, has existed for centuries [Laver, Historiogr. Ling. 5 (1978)], but until recently it had eluded direct measurement. In an analysis of x-ray movies of French and English monolingual speakers, Gick et al. [Phonetica (in press)] link AS to inter-speech posture, allowing measurement of AS without interference from segmental targets during speech, and they give quantitative evidence showing AS to be language-specific. In the present study, ultrasound and Optotrak are used to investigate whether bilingual English-French speakers have two ASs, and whether this varies depending on the mode (monolingual or bilingual) these speakers are in. Specifically, for inter-speech posture of the lips, lip aperture and protrusion are measured using Optotrak. For inter-speech posture of the tongue, tongue root retraction, tongue body and tongue tip height are measured using optically-corrected ultrasound. Segmental context is balanced across the two languages ensuring that the sets of sounds before and after an inter-speech posture are consistent across languages. By testing bilingual speakers, vocal tract morphology across languages is controlled for. Results have implications for L2 acquisition, specifically the teaching and acquisition of pronunciation.
ROBINETT, BETTY WALLACE; AND OTHERS
THE CONTENTS OF THIS SERIES (A COMPILATION OF PAPERS READ AT THE TEACHERS OF ENGLISH TO SPEAKERS OF OTHER LANGUAGES (TESOL) CONFERENCE, NEW YORK CITY, MARCH 17-19, 1966) ARE GROUPED ACCORDING TO GENERAL SUBJECT (AND AUTHORS)--(1) TESOL AS A PROFESSIONAL FIELD (S. OHANNESSIAN, A.H. MARCKWARDT, G. CAPELLE, D. GLICKSBERG), (2) REPORTS ON SPECIAL…
Robinett, Betty Wallace, Ed.
The contents of this series (a compilation of papers read at the Teachers of English to Speakers of Other Languages Conference, New York City, March 17-19, 1966) are grouped according to general subject and authors--(1) TESOL as a Professional Field, by S. Ohannessian, A.H. Marckwardt, G. Capelle, D. Glicksberg; (2) Reports on Special Programs, by…
English speakers talk and think about Time in terms of physical space. The past is behind us, and the future is in front of us. In this way, we "map" space onto Time. This dissertation addresses the specificity of this physical space, or its topography. Inspired by languages like Yupno (Nunez, et al., 2012) and Bamileke-Dschang (Hyman,…
In response to new theoretical claims and inconclusive empirical findings regarding relative clauses in East Asian languages, this study examined the factors relevant to relative clause production by Korean heritage speakers. Gap position (subject vs. object), animacy (plus or minus animate), and the topicality of head nouns (plus or minus…
Bolger, Patrick A.; Zapata, Gabriela C.
This paper focuses on the dearth of language-processing research addressing Spanish heritage speakers in assimilationist communities. First, we review key offline work on this population, and we then summarize the few psycholinguistic (online) studies that have already been carried out. In an attempt to encourage more such research, in the next…
Wiener, Florence D.; And Others
Normative data for 198 urban children (four to eight years old) who speak Black American English (BAE) were obtained on the Test of Language Development. Results revealed speakers of BAE differed significantly in performance from children on whom test was standardized. Difference in performance was reflected in overall test scores and in…
This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.
Foreign language learners purportedly demonstrate intercultural communicative competence in native speaker (NS) chat rooms through self-initiated negotiation sequences, including those triggered by pragmatic issues and cultural content. This study identified and classified one-to-one NS-learner negotiations between intermediate learners and NS of…
Stephens, Elizabeth; Suarez, Sarah; Koenig, Melissa
Testimony provides children with a rich source of knowledge about the world and the people in it. However, testimony is not guaranteed to be veridical, and speakers vary greatly in both knowledge and intent. In this chapter, we argue that children encounter two primary types of conflicts when learning from speakers: conflicts of knowledge and conflicts of interest. We review recent research on children's selective trust in testimony and propose two distinct mechanisms supporting early epistemic vigilance in response to the conflicts associated with speakers. The first section of the chapter focuses on the mechanism of coherence checking, which occurs during the process of message comprehension and facilitates children's comparison of information communicated through testimony to their prior knowledge, alerting them to inaccurate, inconsistent, irrational, and implausible messages. The second section focuses on source-monitoring processes. When children lack relevant prior knowledge with which to evaluate testimonial messages, they monitor speakers themselves for evidence of competence and morality, attending to cues such as confidence, consensus, access to information, prosocial and antisocial behavior, and group membership.
The study investigates strategies and contexts for supporting the literacy development of young, augmented speakers, whose difficulties in literacy learning are not explained by their levels of cognition alone. Indeed, quantitative and qualitative differences exist in their literacy experiences at home and school. In this study, four primary…
Kessler, Luise; Schweinberger, Stefan R.
A speaker’s gaze behaviour can provide perceivers with a multitude of cues which are relevant for communication, thus constituting an important non-verbal interaction channel. The present study investigated whether direct eye gaze of a speaker affects the likelihood of listeners believing truth-ambiguous statements. Participants were presented with videos in which a speaker produced such statements with either direct or averted gaze. The statements were selected through a rating study to ensure that participants were unlikely to know a-priori whether they were true or not (e.g., “sniffer dogs cannot smell the difference between identical twins”). Participants indicated in a forced-choice task whether or not they believed each statement. We found that participants were more likely to believe statements by a speaker looking at them directly, compared to a speaker with averted gaze. Moreover, when participants disagreed with a statement, they were slower to do so when the statement was uttered with direct (compared to averted) gaze, suggesting that the process of rejecting a statement as untrue may be inhibited when that statement is accompanied by direct gaze. PMID:27643789
Glisan, Eileen W.; Drescher, Victor
A study examined the occurrence of specific grammatical structures (double object pronouns, nominalization with "lo," demonstrative adjectives/pronouns, and possessive adjectives/pronouns) in oral samples of native speaker Spanish and compared the results with the treatment of the structures in six beginning-level college Spanish textbooks.…
Abuom, Tom O.; Shah, Emmah; Bastiaanse, Roelien
For this study, sentence comprehension was tested in Swahili-English bilingual agrammatic speakers. The sentences were controlled for four factors: (1) order of the arguments (base vs. derived); (2) embedding (declarative vs. relative sentences); (3) overt use of the relative pronoun "who"; (4) language (English and Swahili). Two…
This handbook is designed for native English speakers who are preparing to teach English in China. The contents of the handbook are selected based on the findings of face-to-face interviews and a questionnaire survey conducted by the author with experienced native English teachers to China as the partial fulfillment of her Master's in TESOL…
A study examined the use of the ethnic language as it relates to ethnic self-identification in three generations of a bilingual family of Mexican origin in San Antonio (Texas). Family members were speakers of Texas Spanish and English. Two questionnaires and follow-up discussions examined fluency in Spanish and English; language preferences;…
Ludi, Georges; Py, Bernard
The bi/plurilingual person is a unique speaker-hearer who should be studied as such and not always in comparison with the monolingual. As such, unilingual linguistic models and perspectives based on the idea that bilingualism is a duplication of competences in two languages (or more) are unsuitable to describe plural practices in multilingual…
24. AIR-CONDITIONING DUCT, WINCH CONTROL BOX, AND SPEAKER AT STATION 85.5 OF MST. FOLDED-UP PLATFORM ON RIGHT OF PHOTO. - Vandenberg Air Force Base, Space Launch Complex 3, Launch Pad 3 East, Napa & Alden Roads, Lompoc, Santa Barbara County, CA
Montrul, Silvina; Sanchez-Walker, Noelia
We report the results of two studies that investigate the factors contributing to non-native-like ability in child and adult heritage speakers by focusing on oral production of Differential Object Marking (DOM), the overt morphological marking of animate direct objects in Spanish. In study 1, 39 school-age bilingual children (ages 6-17) from the…
Aparicio, Frances R.
Problems are discussed and recommendations made for developing programs and materials for Spanish instruction of Spanish speakers. General and specific suggestions are made for material selection and class activities, including spelling, composition, self-editing, dialog writing, transcription, translation, student-teacher conferences, journals,…
Blanco, George M.
This guide provides Texas teachers and administrators with guidelines, goals, instructional strategies, and activities for teaching Spanish to secondary level native speakers. It is based on the principle that the Spanish speaking student is the strongest linguistic and cultural resource to Texas teachers of languages other than English, and one…
Jaber, Maysa; Hussein, Riyad F.
This study is aimed at investigating the rating and intelligibility of different non-native varieties of English, namely French English, Japanese English and Jordanian English by native English speakers and their attitudes towards these foreign accents. To achieve the goals of this study, the researchers used a web-based questionnaire which…
Many studies have shown that verb inflections are difficult to produce for agrammatic aphasic speakers: they are frequently omitted and substituted. The present article gives an overview of our search to understanding why this is the case. The hypothesis is that grammatical morphology referring to the past is selectively impaired in agrammatic…
Sobel, David M.; Macris, Deanna M.
Many studies suggest that preschoolers rely on individuals' histories of generating accurate lexical information when learning novel lexical information from them. The present study examined whether children used a speaker's accuracy about one kind of linguistic knowledge to make inferences about another kind of linguistic knowledge, focusing…
Pascual Cabo, Diego
This study contributes to current trends of heritage speaker (HS) acquisition research by examining the syntax of psych-predicates in HS Spanish. Broadly defined, psych-predicates communicate states of emotions (e.g., to love) and have traditionally been categorized as belonging to one of three classes: class I--"temere" "to…
Kohn, Mary Elizabeth; Farrington, Charlie
Speaker vowel formant normalization, a technique that controls for variation introduced by physical differences between speakers, is necessary in variationist studies to compare speakers of different ages, genders, and physiological makeup in order to understand non-physiological variation patterns within populations. Many algorithms have been established to reduce variation introduced into vocalic data from physiological sources. The lack of real-time studies tracking the effectiveness of these normalization algorithms from childhood through adolescence inhibits exploration of child participation in vowel shifts. This analysis compares normalization techniques applied to data collected from ten African American children across five time points. Linear regressions compare the reduction in variation attributable to age and gender for each speaker for the vowels BEET, BAT, BOT, BUT, and BOAR. A normalization technique is successful if it maintains variation attributable to a reference sociolinguistic variable, while reducing variation attributable to age. Results indicate that normalization techniques which rely on both a measure of central tendency and range of the vowel space perform best at reducing variation attributable to age, although some variation attributable to age persists after normalization for some sections of the vowel space.
Paul, Rhea; Bianchi, Nancy; Augustyn, Amy; Klin, Ami; Volkmar, Fred R.
This paper reports a study of the ability to reproduce stress in a nonsense syllable imitation task by adolescent speakers with autism spectrum disorders (ASD), as compared to typically developing (TD) age-mates. Results are reported for both raters' judgments of the subjects' stress production, as well as acoustic measures of pitch range and…
Explores the potential of the use of voice-recognition technology with second-language speakers of English. Involves the analysis of the output produced by a small group of very competent second-language subjects reading a text into the voice recognition software Dragon Systems "Dragon NaturallySpeaking." (Author/VWL)
Writing teachers need to recognize the special circumstances of culturally displaced students. A specific category of such students are those from the Asian subcontinent, who are not exactly non-native speakers of English, but who do speak non-standard American English. These students occupy a subaltern (marginal) position: they can neither be…
Senay, Ibrahim; Keysar, Boaz
A long and narrow piece of wood is "a bat," "a stick," "a club," or "firewood." In fact, anything can be described from multiple perspectives, each suggesting a different conceptualization. People keep track of how speakers conceptualize things and expect them to describe them similarly in the future. This article demonstrates that these…
Salmelin, R; Schnitzler, A; Schmitz, F; Freund, H J
Ten fluent speakers and nine developmental stutterers read isolated nouns aloud in a delayed reading paradigm. Cortical activation sequences were mapped with a whole-head magnetoencephalography system. The stutterers were mostly fluent in this task. Although the overt performance was essentially identical in the two groups, the cortical activation patterns showed clear differences, both in the evoked responses, time-locked to word presentation and mouth movement onset, and in task-related suppression of 20-Hz oscillations. Within the first 400 ms after seeing the word, processing in fluent speakers advanced from the left inferior frontal cortex (articulatory programming) to the left lateral central sulcus and dorsal premotor cortex (motor preparation). This sequence was reversed in the stutterers, who showed an early left motor cortex activation followed by a delayed left inferior frontal signal. Stutterers thus appeared to initiate motor programmes before preparation of the articulatory code. During speech production, the right motor/premotor cortex generated consistent evoked activation in fluent speakers but was silent in stutterers. On the other hand, suppression of motor cortical 20-Hz rhythm, reflecting task-related neuronal processing, occurred bilaterally in both groups. Moreover, the suppression was right-hemisphere dominant in stutterers, as opposed to left-hemisphere dominant in fluent speakers. Accordingly, the right frontal cortex of stutterers was highly active during speech production but did not generate synchronous time-locked responses. The speech-related 20-Hz suppression concentrated in the mouth area in fluent speakers, but was evident in both the hand and mouth areas in stutterers. These findings may reflect imprecise functional connectivity within the right frontal cortex and incomplete segregation between the adjacent hand and mouth motor representations in stutterers during speech production. A network including the left inferior frontal
This study investigated the acoustic basis of across-utterance, within-speaker variation in speech naturalness for four speakers with dysarthria secondary to Parkinson's disease (PD). Speakers read sentences and produced spontaneous speech. Acoustic measures of fundamental frequency, phrase-final syllable lengthening, intensity and speech rate were obtained. A group of listeners judged speech naturalness using a nine-point Likert scale. Relationships between judgements of speech naturalness and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures also were quantified. Despite variability between speakers, measures of mean F0, intensity range, articulation rate, average syllable duration, duration of final syllables, vocalic nucleus length of final unstressed syllables and pitch accent of final syllables emerged as possible acoustic variables contributing to within-speaker variations in speech naturalness. Results suggest that acoustic measures correlate with speech naturalness, but in dysarthric speech they depend on the speaker due to the within-speaker variation in speech impairment.
Leemann, Adrian; Kolly, Marie-José; Dellwo, Volker
Everyday experience tells us that it is often possible to identify a familiar speaker solely by his/her voice. Such observations reveal that speakers carry individual features in their voices. The present study examines how suprasegmental temporal features contribute to speaker-individuality. Based on data of a homogeneous group of Zurich German speakers, we conducted an experiment that included speaking style variability (spontaneous vs. read speech) and channel variability (high-quality vs. mobile phone-transmitted speech), both of which are characteristic of forensic casework. Speakers demonstrated high between-speaker variability in both read and spontaneous speech, and low within-speaker variability across the two speaking styles. Results further revealed that distortions of the type introduced by mobile telephony had little effect on suprasegmental temporal characteristics. Given this evidence of speaker-individuality, we discuss suprasegmental temporal features' potential for forensic voice comparison.
This paper explores the apparent contradiction between the valuing and promoting of diverse literacies in most UK HEIs, and the discursive construction of spoken native-speaker English as the medium of good grades and prestige academic knowledge. During group interviews on their experiences of university internationalisation, 38 undergraduate…
Stricker, L. J.
The purpose of this study was to replicate previous research on the construct validity of the paper-based version of the TOEFL and extend it to the computer-based TOEFL. Two samples of Graduate Record Examination (GRE) General Test-takers were used: native speakers of English specially recruited to take the computer-based TOEFL, and ESL…
The German /r/ sound is one of the most difficult sounds for American English (AE) speakers who are learning German as a foreign language to produce. The standard German /r/ variant [/R/] and dialectal variant [R] are achieved by varying the tongue constriction degree, while keeping the place of articulation constant [Schiller and Mooshammer…
De Cat, Cecile; Klepousniotou, Ekaterini; Baayen, R. Harald
The processing of English noun-noun compounds (NNCs) was investigated to identify the extent and nature of differences between the performance of native speakers of English and advanced Spanish and German non-native speakers of English. The study sought to establish whether the word order of the equivalent structure in the non-native speakers' mothertongue (L1) had an influence on their processing of NNCs in their second language (L2), and whether this influence was due to differences in grammatical representation (i.e., incomplete acquisition of the relevant structure) or processing effects. Two mask-primed lexical decision experiments were conducted in which compounds were presented with their constituent nouns in licit vs. reversed order. The first experiment used a speeded lexical decision task with reaction time registration, and the second a delayed lexical decision task with EEG registration. There were no significant group differences in accuracy in the licit word order condition, suggesting that the grammatical representation had been fully acquired by the non-native speakers. However, the Spanish speakers made slightly more errors with the reversed order and had longer response times, suggesting an L1 interference effect (as the reverse order matches the licit word order in Spanish). The EEG data, analyzed with generalized additive mixed models, further supported this hypothesis. The EEG waveform of the non-native speakers was characterized by a slightly later onset N400 in the violation condition (reversed constituent order). Compound frequency predicted the amplitude of the EEG signal for the licit word order for native speakers, but for the reversed constituent order for Spanish speakers—the licit order in their L1—supporting the hypothesis that Spanish speakers are affected by interferences from their L1. The pattern of results for the German speakers in the violation condition suggested a strong conflict arising due to licit constituents being
Bellandese, Mary H.
The purpose of this study was to determine if there was a relationship between fundamental frequency (Fo) and gender identification in standard esophageal (ES) or tracheoesophageal (TE) speakers. Twenty-three male and 20 female ES and TE speakers participated in this study. Recordings of these speakers reading the Rainbow Passage were played to 48…
Gorman, Kristen S.; Gegg-Harrison, Whitney; Marsh, Chelsea R.; Tanenhaus, Michael K.
When referring to named objects, speakers can choose either a name ("mbira") or a description ("that gourd-like instrument with metal strips"); whether the name provides useful information depends on whether the speaker's knowledge of the name is shared with the addressee. But, how do speakers determine what is shared? In 2…
Gilbert, Harvey R.; Ferrand, Carole T.
Respirometric quotients (RQ), the ratio of oral air volume expended to total volume expended, were obtained from the productions of oral and nasal airflow of 10 speakers with cleft palate, with and without their prosthetic appliances, and 10 normal speakers. Cleft palate speakers without their appliances exhibited the lowest RQ values. (Author/DB)
Montrul, Silvina; Davidson, Justin; De La Fuente, Israel; Foote, Rebecca
We examined how age of acquisition in Spanish heritage speakers and L2 learners interacts with implicitness vs. explicitness of tasks in gender processing of canonical and non-canonical ending nouns. Twenty-three Spanish native speakers, 29 heritage speakers, and 33 proficiency-matched L2 learners completed three on-line spoken word recognition…
Anderson-Hsieh, Janet; Koehler, Kenneth
A study investigated the effect of foreign accent and speaking rate on native English speaker comprehension. Three native Chinese speakers and one native speaker of American English read passages at different speaking rates. Comprehension scores showed that an increase in speaking rate and heavily accented English decreased listener comprehension.…
Doerr, Neriko Musha; Kumagai, Yuri
"Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…
Andreou, Georgia; Karapetsas, Anargyros; Galantomos, Ioannis
This study investigated the performance of native and non native speakers of Modern Greek language on morphology and syntax tasks. Non-native speakers of Greek whose native language was English, which is a language with strict word order and simple morphology, made more errors and answered more slowly than native speakers on morphology but not…
McGarr, Nancy S.; Raphael, Lawrence J.; Kolia, Betty; Vorperian, Houri K.; Harris, Katherine
Using electopalatography, this study investigated the production of sibilants produced by four adults who have severe-to-profound hearing loss and four speakers with normal hearing. Each speaker wore a Rion[R] semi-flexible electroplate while producing multiple repetitions of the utterances "see, sue, she, shoe." The speakers' productions were…
Goregliad Fjaellingsdal, Tatiana; Ruigendijk, Esther; Scherbaum, Stefan; Bleichner, Martin G.
Language occurs naturally in conversations. However, the study of the neural underpinnings of language has mainly taken place in single individuals using controlled language material. The interactive elements of a conversation (e.g., turn-taking) are often not part of neurolinguistic setups. The prime reason is the difficulty to combine open unrestricted conversations with the requirements of neuroimaging. It is necessary to find a trade-off between the naturalness of a conversation and the restrictions imposed by neuroscientific methods to allow for ecologically more valid studies. Here, we make an attempt to study the effects of a conversational element, namely turn-taking, on linguistic neural correlates, specifically the N400 effect. We focus on the physiological aspect of turn-taking, the speaker-switch, and its effect on the detectability of the N400 effect. The N400 event-related potential reflects expectation violations in a semantic context; the N400 effect describes the difference of the N400 amplitude between semantically expected and unexpected items. Sentences with semantically congruent and incongruent final words were presented in two turn-taking modes: (1) reading aloud first part of the sentence and listening to speaker-switch for the final word, and (2) listening to first part of the sentence and speaker-switch for the final word. A significant N400 effect was found for both turn-taking modes, which was not influenced by the mode itself. However, the mode significantly affected the P200, which was increased for the reading aloud mode compared to the listening mode. Our results show that an N400 effect can be detected during a speaker-switch. Speech articulation (reading aloud) before the analyzed sentence fragment did also not impede the N400 effect detection for the final word. The speaker-switch, however, seems to influence earlier components of the electroencephalogram, related to processing of salient stimuli. We conclude that the N400 can
Sjerps, Matthias J; Mitterer, Holger; McQueen, James M
This study used an active multiple-deviant oddball design to investigate the time-course of normalization processes that help listeners deal with between-speaker variability. Electroencephalograms were recorded while Dutch listeners heard sequences of non-words (standards and occasional deviants). Deviants were [ipapu] or [ɛpapu], and the standard was [(I)(ɛ)papu], where [(I)(ɛ)] was a vowel that was ambiguous between [ɛ] and [i]. These sequences were presented in two conditions, which differed with respect to the vocal-tract characteristics (i.e., the average 1st formant frequency) of the [papu] part, but not of the initial vowels [i], [ɛ] or [(I)(ɛ)] (these vowels were thus identical across conditions). Listeners more often detected a shift from [(I)(ɛ)papu] to [ɛpapu] than from [(I)(ɛ)papu] to [ipapu] in the high F(1) context condition; the reverse was true in the low F(1) context condition. This shows that listeners' perception of vowels differs depending on the speaker's vocal-tract characteristics, as revealed in the speech surrounding those vowels. Cortical electrophysiological responses reflected this normalization process as early as about 120 ms after vowel onset, which suggests that shifts in perception precede influences due to conscious biases or decision strategies. Listeners' abilities to normalize for speaker-vocal-tract properties are for an important part the result of a process that influences representations of speech sounds early in the speech processing stream.
Perusco, Andrew; Poder, Natasha; Mohsin, Mohammed; Rikard-Bell, Glenys; Rissel, Chris; Williams, Mandy; Hua, Myna; Millen, Elizabeth; Sabry, Marial; Guirguis, Sanaa
Tobacco control is a health promotion priority, but there is limited evidence on the effectiveness of campaigns targeting culturally and linguistically diverse (CALD) populations. Being the largest population of non-English-speaking smokers residing in New South Wales (NSW), Australia, Arabic-speakers are a priority population for tobacco control. We report findings from baseline and post-intervention cross-sectional telephone surveys evaluating a comprehensive social marketing campaign (SMC) specifically targeting Arabic-speakers residing in south west Sydney, NSW. The project was associated with a decline in self-reported smoking prevalence from 26% at baseline to 20.7% at post (p < 0.05) and an increase in self-reported smoke-free households from 67.1% at baseline to 74.9% at post (p < 0.05). This paper contributes evidence that comprehensive SMCs targeting CALD populations can reduce smoking prevalence and influence smoking norms in CALD populations.
Joshi, Priyanka D; Wakslak, Cheryl J
Audience characteristics often shape communicators' message framing. Drawing from construal level theory, we suggest that when speaking to many individuals, communicators frame messages in terms of superordinate characteristics that focus attention on the essence of the message. On the other hand, when communicating with a single individual, communicators increasingly describe events and actions in terms of their concrete details. Using different communication tasks and measures of construal, we show that speakers communicating with many individuals, compared with 1 person, describe events more abstractly (Study 1), describe themselves as more trait-like (Study 2), and use more desirability-related persuasive messages (Study 3). Furthermore, speakers' motivation to communicate with their audience moderates their tendency to frame messages based on audience size (Studies 3 and 4). This audience-size abstraction effect is eliminated when a large audience is described as homogeneous, suggesting that people use abstract construal strategically in order to connect across a disparate group of individuals (Study 5). Finally, we show that participants' experienced fluency in communication is influenced by the match between message abstraction and audience size (Study 6).
Badalamenti, A; Langs, R
This paper has presented a quantitative/stochastic research approach to aspects of a psychotherapy with an HIV-positive patient. The Box-Jenkins (ARIMA) method for time-series analysis was applied to the duration of speaker roles for a single male patient in his first three sessions with a male therapist. The results indicate a definitive underlying stochastic structure of order (1, 1, 1). This indicates that the sequence of speaker durations was nonstationary (chaotic), while the rate at which the sequence changed (velocity) showed a clear structure reflecting the influence on the system of the prior state, and of the prior and present shocks. The histograms of the frequency of utterances of varying time lengths revealed a second deep structure through the emergence of an exponential characteristic of a Poisson model. The Poisson rate constants--a measure of interruption and holding tendencies--were stable across interviews. Intervention analysis identified epochs in each interview that significantly altered the progression of the time series. A qualitative investigation of these periods of destabilization suggested that sudden allusions to highly charged topics, including the likelihood of a fatal outcome of the patient's illness, and especially unconscious criticisms of the therapist by the patient, created a large share of these interludes. Finally, there is a discussion of the limitations and implications of these findings, including a comparison with similar analyses for 6 psychotherapy consultation sessions.
Finlayson, Ian R; Corley, Martin
Disfluency is a characteristic feature of spontaneous human speech, commonly seen as a consequence of problems with production. However, the question remains open as to why speakers are disfluent: Is it a mechanical by-product of planning difficulty, or do speakers use disfluency in dialogue to manage listeners' expectations? To address this question, we present two experiments investigating the production of disfluency in monologue and dialogue situations. Dialogue affected the linguistic choices made by participants, who aligned on referring expressions by choosing less frequent names for ambiguous images where those names had previously been mentioned. However, participants were no more disfluent in dialogue than in monologue situations, and the distribution of types of disfluency used remained constant. Our evidence rules out at least a straightforward interpretation of the view that disfluencies are an intentional signal in dialogue.
Mori, Hiroki; Ohshima, Koh
A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
Ge, Zhenhao; Sharma, Sudhendu R.; Smith, Mark J. T.
Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and efficiency. This paper presents a novel PCA/LDA-based approach that is faster than traditional statistical model-based methods and achieves competitive results. First, the performance based on only PCA and only LDA is measured; then a mixed model, taking advantages of both methods, is introduced. A subset of the TIMIT corpus composed of 200 male speakers, is used for enrollment, validation and testing. The best results achieve 100%, 96% and 95% classification rate at population level 50, 100 and 200, using 39- dimensional MFCC features with delta and double delta. These results are based on 12-second text-independent speech for training and 4-second data for test. These are comparable to the conventional MFCC-GMM methods, but require significantly less time to train and operate.
Weaver, H.J.; Burdick, R.B.
When dynamically testing delicate laser components (e.g. an elliptical glass laser disc) it is often impossible to provide a direct contact excitation source such as an impact hammer or shaker. This is because of the delicate and/or brittle nature of the material from which the components are constructed. The alternate approach that is often used in a test of this type is to excite the component with an acoustic speaker. In this paper we describe a small series of tests in which we compare the modal parameters obtained by using a speaker as an excitation source with those obtained on the same object when the excitation was provided by a shaker.
Guppies RL/IRAA 32 Hangar Rd Griffiss AFB NY 13441 11. SUPPLEMENTARY NOTES 12a. DISTRIBUTION/ AVAILABILITY STATEMENT 12b. DISTRIBUTION CODE Distribution...contains ten examples of each of the spoken digits ("zero" through "nine") for eight different speakers; four male and four female. The speech recordings...there were no overlapping windows. Once the feature vector was determined, the features were level normalized . This was achieved by subtracting each
Allen, Virginia French, Ed.
The contents of this volume, a compilation of papers read at the first conference of TESOL (Teachers of English to Speakers of Other Languages), are grouped according to general subject and authors: (1) TESOL as a Professional Field--A.H. Marckwardt, F.J. Colligan, W.F. Marquardt; (2) Reports on Special Programs--J.E. Officer, R.B. Long, M.C.…
Zhang, Caicai; Peng, Gang; Shao, Jing; Wang, William S-Y
Congenital amusia is a lifelong neurodevelopmental disorder of fine-grained pitch processing. In this fMRI study, we examined the neural bases of congenial amusia in speakers of a tonal language - Cantonese. Previous studies on non-tonal language speakers suggest that the neural deficits of congenital amusia lie in the music-selective neural circuitry in the right inferior frontal gyrus (IFG). However, it is unclear whether this finding can generalize to congenital amusics in tonal languages. Tonal language experience has been reported to shape the neural processing of pitch, which raises the question of how tonal language experience affects the neural bases of congenital amusia. To investigate this question, we examined the neural circuitries sub-serving the processing of relative pitch interval in pitch-matched Cantonese level tone and musical stimuli in 11 Cantonese-speaking amusics and 11 musically intact controls. Cantonese-speaking amusics exhibited abnormal brain activities in a widely distributed neural network during the processing of lexical tone and musical stimuli. Whereas the controls exhibited significant activation in the right superior temporal gyrus (STG) in the lexical tone condition and in the cerebellum regardless of the lexical tone and music conditions, no activation was found in the amusics in those regions, which likely reflects a dysfunctional neural mechanism of relative pitch processing in the amusics. Furthermore, the amusics showed abnormally strong activation of the right middle frontal gyrus and precuneus when the pitch stimuli were repeated, which presumably reflect deficits of attending to repeated pitch stimuli or encoding them into working memory. No significant group difference was found in the right IFG in either the whole-brain analysis or region-of-interest analysis. These findings imply that the neural deficits in tonal language speakers might differ from those in non-tonal language speakers, and overlap partly with the
Kreidler, Carol J., Ed.
The papers in this volume, read at the second national TESOL (Teaching English to Speakers of Other Languages) conference, are grouped by general subject as follows: (1) TESOL as a Professional Field--C.H. Prator, J.M. Cowan, T.W. Russell, J.E. Alatis; (2) Reports on Special Programs--H. Thompson, A.D. Nance, D. Pantell, P. Rojas, R.F. Robinett,…
Ponari, Marta; Rodríguez-Cuadrado, Sara; Vinson, David; Fox, Neil; Costa, Albert; Vigliocco, Gabriella
Effects of emotion on word processing are well established in monolingual speakers. However, studies that have assessed whether affective features of words undergo the same processing in a native and nonnative language have provided mixed results: Studies that have found differences between native language (L1) and second language (L2) processing attributed the difference to the fact that L2 learned late in life would not be processed affectively, because affective associations are established during childhood. Other studies suggest that adult learners show similar effects of emotional features in L1 and L2. Differences in affective processing of L2 words can be linked to age and context of learning, proficiency, language dominance, and degree of similarity between L2 and L1. Here, in a lexical decision task on tightly matched negative, positive, and neutral words, highly proficient English speakers from typologically different L1s showed the same facilitation in processing emotionally valenced words as native English speakers, regardless of their L1, the age of English acquisition, or the frequency and context of English use.
Druks, Judit; Weekes, Brendan Stuart
The convergence hypothesis [Green, D. W. (2003). The neural basis of the lexicon and the grammar in L2 acquisition: The convergence hypothesis. In R. van Hout, A. Hulk, F. Kuiken, & R. Towell (Eds.), The interface between syntax and the lexicon in second language acquisition (pp. 197-218). Amsterdam: John Benjamins] assumes that the neural substrates of language representations are shared between the languages of a bilingual speaker. One prediction of this hypothesis is that neurodegenerative disease should produce parallel deterioration to lexical and grammatical processing in bilingual aphasia. We tested this prediction with a late bilingual Hungarian (first language, L1)-English (second language, L2) speaker J.B. who had nonfluent progressive aphasia (NFPA). J.B. had acquired L2 in adolescence but was premorbidly proficient and used English as his dominant language throughout adult life. Our investigations showed comparable deterioration to lexical and grammatical knowledge in both languages during a one-year period. Parallel deterioration to language processing in a bilingual speaker with NFPA challenges the assumption that L1 and L2 rely on different brain mechanisms as assumed in some theories of bilingual language processing [Ullman, M. T. (2001). The neural basis of lexicon and grammar in ﬁrst and second language: The declarative/procedural model. Bilingualism: Language and Cognition, 4(1), 105-122].
Feenaughty, Lynda; Tjaden, Kris; Sussman, Joan
This study investigated the acoustic basis of within-speaker, across-utterance variation in sentence intelligibility for 12 speakers with dysarthria secondary to Parkinson's disease (PD). Acoustic measures were also obtained for 12 healthy controls for comparison to speakers with PD. Speakers read sentences using their typical speech style. Acoustic measures of speech rate, articulatory rate, fundamental frequency, sound pressure level and F2 interquartile range (F2 IQR) were obtained. A group of listeners judged sentence intelligibility using a computerized visual-analog scale. Relationships between judgments of intelligibility and acoustic measures were determined for individual speakers with PD. Relationships among acoustic measures were also quantified. Although considerable variability was noted, articulatory rate, fundamental frequency and F2 IQR were most frequently associated with within-speaker variation in sentence intelligibility. Results suggest that diversity among speakers with PD should be considered when interpreting results from group analyses.
Kriengwatana, Buddhamas; Escudero, Paola; ten Cate, Carel
The extent to which human speech perception evolved by taking advantage of predispositions and pre-existing features of vertebrate auditory and cognitive systems remains a central question in the evolution of speech. This paper reviews asymmetries in vowel perception, speaker voice recognition, and speaker normalization in non-human animals – topics that have not been thoroughly discussed in relation to the abilities of non-human animals, but are nonetheless important aspects of vocal perception. Throughout this paper we demonstrate that addressing these issues in non-human animals is relevant and worthwhile because many non-human animals must deal with similar issues in their natural environment. That is, they must also discriminate between similar-sounding vocalizations, determine signaler identity from vocalizations, and resolve signaler-dependent variation in vocalizations from conspecifics. Overall, we find that, although plausible, the current evidence is insufficiently strong to conclude that directional asymmetries in vowel perception are specific to humans, or that non-human animals can use voice characteristics to recognize human individuals. However, we do find some indication that non-human animals can normalize speaker differences. Accordingly, we identify avenues for future research that would greatly improve and advance our understanding of these topics. PMID:25628583
According to general theories, neutral tone is not regarded as an independent tone in Mandarin. Previous research shows that the most important characteristic of the neutral tone is that it does not have a certain target and pitch contour. (Lin and Yang, 1980) Namely, its pitch contour is uncertain, and it is weak in perception level. However, those studies ignore the variant between different dialects. Our study examined the features of the neutral tone between two dialects of Mandarin speakers and aimed at figuring out the dialect difference effect on the pronunciation of the neutral tone. Our subjects were chosen form two groups of Mandarin speakers. One group is from the North Mainland China, and the other is from Taiwan. The experiment was designed with a speak-it-out process. All subjects read a randomized script written in Mandarin, and the whole process was recorded spontaneously. The preliminary result shows that the dialect difference effect actually matters. It shows a tendency that the neutral tone has a certain target in Taiwan Mandarin speakers.
Schafer, Robin J; Constable, R Todd
Proficient, nonnative English-speaking neurosurgery candidates are assessed for language function in English. In order to understand whether it is necessary to also assess native-language function, we examined the influence of native language on the bilingual language network of early, proficient bilinguals using functional magnetic resonance imaging (fMRI). We compared the blood-oxygen-level-dependent (BOLD) response to a language task in well-matched groups of monolingual native English speakers (MLEs), bilingual native English speakers (BLE1s), and bilingual nonnative English speakers (BLE2s). Random effects analysis revealed a significant main effect of group in temporal-parietal regions highly germane to the planning of temporal lobe surgical procedures. To explore how the three groups differed, we examined the average time course in each area of main effect for each group. We found significant differences between the monolinguals and the two bilingual groups. We interpret our results as evidence for a difference between the language network of early bilinguals and that of monolinguals.
Holler, Judith; Kokal, Idil; Toni, Ivan; Hagoort, Peter; Kelly, Spencer D; Özyürek, Aslı
Recipients process information from speech and co-speech gestures, but it is currently unknown how this processing is influenced by the presence of other important social cues, especially gaze direction, a marker of communicative intent. Such cues may modulate neural activity in regions associated either with the processing of ostensive cues, such as eye gaze, or with the processing of semantic information, provided by speech and gesture. Participants were scanned (fMRI) while taking part in triadic communication involving two recipients and a speaker. The speaker uttered sentences that were and were not accompanied by complementary iconic gestures. Crucially, the speaker alternated her gaze direction, thus creating two recipient roles: addressed (direct gaze) vs unaddressed (averted gaze) recipient. The comprehension of Speech&Gesture relative to SpeechOnly utterances recruited middle occipital, middle temporal and inferior frontal gyri, bilaterally. The calcarine sulcus and posterior cingulate cortex were sensitive to differences between direct and averted gaze. Most importantly, Speech&Gesture utterances, but not SpeechOnly utterances, produced additional activity in the right middle temporal gyrus when participants were addressed. Marking communicative intent with gaze direction modulates the processing of speech-gesture utterances in cerebral areas typically associated with the semantic processing of multi-modal communicative acts.
Schuerman, William L; Meyer, Antje; McQueen, James M
In different tasks involving action perception, performance has been found to be facilitated when the presented stimuli were produced by the participants themselves rather than by another participant. These results suggest that the same mental representations are accessed during both production and perception. However, with regard to spoken word perception, evidence also suggests that listeners' representations for speech reflect the input from their surrounding linguistic community rather than their own idiosyncratic productions. Furthermore, speech perception is heavily influenced by indexical cues that may lead listeners to frame their interpretations of incoming speech signals with regard to speaker identity. In order to determine whether word recognition evinces similar self-advantages as found in action perception, it was necessary to eliminate indexical cues from the speech signal. We therefore asked participants to identify noise-vocoded versions of Dutch words that were based on either their own recordings or those of a statistically average speaker. The majority of participants were more accurate for the average speaker than for themselves, even after taking into account differences in intelligibility. These results suggest that the speech representations accessed during perception of noise-vocoded speech are more reflective of the input of the speech community, and hence that speech perception is not necessarily based on representations of one's own speech.
McKain, Danielle R.
The term real world is often used in mathematics education, yet the definition of real-world problems and how to incorporate them in the classroom remains ambiguous. One way real-world connections can be made is through guest speakers. Guest speakers can offer different perspectives and share knowledge about various subject areas, yet the impact…
Jacobson, Rodolfo, Ed.
Suggesting that America should strive for linguistic and cultural pluralism, this special issue gathers in one place the latest thoughts of scholars on topics related to the concept of cultural pluralism, i.e., English to speakers of other languages (ESOL) and standard English to speakers of a nonstandard dialect (SESOD). Kenneth Croft, James Ney,…
Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.
The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…
Ellis, Elizabeth M.
Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…
Non-native speakers of English often experience problems in pronunciation as they are learning English, many such problems persisting even when the speaker has achieved a high degree of fluency. Research has shown that for a non-native speaker to sound most natural and intelligible in his or her second language, the speaker must acquire proper…
Chang, Ya-Ling; Hsu, Kuan-Yu; Lee, Chih-Kung
Advancement of distributed piezo-electret sensors and actuators facilitates various smart systems development, which include paper speakers, opto-piezo/electret bio-chips, etc. The array-based loudspeaker system possess several advantages over conventional coil speakers, such as light-weightness, flexibility, low power consumption, directivity, etc. With the understanding that the performance of the large-area piezo-electret loudspeakers or even the microfluidic biochip transport behavior could be tailored by changing their dynamic behaviors, a full-field real-time high-resolution non-contact metrology system was developed. In this paper, influence of the resonance modes and the transient vibrations of an arraybased loudspeaker system on the acoustic effect were measured by using a real-time projection moiré metrology system and microphones. To make the paper speaker even more versatile, we combine the photosensitive material TiOPc into the original electret loudspeaker. The vibration of this newly developed opto-electret loudspeaker could be manipulated by illuminating different light-intensity patterns. Trying to facilitate the tailoring process of the opto-electret loudspeaker, projection moiré was adopted to measure its vibration. By recording the projected fringes which are modulated by the contours of the testing sample, the phase unwrapping algorithm can give us a continuous phase distribution which is proportional to the object height variations. With the aid of the projection moiré metrology system, the vibrations associated with each distinctive light pattern could be characterized. Therefore, we expect that the overall acoustic performance could be improved by finding the suitable illuminating patterns. In this manuscript, the system performance of the projection moiré and the optoelectret paper speakers were cross-examined and verified by the experimental results obtained.
Ames, Heather; Grossberg, Stephen
Auditory signals of speech are speaker dependent, but representations of language meaning are speaker independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by adaptive resonance theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [Peterson, G. E., and Barney, H.L., J. Acoust. Soc. Am. 24, 175-184 (1952).] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.
Suh, Jun-Won; Hansen, John H L
In this study, the problem of sparse enrollment data for in-set versus out-of-set speaker recognition is addressed. The challenge here is that both the training speaker data (5 s) and test material (2~6 s) is of limited test duration. The limited enrollment data result in a sparse acoustic model space for the desired speaker model. The focus of this study is on filling these acoustic holes by harvesting neighbor speaker information to leverage overall system performance. Acoustically similar speakers are selected from a separate available corpus via three different methods for speaker similarity measurement. The selected data from these similar acoustic speakers are exploited to fill the lack of phone coverage caused by the original sparse enrollment data. The proposed speaker modeling process mimics the naturally distributed acoustic space for conversational speech. The Gaussian mixture model (GMM) tagging process allows simulated natural conversation speech to be included for in-set speaker modeling, which maintains the original system requirement of text independent speaker recognition. A human listener evaluation is also performed to compare machine versus human speaker recognition performance, with machine performance of 95% compared to 72.2% accuracy for human in-set/out-of-set performance. Results show that for extreme sparse train/reference audio streams, human speaker recognition is not nearly as reliable as machine based speaker recognition. The proposed acoustic hole filling solution (MRNC) produces an averaging 7.42% relative improvement over a GMM-Cohort UBM baseline and a 19% relative improvement over the Eigenvoice baseline using the FISHER corpus.
Dalton, H.; Shupla, C. B.; Buxner, S.; Shipp, S. S.
Join the new NASA SMD Scientist Speaker's Bureau, an online portal to connect scientists interested in getting involved in E/PO projects (e.g., giving public talks, classroom visits, and virtual connections) with audiences! The Scientist Speaker's Bureau helps educators and institutions connect with NASA scientists who are interested in giving presentations, based upon the topic, logistics, and audience. Aside from name, organization, location, bio, and (optional) photo and website, the information that scientists enter into this database will not be made public; instead, it will be used to help match scientists with the requests being placed. One of the most common ways for scientists to interact with students, adults, and general public audiences is to give presentations about or related to their science. However, most educators do not have a simple way to connect with those planetary scientists, Earth scientists, heliophysicists, and astronomers who are interested and available to speak with their audiences. This system is designed to help meet the need for connecting potential audiences to interested scientists. The information input into the database (availability to travel, willingness to present online or in person, interest in presenting to different age groups and sizes of audience, topics, and more) will be used to help match scientists (you!) with the requests being placed by educators. All NASA-funded Earth and space scientists engaged in active research are invited to fill out the short registration form, including those who are involved in missions, institutes, grants, and those who are using NASA science data in their research, and more. There is particular need for young scientists, such as graduate students and post-doctoral researchers, and women and people of diverse backgrounds. Submit your information at http://www.lpi.usra.edu/education/speaker.
Ng, L C; Gable, T J; Holzrichter, J F
Low Power EM radar-like sensors have made it possible to measure properties of the human speech production system in real-time, without acoustic interference. This greatly enhances the quality and quantity of information for many speech related applications. See Holzrichter, Burnett, Ng, and Lea, J. Acoustic. SOC. Am . 103 ( 1) 622 (1998). By combining the Glottal-EM-Sensor (GEMS) with the Acoustic-signals, we've demonstrated an almost 10 fold reduction in error rates from a speaker verification system experiment under a moderate noisy environment (-10dB).
Kreimeyer, Roman; Ludwig, Stefan
We present an automatic acoustic classifier for marine mammals based on human speaker classification methods as an element of a passive acoustic monitoring (PAM) tool. This work is part of the Protection of Marine Mammals (PoMM) project under the framework of the European Defense Agency (EDA) and joined by the Research Department for Underwater Acoustics and Geophysics (FWG), Bundeswehr Technical Centre (WTD 71) and Kiel University. The automatic classification should support sonar operators in the risk mitigation process before and during sonar exercises with a reliable automatic classification result.
Lansford, Kaitlin L; Berisha, Visar; Utianski, Rene L
The current investigation contributes to a perceptual similarity-based approach to dysarthria characterization by utilizing an innovative statistical approach, multinomial logistic regression with sparsity constraints, to identify acoustic features underlying each listener's impressions of speaker similarity. The data-driven approach also permitted an examination of the effect of clinical experience on listeners' impressions of similarity. Listeners, irrespective of level of clinical experience, were found to rely on similar acoustic features during the perceptual sorting task, known as free classification. Overall, the results support the continued advancement of a similarity-based approach to characterizing the communication disorders associated with dysarthria.
Schweppe, Judith; Barth, Sandra; Ketzer-Nöltge, Almut; Rummer, Ralf
Verbatim sentence recall is widely used to test the language competence of native and non-native speakers since it involves comprehension and production of connected speech. However, we assume that, to maintain surface information, sentence recall relies particularly on attentional resources, which differentially affects native and non-native speakers. Since even in near-natives language processing is less automatized than in native speakers, processing a sentence in a foreign language plus retaining its surface may result in a cognitive overload. We contrasted sentence recall performance of German native speakers with that of highly proficient non-natives. Non-natives recalled the sentences significantly poorer than the natives, but performed equally well on a cloze test. This implies that sentence recall underestimates the language competence of good non-native speakers in mixed groups with native speakers. The findings also suggest that theories of sentence recall need to consider both its linguistic and its attentional aspects. PMID:25698996
Schweppe, Judith; Barth, Sandra; Ketzer-Nöltge, Almut; Rummer, Ralf
Verbatim sentence recall is widely used to test the language competence of native and non-native speakers since it involves comprehension and production of connected speech. However, we assume that, to maintain surface information, sentence recall relies particularly on attentional resources, which differentially affects native and non-native speakers. Since even in near-natives language processing is less automatized than in native speakers, processing a sentence in a foreign language plus retaining its surface may result in a cognitive overload. We contrasted sentence recall performance of German native speakers with that of highly proficient non-natives. Non-natives recalled the sentences significantly poorer than the natives, but performed equally well on a cloze test. This implies that sentence recall underestimates the language competence of good non-native speakers in mixed groups with native speakers. The findings also suggest that theories of sentence recall need to consider both its linguistic and its attentional aspects.
Honda, Hidehito; Yamagishi, Kimihiko
Verbal probabilities have directional communicative functions, and most can be categorized as positive (e.g., "it is likely") or negative (e.g., "it is doubtful"). We examined the communicative functions of verbal probabilities based on the reference point hypothesis According to this hypothesis, listeners are sensitive to and can infer a speaker's reference points based on the speaker's selected directionality. In four experiments (two of which examined speakers' choice of directionality and two of which examined listeners' inferences about a speaker's reference point), we found that listeners could make inferences about speakers' reference points based on the stated directionality of verbal probability. Thus, the directionality of verbal probabilities serves the communicative function of conveying information about a speaker's reference point.
Bozavli, Ebubekir; Gulmez, Recep
The aim of this study is to reveal the effect of FLA (foreign language anxiety) in native/non-native speaker of English classrooms. In this study, two groups of students (90 in total) of whom 38 were in NS (native speaker) class and 52 in NNS (non-native speaker) class taking English as a second language course for 22 hours a week at Erzincan…
Kriengwatana, Buddhamas; Escudero, Paola; Kerkhoven, Anne H.; Cate, Carel ten
Different speakers produce the same speech sound differently, yet listeners are still able to reliably identify the speech sound. How listeners can adjust their perception to compensate for speaker differences in speech, and whether these compensatory processes are unique only to humans, is still not fully understood. In this study we compare the ability of humans and zebra finches to categorize vowels despite speaker variation in speech in order to test the hypothesis that accommodating speaker and gender differences in isolated vowels can be achieved without prior experience with speaker-related variability. Using a behavioral Go/No-go task and identical stimuli, we compared Australian English adults’ (naïve to Dutch) and zebra finches’ (naïve to human speech) ability to categorize / I/ and /ε/ vowels of an novel Dutch speaker after learning to discriminate those vowels from only one other speaker. Experiments 1 and 2 presented vowels of two speakers interspersed or blocked, respectively. Results demonstrate that categorization of vowels is possible without prior exposure to speaker-related variability in speech for zebra finches, and in non-native vowel categories for humans. Therefore, this study is the first to provide evidence for what might be a species-shared auditory bias that may supersede speaker-related information during vowel categorization. It additionally provides behavioral evidence contradicting a prior hypothesis that accommodation of speaker differences is achieved via the use of formant ratios. Therefore, investigations of alternative accounts of vowel normalization that incorporate the possibility of an auditory bias for disregarding inter-speaker variability are warranted. PMID:26379579
Hu, Yuxiang; Wang, Min; Lu, Jing; Qiu, Xiaojun
For micro-speakers in a closed box, commonly used nonlinear compensation methods only compensate the distortion caused by the force factor and the stiffness. In this letter, a method to compensate the distortion with consideration of the nonlinear mechanical resistance is proposed based on the feedback linearization criterion. The proposed method is further improved by minimizing the variation of the output power spectrum after compensation. The simulations and experiments show that the total harmonic distortion and the intermodulation distortion of the sound pressure can be reduced significantly with little influence on the sound pressure level.
Hoffman, Michael; Mittelman, Moshe
Professional meetings are an important way of educating, communicating and sharing updated information. After having attended dozens of medical and scientific meetings and conferences, the authors realized that many lectures are not appropriately presented due to a lack of proper speaking skills or inadequate facilities. Although the art of speaking is, to some degree, an inborn talent, speaking skills can be learned, exercised and put into practice. The current article summarizes the authors' thoughts and opinions about what makes a medical or scientific lecture a good or bad one, and it provides suggestions for improving a speaker's performance. Common mistakes are discussed, along with tips and recommendations on how to prepare and provide an interesting, professional, entertaining and attractive lecture. Guidelines are given on how to structure and balance a lecture. A special segment is devoted to speaker-audience relations. A detailed chapter emphasizes rules for preparing good slides. Finally, some recommendations are made regarding facilities in the lecture hall. The authors believe that if the proposed suggestions were applied, more presentations would be successful.
Mencke, E O; Ochsner, G J; Testut, E W
Speech samples (41 CNC monosyllables) of 22 deaf children were analyzed using two distinctive-feature systems, one acoustic and one physiologic. Moderate to high correlations between intelligibility scores by listener judges vs correct feature usage were obtained for positive as well as negative features of both systems. Further, higher correlations between percent-correct feature usage scores vs listener intelligibility scores were observed for phonemes in the initial vs final position-in-work regardless of listener-judge experience, feature system, or presentation mode. These findings suggest that either acoustic or physiologic feature analysis can be employed in describing the articulation of deaf talkers. In general, either of these feature systems also predicts with fair to good accuracy the intelligibility of deaf speakers as judged by either experienced or inexperienced listeners. In view of the appreciably higher correlations obtained between feature use and intelligibility scores in initial compared to final position-in-word, however, caution should be exercised with either of the feature systems studied in predicting the intelligibility of a deaf speaker's final phoneme.
Casserly, Elizabeth D
Feedback perturbation studies of speech acoustics have revealed a great deal about how speakers monitor and control their productions of segmental (e.g., formant frequencies) and non-segmental (e.g., pitch) linguistic elements. The majority of previous work, however, overlooks the role of acoustic feedback in consonant production and makes use of acoustic manipulations that effect either entire utterances or the entire acoustic signal, rather than more temporally and phonetically restricted alterations. This study, therefore, seeks to expand the feedback perturbation literature by examining perturbation of consonant acoustics that is applied in a time-restricted and phonetically specific manner. The spectral center of the alveopalatal fricative [∫] produced in vowel-fricative-vowel nonwords was incrementally raised until it reached the potential for [s]-like frequencies, but the characteristics of high-frequency energy outside the target fricative remained unaltered. An "offline," more widely accessible signal processing method was developed to perform this manipulation. The local feedback perturbation resulted in changes to speakers' fricative production that were more variable, idiosyncratic, and restricted than the compensation seen in more global acoustic manipulations reported in the literature. Implications and interpretations of the results, as well as future directions for research based on the findings, are discussed.
Chang, Charles B; Yao, Yao; Haynes, Erin F; Rhodes, Russell
This study tested the hypothesis that heritage speakers of a minority language, due to their childhood experience with two languages, would outperform late learners in producing contrast: language-internal phonological contrast, as well as cross-linguistic phonetic contrast between similar, yet acoustically distinct, categories of different languages. To this end, production of Mandarin and English by heritage speakers of Mandarin was compared to that of native Mandarin speakers and native American English-speaking late learners of Mandarin in three experiments. In experiment 1, back vowels in Mandarin and English were produced distinctly by all groups, but the greatest separation between similar vowels was achieved by heritage speakers. In experiment 2, Mandarin aspirated and English voiceless plosives were produced distinctly by native Mandarin speakers and heritage speakers, who both put more distance between them than late learners. In experiment 3, the Mandarin retroflex and English palato-alveolar fricatives were distinguished by more heritage speakers and late learners than native Mandarin speakers. Thus, overall the hypothesis was supported: across experiments, heritage speakers were found to be the most successful at simultaneously maintaining language-internal and cross-linguistic contrasts, a result that may stem from a close approximation of phonetic norms that occurs during early exposure to both languages.
Suh, Youngjoo; Kim, Hoirin
Support vector machines (SVMs) have been proved to be an effective approach to speaker verification. An appropriate selection of the kernel function is a key issue in SVM-based classification. In this letter, a new SVM-based speaker verification method utilizing weighted kernels in the Gaussian mixture model supervector space is proposed. The weighted kernels are derived by using the discriminative training approach, which minimizes speaker verification errors. Experiments performed on the NIST 2008 speaker recognition evaluation task showed that the proposed approach provides substantially improved performance over the baseline kernel-based method.
Background Ethnobotanical research was carried out with speakers of Iquito, a critically endangered Amazonian language of the Zaparoan family. The study focused on the concept of "dieting" (siyan++ni in Iquito), a practice involving prohibitions considered necessary to the healing process. These restrictions include: 1) foods and activities that can exacerbate illness, 2) environmental influences that conflict with some methods of healing (e.g. steam baths or enemas) and 3) foods and activities forbidden by the spirits of certain powerful medicinal plants. The study tested the following hypotheses: H1 - Each restriction will correlate with specific elements in illness explanatory models and H2 - Illnesses whose explanatory models have personalistic elements will show a greater number and variety of restrictions than those based on naturalistic reasoning. Methods The work was carried out in 2009 and 2010 in the Alto Nanay region of Peru. In structured interviews, informants gave explanatory models for illness categories, including etiologies, pathophysiologies, treatments and dietary restrictions necessary for 49 illnesses. Seventeen botanical vouchers for species said to have powerful spirits that require diets were also collected. Results All restrictions found correspond to some aspect of illness explanatory models. Thirty-five percent match up with specific illness etiologies, 53% correspond to particular pathophysiologies, 18% correspond with overall seriousness of the illness and 18% are only found with particular forms of treatment. Diets based on personalistic reasoning have a significantly higher average number of restrictions than those based on naturalistic reasoning. Conclusions Dieting plays a central role in healing among Iquito speakers. Specific prohibitions can be explained in terms of specific aspects of illness etiologies, pathophysiologies and treatments. Although the Amazonian literature contains few studies focusing on dietary proscriptions
Bortz, M A
The receptive, expressive and pragmatic language abilities of 18-month-old Zulu speakers were assessed in order to obtain preliminary norms. Twenty-five participants of the Birth to Ten cohort study were investigated using parent reports, mother-child and tester-child interactions. Data was transcribed and analysed using nonparametric statistics. Results demonstrated that receptively subjects understood two-part instructions. Expressively, the mean lexicon was 4.12 words and mean length of utterance 0.65. Pragmatically, subjects were functioning on a nonverbal level and exhibited culture-specific items. The results provided information which could enable speech, language and hearing therapists to engage in primary and secondary prevention. An appropriate test battery for these children is discussed.
Soltani, Majid; Ashayeri, Hasan; Modarresi, Yahya; Salavati, Mahyar; Ghomashchi, Hamed
This study was designed to investigate changes in fundamental frequency (F0) across the life span in Persian speakers. Four hundred children and adults were asked to produce a sustained phonation of vowel /a/ and their voice samples were studied in 10 age groups. F0 was analyzed using the software Praat (Version 5.1.17.). The results revealed that (1) the mean F0 in both sexes decreases from childhood to adulthood; (2) significant F0 differences between boys and girls begin at the age of 12 years; and (3) the range of F0 changes in the life span is greater in men (178.38 Hz) than in women (113.57 Hz). These findings provide new data for Persian-speaking children, women, and men and could be beneficial for Iranian speech and language pathologists.
Recent studies of heritage speakers, many of whom possess incomplete knowledge of their family language, suggest that these speakers may be linguistically superior to second language (L2) learners only in phonology but not in morphosyntax. This study reexamines this claim by focusing on knowledge of clitic pronouns and word order in 24 L2 learners…
Wolfe, Joanna; Shanmugaraj, Nisha; Sipe, Jaclyn
Many communication instructors make allowances for grammatical error in nonnative English speakers' writing, but do businesspeople do the same? We asked 169 businesspeople to comment on three versions of an email with different types of errors. We found that businesspeople do make allowances for errors made by nonnative English speakers,…
Nan, Yun; Sun, Yanan; Peretz, Isabelle
Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117…
Wiener, William R.; Ponchillia, Paul; Joffee, Elga; Rutberg-Kuskin, Judith; Brown, John
Two studies examined the effectiveness of external-speaker announcements in identifying incoming buses to 21 adults with visual impairments, including the placement of external speakers, the ability to understand simultaneous bus announcements, and the speech enhancement of announcements. Announcements could be heard above ambient traffic sounds…
Griffin, Zenzi M.; Oppenheimer, Daniel M.
When describing scenes, speakers gaze at objects while preparing their names (Z. M. Griffin & K. Bock, 2000). In this study, the authors investigated whether gazes to referents occurred in the absence of a correspondence between visual features and word meaning. Speakers gazed significantly longer at objects before intentionally labeling them…
Albirini, Abdulkafi; Benmamoun, Elabbas
This study compares Arabic L1, L2, and heritage speakers' (HS) knowledge of plural formation, which involves concatenative and nonconcatenative modes of derivation. Ninety participants (divided equally among L1, L2, and heritage speakers) completed two oral tasks: a picture naming task (to measure proficiency) and a plural formation task. The…
The focus of this paper is on speakers' rationalisations of their everyday linguistic choices as members of a multilingual academic department in the US. Given the monolingual macro-context, the myriad of native languages spoken by participants, and the professional stake in language competence, the question of how speakers arrive at language…
Sulpizio, Simone; Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik
Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.
This study was a cross-sectional investigation into the request strategies used by Iranian learners of English as a Foreign Language and Australian native speakers of English. The sample involved 96 BA and MA Persian students and 10 native speakers of English. A Discourse Completion Test (DCT) was used to generate data related to the request…
The large-scale continuous migration of speakers from the Anglophone Caribbean to the United States over the past 2 decades has led to an influx of school children who are native speakers of English-lexified Creoles (ELCs). These are oral languages which do not generally occur in formal institutional domains requiring academic registers. Thus,…
Yow, W. Quin; Markman, Ellen M.
Bilingual children regularly face communicative challenges when speakers switch languages. To cope with such challenges, children may attempt to discern a speaker's communicative intent, thereby heightening their sensitivity to nonverbal communicative cues. Two studies examined whether such communication breakdowns increase sensitivity to…
In critical period hypothesis (CPH) research, native speaker (NS) norm groups have often been used to determine whether nonnative speakers (NNSs) were able to score within the NS range of scores. One goal of this article is to investigate what NS samples were used in previous CPH research. The literature review shows that NS control groups tend to…
Blake, Robert J.; Zyzik, Eve C.
Explores the interaction between heritage speakers and second language learners of Spanish in a synchronous computer-assisted learning environment. Students in an intermediate language course were paired with heritage speakers. Transcripts of interactions were examined for points of negotiation. (Author/VWL)
Colombi, M. Cecilia
Heritage language speakers constitute a unique cultural and linguistic resource in the United States while also presenting particular challenges for language educators and language programs. This paper examines the potential of systemic functional linguistics (SFL) in a curriculum for Spanish second language learners/heritage speakers, with…
Yunusova, Yana; Weismer, Gary G.; Lindstrom, Mary J.
Purpose: In this study, the authors classified vocalic segments produced by control speakers (C) and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson's disease (PD); classification was based on movement measures. The researchers asked the following questions: (a) Can vowels be classified on the basis of selected…
Yunusova, Yana; Weismer, Gary; Westbury, John R.; Lindstrom, Mary J.
Purpose: This study compared movement characteristics of markers attached to the jaw, lower lip, tongue blade, and dorsum during production of selected English vowels by normal speakers and speakers with dysarthria due to amyotrophic lateral sclerosis (ALS) or Parkinson disease (PD). The study asked the following questions: (a) Are movement…
Kim, Hoe Kyeung
An online discussion involving text-based computer-mediated communication has great potential for promoting equal participation among non-native speakers of English. Several studies claimed that online discussions could enhance the academic participation of non-native speakers of English. However, there is little research around participation…
Whitehill, Tara L.; Wong, Lina L. -N.
The aim of this study was to investigate the effect of intensive voice therapy on Cantonese speakers with Parkinson's disease. The effect of the treatment on lexical tone was of particular interest. Four Cantonese speakers with idiopathic Parkinson's disease received treatment based on the principles of Lee Silverman Voice Treatment (LSVT).…
To understand the Spanish speaker one must be aware of the many facets of Hispanic culture and the various means of communicating that language and that culture. The two main forms of communication, linguistics and paralinguistics, convey the speaker's behavior and identify him as a member of the Hispanic culture. Linguistic patterns in Spanish…
Kibishi, Hiroshi; Hirabayashi, Kuniaki; Nakagawa, Seiichi
In this paper, we propose a statistical evaluation method of pronunciation proficiency and intelligibility for presentations made in English by native Japanese speakers. We statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the…
In an article in "Cognition" [Machery, E., Mallon, R., Nichols, S., & Stich, S. (2004). "Semantics cross-cultural style." "Cognition, 92", B1-B12] present data which purports to show that East Asian Cantonese-speakers tend to have descriptivist intuitions about the referents of proper names, while Western English-speakers tend to have…
... From the Federal Register Online via the Government Publishing Office ] Vol. 78 Thursday, No. 211 October 31, 2013 Part IV The President Proclamation 9046--Death of Thomas S. Foley Former Speaker of the... ] Proclamation 9046 of October 28, 2013 Death of Thomas S. Foley Former Speaker of the House of...
Due to their unique profile as childhood bilinguals whose first language (L1) became weaker than their second language (L2), heritage speakers can shed light on three key issues in bilingualism--timing, input, and cross-linguistic interaction. The heritage speakers of focus in this dissertation are Korean second generation immigrants mainly…
Numerous studies have been conducted to explore issues surrounding non-native speakers (NNS) English teachers and native speaker (NS) teachers which concern, among others, the comparison between the two, the self-perceptions of NNS English teachers and the effectiveness of their teaching, and the students' opinions on and attitudes towards them.…
Bley-Vroman, Robert; Joo, Hye-Ri
Investigates whether native speakers of Korean learning English develop Knowledge of the holism effect in the English locative and knowledge of the narrow constraints. Results suggest that when given a ground-object structure, both learners and English native speakers preferentially chose a ground-holism picture. (Author/VWL)
Hustad, Katherine C.
Purpose: This study examined the relationship between listener comprehension and intelligibility scores for speakers with mild, moderate, severe, and profound dysarthria. Relationships were examined across all speakers and their listeners when severity effects were statistically controlled, within severity groups, and within individual speakers…
Kraljic, Tanya; Brennan, Susan E.
Evidence has been mixed on whether speakers spontaneously and reliably produce prosodic cues that resolve syntactic ambiguities. And when speakers do produce such cues, it is unclear whether they do so ''for'' their addressees (the "audience design" hypothesis) or ''for'' themselves, as a by-product of planning and articulating utterances. Three…
Jonkers, Roel; Bastiaanse, Roelien
Many studies reveal effects of verb type on verb retrieval, mainly in agrammatic aphasic speakers. In the current study, two factors that might play a role in action naming in anomic aphasic speakers were considered: the conceptual factor instrumentality and the lexical factor name relation to a noun. Instrumental verbs were shown to be better…
Riney, Timothy J.; Takagi, Naoyuki
Investigated the correlation between global foreign accent (GFA) and voice onset time (VOT). VOT values for /p/, /t/, and /k/ were measured at two times, separated by an interval of 42 months. Subjects were 11 Japanese speakers of English as a foreign language; 5 age-matched native speakers of English served as the control group. (Author/VWL)
Alford, Randall L.; Strother, Judith B.
Provides data from a study that sought to determine and compare the attitudes of both native and nonnative speakers of English who listened to the specific regional accents of the English spoken in the United States. The groups judgments differed, and nonnative speakers were better able to perceive differences in regional accents of U.S. English.…
Isurin, Ludmila; Ivanova-Sullivan, Tanya
The present paper looks at the growing population of Russian heritage speakers from a linguistic and psycholinguistic perspective. The study attempts to clarify further the notion of heritage language by comparing the linguistic performance of heritage speakers with that of monolinguals and second language learners. The amount of exposure to…
Sims, Brenda R.; Guice, Stephen
Compares 214 letters of inquiry written by native and nonnative speakers of English to test the assumption that cultural factors beyond language such as the knowledge of the business communication practices and cultural expectations greatly affect communication. Finds that native speakers' letters deviated less from U.S. business communication…
A study of business letters indicates striking differences in the politeness strategies used by native and nonnative English speakers. Nonnative speakers' language tended to be less formal, more direct, and showed an avoidance of certain politeness strategies. The findings suggest that even grammatically flawless business writing may be perceived…
Sobel, David M.; Sedivy, Julie; Buchanan, David W.; Hennessy, Rachel
Preschoolers participated in a modified version of the disambiguation task, designed to test whether the pragmatic environment generated by a reliable or unreliable speaker affected how children interpreted novel labels. Two objects were visible to children, while a third was only visible to the speaker (a fact known by the child). Manipulating…
Fluent speakers and stutterer's increased voice level were analyzed in response to voice-delayed auditory feedback, an Edinburgh masker, and white noise. These results are used to assess auditory feedback monitoring accounts of speech behavior of fluent speakers and stutterers with some implications for the treatment of stuttering. (37 references)…
Kong, Anthony Pak-Hin; Weekes, Brendan Stuart
The aim of this article is to illustrate the use of the Bilingual Aphasia Test (BAT) with a Cantonese-Putonghua speaker. We describe G, who is a relatively young Chinese bilingual speaker with aphasia. G's communication abilities in his L2, Putonghua, were impaired following brain damage. This impairment caused specific difficulties in…
Iverson, Paul; Pinet, Melanie; Evans, Bronwen G.
This study examined whether high-variability auditory training on natural speech can benefit experienced second-language English speakers who already are exposed to natural variability in their daily use of English. The subjects were native French speakers who had learned English in school; experienced listeners were tested in England and the less…
Ruecker, Todd; Ives, Lindsey
Over the past few decades, scholars have paid increasing attention to the role of native speakerism in the field of TESOL. Several recent studies have exposed instances of native speakerism in TESOL recruitment discourses published through a variety of media, but none have focused specifically on professional websites advertising programs in…
Bidelman, Gavin M; Hutka, Stefanie; Moreno, Sylvain
Psychophysiological evidence suggests that music and language are intimately coupled such that experience/training in one domain can influence processing required in the other domain. While the influence of music on language processing is now well-documented, evidence of language-to-music effects have yet to be firmly established. Here, using a cross-sectional design, we compared the performance of musicians to that of tone-language (Cantonese) speakers on tasks of auditory pitch acuity, music perception, and general cognitive ability (e.g., fluid intelligence, working memory). While musicians demonstrated superior performance on all auditory measures, comparable perceptual enhancements were observed for Cantonese participants, relative to English-speaking nonmusicians. These results provide evidence that tone-language background is associated with higher auditory perceptual performance for music listening. Musicians and Cantonese speakers also showed superior working memory capacity relative to nonmusician controls, suggesting that in addition to basic perceptual enhancements, tone-language background and music training might also be associated with enhanced general cognitive abilities. Our findings support the notion that tone language speakers and musically trained individuals have higher performance than English-speaking listeners for the perceptual-cognitive processing necessary for basic auditory as well as complex music perception. These results illustrate bidirectional influences between the domains of music and language.
Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A
Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.
Kim, Yunjung; Weismer, Gary; Kent, Ray D.
In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.
Obermeier, Christian; Kelly, Spencer D; Gunter, Thomas C
In face-to-face communication, speech is typically enriched by gestures. Clearly, not all people gesture in the same way, and the present study explores whether such individual differences in gesture style are taken into account during the perception of gestures that accompany speech. Participants were presented with one speaker that gestured in a straightforward way and another that also produced self-touch movements. Adding trials with such grooming movements makes the gesture information a much weaker cue compared with the gestures of the non-grooming speaker. The Electroencephalogram was recorded as participants watched videos of the individual speakers. Event-related potentials elicited by the speech signal revealed that adding grooming movements attenuated the impact of gesture for this particular speaker. Thus, these data suggest that there is sensitivity to the personal communication style of a speaker and that affects the extent to which gesture and speech are integrated during language comprehension.
Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola
Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889
Futrell, Richard; Hickey, Tina; Lee, Aldrin; Lim, Eunice; Luchkina, Elena; Gibson, Edward
In communicating events by gesture, participants create codes that recapitulate the patterns of word order in the world's vocal languages (Gibson et al., 2013; Goldin-Meadow, So, Ozyurek, & Mylander, 2008; Hall, Mayberry, & Ferreria, 2013; Hall, Ferreira, & Mayberry, 2014; Langus & Nespor, 2010; and others). Participants most often convey simple transitive events using gestures in the order Subject-Object-Verb (SOV), the most common word order in human languages. When there is a possibility of confusion between subject and object, participants use the order Subject-Verb-Object (SVO). This overall pattern has been explained by positing an underlying cognitive preference for subject-initial, verb-final orders, with the verb-medial order SVO order emerging to facilitate robust communication in a noisy channel (Gibson et al., 2013). However, whether the subject-initial and verb-final biases are innate or the result of languages that the participants already know has been unclear, because participants in previous studies all spoke either SVO or SOV languages, which could induce a subject-initial, verb-late bias. Furthermore, the exact manner in which known languages influence gestural orders has been unclear. In this paper we demonstrate that there is a subject-initial and verb-final gesturing bias cross-linguistically by comparing gestures of speakers of SVO languages English and Russian to those of speakers of VSO languages Irish and Tagalog. The findings show that subject-initial and verb-final order emerges even in speakers of verb-initial languages, and that interference from these languages takes the form of occasionally gesturing in VSO order, without an additional bias toward other orders. The results provides further support for the idea that improvised gesture is a window into the pressures shaping language formation, independently of the languages that participants already know.
Law, Sam-Po; Kong, Anthony Pak-Hin; Lai, Loretta Wing-Shan; Lai, Christy
Background Differences in processing nouns and verbs have been investigated intensely in psycholinguistics and neuropsychology in past decades. However, the majority of studies examining retrieval of these word classes have involved tasks of single word stimuli or responses. While the results have provided rich information for addressing issues about grammatical class distinctions, it is unclear whether they have adequate ecological validity for understanding lexical retrieval in connected speech which characterizes daily verbal communication. Previous investigations comparing retrieval of nouns and verbs in single word production and connected speech have reported either discrepant performance between the two contexts with presence of word class dissociation in picture naming but absence in connected speech, or null effects of word class. In addition, word finding difficulties have been found to be less severe in connected speech than picture naming. However, these studies have failed to match target stimuli of the two word classes and between tasks on psycholinguistic variables known to affect performance in response latency and/or accuracy. Aims The present study compared lexical retrieval of nouns and verbs in picture naming and connected speech from picture description, procedural description, and story-telling among 19 Chinese speakers with anomic aphasia and their age, gender, and education matched healthy controls, to understand the influence of grammatical class on word production across speech contexts when target items were balanced for confounding variables between word classes and tasks. Methods & Procedures Elicitation of responses followed the protocol of the AphasiaBank consortium (http://talkbank.org/AphasiaBank/). Target words for confrontation naming were based on well-established naming tests, while those for narrative were drawn from a large database of normal speakers. Selected nouns and verbs in the two contexts were matched for age
Regel, Stefanie; Coulson, Seana; Gunter, Thomas C
An important issue in irony comprehension concerns when and how listeners integrate extra-linguistic and linguistic information to compute the speaker's intended meaning. To assess whether knowledge about the speaker's communicative style impacts the brain response to irony, ERPs were recorded as participants read short passages that ended either with literal or ironic statements made by one of two speakers. The experiment was carried out in two sessions in which each speaker's use of irony was manipulated. In Session 1, 70% of ironic statements were made by the ironic speaker, while the non-ironic speaker expressed 30% of them. For irony by the non-ironic speaker, an increased P600 was observed relative to literal utterances. By contrast, both ironic and literal statements made by the ironic speaker elicited similar P600 amplitudes. In Session 2, conducted 1 day later, both speakers' use of irony was balanced (i.e. 50% ironic, 50% literal). ERPs for Session 2 showed an irony-related P600 for the ironic speaker but not for the non-ironic speaker. Moreover, P200 amplitude was larger for sentences congruent with each speaker's communicative style (i.e. for irony made by the ironic speaker, and for literal statements made by the non-ironic speaker). These findings indicate that pragmatic knowledge about speakers can affect language comprehension 200 ms after the onset of a critical word, as well as neurocognitive processes underlying the later stages of comprehension (500-900 ms post-onset). Thus perceived speakers' characteristics dynamically impact the construction of appropriate interpretations of ironic utterances.
Bley-Vroman, Robert; Yoshinaga, Naoko
Investigates the knowledge of multiple wh-questions such as "Who ate what?" by high-proficiency Japanese speakers of English. Acceptability judgments were obtained on six different types of questions. Acceptability of English examples was rated by native speakers of English, Japanese examples were judged by native speakers of Japanese,…
McNaughton, Stephanie; McDonough, Kim
This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…
Reports on a study investigating how Americans respond to English as a Second Language (ESL) speech depending on the non-native speaker's accent and whether there were errors in the ESL speech. Findings indicate that Americans exhibit different cultural prejudices towards different foreign speakers depending on the accent of the speakers. (50…
De Leon, Phillip L.; McClanahan, Richard D.
In speaker verification (SV) systems that employ a support vector machine (SVM) classifier to make decisions on a supervector derived from Gaussian mixture model (GMM) component mean vectors, a significant portion of the computational load is involved in the calculation of the a posteriori probability of the feature vectors of the speaker under test with respect to the individual component densities of the universal background model (UBM). Further, the calculation of the sufficient statistics for the weight, mean, and covariance parameters derived from these same feature vectors also contribute a substantial amount of processing load to the SV system. In this paper, we propose a method that utilizes clusters of GMM-UBM mixture component densities in order to reduce the computational load required. In the adaptation step we score the feature vectors against the clusters and calculate the a posteriori probabilities and update the statistics exclusively for mixture components belonging to appropriate clusters. Each cluster is a grouping of multivariate normal distributions and is modeled by a single multivariate distribution. As such, the set of multivariate normal distributions representing the different clusters also form a GMM. This GMM is referred to as a hash GMM which can be considered to a lower resolution representation of the GMM-UBM. The mapping that associates the components of the hash GMM with components of the original GMM-UBM is referred to as a shortlist. This research investigates various methods of clustering the components of the GMM-UBM and forming hash GMMs. Of five different methods that are presented one method, Gaussian mixture reduction as proposed by Runnall's, easily outperformed the other methods. This method of Gaussian reduction iteratively reduces the size of a GMM by successively merging pairs of component densities. Pairs are selected for merger by using a Kullback-Leibler based metric. Using Runnal's method of reduction, we were able
Kuhlen, Anna K.; Allefeld, Carsten; Haynes, John-Dylan
Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people—a person speaking and a person listening. The EEG of one set of twelve participants (“speakers”) was recorded while they were narrating short stories. The EEG of another set of twelve participants (“listeners”) was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called “situation models”. With this study we link a coordination of neural activity between individuals directly to verbally communicated information. PMID:23060770
Past work has investigated cross-speaker and cross-gender differences in voicing of /h/ in English speakers. The purpose of this study was to see whether a phonetically sophisticated speaker could intentionally alter his /h/ voicing patterns, and, if so, how he would effect any changes. One adult male speaker of American English, a trained phonetician and dialectologist, produced approximately 500 repetitions of intervocalic /h/ in short carrier phrases, with differing vowel contexts and loudness levels. In the first block, the speaker produced the utterances normally (i.e., without specific instructions on /h/ production); in the second, he was explicitly asked to devoice his /h/'s. Results indicated that the incidence of devoiced /h/ increased from 2% in the first block to 69% in the second block. On average, the /h/'s in the second block were produced with higher baseline airflows, indicating more extreme laryngeal abduction. This alone did not account for the speaker's devoicing behavior, however, since the soft condition, which had the lowest peak airflows in the second block, had the most devoicing. Voice source measures will be compared between the two blocks to clarify how the speaker altered his laryngeal setting to achieve more devoicing. [Work supported by NIH.
Nose, Takashi; Tachibana, Makoto; Kobayashi, Takao
This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.
Mesgarani, Nima; Chang, Edward F
Humans possess a remarkable ability to attend to a single speaker's voice in a multi-talker background. How the auditory system manages to extract intelligible speech under such acoustically complex and adverse listening conditions is not known, and, indeed, it is not clear how attended speech is internally represented. Here, using multi-electrode surface recordings from the cortex of subjects engaged in a listening task with two simultaneous speakers, we demonstrate that population responses in non-primary human auditory cortex encode critical features of attended speech: speech spectrograms reconstructed based on cortical responses to the mixture of speakers reveal the salient spectral and temporal features of the attended speaker, as if subjects were listening to that speaker alone. A simple classifier trained solely on examples of single speakers can decode both attended words and speaker identity. We find that task performance is well predicted by a rapid increase in attention-modulated neural selectivity across both single-electrode and population-level cortical responses. These findings demonstrate that the cortical representation of speech does not merely reflect the external acoustic environment, but instead gives rise to the perceptual aspects relevant for the listener's intended goal.
The goal of this project was to design, model, build, and test a flat panel speaker and frame for a spherical dome structure being made into a simulator. The simulator will be a test bed for evaluating an immersive environment for human interfaces. This project focused on the loud speakers and a sound diffuser for the dome. The rest of the team worked on an Ambisonics 3D sound system, video projection system, and multi-direction treadmill to create the most realistic scene possible. The main programs utilized in this project, were Pro-E and COMSOL. Pro-E was used for creating detailed figures for the fabrication of a frame that held a flat panel loud speaker. The loud speaker was made from a thin sheet of Plexiglas and 4 acoustic exciters. COMSOL, a multiphysics finite analysis simulator, was used to model and evaluate all stages of the loud speaker, frame, and sound diffuser. Acoustical testing measurements were utilized to create polar plots from the working prototype which were then compared to the COMSOL simulations to select the optimal design for the dome. The final goal of the project was to install the flat panel loud speaker design in addition to a sound diffuser on to the wall of the dome. After running tests in COMSOL on various speaker configurations, including a warped Plexiglas version, the optimal speaker design included a flat piece of Plexiglas with a rounded frame to match the curvature of the dome. Eight of these loud speakers will be mounted into an inch and a half of high performance acoustic insulation, or Thinsulate, that will cover the inside of the dome. The following technical paper discusses these projects and explains the engineering processes used, knowledge gained, and the projected future goals of this project
Doyle, P C; Haaf, R G
The purpose of this project was to investigate the perception of consonant-vowel-consonant (CVC) stimuli produced by four tracheoesophageal (TE) speakers. Stimuli were representative of five phonetic manner classes (stop, fricative, affricate, nasal, and liquidglide). Twelve naive normal-hearing young adults served as listeners. Stimuli were presented via headphones and listeners were prepared and analyses of the data were conducted for individual speakers and for the entire group. The listeners' perceptual judgments were analyzed for each manner of production by phonetic context. Based on statistical analyses of the data obtained, all four speakers were perceived by listeners to produce post-vocalic consonants with significantly better intelligibility.
Hu, Chao; Wang, Qiandong; Short, Lindsey A.; Fu, Genyue
The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown. PMID:22439014
Hu, Chao; Wang, Qiandong; Short, Lindsey A; Fu, Genyue
The current study explored the correlation between speakers' Eysenck personality traits and speech spectrum parameters. Forty-six subjects completed the Eysenck Personality Questionnaire. They were instructed to verbally answer the questions shown on a computer screen and their responses were recorded by the computer. Spectrum parameters of /sh/ and /i/ were analyzed by Praat voice software. Formant frequencies of the consonant /sh/ in lying responses were significantly lower than that in truthful responses, whereas no difference existed on the vowel /i/ speech spectrum. The second formant bandwidth of the consonant /sh/ speech spectrum was significantly correlated with the personality traits of Psychoticism, Extraversion, and Neuroticism, and the correlation differed between truthful and lying responses, whereas the first formant frequency of the vowel /i/ speech spectrum was negatively correlated with Neuroticism in both response types. The results suggest that personality characteristics may be conveyed through the human voice, although the extent to which these effects are due to physiological differences in the organs associated with speech or to a general Pygmalion effect is yet unknown.
Wuerger, Sophie; Xiao, Kaida; Mylonas, Dimitris; Huang, Qingmei; Karatzas, Dimosthenis; Hird, Emily; Paramei, Galina
Observers are faster to detect a target among a set of distracters if the targets and distracters come from different color categories. This cross-boundary advantage seems to be limited to the right visual field, which is consistent with the dominance of the left hemisphere for language processing [Gilbert et al., Proc. Natl. Acad. Sci. USA 103, 489 (2006)]. Here we study whether a similar visual field advantage is found in the color identification task in speakers of Mandarin, a language that uses a logographic system. Forty late Mandarin-English bilinguals performed a blue-green color categorization task, in a blocked design, in their first language (L1: Mandarin) or second language (L2: English). Eleven color singletons ranging from blue to green were presented for 160 ms, randomly in the left visual field (LVF) or right visual field (RVF). Color boundary and reaction times (RTs) at the color boundary were estimated in L1 and L2, for both visual fields. We found that the color boundary did not differ between the languages; RTs at the color boundary, however, were on average more than 100 ms shorter in the English compared to the Mandarin sessions, but only when the stimuli were presented in the RVF. The finding may be explained by the script nature of the two languages: Mandarin logographic characters are analyzed visuospatially in the right hemisphere, which conceivably facilitates identification of color presented to the LVF.
Schäfer, Martina; Robb, Michael P
The purpose of this study was to examine stuttering behavior in German-English bilingual people who stutter (PWS), with particular reference to the frequency of stuttering on content and function words. Fifteen bilingual PWS were sampled who spoke German as the first language (L1) and English as a second language (L2). Conversational speech was sampled in each language and analyzed for the percentage of overall stuttering-like disfluencies and distribution of stuttering on content and function words. Significantly more stuttering was found to occur in L2 compared to L1. Stuttering occurred significantly more often on content words compared to function words in L1. No significant difference between stuttering on function and content words was observed in L2. Examination across L1 and L2 found a significantly greater percentage of stuttering on function words in L2 compared to L1, and a significantly lower percentage of stuttering on content words in L2 compared to L1. The characteristics of stuttering in L2 could not be differentiated on the basis of an L2 proficiency measure. The differences observed in the amount of stuttering between L1 and L2 suggest that stuttering in bilingual speakers is closely related to language dominance, with features of stuttering in L2 indicative of a less developed language system.
Roberson, Debi; Hanley, J Richard; Pak, Hyensou
Categorical perception (CP) is said to occur when a continuum of equally spaced physical changes is perceived as unequally spaced as a function of category membership (Harnad, S. (Ed.) (1987). Psychophysical and cognitive aspects of categorical perception: A critical overview. Cambridge: Cambridge University Press). A common suggestion is that CP for color arises because perception is qualitatively distorted when we learn to categorize a dimension. Contrary to this view, we here report that English speakers show no evidence of lowered discrimination thresholds at the boundaries between blue and green categories even though CP is found at these boundaries in a supra-threshold task. Furthermore, there is no evidence of different discrimination thresholds between individuals from two language groups (English and Korean) who use different color terminology in the blue-green region and have different supra-threshold boundaries. Our participants' just noticeable difference (JND) thresholds suggest that they retain a smooth continuum of perceptual space that is not warped by stretching at category boundaries or by within-category compression. At least for the domain of color, categorical perception appears to be a categorical, but not a perceptual phenomenon.
Gibbs, R W; Nayak, N P; Bolton, J L; Keppel, M E
In three experiments, we examined why some idioms can be lexically altered and still retain their figurative meanings (e.g., John buttoned his lips about Mary can be changed into John fastened his lips about Mary and still mean "John didn't say anything about Mary"), whereas other idioms cannot be lexically altered without losing their figurative meanings (e.g., John kicked the bucket, meaning "John died," loses its idiomatic meaning when changed into John kicked the pail). Our hypothesis was that the lexical flexibility of idioms is determined by speakers' assumptions about the ways in which parts of idioms contribute to their figurative interpretations as a whole. The results of the three experiments indicated that idioms whose individual semantic components contribute to their overall figurative meanings (e.g., go out on a limb) were judged as less disrupted by changes in their lexical items (e.g., go out on a branch) than were nondecomposable idioms (e.g., kick the bucket) when their individual words were altered (e.g., punt the pail). These findings lend support to the idea that both the syntactic productivity and the lexical makeup of idioms are matters of degree, depending on the idioms' compositional properties. This conclusion suggests that idioms do not form a unique class of linguistic items, but share many of the properties of more literal language.
Bidelman, G M; Chung, W-L
Electrophysiological studies demonstrate that the neural coding of pitch is modulated by language experience and the linguistic relevance of the auditory input; both rightward and leftward asymmetries have been observed in the hemispheric specialization for pitch. In music, pitch is encoded using two primary features: contour (patterns of rises and falls) and interval (frequency separation between tones) cues. Recent evoked potential studies demonstrate that these "global" (contour) and "local" (interval) aspects of pitch are processed automatically (but bilaterally) in trained musicians. Here, we examined whether alternate forms of pitch expertise, namely, tone-language experience (i.e., Chinese), influence the early detection of contour and intervallic deviations within ongoing pitch sequences. Neuroelectric mismatch negativity (MMN) potentials were recorded in Chinese speakers and English-speaking nonmusicians in response to continuous pitch sequences with occasional global or local deviations in the ongoing melodic stream. This paradigm allowed us to explore potential cross-language differences in the hemispheric weighting for contour and interval processing of pitch. Chinese speakers showed differential pitch encoding between hemispheres not observed in English listeners; Chinese MMNs revealed a rightward bias for contour processing but a leftward hemispheric laterality for interval processing. In contrast, no asymmetries were observed in the English group. Collectively, our findings suggest tone-language experience sensitizes auditory brain mechanisms for the detection of subtle global/local pitch changes in the ongoing auditory stream and exaggerates functional asymmetries in pitch processing between cerebral hemispheres.
The present study investigates voice onset times (VOTs) to determine if cognates enhance the cross-language phonetic influences in the speech production of a range of Spanish-English bilinguals: Spanish heritage speakers, English heritage speakers, advanced L2 Spanish learners, and advanced L2 English learners. To answer this question, lexical…
Biau, Emmanuel; Torralba, Mireia; Fuentemilla, Lluis; de Diego Balaguer, Ruth; Soto-Faraco, Salvador
Speakers often accompany speech with spontaneous beat gestures in natural spoken communication. These gestures are usually aligned with lexical stress and can modulate the saliency of their affiliate words. Here we addressed the consequences of beat gestures on the neural correlates of speech perception. Previous studies have highlighted the role played by theta oscillations in temporal prediction of speech. We hypothesized that the sight of beat gestures may influence ongoing low-frequency neural oscillations around the onset of the corresponding words. Electroencephalographic (EEG) recordings were acquired while participants watched a continuous, naturally recorded discourse. The phase-locking value (PLV) at word onset was calculated from the EEG from pairs of identical words that had been pronounced with and without a concurrent beat gesture in the discourse. We observed an increase in PLV in the 5-6 Hz theta range as well as a desynchronization in the 8-10 Hz alpha band around the onset of words preceded by a beat gesture. These findings suggest that beats help tune low-frequency oscillatory activity at relevant moments during natural speech perception, providing a new insight of how speech and paralinguistic information are integrated.
Donai, Jeremy J; Motiian, Saeid; Doretto, Gianfranco
The high-frequency region of vowel signals (above the third formant or F3) has received little research attention. Recent evidence, however, has documented the perceptual utility of high-frequency information in the speech signal above the traditional frequency bandwidth known to contain important cues for speech and speaker recognition. The purpose of this study was to determine if high-pass filtered vowels could be separated by vowel category and speaker type in a supervised learning framework. Mel frequency cepstral coefficients (MFCCs) were extracted from productions of six vowel categories produced by two male, two female, and two child speakers. Results revealed that the filtered vowels were well separated by vowel category and speaker type using MFCCs from the high-frequency spectrum. This demonstrates the presence of useful information for automated classification from the high-frequency region and is the first study to report findings of this nature in a supervised learning framework.
Donai, Jeremy J.; Motiian, Saeid; Doretto, Gianfranco
The high-frequency region of vowel signals (above the third formant or F3) has received little research attention. Recent evidence, however, has documented the perceptual utility of high-frequency information in the speech signal above the traditional frequency bandwidth known to contain important cues for speech and speaker recognition. The purpose of this study was to determine if high-pass filtered vowels could be separated by vowel category and speaker type in a supervised learning framework. Mel frequency cepstral coefficients (MFCCs) were extracted from productions of six vowel categories produced by two male, two female, and two child speakers. Results revealed that the filtered vowels were well separated by vowel category and speaker type using MFCCs from the high-frequency spectrum. This demonstrates the presence of useful information for automated classification from the high-frequency region and is the first study to report findings of this nature in a supervised learning framework. PMID:27588160
Lass, Norman J.; And Others
Reports a study which shows that subjects can make discriminative judgments of a speaker's height and weight from his tape recorded speech. This ability is not altered by the filtering of the speech signal. (PMJ)
Stuart, Andrew; Kalinowski, Joseph; Rastatter, Michael P.; Lynch, Kerry
This study investigated the effect of short and long auditory feedback delays at two speech rates with normal speakers. Seventeen participants spoke under delayed auditory feedback (DAF) at 0, 25, 50, and 200 ms at normal and fast rates of speech. Significantly two to three times more dysfluencies were displayed at 200 ms (p<0.05) relative to no delay or the shorter delays. There were significantly more dysfluencies observed at the fast rate of speech (p=0.028). These findings implicate the peripheral feedback system(s) of fluent speakers for the disruptive effects of DAF on normal speech production at long auditory feedback delays. Considering the contrast in fluency/dysfluency exhibited between normal speakers and those who stutter at short and long delays, it appears that speech disruption of normal speakers under DAF is a poor analog of stuttering.
Merriman, William E.; Evey, Julie A.
If after teaching a label for 1 object, a speaker does not name a nearby object, 3-year-olds tend to reject the label for the nearby object (W.E. Merriman, J.M. Marazita, L.H. Jarvis, J.A. Evey-Burkey, and M. Biggins, 1995a). In Studies 1 (5-year-olds) and 3 (3-year-olds), this effect depended on object similarity. In Study 2, when a speaker used…
Suh, Youngjoo; Kim, Hoirin
In this paper, a new discriminative likelihood score weighting technique is proposed for speaker identification. The proposed method employs a discriminative weighting of frame-level log-likelihood scores with acoustic-phonetic classification in the Gaussian mixture model (GMM)-based speaker identification. Experiments performed on the Aurora noise-corrupted TIMIT database showed that the proposed approach provides meaningful performance improvement with an overall relative error reduction of 15.8% over the maximum likelihood-based baseline GMM approach.
Knoeferle, Pia; Kreysa, Helene
During comprehension, a listener can rapidly follow a frontally seated speaker’s gaze to an object before its mention, a behavior which can shorten latencies in speeded sentence verification. However, the robustness of gaze-following, its interaction with core comprehension processes such as syntactic structuring, and the persistence of its effects are unclear. In two “visual-world” eye-tracking experiments participants watched a video of a speaker, seated at an angle, describing transitive (non-depicted) actions between two of three Second Life characters on a computer screen. Sentences were in German and had either subjectNP1-verb-objectNP2 or objectNP1-verb-subjectNP2 structure; the speaker either shifted gaze to the NP2 character or was obscured. Several seconds later, participants verified either the sentence referents or their role relations. When participants had seen the speaker’s gaze shift, they anticipated the NP2 character before its mention and earlier than when the speaker was obscured. This effect was more pronounced for SVO than OVS sentences in both tasks. Interactions of speaker gaze and sentence structure were more pervasive in role-relations verification: participants verified the role relations faster for SVO than OVS sentences, and faster when they had seen the speaker shift gaze than when the speaker was obscured. When sentence and template role-relations matched, gaze-following even eliminated the SVO-OVS response-time differences. Thus, gaze-following is robust even when the speaker is seated at an angle to the listener; it varies depending on the syntactic structure and thematic role relations conveyed by a sentence; and its effects can extend to delayed post-sentence comprehension processes. These results suggest that speaker gaze effects contribute pervasively to visual attention and comprehension processes and should thus be accommodated by accounts of situated language comprehension. PMID:23227018
Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou
This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.
Nan, Yun; Sun, Yanan; Peretz, Isabelle
Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117 healthy young Mandarin speakers with no self-declared musical problems and 22 individuals who reported musical difficulties and scored two standard deviations below the mean obtained by the Mandarin speakers without amusia. These 22 amusic individuals showed a similar pattern of musical impairment as did amusic speakers of non-tonal languages, by exhibiting a more pronounced deficit in melody than in rhythm processing. Furthermore, nearly half the tested amusics had impairments in the discrimination and identification of Mandarin lexical tones. Six showed marked impairments, displaying what could be called lexical tone agnosia, but had normal tone production. Our results show that speakers of tone languages such as Mandarin may experience musical pitch disorder despite early exposure to speech-relevant pitch contrasts. The observed association between the musical disorder and lexical tone difficulty indicates that the pitch disorder as defining congenital amusia is not specific to music or culture but is rather general in nature.
Wang, Shizhen; Lulich, Steven M; Alwan, Abeer
Speaker normalization typically focuses on inter-speaker variabilities of the supraglottal (vocal tract) resonances, which constitute a major cause of spectral mismatch. Recent studies have shown that the subglottal airways also affect spectral properties of speech sounds, and promising results were reported using the subglottal resonances for speaker normalization. This paper proposes a reliable algorithm to automatically estimate the second subglottal resonance (Sg2) from speech signals. The algorithm is calibrated on children's speech data with simultaneous accelerometer recordings from which Sg2 frequencies can be directly measured. A cross-language study with bilingual Spanish-English children is performed to investigate whether Sg2 frequencies are independent of speech content and language. The study verifies that Sg2 is approximately constant for a given speaker and thus can be a good candidate for limited data speaker normalization and cross-language adaptation. A speaker normalization method using Sg2 is then presented. This method is computationally more efficient than maximum-likelihood based vocal tract length normalization (VTLN), with performance better than VTLN for limited adaptation data and cross-language adaptation. Experimental results confirm that this method performs well in a variety of testing conditions and tasks.
Dussias, Paola E; Cramer Scaltz, Tracy R
Using a self-paced moving window reading paradigm, we examine the degree to which structural commitments made while 60 Spanish-English L2 speakers read syntactically ambiguous sentences in their second language (L2) are constrained by the verb's lexical entry about its preferred structural environment (i.e., subcategorization bias). The ambiguity under investigation arises because a noun phrase immediately following a verb can be parsed as either the direct object of the verb 'The CIA director confirmed the rumor when he testified before Congress', or as the subject of an embedded complement 'The CIA director confirmed the rumor could mean a security leak'. In an experiment with 59 monolingual English participants, we replicate the findings reported in the previous literature demonstrating that native speakers are guided by subcategorization bias information during sentence interpretation. In a bilingual experiment, we then show that L2 subcategorization biases influence L2 sentence interpretation. The results indicate that L2 speakers keep track of the relative frequencies of verb-subcategorization alternatives and use this information when building structure in the L2.
Chen, Jenn-Yeu; Su, Jui-Ju; Lee, Chao-Yang; O'Seaghdha, Padraig G.
Chinese and English speakers seem to hold different conceptions of time which may be related to the different codings of time in the two languages. Employing a sentence-picture matching task, we have investigated this linguistic relativity in Chinese-English bilinguals varying in English proficiency and found that those with high proficiency…
This dissertation measured the acoustic properties of the English fricatives and affricates produced by native and Chinese L2 speakers of English to identify the phonetic basis and sources of a foreign accent and to explore the mechanism involved in L2 speech production and L2 phonological acquisition at the segmental level. Based on a Network…
Ng, Manwa L; Chen, Yang
The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.
Schafer, Patrick J; Serman, Maja; Arnold, Mirko; Corona-Strauss, Farah I; Strauss, Daniel J; Seidler-Fallbohmer, Brigit; Seidler, Harald
Speaker recognition in a multi-speaker environment is a complex listening task that requires effort to be solved. Especially people with hearing loss show an increased listening effort in demanding listening situations compared to normal hearing people. However, a standardized method to quantify listening effort does not exist yet. Recently we have shown a possible way to determine listening effort objectively. The aim of this study was to validate the proposed objective measure in a challenging, true-to-life listening situation, and to get an insight on the influence of different hearing aid (HA) settings on the listening effort using the proposed measure. To achieve this we investigated the influence of four different HA settings and two different listening task difficulties (LTD) on the listening effort of people with hearing loss in a selective, real-speech listening task. HA setting A, B and C all had an adaptive compression with static characteristic, but differed in the gain and compression settings (more and less gain and more and less linear). Setting D had an adaptive compression whose characteristic was situation-dependent. To quantify the listening effort the ongoing oscillatory EEG activity was recorded as the basis to calculate the objective measure (OLEosc). By way of comparison a subjective listening effort score was determined on an individual basis (SLEscr). The results show that the OLEosc maps the SLEscr well in every of the tested conditions. Furthermore, the results also suggest that OLEosc might be more sensitive to small variances in listening effort than the employed subjective rating scale.
Bonte, Milene; Hausfeld, Lars; Scharke, Wolfgang; Valente, Giancarlo; Formisano, Elia
Selective attention to relevant sound properties is essential for everyday listening situations. It enables the formation of different perceptual representations of the same acoustic input and is at the basis of flexible and goal-dependent behavior. Here, we investigated the role of the human auditory cortex in forming behavior-dependent representations of sounds. We used single-trial fMRI and analyzed cortical responses collected while subjects listened to the same speech sounds (vowels /a/, /i/, and /u/) spoken by different speakers (boy, girl, male) and performed a delayed-match-to-sample task on either speech sound or speaker identity. Univariate analyses showed a task-specific activation increase in the right superior temporal gyrus/sulcus (STG/STS) during speaker categorization and in the right posterior temporal cortex during vowel categorization. Beyond regional differences in activation levels, multivariate classification of single trial responses demonstrated that the success with which single speakers and vowels can be decoded from auditory cortical activation patterns depends on task demands and subject's behavioral performance. Speaker/vowel classification relied on distinct but overlapping regions across the (right) mid-anterior STG/STS (speakers) and bilateral mid-posterior STG/STS (vowels), as well as the superior temporal plane including Heschl's gyrus/sulcus. The task dependency of speaker/vowel classification demonstrates that the informative fMRI response patterns reflect the top-down enhancement of behaviorally relevant sound representations. Furthermore, our findings suggest that successful selection, processing, and retention of task-relevant sound properties relies on the joint encoding of information across early and higher-order regions of the auditory cortex.
Mosko, J. D.; Stevens, K. N.; Griffin, G. R.
Acoustical analyses were conducted of words produced by four speakers in a motion stress-inducing situation. The aim of the analyses was to document the kinds of changes that occur in the vocal utterances of speakers who are exposed to motion stress and to comment on the implications of these results for the design and development of voice interactive systems. The speakers differed markedly in the types and magnitudes of the changes that occurred in their speech. For some speakers, the stress-inducing experimental condition caused an increase in fundamental frequency, changes in the pattern of vocal fold vibration, shifts in vowel production and changes in the relative amplitudes of sounds containing turbulence noise. All speakers showed greater variability in the experimental condition than in more relaxed control situation. The variability was manifested in the acoustical characteristics of individual phonetic elements, particularly in speech sound variability observed serve to unstressed syllables. The kinds of changes and variability observed serve to emphasize the limitations of speech recognition systems based on template matching of patterns that are stored in the system during a training phase. There is need for a better understanding of these phonetic modifications and for developing ways of incorporating knowledge about these changes within a speech recognition system.
Ding, Nai; Simon, Jonathan Z
A visual scene is perceived in terms of visual objects. Similar ideas have been proposed for the analogous case of auditory scene analysis, although their hypothesized neural underpinnings have not yet been established. Here, we address this question by recording from subjects selectively listening to one of two competing speakers, either of different or the same sex, using magnetoencephalography. Individual neural representations are seen for the speech of the two speakers, with each being selectively phase locked to the rhythm of the corresponding speech stream and from which can be exclusively reconstructed the temporal envelope of that speech stream. The neural representation of the attended speech dominates responses (with latency near 100 ms) in posterior auditory cortex. Furthermore, when the intensity of the attended and background speakers is separately varied over an 8-dB range, the neural representation of the attended speech adapts only to the intensity of that speaker but not to the intensity of the background speaker, suggesting an object-level intensity gain control. In summary, these results indicate that concurrent auditory objects, even if spectrotemporally overlapping and not resolvable at the auditory periphery, are neurally encoded individually in auditory cortex and emerge as fundamental representational units for top-down attentional modulation and bottom-up neural adaptation.
This study investigates the nature of the acoustic variation in sequences of identical affricates produced by Polish learners of English. In both English and Polish sequences of identical affricates occur across word boundaries, but only in Polish do such sequences also occur root internally and across morpheme boundaries. In Polish sequences of identical affricates are manifested variably both by rearticulation of both affricates and by articulation of a single affricate but with lengthened duration of either the stop or the fricative. To investigate their English, the subjects performed two tasks: repetition of 12 English sentences and orally responding to 17 multiple choice questions. The task produced significant cross-speaker differences in the phonetics of the geminates, differences correlated with differences in their proficiency levels in English. The more Polish-like singly articulated long affricates were produced by 22% of the intermediate speakers but by 48% of the advanced speakers, the opposite of what one might expect. The intermediate speakers appear to have paid more attention to the phonetics of the English cues, thus producing more fully rearticulated affricates; the more advanced speakers appear to have paid less attention to the phonetics of the cues, thus reverting more to the norms of Polish pronunciation.
Derdemezis, Ekaterini; Kent, Ray D.; Fourakis, Marios; Reinicke, Emily L.; Bolt, Daniel M.
Purpose This study systematically assessed the effects of select linear predictive coding (LPC) analysis parameter manipulations on vowel formant measurements for diverse speaker groups using 4 trademarked Speech Acoustic Analysis Software Packages (SAASPs): CSL, Praat, TF32, and WaveSurfer. Method Productions of 4 words containing the corner vowels were recorded from 4 speaker groups with typical development (male and female adults and male and female children) and 4 speaker groups with Down syndrome (male and female adults and male and female children). Formant frequencies were determined from manual measurements using a consensus analysis procedure to establish formant reference values, and from the 4 SAASPs (using both the default analysis parameters and with adjustments or manipulations to select parameters). Smaller differences between values obtained from the SAASPs and the consensus analysis implied more optimal analysis parameter settings. Results Manipulations of default analysis parameters in CSL, Praat, and TF32 yielded more accurate formant measurements, though the benefit was not uniform across speaker groups and formants. In WaveSurfer, manipulations did not improve formant measurements. Conclusions The effects of analysis parameter manipulations on accuracy of formant-frequency measurements varied by SAASP, speaker group, and formant. The information from this study helps to guide clinical and research applications of SAASPs. PMID:26501214
Lass, Norman; Atkins, Traci; Squires, Rebekah
A master tape containing the randomized recordings of 20 Hispanic-, Asian-, and Arabic-accented English speakers reading a standard prose passage was presented to a group of 22 native English-speaking listeners who participated in two listening sessions. In the first session they were asked to use a 5-point listening preference rating scale. In the second session they heard the same speakers and were asked to determine the presence or absence of an accent and, if present, the type (Asian, Hispanic, Arabic, or other) and degree (mild, moderate, or severe) of accentedness. A scattergram plotting listeners' mean listening preference ratings and degree of accentedness ratings for each of the speakers in the study revealed a strong inverse relationship which yielded a statistically significant (p <0.01) correlation coefficient.Thus, the higher the degree of severity of the listeners' judged accentedness of speakers, the more negative their listening preference rating judgments of the speakers. Implications of these findings and suggestions for future research are discussed.
Islam, Md. Atiqul; Jassim, Wissam A.; Cheok, Ng Siew; Zilany, Muhammad Shamsul Arefeen
Speaker identification under noisy conditions is one of the challenging topics in the field of speech processing applications. Motivated by the fact that the neural responses are robust against noise, this paper proposes a new speaker identification system using 2-D neurograms constructed from the responses of a physiologically-based computational model of the auditory periphery. The responses of auditory-nerve fibers for a wide range of characteristic frequency were simulated to speech signals to construct neurograms. The neurogram coefficients were trained using the well-known Gaussian mixture model-universal background model classification technique to generate an identity model for each speaker. In this study, three text-independent and one text-dependent speaker databases were employed to test the identification performance of the proposed method. Also, the robustness of the proposed method was investigated using speech signals distorted by three types of noise such as the white Gaussian, pink, and street noises with different signal-to-noise ratios. The identification results of the proposed neural-response-based method were compared to the performances of the traditional speaker identification methods using features such as the Mel-frequency cepstral coefficients, Gamma-tone frequency cepstral coefficients and frequency domain linear prediction. Although the classification accuracy achieved by the proposed method was comparable to the performance of those traditional techniques in quiet, the new feature was found to provide lower error rates of classification under noisy environments. PMID:27392046
Hanson, H M; Chuang, E S
Acoustic measurements believed to reflect glottal characteristics were made on recordings collected from 21 male speakers. The waveforms and spectra of three nonhigh vowels (/ae, lambda, epsilon/) were analyzed to obtain acoustic parameters related to first-formant bandwidth, open quotient, spectral tilt, and aspiration noise. Comparisons were made with previous results obtained for 22 female speakers [H. M. Hanson, J. Acoust. Soc. Am. 101, 466-481 (1997)]. While there is considerable overlap across gender, the male data show lower average values and less interspeaker variation for all measures. In particular, the amplitude of the first harmonic relative to that of the third formant is 9.6 dB lower for the male speakers than for the female speakers, suggesting that spectral tilt is an especially significant parameter for differentiating male and female speech. These findings are consistent with fiberscopic studies which have shown that males tend to have a more complete glottal closure, leading to less energy loss at the glottis and less spectral tilt. Observations of the speech waveforms and spectra suggest the presence of a second glottal excitation within a glottal period for some of the male speakers. Possible causes and acoustic consequences of these second excitations are discussed.
This paper presents and evaluates a modular/hybrid connectionist system for speaker identification. Modularity has emerged as a powerful technique for reducing the complexity of connectionist systems, allowing a priori knowledge to be incorporated into their design. In problems where training data are scarce, such modular systems are likely to generalize significantly better than a monolithic connectionist system. In addition, modules are not restricted to be connectionist: hybrid systems, with e.g. Hidden Markov Models (HMMs), can be designed, combining the advantages of connectionist and non-connectionist approaches. Text independent speaker identification is an inherently complex task where the amount of training data is often limited. It thus provides an ideal domain to test the validity of the modular/hybrid connectionist approach. An architecture is developed in this paper which achieves this identification, based upon the cooperation of several connectionist modules, together with an HMM module. When tested on a population of 102 speakers extracted from the DARPA-TIMIT database, perfect identification was obtained. Overall, our recognition results are among the best for any text-independent speaker identification system handling this population size. In a specific comparison with a system based on multivariate auto-regressive models, the modular/hybrid connectionist approach was found to be significantly better in terms of both accuracy and speed. Our design also allows for easy incorporation of new speakers.
Kim, Hye Jin; Yang, Woo Seok; No, Kwangsoo
The vibrational characteristics of 3 types of the acoustic diaphragms are investigated to enhance the output acoustic performance of the piezoelectric ceramic speaker in a low-frequency range. In other to achieve both a higher output sound pressure level and wider frequency range of the piezoelectric speaker, we have proposed a rubber/resin bi-layer acoustic diaphragm. The theoretical square-root dependence of the fundamental resonant frequency on the thickness and Young's modulus of the acoustic diaphragm was verified by finite-element analysis simulation and laser scanning vibrometer measurement. The simulated resonant frequencies for each diaphragm correspond well to the measured results. From the simulated and measured resonant frequency results, it is found that the fundamental resonant frequency of the piezoelectric ceramic speaker can be designed by adjusting the thickness ratio of the rubber/resin bi-layer acoustic diaphragm. Compared with a commercial piezoelectric speaker, the fabricated piezoelectric ceramic speaker with the rubber/resin bi-layer diaphragm has at least 10 dB higher sound pressures in the low-frequency range of less than 1 kHz.
Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat
There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.
Lev-Ari, Shiri; Peperkamp, Sharon
Speech perception is known to be influenced by listeners' expectations of the speaker. This paper tests whether the demographic makeup of individuals' communities can influence their perception of foreign sounds by influencing their expectations of the language. Using online experiments with participants from all across the U.S. and matched census data on the proportion of Spanish and other foreign language speakers in participants' communities, this paper shows that the demographic makeup of individuals' communities influences their expectations of foreign languages to have an alveolar trill versus a tap (Experiment 1), as well as their consequent perception of these sounds (Experiment 2). Thus, the paper shows that while individuals' expectations of foreign language to have a trill occasionally lead them to misperceive a tap in a foreign language as a trill, a higher proportion of non-trill language speakers in one's community decreases this likelihood. These results show that individuals' environment can influence their perception by shaping their linguistic expectations.
It is typically assumed that when orthography is translated silently into phonology (i.e., when reading silently), the phonological representation is equivalent to the spoken form or, at least, the surface phonemic form. The research presented here demonstrates that the phonological representation is likely to be more abstract than this, and is…
Lim, Valerie P. C.; Lincoln, Michelle; Chan, Yiong Huak; Onslow, Mark
Purpose: English and Mandarin are the 2 most spoken languages in the world, yet it is not known how stuttering manifests in English-Mandarin bilinguals. In this research, the authors investigated whether the severity and type of stuttering is different in English and Mandarin in English-Mandarin bilinguals, and whether this difference was…
Lagos, Cristián; Espinoza, Marco; Rojas, Darío
In this paper, we analyse the cultural models (or folk theory of language) that the Mapuche intellectual elite have about Mapudungun, the native language of the Mapuche people still spoken today in Chile as the major minority language. Our theoretical frame is folk linguistics and studies of language ideology, but we have also taken an applied…
Bhagwat, Jui; Casasola, Marianella
Two experiments examined when monolingual, English-learning 19-month-old infants learn a second object label. Two experimenters sat together. One labeled a novel object with one novel label, whereas the other labeled the same object with a different label in either the same or a different language. Infants were tested on their comprehension of each label immediately following its presentation. Infants mapped the first label at above chance levels, but they did so with the second label only when requested by the speaker who provided it (Experiment 1) or when the second experimenter labeled the object in a different language (Experiment 2). These results show that 19-month-olds learn second object labels but do not readily generalize them across speakers of the same language. The results highlight how speaker and language spoken guide infants' acceptance of second labels, supporting sociopragmatic views of word learning.
Karadaghi, Rawande; Hertlein, Heinz; Ariyaeeinia, Aladdin
The concern in this paper is an important category of applications of open-set speaker identification in criminal investigation, which involves operating with short and varied duration speech. The study presents investigations into the adverse effects of such an operating condition on the accuracy of open-set speaker identification, based on both GMMUBM and i-vector approaches. The experiments are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover the real-world operating conditions in the considered application area, the study includes experiments with various combinations of training and testing data duration. The paper details the characteristics of the experimental investigations conducted and provides a thorough analysis of the results obtained.
Green, David William
This paper proposes that different experimental contexts (single or dual language contexts) permit different neural loci at which words in the target language can be selected. However, in order to develop a fuller understanding of the neural circuit mediating language control we need to consider the community context in which bilingual speakers typically use their two languages (the behavioral ecology of bilingual speakers). The contrast between speakers from code-switching and non-code-switching communities offers a way to increase our understanding of the cortical, subcortical and, in particular, cerebellar structures involved in language control. It will also help us identify the non-verbal behavioral correlates associated with these control processes.
Ohms, Verena R.; Gill, Arike; Van Heijningen, Caroline A. A.; Beckers, Gabriel J. L.; ten Cate, Carel
Humans readily distinguish spoken words that closely resemble each other in acoustic structure, irrespective of audible differences between individual voices or sex of the speakers. There is an ongoing debate about whether the ability to form phonetic categories that underlie such distinctions indicates the presence of uniquely evolved, speech-linked perceptual abilities, or is based on more general ones shared with other species. We demonstrate that zebra finches (Taeniopygia guttata) can discriminate and categorize monosyllabic words that differ in their vowel and transfer this categorization to the same words spoken by novel speakers independent of the sex of the voices. Our analysis indicates that the birds, like humans, use intrinsic and extrinsic speaker normalization to make the categorization. This finding shows that there is no need to invoke special mechanisms, evolved together with language, to explain this feature of speech perception. PMID:19955157
Ryskin, Rachel A; Wang, Ranxiao Frances; Brown-Schmidt, Sarah
Little is known about how listeners represent another person's spatial perspective during language processing (e.g., two people looking at a map from different angles). Can listeners use contextual cues such as speaker identity to access a representation of the interlocutor's spatial perspective? In two eye-tracking experiments, participants received auditory instructions to move objects around a screen from two randomly alternating spatial perspectives (45° vs. 315° or 135° vs. 225° rotations from the participant's viewpoint). Instructions were spoken either by one voice, where the speaker's perspective switched at random, or by two voices, where each speaker maintained one perspective. Analysis of participant eye-gaze showed that interpretation of the instructions improved when each viewpoint was associated with a different voice. These findings demonstrate that listeners can learn mappings between individual talkers and viewpoints, and use these mappings to guide online language processing.
Sawada, Hideyuki; Ohkado, Minoru
Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.
Gable, Todd J.; Ng, Lawrence C.; Holzrichter, John F.; Burnett, Greg C.
A method and system for speech characterization. One embodiment includes a method for speaker verification which includes collecting data from a speaker, wherein the data comprises acoustic data and non-acoustic data. The data is used to generate a template that includes a first set of "template" parameters. The method further includes receiving a real-time identity claim from a claimant, and using acoustic data and non-acoustic data from the identity claim to generate a second set of parameters. The method further includes comparing the first set of parameters to the set of parameters to determine whether the claimant is the speaker. The first set of parameters and the second set of parameters include at least one purely non-acoustic parameter, including a non-acoustic glottal shape parameter derived from averaging multiple glottal cycle waveforms.
Papakyritsis, Ioannis; Müller, Nicole
The study reported in this paper investigated the abilities of Greek speakers with dysarthria to signal lexical stress at the single word level. Three speakers with dysarthria and two unimpaired control participants were recorded completing a repetition task of a list of words consisting of minimal pairs of Greek disyllabic words contrasted by lexical stress location only. Fourteen listeners were asked to determine the attempted stress location for each word pair. Acoustic analyses of duration and intensity ratios, both within and across words, were undertaken to identify possible acoustic correlates of the listeners' judgments concerning stress location. Acoustic and perceptual data indicate that while each participant with dysarthria in this study had some difficulty in signaling stress unambiguously, the pattern of difficulty was different for each speaker. Further, it was found that the relationship between the listeners' judgments of stress location and the acoustic data was not conclusive.
Searl, Jeff; Evitts, Paul; Davis, William J
The purpose of this study was to examine the effect of a thin pseudopalate on the speech of normal adults. It was hypothesized that speech would be initially altered, but speakers would adapt quickly to the device. Eleven speakers produced words without the pseudopalate and at six intervals with the appliance in place. Consonant acoustics were changed initially, but returned to baseline within approximately 30 minutes. Perceptually, consonant identification and distortion ratings were unchanged when wearing the pseudopalate. Results suggest an initial alteration to speech detectable acoustically, but rapid adaptation, for most speakers. Investigators using thin pseudopalates must recognize that speech is altered, at least initially, and account for this in their procedures and interpretation of results.
Kim, Ji-Hye; Montrul, Silvina; Yoon, James
This study investigates the potential incomplete acquisition of binding interpretations in Korean-English bilinguals by asking whether and how the majority language of these bilinguals (English) influences their family or heritage language (Korean), especially when exposure to and use of English starts very early. The experiment tested the…
Puah, Yann-Yann; Ting, Su-Hie
The study examines the influence of gender, age and socio-economic status on attitudes of Foochow and Hokkien towards their ethnic language and Mandarin. The matched guise test results of 120 Foochow and 120 Hokkien participants in Kuching, Malaysia, showed positive attitudes towards Mandarin on all the 15 traits. The Hokkien participants were…
Herbert, Ruth; Anderson, Elizabeth; Best, Wendy; Gregory, Emma
Theories of spoken word production agree that semantic and phonological representations are activated in spoken word production. There is less agreement concerning the role of syntax. In this study we investigated noun syntax activation in English bare noun naming, using mass and count nouns. Fourteen healthy controls and 13 speakers with aphasia took part. Participants named mass and count nouns, and completed a related noun syntax judgement task. We analysed speakers' noun syntax knowledge when naming accurately, and when making errors in production. Healthy speakers' noun syntax judgement was accurate for words they named correctly, but this did not correlate with naming accuracy. Speakers with aphasia varied in their noun syntax judgement, and this also did not correlate with naming accuracy. Healthy speakers' syntax for semantic errors was less accurate, as was that for speakers with aphasia. For phonological errors half the participants with aphasia could access syntax, half could not, indicating two types of phonological error. Individual differences were found in no responses. Finally, we found no effect of frequency for any of the above. The lack of a relationship between syntax and naming accuracy suggests that syntax is available, but access is not obligatory. This finding supports theories incorporating non-obligatory syntactic processing, which is independent of phonological access. The semantic error data are best explained within such a theory where there is damage to phonological access and hence to independent syntax. For the aphasia group we identify two types of phonological error, one implicating syntax and phonology, and one implicating phonology only, again supporting independent access to these systems. Overall the data support a model within which syntax is independent of phonology, and activation of syntax operates flexibly dependent on task demands and integrity of other processing routines.
Ferguson, Ian; Phillips, Andrew W.; Lin, Michelle
Introduction Although continuing medical education (CME) presentations are common across health professions, it is unknown whether slide design is independently associated with audience evaluations of the speaker. Based on the conceptual framework of Mayer’s theory of multimedia learning, this study aimed to determine whether image use and text density in presentation slides are associated with overall speaker evaluations. Methods This retrospective analysis of six sequential CME conferences (two annual emergency medicine conferences over a three-year period) used a mixed linear regression model to assess whether post-conference speaker evaluations were associated with image fraction (percentage of image-based slides per presentation) and text density (number of words per slide). Results A total of 105 unique lectures were given by 49 faculty members, and 1,222 evaluations (70.1% response rate) were available for analysis. On average, 47.4% (SD=25.36) of slides had at least one educationally-relevant image (image fraction). Image fraction significantly predicted overall higher evaluation scores [F(1, 100.676)=6.158, p=0.015] in the mixed linear regression model. The mean (SD) text density was 25.61 (8.14) words/slide but was not a significant predictor [F(1, 86.293)=0.55, p=0.815]. Of note, the individual speaker [χ2(1)=2.952, p=0.003] and speaker seniority [F(3, 59.713)=4.083, p=0.011] significantly predicted higher scores. Conclusion This is the first published study to date assessing the linkage between slide design and CME speaker evaluations by an audience of practicing clinicians. The incorporation of images was associated with higher evaluation scores, in alignment with Mayer’s theory of multimedia learning. Contrary to this theory, however, text density showed no significant association, suggesting that these scores may be multifactorial. Professional development efforts should focus on teaching best practices in both slide design and presentation
insight into a low resource languages ,” Transactions on Machine Learning and Artificial Intelligence, 2(4), Aug. 2014, 115-126.  Q. Zheng, G. Liu...by a person is a rich and valuable piece of information for several applications such as health monitoring, second language learning or language ...ROBUST SPEECH PROCESSING & RECOGNITION: SPEAKER ID, LANGUAGE ID, SPEECH RECOGNITION/KEYWORD SPOTTING, DIARIZATION/CO- CHANNEL/ENVIRONMENTAL
The purpose of this paper is to distinguish different types of television speakers who, concurrently with the health professionals, talk about eight health-related behaviours. This analysis covered three weeks of television by two broadcasters (756 hours of television). We extracted from the television programs the messages and lifestyle images transmitted to the public on the above-mentioned behaviours. We discovered that the different television speakers (health professionals, journalists, show hosts and stars, everyday people, fictional characters, advertising and social marketing) are interested at very different levels by the studied behaviours, and thus contribute differently to the creation of health representation, which is not always in line with health promotion measures.
Spanish Courses for Spanish Speakers: Partial Listing of Programs. [and] Spanish Materials Being Used in Courses for Native Spanish Speakers at the Secondary Level. CLEAR Materials Resource Series, Numbers 3 and 5.
Dreyfus, Dan; Willetts, Karen
Two numbers of the CLEAR Materials Resource Series that both deal with teaching Spanish to native Spanish speakers have been combined. Number three provides a brief description of the courses and curricula developed by various school districts for the teaching of Spanish language arts to native speakers of Spanish at the elementary and secondary…
Investigates native speakers' (NS) perceptions of coherence and comprehensibility of nonnative speakers' writing and talk that lacks or misuses grammatical cohesive devices. NS readers of NNS texts with missing cohesive devices assumed coherence and imposed coherence on the text by adding grammatical cohesive devices missing in the original,…
Dagenais, Paul A; Stallworth, Jamequa A
The purpose of this study was to determine the influence of dialect upon the perception of dysarthric speech. Speakers and listeners were self-identifying as either Caucasian American or African American. Three speakers were Caucasian American, three were African American. Four speakers had experienced a CVA and were dysarthric. Listeners were age matched and were equally divided for gender. Readers recorded 14 word sentences from the Assessment of Intelligibility of Dysarthric Speech. Listeners provided ratings of intelligibility, comprehensibility, and acceptability. Own-race biases were found for all measures; however, significant findings were found for intelligibility and comprehensibility in that the Caucasian Americans provided significantly higher scores for Caucasian American speakers. Clinical implications are discussed.
McCaffrey Morrison, Helen
Locus equations (LEs) were derived from consonant-vowel-consonant (CVC) syllables produced by four speakers with profound hearing loss. Group data indicated that LE functions obtained for the separate CVC productions initiated by /b/, /d/, and /g/ were less well-separated in acoustic space than those obtained from speakers with normal hearing. A…
This article reports on a qualitative investigation into the ways that speakers of other languages negotiate their identities in an English-medium international school. As elective-bilinguals, it is generally assumed that such individuals are exempt from unfavourable positioning. As beginning speakers of English, however, older participants in…
This document shares a vision for a 4-year curriculum for Heritage Speakers of Spanish (HSS)/Spanish for Native Speakers (SNS), describing a course developed for SNS students within Mercy High School in San Francisco, California. The vision foresees an ever-increasing number of HSS and SNS students completing college level degree programs then…
This study analysed the extent to which literate native speakers of a language with a phonemic alphabetic orthography rely on their first language (L1) orthography during second language (L2) speech production of a language that has a morphophonemic alphabetic orthography. The production of the English flapping rule by 15 adult native speakers of…
A growing population of English Language Learners (ELLs) is U.S.-born, a phenomenon that raises new questions for Teaching English to Speakers of Other Languages (TESOL). This paper proposes the need for an improved understanding of how the research literature on heritage language speakers can inform ESL education. (Contains 1 figure and 5 notes.)
The present study examines sociolinguistic features of a particular speech act, paying compliments, by comparing and contrasting native Chinese and native American speakers' performances. By focusing on a relatively understudied speaker group such as the Chinese, typically regarded as having rules of speaking and social norms very different from…
Montrul, Silvina; Foote, Rebecca; Perpinan, Silvia
This study investigates knowledge of gender agreement in Spanish L2 learners and heritage speakers, who differ in age and context/mode of acquisition. On some current theoretical accounts, persistent difficulty with grammatical gender in adult L2 acquisition is due to age. These accounts predict that heritage speakers should be more accurate on…
Reports the findings of a study in which transfer of verb properties was investigated via syntactic data elicited from second language learners. The performance of Hindi-Urdu speakers on tests of English causatives was compared with that of Vietnamese speakers, because there are five significant differences between causativization patterns in…
Heritage language (HL) speakers have received scholarly attention in recent years as an interdisciplinary research theme, but relatively less attention has been paid to their demographics. Existing studies of HL speakers' demographics often focus on young children in areas of high immigrant concentration (i.e., California, Florida, and New York);…
Hayes-Harb, Rachel; Watzinger-Tharp, Johanna
We explore the relationship between accentedness and intelligibility, and investigate how listeners' beliefs about nonnative speech interact with their accentedness and intelligibility judgments. Native German speakers and native English learners of German produced German sentences, which were presented to 12 native German speakers in accentedness…
Usage-based theories of language learning suggest that native speakers of a language are acutely aware of formulaic language due in large part to frequency effects. Corpora and data-driven learning can offer useful insights into frequent patterns of naturally occurring language to second/foreign language learners who, unlike native speakers, are…
The study investigated the acquisition of Modern Standard Arabic (MSA) by second language (L2) learners and by heritage speakers of the colloquial varieties of Arabic. The study focused on three questions: (1) whether heritage speakers who enroll in college-level elementary MSA classes have an advantage over their L2 counterparts, (2) whether any…
Stewart, Andrew J.; Haigh, Matthew; Ferguson, Heather J.
Statements of the form if… then… can be used to communicate conditional speech acts such as tips and promises. Conditional promises require the speaker to have perceived control over the outcome event, whereas conditional tips do not. In an eye-tracking study, we examined whether readers are sensitive to information about perceived speaker control…
Singleton, Jenny L.; Morgan, Dianne; DiGello, Elizabeth; Wiles, Jill; Rivers, Rachel
The written English vocabulary of 72 deaf elementary school students of various proficiency levels in American Sign Language (ASL) was compared with the performance of 60 hearing English-as-a-second-language (ESL) speakers and 61 hearing monolingual speakers of English, all of similar age. Students were asked to retell "The Tortoise and the Hare"…
This dissertation explores the second language acquisition of Mandarin Chinese tones by speakers of non-tonal languages within the framework of Optimality Theory. The effects of three L1s are analyzed: American English, a stress-accent language; Tokyo Japanese, a lexical pitch accent language; and Seoul Korean, a non-stress and non-pitch accent…
Consistent with the notion of learning as changing participation (Lave & Wenger, 1991; Rogoff, 1998; Sfard, 1998; Young & Miller, 2004), the present qualitative study investigated how social interaction between learners of Japanese as a foreign language and native speaker classroom guests contributed to the students' use of second language…
Riebe, L.; Sibson, R.; Roepen, D.; Meakins, K.
This study provides insights into the perceptions and expectations of Australian undergraduate business students (n=150) regarding the incorporation of guest speakers into the curriculum of a leadership unit focused on employability skills development. The authors adopted a mixed methods approach. A survey was conducted, with quantitative results…
Duffield, Nigel G.; Matsuo, Ayumi
This article examines sensitivity to structural parallelism in verb phrase ellipsis constructions in English native speakers as well as in three groups of advanced second language (L2) learners. The results of a set of experiments, based on those of Tanenhaus and Carlson (1990), reveal subtle but reliable differences among the various learner…
This study investigated the acquisition of the Spanish clitic se by English native speakers in passive, middle, and impersonal constructions. Little research has been done on this topic in SLA within a UG framework (Bayona, 2005; Bruhn de Garavito, 1999). VanPatten (2004) proposed the Processing Instruction (PI) model arguing for the necessity of…
Potowski, Kim; Jegerski, Jill; Morgan-Short, Kara
The current study compared the effects of two second language (L2) instruction types--processing instruction (VanPatten, 2004) and traditional output-based instruction--on the development of the Spanish past subjunctive among U.S. Spanish heritage language speakers and traditional L2 learners. After exposure to instruction, both the heritage…
Sprinkle, Evelyn C.; Miguel, Caio F.
The current study assessed the use of standard conditional discrimination (i.e., listener) and textual/tact (i.e., speaker) training in the establishment of equivalence classes containing dictated names, tacts/textual responses, pictures and printed words. Four children (ages 5 to 7 years) diagnosed with autism were taught to select pictures and…
Kraat, Arlene W.
The report integrates findings of an international state-of-the-art study of augmentative communication, focusing on the interaction between a person using a communication aid and an able-bodied, natural speaker. Chapter titles and selected subtopics are as follows: (1) "Beyond Symbols and Switches: The Study of Communication Aid Use";…
Morrison, Geoffrey Stewart; Sahito, Farhan Hyder; Jardine, Gaëlle; Djokic, Djordje; Clavet, Sophie; Berghs, Sabine; Goemans Dorny, Caroline
A survey was conducted of the use of speaker identification by law enforcement agencies around the world. A questionnaire was circulated to law enforcement agencies in the 190 member countries of INTERPOL. 91 responses were received from 69 countries. 44 respondents reported that they had speaker identification capabilities in house or via external laboratories. Half of these came from Europe. 28 respondents reported that they had databases of audio recordings of speakers. The clearest pattern in the responses was that of diversity. A variety of different approaches to speaker identification were used: The human-supervised-automatic approach was the most popular in North America, the auditory-acoustic-phonetic approach was the most popular in Europe, and the spectrographic/auditory-spectrographic approach was the most popular in Africa, Asia, the Middle East, and South and Central America. Globally, and in Europe, the most popular framework for reporting conclusions was identification/exclusion/inconclusive. In Europe, the second most popular framework was the use of verbal likelihood ratio scales.
Cummings, Melbourne S.
Bishop Henry McNeal Turner, a journalist and speaker, headed a back-to-Africa movement in the second half of the nineteenth century that was one of the first black rhetorical movements to meet the challenges of institutionalized racism in the United States. Turner was a preacher in the African Methodist Episcopal Church, becoming first an elder…
Tesink, Cathelijne M. J. Y.; Petersson, Karl Magnus; van Berkum, Jos J. A.; van den Brink, Danielle; Buitelaar, Jan K.; Hagoort, Peter
When interpreting a message, a listener takes into account several sources of linguistic and extralinguistic information. Here we focused on one particular form of extralinguistic information, certain speaker characteristics as conveyed by the voice. Using functional magnetic resonance imaging, we examined the neural structures involved in the…
Penrose, John M.
Students who are nonnative speakers of English are both a major component of today's diverse student population and also a special constituency in business communication classrooms. They may be foreign students or resident students who have primary languages other than English. Business communication instructors face major challenges conducting…
Boutte, Gloria Swindler; Johnson, George L., Jr.
This article focuses on the development and experiences of two African American Language speakers who are on the precipice of biliteracy and bilingualism. Using a composite counterstory that integrates samples of the girls' language during daily routines as a critical race theoretical analytic tool, we examine their language virtuosity as…
Albirini, Abdulkafi; Benmamoun, Elabbas; Chakrani, Brahim
Heritage language acquisition has been characterized by various asymmetries, including the differential acquisition rates of various linguistic areas and the unbalanced acquisition of different categories within a single area. This paper examines Arabic heritage speakers' knowledge of subject-verb agreement versus noun-adjective agreement with the…
This dissertation examines the acquisition of object clitic placement in Standard Italian by heritage speakers (HSs) of non-standard Italian dialects. It compares two different groups of Standard Italian learners--Northern Italian dialect HSs and Southern Italian dialect HSs--whose heritage dialects contrast with each other in clitic word order.…
Duran, Richard P.
Recent cognitive research concerned with training of word recognition skills and vocabulary skills in English monolinguals has implications for second language learning theory and the teaching of English reading skills to native Spanish speakers. Researchers in reading development, cognitive psychology, and second language proficiency assessment…
The linguistic relativity hypothesis proposes that speakers of different languages perceive and conceptualize the world differently, but do their brains reflect these differences? In English, most nouns do not provide linguistic clues to their categories, whereas most Mandarin Chinese nouns provide explicit category information, either…
Kan, Pui Fong; Sadagopan, Neeraja; Janich, Lauren; Andrade, Marixa
Purpose: This study examines the effects of the levels of speech practice on fast mapping in monolingual and bilingual speakers. Method: Participants were 30 English-speaking monolingual and 30 Spanish-English bilingual young adults. Each participant was randomly assigned to 1 of 3 practice conditions prior to the fast-mapping task: (a) intensive…
Pittman, Ramona T.; Joshi, R. Malatesha; Carreker, Suzanne
The purpose of this eight week study was to provide explicit instruction to improve spelling to 124 sixth grade students who are speakers of African American English (AAE). Two classroom teachers taught 14 different language arts class sections. The research design was a pretest/posttest/posttest design using wait-list-control. The treatment group…
Diez-Bedmar, Maria Belen; Perez-Paredes, Pascual
Online collaborative writing tasks are frequently undertaken in forums and wikis. Variation between these two communication modes has yet to be examined, particularly type of feedback and its effects. We investigated the type of feedback and the impact of English native-speakers' feedback on Spanish peers' discourse restructuring in the context of…
So, Connie K.; Attina, Virginie
This study examined the effect of native language background on listeners' perception of native and non-native vowels spoken by native (Hong Kong Cantonese) and non-native (Mandarin and Australian English) speakers. They completed discrimination and an identification task with and without visual cues in clear and noisy conditions. Results…
Chrabaszcz, Anna; Winn, Matthew; Lin, Candise Y.; Idsardi, William J.
Purpose: This study investigated how listeners' native language affects their weighting of acoustic cues (such as vowel quality, pitch, duration, and intensity) in the perception of contrastive word stress. Method: Native speakers (N = 45) of typologically diverse languages (English, Russian, and Mandarin) performed a stress identification…
Nasals are cross-linguistically susceptible to change, especially in the syllable final position. Acoustic reports on Mandarin nasal production have recently shown that the syllable-final distinction is frequently dropped. Few studies, however, have addressed the issue of perceptual processing in Mandarin nasals for L1 and L2 speakers of Mandarin…
Hanna, Joy E.; Brennan, Susan E.
In two experiments, we explored the time course and flexibility with which speakers' eye gaze can be used to disambiguate referring expressions in spontaneous dialog. Naive director/matcher pairs were separated by a barrier and saw each other's faces but not their displays. Displays held identical objects, with the matcher's arranged in a row and…
Fukumura, Kumiko; van Gompel, Roger P. G.
We report two experiments that investigated the widely held assumption that speakers use the addressee's discourse model when choosing referring expressions (e.g., Ariel, 1990; Chafe, 1994; Givon, 1983; Prince, 1985), by manipulating whether the addressee could hear the immediately preceding linguistic context. Experiment 1 showed that speakers…
Given the rising prominence of nonstandard varieties of English around the world (Jenkins 2007), learners of English as a second language are increasingly called on to communicate with speakers of both native and non-native nonstandard English varieties. In many classrooms around the world, however, learners continue to be exposed only to…
Tocaimaza-Hatch, C. Cecilia
This project investigated vocabulary learning from a sociocultural perspective--in particular, the way in which lexical knowledge was mediated in Spanish second language (L2) learners' and native speakers' (NSs') interactions. Nine students who were enrolled in an advanced conversation course completed an oral portfolio assignment consisting of…
Roberts, James B.; Sawyer, Chris R.; Behnke, Ralph R.
Recent studies on the anxiety patterns of public speakers have generally supported perspectives on emotion from the field of neurobiology. Without relying on highly invasive or cumbersome technology, much of the biology of speech anxiety has been derived from heart rate studies of physiological arousal rather than examining more direct evidence…
Hazan, Valerie; Simpson, Andrew
Extended findings of a study that found that increasing the salience of perceptually important regions of nonsense word and sentence materials aids speech perception in noise by investigating the robustness of these enhancement techniques in improving consonant intelligibility for a range of different speakers and for groups of listeners with…
Horton, Philip B.
Describes a student laboratory project involving the design of an "acoustic suspension speaker system. The characteristics of the loudspeaker used are measured as an extension of the inertia-balance experiment. The experiment may be extended to a study of Stelmholtz resonators, coupled oscillators, electromagnetic forces, thermodynamics and…
On June 15-19, 2015, the Speakers Bureau hosted EPA’s 5th Annual Science of Climate Change Workshop in Research Triangle Park, bringing in a group of high-school students eager to learn about the science behind taking action on climate change.
Zhang, Jie; Anderson, Richard C.; Wang, Qiuying; Packard, Jerome; Wu, Xinchun; Tang, Shan; Ke, Xiaoling
Knowledge of compound word structures in Chinese and English was investigated, comparing 435 Chinese and 258 Americans, including second, fourth, and sixth graders, and college undergraduates. As anticipated, the results revealed that Chinese speakers performed better on a word structure analogy task than their English-speaking counterparts. Also,…
Zhang, Jianliang; Kalinowski, Joseph; Saltuklaroglu, Tim; Hudock, Daniel
Background: Previous studies have found simultaneous increases in skin conductance response and decreases in heart rate when normally fluent speakers watched and listened to stuttered speech compared with fluent speech, suggesting that stuttering induces arousal and emotional unpleasantness in listeners. However, physiological responses of persons…
This variational pragmatics (VP) study investigates the similarities and differences of compliment responses (CR) between Thai and Punjabi speakers of English in Thailand, focusing on the strategies used in CR when the microsociolinguistic variables are integrated into the Discourse Completion Task (DCT). The participants were 20 Thai and 20…
Xu, Jing; Bull, Susan
Despite holding advanced language qualifications, many overseas students studying at English-speaking universities still have difficulties in formulating grammatically correct sentences. This article introduces an "independent open learner model" for advanced second language speakers of English, which confronts students with the state of their…
Chamorro, Gloria; Sturt, Patrick; Sorace, Antonella
Previous research has shown L1 attrition to be restricted to structures at the interfaces between syntax and pragmatics, but not to occur with syntactic properties that do not involve such interfaces ("Interface Hypothesis", Sorace and Filiaci in "Anaphora resolution in near-native speakers of Italian." "Second Lang…
Omar, Alwiya S.
A study investigated the production of conventional conversational openings by five advanced learners of Kiswahili with experience in the Kiswahili speaking environment. Native speakers of Kiswahili usually engage in lengthy openings including several phatic inquiries (PIs) and phatic responses (PRs). The number and manner in which the PIs and PRs…
Soley, Gaye; Sebastián-Gallés, Núria
Infants show attentional biases for certain individuals over others based on various cues. However, the role of these biases in shaping infants' preferences and learning is not clear. This study asked whether infants' preference for native speakers (Kinzler, Dupoux, & Spelke, 2007) would modulate their preferences for tunes. After getting…
Xuan, Pham Thi Thanh
Few studies have focused on the identity formation of non-native English speaking teachers (NNESTs) as legitimate speakers and teachers of English. Drawing on Norton's (2000) poststructuralist theory of identity as a process of struggling and changing, this study examined whether and how Asian international students studying for a Masters in…
Carminati, Maria Nella; Knoeferle, Pia
We report two visual-world eye-tracking experiments that investigated how and with which time course emotional information from a speaker's face affects younger (N = 32, Mean age = 23) and older (N = 32, Mean age = 64) listeners' visual attention and language comprehension as they processed emotional sentences in a visual context. The age manipulation tested predictions by socio-emotional selectivity theory of a positivity effect in older adults. After viewing the emotional face of a speaker (happy or sad) on a computer display, participants were presented simultaneously with two pictures depicting opposite-valence events (positive and negative; IAPS database) while they listened to a sentence referring to one of the events. Participants' eye fixations on the pictures while processing the sentence were increased when the speaker's face was (vs. wasn't) emotionally congruent with the sentence. The enhancement occurred from the early stages of referential disambiguation and was modulated by age. For the older adults it was more pronounced with positive faces, and for the younger ones with negative faces. These findings demonstrate for the first time that emotional facial expressions, similarly to previously-studied speaker cues such as eye gaze and gestures, are rapidly integrated into sentence processing. They also provide new evidence for positivity effects in older adults during situated sentence processing.
Andrews, Edna; And Others
Two surveys conducted in the Soviet Union are reported that demonstrate the complicated interrelationship between linguistic form and meaning. They support Jakobson and Gorbacevic on gender signalling, particularly when the speaker is not certain of the noun in question. (Contains 44 references.) (LB)
Churchman, Edith C.
Special consideration should be given to curriculum development in basic-speech-communication classrooms which have non-native speakers of English as students. Fluency, student grouping, background diversity, and degrees of freedom of speech all affect the ability and achievement of non-native English-speaking students in such classrooms. A hybrid…
Chinese characters are used in both Chinese and Japanese writing systems. When literate speakers of either language experience problems in finding or understanding words, they often resort to using Chinese characters or "kanji" (i.e., Chinese characters used in Japanese writing) in their talk, a practice known as "brush talk" ("bitan" in Chinese,…
Kuhnert, Barbara; Hoole, Phil
A simultaneous EPG/EMA study of tongue gestures of five speakers was conducted to investigate the kinematic events accompanying alveolar stop reductions in the context of a velar plosive /k/ and in the context of a laryngeal fricative /h/ in two languages, English and German. No systematic language differences could be detected. Alveolar…
Terrell, Anne; And Others
This guide for pronunciation practice is intended primarily for adult speakers of Cantonese who need to speak English in order to understand and be understood in a job outside the Cantonese-speaking community. It is not designed for the advanced student. This document provides a contrastive analysis between the sound systems of English and…
There is an ongoing debate in the area of teaching English to speakers of other languages (TESOL) about what should constitute the knowledge base of language teachers. This article offers an analysis of the major opposing views in the debate and suggests an alternative critical approach to language teacher knowledge. While recognising various…
Andrade, Maureen S.
The cultural, ethnic, and linguistic diversity of today's tertiary students necessitates exploring a variety of approaches to support learning. This study reports on a university's efforts to understand the teaching and learning of its diverse students--international students who are nonnative English speakers (NNESs). Faculty were surveyed to…
Nagano, Tomonori; Fernandez, Hector
This article describes the development process of a project for heritage language speakers of Mandarin Chinese, Spanish, and Japanese at a high-enrollment community college in the northeast United States. This pilot project, funded by the Henry Luce Foundation, aimed to empower minority group students through active reinforcement of students'…
Heritage speakers (HSs) of Russian in the United States form a very complex and diverse group of learners. Research in heritage linguistics has examined key parameters of the HSs' oral production. Important work has been done in heritage language (HL) pragmatics, morphology, and lexicon. However, very few studies have been conducted to…
DeFeo, Dayna Jean
This article presents a case study of the perceptions of Spanish heritage speakers enrolled in introductory-level Spanish foreign language courses. Despite their own identities that were linked to the United States and Spanish of the Borderlands, the participants felt that the curriculum acknowledged the Spanish of Spain and foreign countries but…
Casillas, Joseph V.; Simonet, Miquel
This study investigates how fluent second-language (L2) learners of English produce and perceive the /ae/-/?/ vowel contrast of Southwestern American English. Two learner groups are examined: (1) early, proficient English speakers who were raised by Spanish-speaking families but who became dominant in English during childhood and, as adults, lack…
Hennebry, Mairin; Lo, Yuen Yi; Macaro, Ernesto
We report a small-scale study investigating the perceptions of postgraduate students who are non-native speakers of English and those of academic staff with regard to those students. Previous research has focused only on the former and identified a number of linguistic and cultural challenges these students face in adapting to Anglophone…
Cabo, Diego Pascual Y.; Rothman, Jason
This Forum challenges and problematizes the term "incomplete acquisition," which has been widely used to describe the state of competence of heritage speaker (HS) bilinguals for well over a decade (see, e.g., Montrul, 2008). It is suggested and defended that HS competence, while often different from monolingual peers, is in fact not incomplete…