Sample records for dependent speaker identification

  1. Advancements in robust algorithm formulation for speaker identification of whispered speech

    NASA Astrophysics Data System (ADS)

    Fan, Xing

    Whispered speech is an alternative speech production mode from neutral speech, which is used by talkers intentionally in natural conversational scenarios to protect privacy and to avoid certain content from being overheard/made public. Due to the profound differences between whispered and neutral speech in production mechanism and the absence of whispered adaptation data, the performance of speaker identification systems trained with neutral speech degrades significantly. This dissertation therefore focuses on developing a robust closed-set speaker recognition system for whispered speech by using no or limited whispered adaptation data from non-target speakers. This dissertation proposes the concept of "High''/"Low'' performance whispered data for the purpose of speaker identification. A variety of acoustic properties are identified that contribute to the quality of whispered data. An acoustic analysis is also conducted to compare the phoneme/speaker dependency of the differences between whispered and neutral data in the feature domain. The observations from those acoustic analysis are new in this area and also serve as a guidance for developing robust speaker identification systems for whispered speech. This dissertation further proposes two systems for speaker identification of whispered speech. One system focuses on front-end processing. A two-dimensional feature space is proposed to search for "Low''-quality performance based whispered utterances and separate feature mapping functions are applied to vowels and consonants respectively in order to retain the speaker's information shared between whispered and neutral speech. The other system focuses on speech-mode-independent model training. The proposed method generates pseudo whispered features from neutral features by using the statistical information contained in a whispered Universal Background model (UBM) trained from extra collected whispered data from non-target speakers. Four modeling methods are proposed for the transformation estimation in order to generate the pseudo whispered features. Both of the above two systems demonstrate a significant improvement over the baseline system on the evaluation data. This dissertation has therefore contributed to providing a scientific understanding of the differences between whispered and neutral speech as well as improved front-end processing and modeling method for speaker identification of whispered speech. Such advancements will ultimately contribute to improve the robustness of speech processing systems.

  2. Recognition of speaker-dependent continuous speech with KEAL

    NASA Astrophysics Data System (ADS)

    Mercier, G.; Bigorgne, D.; Miclet, L.; Le Guennec, L.; Querre, M.

    1989-04-01

    A description of the speaker-dependent continuous speech recognition system KEAL is given. An unknown utterance, is recognized by means of the followng procedures: acoustic analysis, phonetic segmentation and identification, word and sentence analysis. The combination of feature-based, speaker-independent coarse phonetic segmentation with speaker-dependent statistical classification techniques is one of the main design features of the acoustic-phonetic decoder. The lexical access component is essentially based on a statistical dynamic programming technique which aims at matching a phonemic lexical entry containing various phonological forms, against a phonetic lattice. Sentence recognition is achieved by use of a context-free grammar and a parsing algorithm derived from Earley's parser. A speaker adaptation module allows some of the system parameters to be adjusted by matching known utterances with their acoustical representation. The task to be performed, described by its vocabulary and its grammar, is given as a parameter of the system. Continuously spoken sentences extracted from a 'pseudo-Logo' language are analyzed and results are presented.

  3. Performance enhancement for audio-visual speaker identification using dynamic facial muscle model.

    PubMed

    Asadpour, Vahid; Towhidkhah, Farzad; Homayounpour, Mohammad Mehdi

    2006-10-01

    Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.

  4. Cost-sensitive learning for emotion robust speaker recognition.

    PubMed

    Li, Dongdong; Yang, Yingchun; Dai, Weihui

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved.

  5. Cost-Sensitive Learning for Emotion Robust Speaker Recognition

    PubMed Central

    Li, Dongdong; Yang, Yingchun

    2014-01-01

    In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an unacceptably high error rate and aggravate the performance of speaker recognition system. This paper deals with this problem by introducing a cost-sensitive learning technology to reweight the probability of test affective utterances in the pitch envelop level, which can enhance the robustness in emotion-dependent speaker recognition effectively. Based on that technology, a new architecture of recognition system as well as its components is proposed in this paper. The experiment conducted on the Mandarin Affective Speech Corpus shows that an improvement of 8% identification rate over the traditional speaker recognition is achieved. PMID:24999492

  6. Unsupervised real-time speaker identification for daily movies

    NASA Astrophysics Data System (ADS)

    Li, Ying; Kuo, C.-C. Jay

    2002-07-01

    The problem of identifying speakers for movie content analysis is addressed in this paper. While most previous work on speaker identification was carried out in a supervised mode using pure audio data, more robust results can be obtained in real-time by integrating knowledge from multiple media sources in an unsupervised mode. In this work, both audio and visual cues will be employed and subsequently combined in a probabilistic framework to identify speakers. Particularly, audio information is used to identify speakers with a maximum likelihood (ML)-based approach while visual information is adopted to distinguish speakers by detecting and recognizing their talking faces based on face detection/recognition and mouth tracking techniques. Moreover, to accommodate for speakers' acoustic variations along time, we update their models on the fly by adapting to their newly contributed speech data. Encouraging results have been achieved through extensive experiments, which shows a promising future of the proposed audiovisual-based unsupervised speaker identification system.

  7. Open-set speaker identification with diverse-duration speech data

    NASA Astrophysics Data System (ADS)

    Karadaghi, Rawande; Hertlein, Heinz; Ariyaeeinia, Aladdin

    2015-05-01

    The concern in this paper is an important category of applications of open-set speaker identification in criminal investigation, which involves operating with short and varied duration speech. The study presents investigations into the adverse effects of such an operating condition on the accuracy of open-set speaker identification, based on both GMMUBM and i-vector approaches. The experiments are conducted using a protocol developed for the identification task, based on the NIST speaker recognition evaluation corpus of 2008. In order to closely cover the real-world operating conditions in the considered application area, the study includes experiments with various combinations of training and testing data duration. The paper details the characteristics of the experimental investigations conducted and provides a thorough analysis of the results obtained.

  8. Speaker Recognition by Combining MFCC and Phase Information in Noisy Conditions

    NASA Astrophysics Data System (ADS)

    Wang, Longbiao; Minami, Kazue; Yamamoto, Kazumasa; Nakagawa, Seiichi

    In this paper, we investigate the effectiveness of phase for speaker recognition in noisy conditions and combine the phase information with mel-frequency cepstral coefficients (MFCCs). To date, almost speaker recognition methods are based on MFCCs even in noisy conditions. For MFCCs which dominantly capture vocal tract information, only the magnitude of the Fourier Transform of time-domain speech frames is used and phase information has been ignored. High complement of the phase information and MFCCs is expected because the phase information includes rich voice source information. Furthermore, some researches have reported that phase based feature was robust to noise. In our previous study, a phase information extraction method that normalizes the change variation in the phase depending on the clipping position of the input speech was proposed, and the performance of the combination of the phase information and MFCCs was remarkably better than that of MFCCs. In this paper, we evaluate the robustness of the proposed phase information for speaker identification in noisy conditions. Spectral subtraction, a method skipping frames with low energy/Signal-to-Noise (SN) and noisy speech training models are used to analyze the effect of the phase information and MFCCs in noisy conditions. The NTT database and the JNAS (Japanese Newspaper Article Sentences) database added with stationary/non-stationary noise were used to evaluate our proposed method. MFCCs outperformed the phase information for clean speech. On the other hand, the degradation of the phase information was significantly smaller than that of MFCCs for noisy speech. The individual result of the phase information was even better than that of MFCCs in many cases by clean speech training models. By deleting unreliable frames (frames having low energy/SN), the speaker identification performance was improved significantly. By integrating the phase information with MFCCs, the speaker identification error reduction rate was about 30%-60% compared with the standard MFCC-based method.

  9. Speaker identification for the improvement of the security communication between law enforcement units

    NASA Astrophysics Data System (ADS)

    Tovarek, Jaromir; Partila, Pavol

    2017-05-01

    This article discusses the speaker identification for the improvement of the security communication between law enforcement units. The main task of this research was to develop the text-independent speaker identification system which can be used for real-time recognition. This system is designed for identification in the open set. It means that the unknown speaker can be anyone. Communication itself is secured, but we have to check the authorization of the communication parties. We have to decide if the unknown speaker is the authorized for the given action. The calls are recorded by IP telephony server and then these recordings are evaluate using classification If the system evaluates that the speaker is not authorized, it sends a warning message to the administrator. This message can detect, for example a stolen phone or other unusual situation. The administrator then performs the appropriate actions. Our novel proposal system uses multilayer neural network for classification and it consists of three layers (input layer, hidden layer, and output layer). A number of neurons in input layer corresponds with the length of speech features. Output layer then represents classified speakers. Artificial Neural Network classifies speech signal frame by frame, but the final decision is done over the complete record. This rule substantially increases accuracy of the classification. Input data for the neural network are a thirteen Mel-frequency cepstral coefficients, which describe the behavior of the vocal tract. These parameters are the most used for speaker recognition. Parameters for training, testing and validation were extracted from recordings of authorized users. Recording conditions for training data correspond with the real traffic of the system (sampling frequency, bit rate). The main benefit of the research is the system developed for text-independent speaker identification which is applied to secure communication between law enforcement units.

  10. A language-familiarity effect for speaker discrimination without comprehension.

    PubMed

    Fleming, David; Giordano, Bruno L; Caldara, Roberto; Belin, Pascal

    2014-09-23

    The influence of language familiarity upon speaker identification is well established, to such an extent that it has been argued that "Human voice recognition depends on language ability" [Perrachione TK, Del Tufo SN, Gabrieli JDE (2011) Science 333(6042):595]. However, 7-mo-old infants discriminate speakers of their mother tongue better than they do foreign speakers [Johnson EK, Westrek E, Nazzi T, Cutler A (2011) Dev Sci 14(5):1002-1011] despite their limited speech comprehension abilities, suggesting that speaker discrimination may rely on familiarity with the sound structure of one's native language rather than the ability to comprehend speech. To test this hypothesis, we asked Chinese and English adult participants to rate speaker dissimilarity in pairs of sentences in English or Mandarin that were first time-reversed to render them unintelligible. Even in these conditions a language-familiarity effect was observed: Both Chinese and English listeners rated pairs of native-language speakers as more dissimilar than foreign-language speakers, despite their inability to understand the material. Our data indicate that the language familiarity effect is not based on comprehension but rather on familiarity with the phonology of one's native language. This effect may stem from a mechanism analogous to the "other-race" effect in face recognition.

  11. Optimization of multilayer neural network parameters for speaker recognition

    NASA Astrophysics Data System (ADS)

    Tovarek, Jaromir; Partila, Pavol; Rozhon, Jan; Voznak, Miroslav; Skapa, Jan; Uhrin, Dominik; Chmelikova, Zdenka

    2016-05-01

    This article discusses the impact of multilayer neural network parameters for speaker identification. The main task of speaker identification is to find a specific person in the known set of speakers. It means that the voice of an unknown speaker (wanted person) belongs to a group of reference speakers from the voice database. One of the requests was to develop the text-independent system, which means to classify wanted person regardless of content and language. Multilayer neural network has been used for speaker identification in this research. Artificial neural network (ANN) needs to set parameters like activation function of neurons, steepness of activation functions, learning rate, the maximum number of iterations and a number of neurons in the hidden and output layers. ANN accuracy and validation time are directly influenced by the parameter settings. Different roles require different settings. Identification accuracy and ANN validation time were evaluated with the same input data but different parameter settings. The goal was to find parameters for the neural network with the highest precision and shortest validation time. Input data of neural networks are a Mel-frequency cepstral coefficients (MFCC). These parameters describe the properties of the vocal tract. Audio samples were recorded for all speakers in a laboratory environment. Training, testing and validation data set were split into 70, 15 and 15 %. The result of the research described in this article is different parameter setting for the multilayer neural network for four speakers.

  12. Evaluation of speaker de-identification based on voice gender and age conversion

    NASA Astrophysics Data System (ADS)

    Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

    2018-03-01

    Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.

  13. INTERPOL survey of the use of speaker identification by law enforcement agencies.

    PubMed

    Morrison, Geoffrey Stewart; Sahito, Farhan Hyder; Jardine, Gaëlle; Djokic, Djordje; Clavet, Sophie; Berghs, Sabine; Goemans Dorny, Caroline

    2016-06-01

    A survey was conducted of the use of speaker identification by law enforcement agencies around the world. A questionnaire was circulated to law enforcement agencies in the 190 member countries of INTERPOL. 91 responses were received from 69 countries. 44 respondents reported that they had speaker identification capabilities in house or via external laboratories. Half of these came from Europe. 28 respondents reported that they had databases of audio recordings of speakers. The clearest pattern in the responses was that of diversity. A variety of different approaches to speaker identification were used: The human-supervised-automatic approach was the most popular in North America, the auditory-acoustic-phonetic approach was the most popular in Europe, and the spectrographic/auditory-spectrographic approach was the most popular in Africa, Asia, the Middle East, and South and Central America. Globally, and in Europe, the most popular framework for reporting conclusions was identification/exclusion/inconclusive. In Europe, the second most popular framework was the use of verbal likelihood ratio scales. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Discriminative analysis of lip motion features for speaker identification and speech-reading.

    PubMed

    Cetingül, H Ertan; Yemez, Yücel; Erzin, Engin; Tekalp, A Murat

    2006-10-01

    There have been several studies that jointly use audio, lip intensity, and lip geometry information for speaker identification and speech-reading applications. This paper proposes using explicit lip motion information, instead of or in addition to lip intensity and/or geometry information, for speaker identification and speech-reading within a unified feature selection and discrimination analysis framework, and addresses two important issues: 1) Is using explicit lip motion information useful, and, 2) if so, what are the best lip motion features for these two applications? The best lip motion features for speaker identification are considered to be those that result in the highest discrimination of individual speakers in a population, whereas for speech-reading, the best features are those providing the highest phoneme/word/phrase recognition rate. Several lip motion feature candidates have been considered including dense motion features within a bounding box about the lip, lip contour motion features, and combination of these with lip shape features. Furthermore, a novel two-stage, spatial, and temporal discrimination analysis is introduced to select the best lip motion features for speaker identification and speech-reading applications. Experimental results using an hidden-Markov-model-based recognition system indicate that using explicit lip motion information provides additional performance gains in both applications, and lip motion features prove more valuable in the case of speech-reading application.

  15. Improving Speaker Recognition by Biometric Voice Deconstruction

    PubMed Central

    Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

    2015-01-01

    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions. PMID:26442245

  16. Improving Speaker Recognition by Biometric Voice Deconstruction.

    PubMed

    Mazaira-Fernandez, Luis Miguel; Álvarez-Marquina, Agustín; Gómez-Vilda, Pedro

    2015-01-01

    Person identification, especially in critical environments, has always been a subject of great interest. However, it has gained a new dimension in a world threatened by a new kind of terrorism that uses social networks (e.g., YouTube) to broadcast its message. In this new scenario, classical identification methods (such as fingerprints or face recognition) have been forcedly replaced by alternative biometric characteristics such as voice, as sometimes this is the only feature available. The present study benefits from the advances achieved during last years in understanding and modeling voice production. The paper hypothesizes that a gender-dependent characterization of speakers combined with the use of a set of features derived from the components, resulting from the deconstruction of the voice into its glottal source and vocal tract estimates, will enhance recognition rates when compared to classical approaches. A general description about the main hypothesis and the methodology followed to extract the gender-dependent extended biometric parameters is given. Experimental validation is carried out both on a highly controlled acoustic condition database, and on a mobile phone network recorded under non-controlled acoustic conditions.

  17. Analysis of human scream and its impact on text-independent speaker verification.

    PubMed

    Hansen, John H L; Nandwana, Mahesh Kumar; Shokouhi, Navid

    2017-04-01

    Scream is defined as sustained, high-energy vocalizations that lack phonological structure. Lack of phonological structure is how scream is identified from other forms of loud vocalization, such as "yell." This study investigates the acoustic aspects of screams and addresses those that are known to prevent standard speaker identification systems from recognizing the identity of screaming speakers. It is well established that speaker variability due to changes in vocal effort and Lombard effect contribute to degraded performance in automatic speech systems (i.e., speech recognition, speaker identification, diarization, etc.). However, previous research in the general area of speaker variability has concentrated on human speech production, whereas less is known about non-speech vocalizations. The UT-NonSpeech corpus is developed here to investigate speaker verification from scream samples. This study considers a detailed analysis in terms of fundamental frequency, spectral peak shift, frame energy distribution, and spectral tilt. It is shown that traditional speaker recognition based on the Gaussian mixture models-universal background model framework is unreliable when evaluated with screams.

  18. Noise Reduction with Microphone Arrays for Speaker Identification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cohen, Z

    Reducing acoustic noise in audio recordings is an ongoing problem that plagues many applications. This noise is hard to reduce because of interfering sources and non-stationary behavior of the overall background noise. Many single channel noise reduction algorithms exist but are limited in that the more the noise is reduced; the more the signal of interest is distorted due to the fact that the signal and noise overlap in frequency. Specifically acoustic background noise causes problems in the area of speaker identification. Recording a speaker in the presence of acoustic noise ultimately limits the performance and confidence of speaker identificationmore » algorithms. In situations where it is impossible to control the environment where the speech sample is taken, noise reduction filtering algorithms need to be developed to clean the recorded speech of background noise. Because single channel noise reduction algorithms would distort the speech signal, the overall challenge of this project was to see if spatial information provided by microphone arrays could be exploited to aid in speaker identification. The goals are: (1) Test the feasibility of using microphone arrays to reduce background noise in speech recordings; (2) Characterize and compare different multichannel noise reduction algorithms; (3) Provide recommendations for using these multichannel algorithms; and (4) Ultimately answer the question - Can the use of microphone arrays aid in speaker identification?« less

  19. Using Avatars for Improving Speaker Identification in Captioning

    NASA Astrophysics Data System (ADS)

    Vy, Quoc V.; Fels, Deborah I.

    Captioning is the main method for accessing television and film content by people who are deaf or hard-of-hearing. One major difficulty consistently identified by the community is that of knowing who is speaking particularly for an off screen narrator. A captioning system was created using a participatory design method to improve speaker identification. The final prototype contained avatars and a coloured border for identifying specific speakers. Evaluation results were very positive; however participants also wanted to customize various components such as caption and avatar location.

  20. The impact of compression of speech signal, background noise and acoustic disturbances on the effectiveness of speaker identification

    NASA Astrophysics Data System (ADS)

    Kamiński, K.; Dobrowolski, A. P.

    2017-04-01

    The paper presents the architecture and the results of optimization of selected elements of the Automatic Speaker Recognition (ASR) system that uses Gaussian Mixture Models (GMM) in the classification process. Optimization was performed on the process of selection of individual characteristics using the genetic algorithm and the parameters of Gaussian distributions used to describe individual voices. The system that was developed was tested in order to evaluate the impact of different compression methods used, among others, in landline, mobile, and VoIP telephony systems, on effectiveness of the speaker identification. Also, the results were presented of effectiveness of speaker identification at specific levels of noise with the speech signal and occurrence of other disturbances that could appear during phone calls, which made it possible to specify the spectrum of applications of the presented ASR system.

  1. Intonation contrast in Cantonese speakers with hypokinetic dysarthria associated with Parkinson's disease.

    PubMed

    Ma, Joan K-Y; Whitehill, Tara L; So, Susanne Y-S

    2010-08-01

    Speech produced by individuals with hypokinetic dysarthria associated with Parkinson's disease (PD) is characterized by a number of features including impaired speech prosody. The purpose of this study was to investigate intonation contrasts produced by this group of speakers. Speech materials with a question-statement contrast were collected from 14 Cantonese speakers with PD. Twenty listeners then classified the productions as either questions or statements. Acoustic analyses of F0, duration, and intensity were conducted to determine which acoustic cues distinguished the production of questions from statements, and which cues appeared to be exploited by listeners in identifying intonational contrasts. The results show that listeners identified statements with a high degree of accuracy, but the accuracy of question identification ranged from 0.56% to 96% across the 14 speakers. The speakers with PD used similar acoustic cues as nondysarthric Cantonese speakers to mark the question-statement contrast, although the contrasts were not observed in all speakers. Listeners mainly used F0 cues at the final syllable for intonation identification. These data contribute to the researchers' understanding of intonation marking in speakers with PD, with specific application to the production and perception of intonation in a lexical tone language.

  2. The Role of Speaker Identification in Korean University Students' Attitudes towards Five Varieties of English

    ERIC Educational Resources Information Center

    Yook, Cheongmin; Lindemann, Stephanie

    2013-01-01

    This study investigates how the attitudes of 60 Korean university students towards five varieties of English are affected by the identification of the speaker's nationality and ethnicity. The study employed both a verbal guise technique and questions eliciting overt beliefs and preferences related to learning English. While the majority of the…

  3. Effects of various electrode configurations on music perception, intonation and speaker gender identification.

    PubMed

    Landwehr, Markus; Fürstenberg, Dirk; Walger, Martin; von Wedel, Hasso; Meister, Hartmut

    2014-01-01

    Advances in speech coding strategies and electrode array designs for cochlear implants (CIs) predominantly aim at improving speech perception. Current efforts are also directed at transmitting appropriate cues of the fundamental frequency (F0) to the auditory nerve with respect to speech quality, prosody, and music perception. The aim of this study was to examine the effects of various electrode configurations and coding strategies on speech intonation identification, speaker gender identification, and music quality rating. In six MED-EL CI users electrodes were selectively deactivated in order to simulate different insertion depths and inter-electrode distances when using the high definition continuous interleaved sampling (HDCIS) and fine structure processing (FSP) speech coding strategies. Identification of intonation and speaker gender was determined and music quality rating was assessed. For intonation identification HDCIS was robust against the different electrode configurations, whereas fine structure processing showed significantly worse results when a short electrode depth was simulated. In contrast, speaker gender recognition was not affected by electrode configuration or speech coding strategy. Music quality rating was sensitive to electrode configuration. In conclusion, the three experiments revealed different outcomes, even though they all addressed the reception of F0 cues. Rapid changes in F0, as seen with intonation, were the most sensitive to electrode configurations and coding strategies. In contrast, electrode configurations and coding strategies did not show large effects when F0 information was available over a longer time period, as seen with speaker gender. Music quality relies on additional spectral cues other than F0, and was poorest when a shallow insertion was simulated.

  4. Robust speaker's location detection in a vehicle environment using GMM models.

    PubMed

    Hu, Jwu-Sheng; Cheng, Chieh-Cheng; Liu, Wei-Han

    2006-04-01

    Abstract-Human-computer interaction (HCI) using speech communication is becoming increasingly important, especially in driving where safety is the primary concern. Knowing the speaker's location (i.e., speaker localization) not only improves the enhancement results of a corrupted signal, but also provides assistance to speaker identification. Since conventional speech localization algorithms suffer from the uncertainties of environmental complexity and noise, as well as from the microphone mismatch problem, they are frequently not robust in practice. Without a high reliability, the acceptance of speech-based HCI would never be realized. This work presents a novel speaker's location detection method and demonstrates high accuracy within a vehicle cabinet using a single linear microphone array. The proposed approach utilize Gaussian mixture models (GMM) to model the distributions of the phase differences among the microphones caused by the complex characteristic of room acoustic and microphone mismatch. The model can be applied both in near-field and far-field situations in a noisy environment. The individual Gaussian component of a GMM represents some general location-dependent but content and speaker-independent phase difference distributions. Moreover, the scheme performs well not only in nonline-of-sight cases, but also when the speakers are aligned toward the microphone array but at difference distances from it. This strong performance can be achieved by exploiting the fact that the phase difference distributions at different locations are distinguishable in the environment of a car. The experimental results also show that the proposed method outperforms the conventional multiple signal classification method (MUSIC) technique at various SNRs.

  5. Shibboleth: An Automated Foreign Accent Identification Program

    ERIC Educational Resources Information Center

    Frost, Wende

    2013-01-01

    The speech of non-native (L2) speakers of a language contains phonological rules that differentiate them from native speakers. These phonological rules characterize or distinguish accents in an L2. The Shibboleth program creates combinatorial rule-sets to describe the phonological pattern of these accents and classifies L2 speakers into their…

  6. Single-Word Intelligibility in Speakers with Repaired Cleft Palate

    ERIC Educational Resources Information Center

    Whitehill, Tara; Chau, Cynthia

    2004-01-01

    Many speakers with repaired cleft palate have reduced intelligibility, but there are limitations with current procedures for assessing intelligibility. The aim of this study was to construct a single-word intelligibility test for speakers with cleft palate. The test used a multiple-choice identification format, and was based on phonetic contrasts…

  7. An Acoustic and Social Dialect Analysis of Perceptual Variables in Listener Identification and Rating of Negro Speakers. Final Report.

    ERIC Educational Resources Information Center

    Bryden, James D.

    The purpose of this study was to specify variables which function significantly in the racial identification and speech quality rating of Negro and white speakers by Negro and white listeners. Ninety-one adults served as subjects for the speech task; 86 of these subjects, 43 Negro and 43 white, provided the listener responses. Subjects were chosen…

  8. Identification and tracking of particular speaker in noisy environment

    NASA Astrophysics Data System (ADS)

    Sawada, Hideyuki; Ohkado, Minoru

    2004-10-01

    Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.

  9. Automatic speech recognition and training for severely dysarthric users of assistive technology: the STARDUST project.

    PubMed

    Parker, Mark; Cunningham, Stuart; Enderby, Pam; Hawley, Mark; Green, Phil

    2006-01-01

    The STARDUST project developed robust computer speech recognizers for use by eight people with severe dysarthria and concomitant physical disability to access assistive technologies. Independent computer speech recognizers trained with normal speech are of limited functional use by those with severe dysarthria due to limited and inconsistent proximity to "normal" articulatory patterns. Severe dysarthric output may also be characterized by a small mass of distinguishable phonetic tokens making the acoustic differentiation of target words difficult. Speaker dependent computer speech recognition using Hidden Markov Models was achieved by the identification of robust phonetic elements within the individual speaker output patterns. A new system of speech training using computer generated visual and auditory feedback reduced the inconsistent production of key phonetic tokens over time.

  10. Evaluation of a speaker identification system with and without fusion using three databases in the presence of noise and handset effects

    NASA Astrophysics Data System (ADS)

    S. Al-Kaltakchi, Musab T.; Woo, Wai L.; Dlay, Satnam; Chambers, Jonathon A.

    2017-12-01

    In this study, a speaker identification system is considered consisting of a feature extraction stage which utilizes both power normalized cepstral coefficients (PNCCs) and Mel frequency cepstral coefficients (MFCC). Normalization is applied by employing cepstral mean and variance normalization (CMVN) and feature warping (FW), together with acoustic modeling using a Gaussian mixture model-universal background model (GMM-UBM). The main contributions are comprehensive evaluations of the effect of both additive white Gaussian noise (AWGN) and non-stationary noise (NSN) (with and without a G.712 type handset) upon identification performance. In particular, three NSN types with varying signal to noise ratios (SNRs) were tested corresponding to street traffic, a bus interior, and a crowded talking environment. The performance evaluation also considered the effect of late fusion techniques based on score fusion, namely, mean, maximum, and linear weighted sum fusion. The databases employed were TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3600 speech utterances. As recommendations from the study, mean fusion is found to yield overall best performance in terms of speaker identification accuracy (SIA) with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings.

  11. Acoustic and perceptual effects of overall F0 range in a lexical pitch accent distinction

    NASA Astrophysics Data System (ADS)

    Wade, Travis

    2002-05-01

    A speaker's overall fundamental frequency range is generally considered a variable, nonlinguistic element of intonation. This study examined the precision with which overall F0 is predictable based on previous intonational context and the extent to which it may be perceptually significant. Speakers of Tokyo Japanese produced pairs of sentences differing lexically only in the presence or absence of a single pitch accent as responses to visual and prerecorded speech cues presented in an interactive manner. F0 placement of high tones (previously observed to be relatively variable in pitch contours) was found to be consistent across speakers and uniformly dependent on the intonation of the different sentences used as cues. In a subsequent perception experiment, continuous manipulation of these same sentences between typical accented and typical non-accent-containing versions were presented to Japanese listeners for lexical identification. Results showed that listeners' perception was not significantly altered in compensation for artificial manipulation of preceding intonation. Implications are discussed within an autosegmental analysis of tone. The current results are consistent with the notion that pitch range (i.e., specific vertical locations of tonal peaks) does not simply vary gradiently across speakers and situations but constitutes a predictable part of the phonetic specification of tones.

  12. Left hemisphere lateralization for lexical and acoustic pitch processing in Cantonese speakers as revealed by mismatch negativity.

    PubMed

    Gu, Feng; Zhang, Caicai; Hu, Axu; Zhao, Guoping

    2013-12-01

    For nontonal language speakers, speech processing is lateralized to the left hemisphere and musical processing is lateralized to the right hemisphere (i.e., function-dependent brain asymmetry). On the other hand, acoustic temporal processing is lateralized to the left hemisphere and spectral/pitch processing is lateralized to the right hemisphere (i.e., acoustic-dependent brain asymmetry). In this study, we examine whether the hemispheric lateralization of lexical pitch and acoustic pitch processing in tonal language speakers is consistent with the patterns of function- and acoustic-dependent brain asymmetry in nontonal language speakers. Pitch contrast in both speech stimuli (syllable /ji/ in Experiment 1) and nonspeech stimuli (harmonic tone in Experiment 1; pure tone in Experiment 2) was presented to native Cantonese speakers in passive oddball paradigms. We found that the mismatch negativity (MMN) elicited by lexical pitch contrast was lateralized to the left hemisphere, which is consistent with the pattern of function-dependent brain asymmetry (i.e., left hemisphere lateralization for speech processing) in nontonal language speakers. However, the MMN elicited by acoustic pitch contrast was also left hemisphere lateralized (harmonic tone in Experiment 1) or showed a tendency for left hemisphere lateralization (pure tone in Experiment 2), which is inconsistent with the pattern of acoustic-dependent brain asymmetry (i.e., right hemisphere lateralization for acoustic pitch processing) in nontonal language speakers. The consistent pattern of function-dependent brain asymmetry and the inconsistent pattern of acoustic-dependent brain asymmetry between tonal and nontonal language speakers can be explained by the hypothesis that the acoustic-dependent brain asymmetry is the consequence of a carryover effect from function-dependent brain asymmetry. Potential evolutionary implication of this hypothesis is discussed. © 2013.

  13. Voice recognition through phonetic features with Punjabi utterances

    NASA Astrophysics Data System (ADS)

    Kaur, Jasdeep; Juglan, K. C.; Sharma, Vishal; Upadhyay, R. K.

    2017-07-01

    This paper deals with perception and disorders of speech in view of Punjabi language. Visualizing the importance of voice identification, various parameters of speaker identification has been studied. The speech material was recorded with a tape recorder in their normal and disguised mode of utterances. Out of the recorded speech materials, the utterances free from noise, etc were selected for their auditory and acoustic spectrographic analysis. The comparison of normal and disguised speech of seven subjects is reported. The fundamental frequency (F0) at similar places, Plosive duration at certain phoneme, Amplitude ratio (A1:A2) etc. were compared in normal and disguised speech. It was found that the formant frequency of normal and disguised speech remains almost similar only if it is compared at the position of same vowel quality and quantity. If the vowel is more closed or more open in the disguised utterance the formant frequency will be changed in comparison to normal utterance. The ratio of the amplitude (A1: A2) is found to be speaker dependent. It remains unchanged in the disguised utterance. However, this value may shift in disguised utterance if cross sectioning is not done at the same location.

  14. High stimulus variability in nonnative speech learning supports formation of abstract categories: evidence from Japanese geminates.

    PubMed

    Sadakata, Makiko; McQueen, James M

    2013-08-01

    This study reports effects of a high-variability training procedure on nonnative learning of a Japanese geminate-singleton fricative contrast. Thirty native speakers of Dutch took part in a 5-day training procedure in which they identified geminate and singleton variants of the Japanese fricative /s/. Participants were trained with either many repetitions of a limited set of words recorded by a single speaker (low-variability training) or with fewer repetitions of a more variable set of words recorded by multiple speakers (high-variability training). Both types of training enhanced identification of speech but not of nonspeech materials, indicating that learning was domain specific. High-variability training led to superior performance in identification but not in discrimination tests, and supported better generalization of learning as shown by transfer from the trained fricatives to the identification of untrained stops and affricates. Variability thus helps nonnative listeners to form abstract categories rather than to enhance early acoustic analysis.

  15. Gender identification from high-pass filtered vowel segments: the use of high-frequency energy.

    PubMed

    Donai, Jeremy J; Lass, Norman J

    2015-10-01

    The purpose of this study was to examine the use of high-frequency information for making gender identity judgments from high-pass filtered vowel segments produced by adult speakers. Specifically, the effect of removing lower-frequency spectral detail (i.e., F3 and below) from vowel segments via high-pass filtering was evaluated. Thirty listeners (ages 18-35) with normal hearing participated in the experiment. A within-subjects design was used to measure gender identification for six 250-ms vowel segments (/æ/, /ɪ /, /ɝ/, /ʌ/, /ɔ/, and /u/), produced by ten male and ten female speakers. The results of this experiment demonstrated that despite the removal of low-frequency spectral detail, the listeners were accurate in identifying speaker gender from the vowel segments, and did so with performance significantly above chance. The removal of low-frequency spectral detail reduced gender identification by approximately 16 % relative to unfiltered vowel segments. Classification results using linear discriminant function analyses followed the perceptual data, using spectral and temporal representations derived from the high-pass filtered segments. Cumulatively, these findings indicate that normal-hearing listeners are able to make accurate perceptual judgments regarding speaker gender from vowel segments with low-frequency spectral detail removed via high-pass filtering. Therefore, it is reasonable to suggest the presence of perceptual cues related to gender identity in the high-frequency region of naturally produced vowel signals. Implications of these findings and possible mechanisms for performing the gender identification task from high-pass filtered stimuli are discussed.

  16. Implementation of support vector machine for classification of speech marked hijaiyah letters based on Mel frequency cepstrum coefficient feature extraction

    NASA Astrophysics Data System (ADS)

    Adhi Pradana, Wisnu; Adiwijaya; Novia Wisesty, Untari

    2018-03-01

    Support Vector Machine or commonly called SVM is one method that can be used to process the classification of a data. SVM classifies data from 2 different classes with hyperplane. In this study, the system was built using SVM to develop Arabic Speech Recognition. In the development of the system, there are 2 kinds of speakers that have been tested that is dependent speakers and independent speakers. The results from this system is an accuracy of 85.32% for speaker dependent and 61.16% for independent speakers.

  17. Robust Recognition of Loud and Lombard speech in the Fighter Cockpit Environment

    DTIC Science & Technology

    1988-08-01

    the latter as inter-speaker variability. According to Zue [Z85j, inter-speaker variabilities can be attributed to sociolinguistic background, dialect...34 Journal of the Acoustical Society of America , Vol 50, 1971. [At74I B. S. Atal, "Linear prediction for speaker identification," Journal of the Acoustical...Society of America , Vol 55, 1974. [B771 B. Beek, E. P. Neuberg, and D. C. Hodge, "An Assessment of the Technology of Automatic Speech Recognition for

  18. "Feminism Lite?" Feminist Identification, Speaker Appearance, and Perceptions of Feminist and Antifeminist Messengers

    ERIC Educational Resources Information Center

    Bullock, Heather E.; Fernald, Julian L.

    2003-01-01

    Drawing on a communications model of persuasion (Hovland, Janis, & Kelley, 1953), this study examined the effect of target appearance on feminists' and nonfeminists' perceptions of a speaker delivering a feminist or an antifeminist message. One hundred three college women watched one of four videotaped speeches that varied by content (profeminist…

  19. Accent Identification by Adults with Aphasia

    ERIC Educational Resources Information Center

    Newton, Caroline; Burns, Rebecca; Bruce, Carolyn

    2013-01-01

    The UK is a diverse society where individuals regularly interact with speakers with different accents. Whilst there is a growing body of research on the impact of speaker accent on comprehension in people with aphasia, there is none which explores their ability to identify accents. This study investigated the ability of this group to identify the…

  20. Statistical Evaluation of Biometric Evidence in Forensic Automatic Speaker Recognition

    NASA Astrophysics Data System (ADS)

    Drygajlo, Andrzej

    Forensic speaker recognition is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). This paper aims at presenting forensic automatic speaker recognition (FASR) methods that provide a coherent way of quantifying and presenting recorded voice as biometric evidence. In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect. The interpretation of recorded voice as evidence in the forensic context presents particular challenges, including within-speaker (within-source) variability and between-speakers (between-sources) variability. Consequently, FASR methods must provide a statistical evaluation which gives the court an indication of the strength of the evidence given the estimated within-source and between-sources variabilities. This paper reports on the first ENFSI evaluation campaign through a fake case, organized by the Netherlands Forensic Institute (NFI), as an example, where an automatic method using the Gaussian mixture models (GMMs) and the Bayesian interpretation (BI) framework were implemented for the forensic speaker recognition task.

  1. Identifying the nonlinear mechanical behaviour of micro-speakers from their quasi-linear electrical response

    NASA Astrophysics Data System (ADS)

    Zilletti, Michele; Marker, Arthur; Elliott, Stephen John; Holland, Keith

    2017-05-01

    In this study model identification of the nonlinear dynamics of a micro-speaker is carried out by purely electrical measurements, avoiding any explicit vibration measurements. It is shown that a dynamic model of the micro-speaker, which takes into account the nonlinear damping characteristic of the device, can be identified by measuring the response between the voltage input and the current flowing into the coil. An analytical formulation of the quasi-linear model of the micro-speaker is first derived and an optimisation method is then used to identify a polynomial function which describes the mechanical damping behaviour of the micro-speaker. The analytical results of the quasi-linear model are compared with numerical results. This study potentially opens up the possibility of efficiently implementing nonlinear echo cancellers.

  2. Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence From Mandarin Speakers.

    PubMed

    Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei

    2015-07-01

    Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking individuals with high-functioning autism. Individuals with autism showed superior melodic contour identification but comparable contour discrimination relative to controls. In contrast, these individuals performed worse than controls on both discrimination and identification of speech intonation. These findings provide the first evidence for differential pitch processing in music and speech in tone language speakers with autism, suggesting that tone language experience may not compensate for speech intonation perception deficits in individuals with autism.

  3. Speaker recognition with temporal cues in acoustic and electric hearing

    NASA Astrophysics Data System (ADS)

    Vongphoe, Michael; Zeng, Fan-Gang

    2005-08-01

    Natural spoken language processing includes not only speech recognition but also identification of the speaker's gender, age, emotional, and social status. Our purpose in this study is to evaluate whether temporal cues are sufficient to support both speech and speaker recognition. Ten cochlear-implant and six normal-hearing subjects were presented with vowel tokens spoken by three men, three women, two boys, and two girls. In one condition, the subject was asked to recognize the vowel. In the other condition, the subject was asked to identify the speaker. Extensive training was provided for the speaker recognition task. Normal-hearing subjects achieved nearly perfect performance in both tasks. Cochlear-implant subjects achieved good performance in vowel recognition but poor performance in speaker recognition. The level of the cochlear implant performance was functionally equivalent to normal performance with eight spectral bands for vowel recognition but only to one band for speaker recognition. These results show a disassociation between speech and speaker recognition with primarily temporal cues, highlighting the limitation of current speech processing strategies in cochlear implants. Several methods, including explicit encoding of fundamental frequency and frequency modulation, are proposed to improve speaker recognition for current cochlear implant users.

  4. English Language Schooling, Linguistic Realities, and the Native Speaker of English in Hong Kong

    ERIC Educational Resources Information Center

    Hansen Edwards, Jette G.

    2018-01-01

    The study employs a case study approach to examine the impact of educational backgrounds on nine Hong Kong tertiary students' English and Cantonese language practices and identifications as native speakers of English and Cantonese. The study employed both survey and interview data to probe the participants' English and Cantonese language use at…

  5. Priming of Non-Speech Vocalizations in Male Adults: The Influence of the Speaker's Gender

    ERIC Educational Resources Information Center

    Fecteau, Shirley; Armony, Jorge L.; Joanette, Yves; Belin, Pascal

    2004-01-01

    Previous research reported a priming effect for voices. However, the type of information primed is still largely unknown. In this study, we examined the influence of speaker's gender and emotional category of the stimulus on priming of non-speech vocalizations in 10 male participants, who performed a gender identification task. We found a…

  6. Effects of Phonetic Similarity in the Identification of Mandarin Tones

    ERIC Educational Resources Information Center

    Li, Bin; Shao, Jing; Bao, Mingzhen

    2017-01-01

    Tonal languages differ in how they use phonetic correlates, e.g. average pitch height and pitch direction, for tonal contrasts. Thus, native speakers of a tonal language may need to adjust their attention to familiar or unfamiliar phonetic cues when perceiving non-native tones. On the other hand, speakers of a non-tonal language may need to…

  7. Speaker's voice as a memory cue.

    PubMed

    Campeanu, Sandra; Craik, Fergus I M; Alain, Claude

    2015-02-01

    Speaker's voice occupies a central role as the cornerstone of auditory social interaction. Here, we review the evidence suggesting that speaker's voice constitutes an integral context cue in auditory memory. Investigation into the nature of voice representation as a memory cue is essential to understanding auditory memory and the neural correlates which underlie it. Evidence from behavioral and electrophysiological studies suggest that while specific voice reinstatement (i.e., same speaker) often appears to facilitate word memory even without attention to voice at study, the presence of a partial benefit of similar voices between study and test is less clear. In terms of explicit memory experiments utilizing unfamiliar voices, encoding methods appear to play a pivotal role. Voice congruency effects have been found when voice is specifically attended at study (i.e., when relatively shallow, perceptual encoding takes place). These behavioral findings coincide with neural indices of memory performance such as the parietal old/new recollection effect and the late right frontal effect. The former distinguishes between correctly identified old words and correctly identified new words, and reflects voice congruency only when voice is attended at study. Characterization of the latter likely depends upon voice memory, rather than word memory. There is also evidence to suggest that voice effects can be found in implicit memory paradigms. However, the presence of voice effects appears to depend greatly on the task employed. Using a word identification task, perceptual similarity between study and test conditions is, like for explicit memory tests, crucial. In addition, the type of noise employed appears to have a differential effect. While voice effects have been observed when white noise is used at both study and test, using multi-talker babble does not confer the same results. In terms of neuroimaging research modulations, characterization of an implicit memory effect reflective of voice congruency is currently lacking. Copyright © 2014 Elsevier B.V. All rights reserved.

  8. Native Speakers of Arabic and ESL Texts: Evidence for the Transfer of Written Word Identification Processes

    ERIC Educational Resources Information Center

    Hayes-Harb, Rachel

    2006-01-01

    English as a second language (ESL) teachers have long noted that native speakers of Arabic exhibit exceptional difficulty with English reading comprehension (e.g., Thompson-Panos & Thomas-Ruzic, 1983). Most existing work in this area has looked to higher level aspects of reading such as familiarity with discourse structure and cultural knowledge…

  9. The Effect of Scene Variation on the Redundant Use of Color in Definite Reference

    ERIC Educational Resources Information Center

    Koolen, Ruud; Goudbeek, Martijn; Krahmer, Emiel

    2013-01-01

    This study investigates to what extent the amount of variation in a visual scene causes speakers to mention the attribute color in their definite target descriptions, focusing on scenes in which this attribute is not needed for identification of the target. The results of our three experiments show that speakers are more likely to redundantly…

  10. What's Learned Together Stays Together: Speakers' Choice of Referring Expression Reflects Shared Experience

    ERIC Educational Resources Information Center

    Gorman, Kristen S.; Gegg-Harrison, Whitney; Marsh, Chelsea R.; Tanenhaus, Michael K.

    2013-01-01

    When referring to named objects, speakers can choose either a name ("mbira") or a description ("that gourd-like instrument with metal strips"); whether the name provides useful information depends on whether the speaker's knowledge of the name is shared with the addressee. But, how do speakers determine what is shared? In 2…

  11. The perception of FM sweeps by Chinese and English listeners.

    PubMed

    Luo, Huan; Boemio, Anthony; Gordon, Michael; Poeppel, David

    2007-02-01

    Frequency-modulated (FM) signals are an integral acoustic component of ecologically natural sounds and are analyzed effectively in the auditory systems of humans and animals. Linearly frequency-modulated tone sweeps were used here to evaluate two questions. First, how rapid a sweep can listeners accurately perceive? Second, is there an effect of native language insofar as the language (phonology) is differentially associated with processing of FM signals? Speakers of English and Mandarin Chinese were tested to evaluate whether being a speaker of a tone language altered the perceptual identification of non-speech tone sweeps. In two psychophysical studies, we demonstrate that Chinese subjects perform better than English subjects in FM direction identification, but not in an FM discrimination task, in which English and Chinese speakers show similar detection thresholds of approximately 20 ms duration. We suggest that the better FM direction identification in Chinese subjects is related to their experience with FM direction analysis in the tone-language environment, even though supra-segmental tonal variation occurs over a longer time scale. Furthermore, the observed common discrimination temporal threshold across two language groups supports the conjecture that processing auditory signals at durations of approximately 20 ms constitutes a fundamental auditory perceptual threshold.

  12. Perceptual Detection of Subtle Dysphonic Traits in Individuals with Cervical Spinal Cord Injury Using an Audience Response Systems Approach.

    PubMed

    Johansson, Kerstin; Strömbergsson, Sofia; Robieux, Camille; McAllister, Anita

    2017-01-01

    Reduced respiratory function following lower cervical spinal cord injuries (CSCIs) may indirectly result in vocal dysfunction. Although self-reports indicate voice change and limitations following CSCI, earlier efforts using global perceptual ratings to distinguish speakers with CSCI from noninjured speakers have not been very successful. We investigate the use of an audience response system-based approach to distinguish speakers with CSCI from noninjured speakers, and explore whether specific vocal traits can be identified as characteristic for speakers with CSCI. Fourteen speech-language pathologists participated in a web-based perceptual task, where their overt reactions to vocal dysfunction were registered during the continuous playback of recordings of 36 speakers (18 with CSCI, and 18 matched controls). Dysphonic events were identified through manual perceptual analysis, to allow the exploration of connections between dysphonic events and listener reactions. More dysphonic events, and more listener reactions, were registered for speakers with CSCI than for noninjured speakers. Strain (particularly in phrase-final position) and creak (particularly in nonphrase-final position) distinguish speakers with CSCI from noninjured speakers. For the identification of intermittent and subtle signs of vocal dysfunction, an approach where the temporal distribution of symptoms is registered offers a viable means to distinguish speakers affected by voice dysfunction from non-affected speakers. In speakers with CSCI, clinicians should listen for presence of final strain and nonfinal creak, and pay attention to self-reported voice function and voice problems, to identify individuals in need for clinical assessment and intervention. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  13. Speaker normalization for chinese vowel recognition in cochlear implants.

    PubMed

    Luo, Xin; Fu, Qian-Jie

    2005-07-01

    Because of the limited spectra-temporal resolution associated with cochlear implants, implant patients often have greater difficulty with multitalker speech recognition. The present study investigated whether multitalker speech recognition can be improved by applying speaker normalization techniques to cochlear implant speech processing. Multitalker Chinese vowel recognition was tested with normal-hearing Chinese-speaking subjects listening to a 4-channel cochlear implant simulation, with and without speaker normalization. For each subject, speaker normalization was referenced to the speaker that produced the best recognition performance under conditions without speaker normalization. To match the remaining speakers to this "optimal" output pattern, the overall frequency range of the analysis filter bank was adjusted for each speaker according to the ratio of the mean third formant frequency values between the specific speaker and the reference speaker. Results showed that speaker normalization provided a small but significant improvement in subjects' overall recognition performance. After speaker normalization, subjects' patterns of recognition performance across speakers changed, demonstrating the potential for speaker-dependent effects with the proposed normalization technique.

  14. The Sound of Voice: Voice-Based Categorization of Speakers' Sexual Orientation within and across Languages.

    PubMed

    Sulpizio, Simone; Fasoli, Fabio; Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik

    2015-01-01

    Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity.

  15. Neural Systems Involved When Attending to a Speaker

    PubMed Central

    Kamourieh, Salwa; Braga, Rodrigo M.; Leech, Robert; Newbould, Rexford D.; Malhotra, Paresh; Wise, Richard J. S.

    2015-01-01

    Remembering what a speaker said depends on attention. During conversational speech, the emphasis is on working memory, but listening to a lecture encourages episodic memory encoding. With simultaneous interference from background speech, the need for auditory vigilance increases. We recreated these context-dependent demands on auditory attention in 2 ways. The first was to require participants to attend to one speaker in either the absence or presence of a distracting background speaker. The second was to alter the task demand, requiring either an immediate or delayed recall of the content of the attended speech. Across 2 fMRI studies, common activated regions associated with segregating attended from unattended speech were the right anterior insula and adjacent frontal operculum (aI/FOp), the left planum temporale, and the precuneus. In contrast, activity in a ventral right frontoparietal system was dependent on both the task demand and the presence of a competing speaker. Additional multivariate analyses identified other domain-general frontoparietal systems, where activity increased during attentive listening but was modulated little by the need for speech stream segregation in the presence of 2 speakers. These results make predictions about impairments in attentive listening in different communicative contexts following focal or diffuse brain pathology. PMID:25596592

  16. Shhh… I Need Quiet! Children's Understanding of American, British, and Japanese-accented English Speakers.

    PubMed

    Bent, Tessa; Holt, Rachael Frush

    2018-02-01

    Children's ability to understand speakers with a wide range of dialects and accents is essential for efficient language development and communication in a global society. Here, the impact of regional dialect and foreign-accent variability on children's speech understanding was evaluated in both quiet and noisy conditions. Five- to seven-year-old children ( n = 90) and adults ( n = 96) repeated sentences produced by three speakers with different accents-American English, British English, and Japanese-accented English-in quiet or noisy conditions. Adults had no difficulty understanding any speaker in quiet conditions. Their performance declined for the nonnative speaker with a moderate amount of noise; their performance only substantially declined for the British English speaker (i.e., below 93% correct) when their understanding of the American English speaker was also impeded. In contrast, although children showed accurate word recognition for the American and British English speakers in quiet conditions, they had difficulty understanding the nonnative speaker even under ideal listening conditions. With a moderate amount of noise, their perception of British English speech declined substantially and their ability to understand the nonnative speaker was particularly poor. These results suggest that although school-aged children can understand unfamiliar native dialects under ideal listening conditions, their ability to recognize words in these dialects may be highly susceptible to the influence of environmental degradation. Fully adult-like word identification for speakers with unfamiliar accents and dialects may exhibit a protracted developmental trajectory.

  17. How Captain Amerika uses neural networks to fight crime

    NASA Technical Reports Server (NTRS)

    Rogers, Steven K.; Kabrisky, Matthew; Ruck, Dennis W.; Oxley, Mark E.

    1994-01-01

    Artificial neural network models can make amazing computations. These models are explained along with their application in problems associated with fighting crime. Specific problems addressed are identification of people using face recognition, speaker identification, and fingerprint and handwriting analysis (biometric authentication).

  18. On the optimization of a mixed speaker array in an enclosed space using the virtual-speaker weighting method

    NASA Astrophysics Data System (ADS)

    Peng, Bo; Zheng, Sifa; Liao, Xiangning; Lian, Xiaomin

    2018-03-01

    In order to achieve sound field reproduction in a wide frequency band, multiple-type speakers are used. The reproduction accuracy is not only affected by the signals sent to the speakers, but also depends on the position and the number of each type of speaker. The method of optimizing a mixed speaker array is investigated in this paper. A virtual-speaker weighting method is proposed to optimize both the position and the number of each type of speaker. In this method, a virtual-speaker model is proposed to quantify the increment of controllability of the speaker array when the speaker number increases. While optimizing a mixed speaker array, the gain of the virtual-speaker transfer function is used to determine the priority orders of the candidate speaker positions, which optimizes the position of each type of speaker. Then the relative gain of the virtual-speaker transfer function is used to determine whether the speakers are redundant, which optimizes the number of each type of speaker. Finally the virtual-speaker weighting method is verified by reproduction experiments of the interior sound field in a passenger car. The results validate that the optimum mixed speaker array can be obtained using the proposed method.

  19. The Effects of the Literal Meaning of Emotional Phrases on the Identification of Vocal Emotions.

    PubMed

    Shigeno, Sumi

    2018-02-01

    This study investigates the discrepancy between the literal emotional content of speech and emotional tone in the identification of speakers' vocal emotions in both the listeners' native language (Japanese), and in an unfamiliar language (random-spliced Japanese). Both experiments involve a "congruent condition," in which the emotion contained in the literal meaning of speech (words and phrases) was compatible with vocal emotion, and an "incongruent condition," in which these forms of emotional information were discordant. Results for Japanese indicated that performance in identifying emotions did not differ significantly between the congruent and incongruent conditions. However, the results for random-spliced Japanese indicated that vocal emotion was correctly identified more often in the congruent than in the incongruent condition. The different results for Japanese and random-spliced Japanese suggested that the literal meaning of emotional phrases influences the listener's perception of the speaker's emotion, and that Japanese participants could infer speakers' intended emotions in the incongruent condition.

  20. Tier-Adjacency Is Not a Necessary Condition for Learning Phonotactic Dependencies

    ERIC Educational Resources Information Center

    Koo, Hahn; Callahan, Lydia

    2012-01-01

    One hypothesis raised by Newport and Aslin to explain how speakers learn dependencies between nonadjacent phonemes is that speakers track bigram probabilities between two segments that are adjacent to each other within a tier of their own. The hypothesis predicts that a dependency between segments separated from each other at the tier level cannot…

  1. Mother and Father Speech: Distribution of Parental Speech Features in English and Spanish. Papers and Reports on Child Language Development, No. 12.

    ERIC Educational Resources Information Center

    Blount, Ben G.; Padgug, Elise J.

    Features of parental speech to young children was studied in four English-speaking and four Spanish-speaking families. Children ranged in age from 9 to 12 months for the English speakers and from 8 to 22 months for the Spanish speakers. Examination of the utterances led to the identification of 34 prosodic, paralinguistic, and interactional…

  2. The Sound of Voice: Voice-Based Categorization of Speakers’ Sexual Orientation within and across Languages

    PubMed Central

    Maass, Anne; Paladino, Maria Paola; Vespignani, Francesco; Eyssel, Friederike; Bentler, Dominik

    2015-01-01

    Empirical research had initially shown that English listeners are able to identify the speakers' sexual orientation based on voice cues alone. However, the accuracy of this voice-based categorization, as well as its generalizability to other languages (language-dependency) and to non-native speakers (language-specificity), has been questioned recently. Consequently, we address these open issues in 5 experiments: First, we tested whether Italian and German listeners are able to correctly identify sexual orientation of same-language male speakers. Then, participants of both nationalities listened to voice samples and rated the sexual orientation of both Italian and German male speakers. We found that listeners were unable to identify the speakers' sexual orientation correctly. However, speakers were consistently categorized as either heterosexual or gay on the basis of how they sounded. Moreover, a similar pattern of results emerged when listeners judged the sexual orientation of speakers of their own and of the foreign language. Overall, this research suggests that voice-based categorization of sexual orientation reflects the listeners' expectations of how gay voices sound rather than being an accurate detector of the speakers' actual sexual identity. Results are discussed with regard to accuracy, acoustic features of voices, language dependency and language specificity. PMID:26132820

  3. Age as a Factor in Ethnic Accent Identification in Singapore

    ERIC Educational Resources Information Center

    Tan, Ying Ying

    2012-01-01

    This study seeks to answer two research questions. First, can listeners distinguish the ethnicity of the speakers on the basis of voice quality alone? Second, do demographic differences among the listeners affect discriminability? A simple but carefully designed and controlled ethnic identification test was carried out on 325 Singaporean…

  4. Native and Non-Native Speakers' Brain Responses to Filled Indirect Object Gaps

    ERIC Educational Resources Information Center

    Jessen, Anna; Festman, Julia; Boxell, Oliver; Felser, Claudia

    2017-01-01

    We examined native and non-native English speakers' processing of indirect object "wh"-dependencies using a filled-gap paradigm while recording event-related potentials (ERPs). The non-native group was comprised of native German-speaking, proficient non-native speakers of English. Both participant groups showed evidence of linking…

  5. Speech variability effects on recognition accuracy associated with concurrent task performance by pilots

    NASA Technical Reports Server (NTRS)

    Simpson, C. A.

    1985-01-01

    In the present study of the responses of pairs of pilots to aircraft warning classification tasks using an isolated word, speaker-dependent speech recognition system, the induced stress was manipulated by means of different scoring procedures for the classification task and by the inclusion of a competitive manual control task. Both speech patterns and recognition accuracy were analyzed, and recognition errors were recorded by type for an isolated word speaker-dependent system and by an offline technique for a connected word speaker-dependent system. While errors increased with task loading for the isolated word system, there was no such effect for task loading in the case of the connected word system.

  6. Congenital amusia in speakers of a tone language: association with lexical tone agnosia.

    PubMed

    Nan, Yun; Sun, Yanan; Peretz, Isabelle

    2010-09-01

    Congenital amusia is a neurogenetic disorder that affects the processing of musical pitch in speakers of non-tonal languages like English and French. We assessed whether this musical disorder exists among speakers of Mandarin Chinese who use pitch to alter the meaning of words. Using the Montreal Battery of Evaluation of Amusia, we tested 117 healthy young Mandarin speakers with no self-declared musical problems and 22 individuals who reported musical difficulties and scored two standard deviations below the mean obtained by the Mandarin speakers without amusia. These 22 amusic individuals showed a similar pattern of musical impairment as did amusic speakers of non-tonal languages, by exhibiting a more pronounced deficit in melody than in rhythm processing. Furthermore, nearly half the tested amusics had impairments in the discrimination and identification of Mandarin lexical tones. Six showed marked impairments, displaying what could be called lexical tone agnosia, but had normal tone production. Our results show that speakers of tone languages such as Mandarin may experience musical pitch disorder despite early exposure to speech-relevant pitch contrasts. The observed association between the musical disorder and lexical tone difficulty indicates that the pitch disorder as defining congenital amusia is not specific to music or culture but is rather general in nature.

  7. The role of linguistic experience in the processing of probabilistic information in production.

    PubMed

    Gustafson, Erin; Goldrick, Matthew

    2018-01-01

    Speakers track the probability that a word will occur in a particular context and utilize this information during phonetic processing. For example, content words that have high probability within a discourse tend to be realized with reduced acoustic/articulatory properties. Such probabilistic information may influence L1 and L2 speech processing in distinct ways (reflecting differences in linguistic experience across groups and the overall difficulty of L2 speech processing). To examine this issue, L1 and L2 speakers performed a referential communication task, describing sequences of simple actions. The two groups of speakers showed similar effects of discourse-dependent probabilistic information on production, suggesting that L2 speakers can successfully track discourse-dependent probabilities and use such information to modulate phonetic processing.

  8. Sensing of Particular Speakers for the Construction of Voice Interface Utilized in Noisy Environment

    NASA Astrophysics Data System (ADS)

    Sawada, Hideyuki; Ohkado, Minoru

    Human is able to exchange information smoothly using voice under different situations such as noisy environment in a crowd and with the existence of plural speakers. We are able to detect the position of a source sound in 3D space, extract a particular sound from mixed sounds, and recognize who is talking. By realizing this mechanism with a computer, new applications will be presented for recording a sound with high quality by reducing noise, presenting a clarified sound, and realizing a microphone-free speech recognition by extracting particular sound. The paper will introduce a realtime detection and identification of particular speaker in noisy environment using a microphone array based on the location of a speaker and the individual voice characteristics. The study will be applied to develop an adaptive auditory system of a mobile robot which collaborates with a factory worker.

  9. Understanding of emotions and false beliefs among hearing children versus deaf children.

    PubMed

    Ziv, Margalit; Most, Tova; Cohen, Shirit

    2013-04-01

    Emotion understanding and theory of mind (ToM) are two major aspects of social cognition in which deaf children demonstrate developmental delays. The current study investigated these social cognition aspects in two subgroups of deaf children-those with cochlear implants who communicate orally (speakers) and those who communicate primarily using sign language (signers)-in comparison to hearing children. Participants were 53 Israeli kindergartners-20 speakers, 10 signers, and 23 hearing children. Tests included four emotion identification and understanding tasks and one false belief task (ToM). Results revealed similarities among all children's emotion labeling and affective perspective taking abilities, similarities between speakers and hearing children in false beliefs and in understanding emotions in typical contexts, and lower performance of signers on the latter three tasks. Adapting educational experiences to the unique characteristics and needs of speakers and signers is recommended.

  10. And then I saw her race: Race-based expectations affect infants' word processing.

    PubMed

    Weatherhead, Drew; White, Katherine S

    2018-08-01

    How do our expectations about speakers shape speech perception? Adults' speech perception is influenced by social properties of the speaker (e.g., race). When in development do these influences begin? In the current study, 16-month-olds heard familiar words produced in their native accent (e.g., "dog") and in an unfamiliar accent involving a vowel shift (e.g., "dag"), in the context of an image of either a same-race speaker or an other-race speaker. Infants' interpretation of the words depended on the speaker's race. For the same-race speaker, infants only recognized words produced in the familiar accent; for the other-race speaker, infants recognized both versions of the words. Two additional experiments showed that infants only recognized an other-race speaker's atypical pronunciations when they differed systematically from the native accent. These results provide the first evidence that expectations driven by unspoken properties of speakers, such as race, influence infants' speech processing. Copyright © 2018 Elsevier B.V. All rights reserved.

  11. Do Listeners Store in Memory a Speaker's Habitual Utterance-Final Phonation Type?

    PubMed Central

    Bőhm, Tamás; Shattuck-Hufnagel, Stefanie

    2009-01-01

    Earlier studies report systematic differences across speakers in the occurrence of utterance-final irregular phonation; the work reported here investigated whether human listeners remember this speaker-specific information and can access it when necessary (a prerequisite for using this cue in speaker recognition). Listeners personally familiar with the voices of the speakers were presented with pairs of speech samples: one with the original and the other with transformed final phonation type. Asked to select the member of the pair that was closer to the talker's voice, most listeners tended to choose the unmanipulated token (even though they judged them to sound essentially equally natural). This suggests that utterance-final pitch period irregularity is part of the mental representation of individual speaker voices, although this may depend on the individual speaker and listener to some extent. PMID:19776665

  12. Standardization and future directions in pattern identification research: International brainstorming session.

    PubMed

    Jung, Jeeyoun; Park, Bongki; Lee, Ju Ah; You, Sooseong; Alraek, Terje; Bian, Zhao-Xiang; Birch, Stephen; Kim, Tae-Hun; Xu, Hao; Zaslawski, Chris; Kang, Byoung-Kab; Lee, Myeong Soo

    2016-09-01

    An international brainstorming session on standardizing pattern identification (PI) was held at the Korea Institute of Oriental Medicine on October 1, 2013 in Daejeon, South Korea. This brainstorming session was convened to gather insights from international traditional East Asian medicine specialists regarding PI standardization. With eight presentations and discussion sessions, the meeting allowed participants to discuss research methods and diagnostic systems used in traditional medicine for PI. One speaker presented a talk titled "The diagnostic criteria for blood stasis syndrome: implications for standardization of PI". Four speakers presented on future strategies and objective measurement tools that could be used in PI research. Later, participants shared information and methodology for accurate diagnosis and PI. They also discussed the necessity for standardizing PI and methods for international collaborations in pattern research.

  13. Perception of musical and lexical tones by Taiwanese-speaking musicians.

    PubMed

    Lee, Chao-Yang; Lee, Yuh-Fang; Shr, Chia-Lin

    2011-07-01

    This study explored the relationship between music and speech by examining absolute pitch and lexical tone perception. Taiwanese-speaking musicians were asked to identify musical tones without a reference pitch and multispeaker Taiwanese level tones without acoustic cues typically present for speaker normalization. The results showed that a high percentage of the participants (65% with an exact match required and 81% with one-semitone errors allowed) possessed absolute pitch, as measured by the musical tone identification task. A negative correlation was found between occurrence of absolute pitch and age of onset of musical training, suggesting that the acquisition of absolute pitch resembles the acquisition of speech. The participants were able to identify multispeaker Taiwanese level tones with above-chance accuracy, even though the acoustic cues typically present for speaker normalization were not available in the stimuli. No correlations were found between the performance in musical tone identification and the performance in Taiwanese tone identification. Potential reasons for the lack of association between the two tasks are discussed. © 2011 Acoustical Society of America

  14. "Who" is saying "what"? Brain-based decoding of human voice and speech.

    PubMed

    Formisano, Elia; De Martino, Federico; Bonte, Milene; Goebel, Rainer

    2008-11-07

    Can we decipher speech content ("what" is being said) and speaker identity ("who" is saying it) from observations of brain activity of a listener? Here, we combine functional magnetic resonance imaging with a data-mining algorithm and retrieve what and whom a person is listening to from the neural fingerprints that speech and voice signals elicit in the listener's auditory cortex. These cortical fingerprints are spatially distributed and insensitive to acoustic variations of the input so as to permit the brain-based recognition of learned speech from unknown speakers and of learned voices from previously unheard utterances. Our findings unravel the detailed cortical layout and computational properties of the neural populations at the basis of human speech recognition and speaker identification.

  15. Syntactic Constraints and Individual Differences in Native and Non-Native Processing of Wh-Movement

    PubMed Central

    Johnson, Adrienne; Fiorentino, Robert; Gabriele, Alison

    2016-01-01

    There is a debate as to whether second language (L2) learners show qualitatively similar processing profiles as native speakers or whether L2 learners are restricted in their ability to use syntactic information during online processing. In the realm of wh-dependency resolution, research has examined whether learners, similar to native speakers, attempt to resolve wh-dependencies in grammatically licensed contexts but avoid positing gaps in illicit contexts such as islands. Also at issue is whether the avoidance of gap filling in islands is due to adherence to syntactic constraints or whether islands simply present processing bottlenecks. One approach has been to examine the relationship between processing abilities and the establishment of wh-dependencies in islands. Grammatical accounts of islands do not predict such a relationship as the parser should simply not predict gaps in illicit contexts. In contrast, a pattern of results showing that individuals with more processing resources are better able to establish wh-dependencies in islands could conceivably be compatible with certain processing accounts. In a self-paced reading experiment which examines the processing of wh-dependencies, we address both questions, examining whether native English speakers and Korean learners of English show qualitatively similar patterns and whether there is a relationship between working memory, as measured by counting span and reading span, and processing in both island and non-island contexts. The results of the self-paced reading experiment suggest that learners can use syntactic information on the same timecourse as native speakers, showing qualitative similarity between the two groups. Results of regression analyses did not reveal a significant relationship between working memory and the establishment of wh-dependencies in islands but we did observe significant relationships between working memory and the processing of licit wh-dependencies. As the contexts in which these relationships emerged differed for learners and native speakers, our results call for further research examining individual differences in dependency resolution in both populations. PMID:27148152

  16. Syntactic Constraints and Individual Differences in Native and Non-Native Processing of Wh-Movement.

    PubMed

    Johnson, Adrienne; Fiorentino, Robert; Gabriele, Alison

    2016-01-01

    There is a debate as to whether second language (L2) learners show qualitatively similar processing profiles as native speakers or whether L2 learners are restricted in their ability to use syntactic information during online processing. In the realm of wh-dependency resolution, research has examined whether learners, similar to native speakers, attempt to resolve wh-dependencies in grammatically licensed contexts but avoid positing gaps in illicit contexts such as islands. Also at issue is whether the avoidance of gap filling in islands is due to adherence to syntactic constraints or whether islands simply present processing bottlenecks. One approach has been to examine the relationship between processing abilities and the establishment of wh-dependencies in islands. Grammatical accounts of islands do not predict such a relationship as the parser should simply not predict gaps in illicit contexts. In contrast, a pattern of results showing that individuals with more processing resources are better able to establish wh-dependencies in islands could conceivably be compatible with certain processing accounts. In a self-paced reading experiment which examines the processing of wh-dependencies, we address both questions, examining whether native English speakers and Korean learners of English show qualitatively similar patterns and whether there is a relationship between working memory, as measured by counting span and reading span, and processing in both island and non-island contexts. The results of the self-paced reading experiment suggest that learners can use syntactic information on the same timecourse as native speakers, showing qualitative similarity between the two groups. Results of regression analyses did not reveal a significant relationship between working memory and the establishment of wh-dependencies in islands but we did observe significant relationships between working memory and the processing of licit wh-dependencies. As the contexts in which these relationships emerged differed for learners and native speakers, our results call for further research examining individual differences in dependency resolution in both populations.

  17. Sound Localization and Speech Perception in Noise of Pediatric Cochlear Implant Recipients: Bimodal Fitting Versus Bilateral Cochlear Implants.

    PubMed

    Choi, Ji Eun; Moon, Il Joon; Kim, Eun Yeon; Park, Hee-Sung; Kim, Byung Kil; Chung, Won-Ho; Cho, Yang-Sun; Brown, Carolyn J; Hong, Sung Hwa

    The aim of this study was to compare binaural performance of auditory localization task and speech perception in babble measure between children who use a cochlear implant (CI) in one ear and a hearing aid (HA) in the other (bimodal fitting) and those who use bilateral CIs. Thirteen children (mean age ± SD = 10 ± 2.9 years) with bilateral CIs and 19 children with bimodal fitting were recruited to participate. Sound localization was assessed using a 13-loudspeaker array in a quiet sound-treated booth. Speakers were placed in an arc from -90° azimuth to +90° azimuth (15° interval) in horizontal plane. To assess the accuracy of sound location identification, we calculated the absolute error in degrees between the target speaker and the response speaker during each trial. The mean absolute error was computed by dividing the sum of absolute errors by the total number of trials. We also calculated the hemifield identification score to reflect the accuracy of right/left discrimination. Speech-in-babble perception was also measured in the sound field using target speech presented from the front speaker. Eight-talker babble was presented in the following four different listening conditions: from the front speaker (0°), from one of the two side speakers (+90° or -90°), from both side speakers (±90°). Speech, spatial, and quality questionnaire was administered. When the two groups of children were directly compared with each other, there was no significant difference in localization accuracy ability or hemifield identification score under binaural condition. Performance in speech perception test was also similar to each other under most babble conditions. However, when the babble was from the first device side (CI side for children with bimodal stimulation or first CI side for children with bilateral CIs), speech understanding in babble by bilateral CI users was significantly better than that by bimodal listeners. Speech, spatial, and quality scores were comparable with each other between the two groups. Overall, the binaural performance was similar to each other between children who are fit with two CIs (CI + CI) and those who use bimodal stimulation (HA + CI) in most conditions. However, the bilateral CI group showed better speech perception than the bimodal CI group when babble was from the first device side (first CI side for bilateral CI users or CI side for bimodal listeners). Therefore, if bimodal performance is significantly below the mean bilateral CI performance on speech perception in babble, these results suggest that a child should be considered to transit from bimodal stimulation to bilateral CIs.

  18. Interface of Linguistic and Visual Information During Audience Design.

    PubMed

    Fukumura, Kumiko

    2015-08-01

    Evidence suggests that speakers can take account of the addressee's needs when referring. However, what representations drive the speaker's audience design has been less clear. This study aims to go beyond previous studies by investigating the interplay between the visual and linguistic context during audience design. Speakers repeated subordinate descriptions (e.g., firefighter) given in the prior linguistic context less and used basic-level descriptions (e.g., man) more when the addressee did not hear the linguistic context than when s/he did. But crucially, this effect happened only when the referent lacked the visual attributes associated with the expressions (e.g., the referent was in plain clothes rather than in a firefighter uniform), so there was no other contextual cue available for the identification of the referent. This suggests that speakers flexibly use different contextual cues to help their addressee map the referring expression onto the intended referent. In addition, speakers used fewer pronouns when the addressee did not hear the linguistic antecedent than when s/he did. This suggests that although speakers may be egocentric during anaphoric reference (Fukumura & Van Gompel, 2012), they can cooperatively avoid pronouns when the linguistic antecedents were not shared with their addressee during initial reference. © 2014 Cognitive Science Society, Inc.

  19. Voice input/output capabilities at Perception Technology Corporation

    NASA Technical Reports Server (NTRS)

    Ferber, Leon A.

    1977-01-01

    Condensed resumes of key company personnel at the Perception Technology Corporation are presented. The staff possesses recognition, speech synthesis, speaker authentication, and language identification. Hardware and software engineers' capabilities are included.

  20. Cross-language identification of long-term average speech spectra in Korean and English: toward a better understanding of the quantitative difference between two languages.

    PubMed

    Noh, Heil; Lee, Dong-Hee

    2012-01-01

    To identify the quantitative differences between Korean and English in long-term average speech spectra (LTASS). Twenty Korean speakers, who lived in the capital of Korea and spoke standard Korean as their first language, were compared with 20 native English speakers. For the Korean speakers, a passage from a novel and a passage from a leading newspaper article were chosen. For the English speakers, the Rainbow Passage was used. The speech was digitally recorded using GenRad 1982 Precision Sound Level Meter and GoldWave® software and analyzed using MATLAB program. There was no significant difference in the LTASS between the Korean subjects reading a news article or a novel. For male subjects, the LTASS of Korean speakers was significantly lower than that of English speakers above 1.6 kHz except at 4 kHz and its difference was more than 5 dB, especially at higher frequencies. For women, the LTASS of Korean speakers showed significantly lower levels at 0.2, 0.5, 1, 1.25, 2, 2.5, 6.3, 8, and 10 kHz, but the differences were less than 5 dB. Compared with English speakers, the LTASS of Korean speakers showed significantly lower levels in frequencies above 2 kHz except at 4 kHz. The difference was less than 5 dB between 2 and 5 kHz but more than 5 dB above 6 kHz. To adjust the formula for fitting hearing aids for Koreans, our results based on the LTASS analysis suggest that one needs to raise the gain in high-frequency regions.

  1. Perception of English palatal codas by Korean speakers of English

    NASA Astrophysics Data System (ADS)

    Yeon, Sang-Hee

    2003-04-01

    This study aimed at looking at perception of English palatal codas by Korean speakers of English to determine if perception problems are the source of production problems. In particular, first, this study looked at the possible first language effect on the perception of English palatal codas. Second, a possible perceptual source of vowel epenthesis after English palatal codas was investigated. In addition, individual factors, such as length of residence, TOEFL score, gender and academic status, were compared to determine if those affected the varying degree of the perception accuracy. Eleven adult Korean speakers of English as well as three native speakers of English participated in the study. Three sets of a perception test including identification of minimally different English pseudo- or real words were carried out. The results showed that, first, the Korean speakers perceived the English codas significantly worse than the Americans. Second, the study supported the idea that Koreans perceived an extra /i/ after the final affricates due to final release. Finally, none of the individual factors explained the varying degree of the perceptional accuracy. In particular, TOEFL scores and the perception test scores did not have any statistically significant association.

  2. Training the perception of Hindi dental and retroflex stops by native speakers of American English and Japanese.

    PubMed

    Pruitt, John S; Jenkins, James J; Strange, Winifred

    2006-03-01

    Perception of second language speech sounds is influenced by one's first language. For example, speakers of American English have difficulty perceiving dental versus retroflex stop consonants in Hindi although English has both dental and retroflex allophones of alveolar stops. Japanese, unlike English, has a contrast similar to Hindi, specifically, the Japanese /d/ versus the flapped /r/ which is sometimes produced as a retroflex. This study compared American and Japanese speakers' identification of the Hindi contrast in CV syllable contexts where C varied in voicing and aspiration. The study then evaluated the participants' increase in identifying the distinction after training with a computer-interactive program. Training sessions progressively increased in difficulty by decreasing the extent of vowel truncation in stimuli and by adding new speakers. Although all participants improved significantly, Japanese participants were more accurate than Americans in distinguishing the contrast on pretest, during training, and on posttest. Transfer was observed to three new consonantal contexts, a new vowel context, and a new speaker's productions. Some abstract aspect of the contrast was apparently learned during training. It is suggested that allophonic experience with dental and retroflex stops may be detrimental to perception of the new contrast.

  3. 29 CFR 102.142 - Transcripts, recordings or minutes of closed meetings; public availability; retention.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...

  4. 29 CFR 102.142 - Transcripts, recordings or minutes of closed meetings; public availability; retention.

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ..., or transcriptions of electronic recordings including the identification of speakers, shall to the... cost of transcription. (c) The agency shall maintain a complete verbatim copy of the transcript, a...

  5. Speaker gender identification based on majority vote classifiers

    NASA Astrophysics Data System (ADS)

    Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri

    2017-03-01

    Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.

  6. Report of an international symposium on drugs and driving

    DOT National Transportation Integrated Search

    1975-06-30

    This report presents the proceedings of a Symposium on Drugs (other than alcohol) and Driving. Speaker's papers and work session summaries are included. Major topics include: Overview of Problem, Risk Identification, Drug Measurement in Biological Ma...

  7. Individual differences in selective attention predict speech identification at a cocktail party.

    PubMed

    Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

    2016-08-31

    Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise.

  8. Perception of emotionally loaded vocal expressions and its connection to responses to music. A cross-cultural investigation: Estonia, Finland, Sweden, Russia, and the USA

    PubMed Central

    Waaramaa, Teija; Leisiö, Timo

    2013-01-01

    The present study focused on voice quality and the perception of the basic emotions from speech samples in cross-cultural conditions. It was examined whether voice quality, cultural, or language background, age, or gender were related to the identification of the emotions. Professional actors (n2) and actresses (n2) produced non-sense sentences (n32) and protracted vowels (n8) expressing the six basic emotions, interest, and a neutral emotional state. The impact of musical interests on the ability to distinguish between emotions or valence (on an axis positivity – neutrality – negativity) from voice samples was studied. Listening tests were conducted on location in five countries: Estonia, Finland, Russia, Sweden, and the USA with 50 randomly chosen participants (25 males and 25 females) in each country. The participants (total N = 250) completed a questionnaire eliciting their background information and musical interests. The responses in the listening test and the questionnaires were statistically analyzed. Voice quality parameters and the share of the emotions and valence identified correlated significantly with each other for both genders. The percentage of emotions and valence identified was clearly above the chance level in each of the five countries studied, however, the countries differed significantly from each other for the identified emotions and the gender of the speaker. The samples produced by females were identified significantly better than those produced by males. Listener's age was a significant variable. Only minor gender differences were found for the identification. Perceptual confusion in the listening test between emotions seemed to be dependent on their similar voice production types. Musical interests tended to have a positive effect on the identification of the emotions. The results also suggest that identifying emotions from speech samples may be easier for those listeners who share a similar language or cultural background with the speaker. PMID:23801972

  9. Greek perception and production of an English vowel contrast: A preliminary study

    NASA Astrophysics Data System (ADS)

    Podlipský, Václav J.

    2005-04-01

    This study focused on language-independent principles functioning in acquisition of second language (L2) contrasts. Specifically, it tested Bohn's Desensitization Hypothesis [in Speech perception and linguistic experience: Issues in Cross Language Research, edited by W. Strange (York Press, Baltimore, 1995)] which predicted that Greek speakers of English as an L2 would base their perceptual identification of English /i/ and /I/ on durational differences. Synthetic vowels differing orthogonally in duration and spectrum between the /i/ and /I/ endpoints served as stimuli for a forced-choice identification test. To assess L2 proficiency and to evaluate the possibility of cross-language category assimilation, productions of English /i/, /I/, and /ɛ/ and of Greek /i/ and /e/ were elicited and analyzed acoustically. The L2 utterances were also rated for the degree of foreign accent. Two native speakers of Modern Greek with low and 2 with intermediate experience in English participated. Six native English (NE) listeners and 6 NE speakers tested in an earlier study constituted the control groups. Heterogeneous perceptual behavior was observed for the L2 subjects. It is concluded that until acquisition in completely naturalistic settings is tested, possible interference of formally induced meta-linguistic differentiation between a ``short'' and a ``long'' vowel cannot be eliminated.

  10. What a speaker's choice of frame reveals: reference points, frame selection, and framing effects.

    PubMed

    McKenzie, Craig R M; Nelson, Jonathan D

    2003-09-01

    Framing effects are well established: Listeners' preferences depend on how outcomes are described to them, or framed. Less well understood is what determines how speakers choose frames. Two experiments revealed that reference points systematically influenced speakers' choices between logically equivalent frames. For example, speakers tended to describe a 4-ounce cup filled to the 2-ounce line as half full if it was previously empty but described it as half empty if it was previously full. Similar results were found when speakers could describe the outcome of a medical treatment in terms of either mortality or survival (e.g., 25% die vs. 75% survive). Two additional experiments showed that listeners made accurate inferences about speakers' reference points on the basis of the selected frame (e.g., if a speaker described a cup as half empty, listeners inferred that the cup used to be full). Taken together, the data suggest that frames reliably convey implicit information in addition to their explicit content, which helps explain why framing effects are so robust.

  11. An automatic speech recognition system with speaker-independent identification support

    NASA Astrophysics Data System (ADS)

    Caranica, Alexandru; Burileanu, Corneliu

    2015-02-01

    The novelty of this work relies on the application of an open source research software toolkit (CMU Sphinx) to train, build and evaluate a speech recognition system, with speaker-independent support, for voice-controlled hardware applications. Moreover, we propose to use the trained acoustic model to successfully decode offline voice commands on embedded hardware, such as an ARMv6 low-cost SoC, Raspberry PI. This type of single-board computer, mainly used for educational and research activities, can serve as a proof-of-concept software and hardware stack for low cost voice automation systems.

  12. Inclusion, Affection, Control: The Pragmatics of Intergenerational Communication.

    ERIC Educational Resources Information Center

    Hess, Lucille J.; Hess, Richard C.

    Personal intent and discourse considerations play an important role in understanding the nature of a conversation between a youth and an elderly person. Each participant makes assumptions about the listener's knowledge and ability to communicate effectively. The way a speaker reacts to the other participant depends upon the speaker's own…

  13. Processing subject-verb agreement in a second language depends on proficiency

    PubMed Central

    Hoshino, Noriko; Dussias, Paola E.; Kroll, Judith F.

    2010-01-01

    Subject-verb agreement is a computation that is often difficult to execute perfectly in the first language (L1) and even more difficult to produce skillfully in a second language (L2). In this study, we examined the way in which bilingual speakers complete sentence fragments in a manner that reflects access to both grammatical and conceptual number. In two experiments, we show that bilingual speakers are sensitive to both grammatical and conceptual number in the L1 and grammatical number agreement in the L2. However, only highly proficient bilinguals are also sensitive to conceptual number in the L2. The results suggest that the extent to which speakers are able to exploit conceptual information during speech planning depends on the level of language proficiency. PMID:20640178

  14. Multilevel Analysis in Analyzing Speech Data

    ERIC Educational Resources Information Center

    Guddattu, Vasudeva; Krishna, Y.

    2011-01-01

    The speech produced by human vocal tract is a complex acoustic signal, with diverse applications in phonetics, speech synthesis, automatic speech recognition, speaker identification, communication aids, speech pathology, speech perception, machine translation, hearing research, rehabilitation and assessment of communication disorders and many…

  15. Syntactic learning by mere exposure - An ERP study in adult learners

    PubMed Central

    Mueller, Jutta L; Oberecker, Regine; Friederici, Angela D

    2009-01-01

    Background Artificial language studies have revealed the remarkable ability of humans to extract syntactic structures from a continuous sound stream by mere exposure. However, it remains unclear whether the processes acquired in such tasks are comparable to those applied during normal language processing. The present study compares the ERPs to auditory processing of simple Italian sentences in native and non-native speakers after brief exposure to Italian sentences of a similar structure. The sentences contained a non-adjacent dependency between an auxiliary and the morphologically marked suffix of the verb. Participants were presented four alternating learning and testing phases. During learning phases only correct sentences were presented while during testing phases 50 percent of the sentences contained a grammatical violation. Results The non-native speakers successfully learned the dependency and displayed an N400-like negativity and a subsequent anteriorily distributed positivity in response to rule violations. The native Italian group showed an N400 followed by a P600 effect. Conclusion The presence of the P600 suggests that native speakers applied a grammatical rule. In contrast, non-native speakers appeared to use a lexical form-based processing strategy. Thus, the processing mechanisms acquired in the language learning task were only partly comparable to those applied by competent native speakers. PMID:19640301

  16. Syntactic learning by mere exposure--an ERP study in adult learners.

    PubMed

    Mueller, Jutta L; Oberecker, Regine; Friederici, Angela D

    2009-07-29

    Artificial language studies have revealed the remarkable ability of humans to extract syntactic structures from a continuous sound stream by mere exposure. However, it remains unclear whether the processes acquired in such tasks are comparable to those applied during normal language processing. The present study compares the ERPs to auditory processing of simple Italian sentences in native and non-native speakers after brief exposure to Italian sentences of a similar structure. The sentences contained a non-adjacent dependency between an auxiliary and the morphologically marked suffix of the verb. Participants were presented four alternating learning and testing phases. During learning phases only correct sentences were presented while during testing phases 50 percent of the sentences contained a grammatical violation. The non-native speakers successfully learned the dependency and displayed an N400-like negativity and a subsequent anteriorily distributed positivity in response to rule violations. The native Italian group showed an N400 followed by a P600 effect. The presence of the P600 suggests that native speakers applied a grammatical rule. In contrast, non-native speakers appeared to use a lexical form-based processing strategy. Thus, the processing mechanisms acquired in the language learning task were only partly comparable to those applied by competent native speakers.

  17. A cross-language study of perception of lexical stress in English.

    PubMed

    Yu, Vickie Y; Andruski, Jean E

    2010-08-01

    This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The results show that both language groups were able to identify and discriminate stress patterns. Lexical and segmental information affected the English and Chinese speakers in varying degrees. English and Chinese speakers showed different response patterns to trochaic vs. iambic stress across the three types of stimuli. An acoustic analysis revealed that two language groups used different acoustic cues to process lexical stress. The findings suggest that the different degrees of lexical and segmental effects can be explained by language background, which in turn supports the hypothesis that language background affects the perception of lexical stress in English.

  18. Arctic Visiting Speakers Series (AVS)

    NASA Astrophysics Data System (ADS)

    Fox, S. E.; Griswold, J.

    2011-12-01

    The Arctic Visiting Speakers (AVS) Series funds researchers and other arctic experts to travel and share their knowledge in communities where they might not otherwise connect. Speakers cover a wide range of arctic research topics and can address a variety of audiences including K-12 students, graduate and undergraduate students, and the general public. Host applications are accepted on an on-going basis, depending on funding availability. Applications need to be submitted at least 1 month prior to the expected tour dates. Interested hosts can choose speakers from an online Speakers Bureau or invite a speaker of their choice. Preference is given to individuals and organizations to host speakers that reach a broad audience and the general public. AVS tours are encouraged to span several days, allowing ample time for interactions with faculty, students, local media, and community members. Applications for both domestic and international visits will be considered. Applications for international visits should involve participation of more than one host organization and must include either a US-based speaker or a US-based organization. This is a small but important program that educates the public about Arctic issues. There have been 27 tours since 2007 that have impacted communities across the globe including: Gatineau, Quebec Canada; St. Petersburg, Russia; Piscataway, New Jersey; Cordova, Alaska; Nuuk, Greenland; Elizabethtown, Pennsylvania; Oslo, Norway; Inari, Finland; Borgarnes, Iceland; San Francisco, California and Wolcott, Vermont to name a few. Tours have included lectures to K-12 schools, college and university students, tribal organizations, Boy Scout troops, science center and museum patrons, and the general public. There are approximately 300 attendees enjoying each AVS tour, roughly 4100 people have been reached since 2007. The expectations for each tour are extremely manageable. Hosts must submit a schedule of events and a tour summary to be posted online. Hosts must acknowledge the National Science Foundation Office of Polar Programs and ARCUS in all promotional materials. Host agrees to send ARCUS photographs, fliers, and if possible a video of the main lecture. Host and speaker agree to collect data on the number of attendees in each audience to submit as part of a post-tour evaluation. The grants can generally cover all the expenses of a tour, depending on the location. A maximum of 2,000 will be provided for the travel related expenses of a speaker on a domestic visit. A maxiμm of 2,500 will be provided for the travel related expenses of a speaker on an international visit. Each speaker will receive an honorarium of $300.

  19. Brain Plasticity in Speech Training in Native English Speakers Learning Mandarin Tones

    NASA Astrophysics Data System (ADS)

    Heinzen, Christina Carolyn

    The current study employed behavioral and event-related potential (ERP) measures to investigate brain plasticity associated with second-language (L2) phonetic learning based on an adaptive computer training program. The program utilized the acoustic characteristics of Infant-Directed Speech (IDS) to train monolingual American English-speaking listeners to perceive Mandarin lexical tones. Behavioral identification and discrimination tasks were conducted using naturally recorded speech, carefully controlled synthetic speech, and non-speech control stimuli. The ERP experiments were conducted with selected synthetic speech stimuli in a passive listening oddball paradigm. Identical pre- and post- tests were administered on nine adult listeners, who completed two-to-three hours of perceptual training. The perceptual training sessions used pair-wise lexical tone identification, and progressed through seven levels of difficulty for each tone pair. The levels of difficulty included progression in speaker variability from one to four speakers and progression through four levels of acoustic exaggeration of duration, pitch range, and pitch contour. Behavioral results for the natural speech stimuli revealed significant training-induced improvement in identification of Tones 1, 3, and 4. Improvements in identification of Tone 4 generalized to novel stimuli as well. Additionally, comparison between discrimination of across-category and within-category stimulus pairs taken from a synthetic continuum revealed a training-induced shift toward more native-like categorical perception of the Mandarin lexical tones. Analysis of the Mismatch Negativity (MMN) responses in the ERP data revealed increased amplitude and decreased latency for pre-attentive processing of across-category discrimination as a result of training. There were also laterality changes in the MMN responses to the non-speech control stimuli, which could reflect reallocation of brain resources in processing pitch patterns for the across-category lexical tone contrast. Overall, the results support the use of IDS characteristics in training non-native speech contrasts and provide impetus for further research.

  20. Can you hear my age? Influences of speech rate and speech spontaneity on estimation of speaker age

    PubMed Central

    Skoog Waller, Sara; Eriksson, Mårten; Sörqvist, Patrik

    2015-01-01

    Cognitive hearing science is mainly about the study of how cognitive factors contribute to speech comprehension, but cognitive factors also partake in speech processing to infer non-linguistic information from speech signals, such as the intentions of the talker and the speaker’s age. Here, we report two experiments on age estimation by “naïve” listeners. The aim was to study how speech rate influences estimation of speaker age by comparing the speakers’ natural speech rate with increased or decreased speech rate. In Experiment 1, listeners were presented with audio samples of read speech from three different speaker age groups (young, middle aged, and old adults). They estimated the speakers as younger when speech rate was faster than normal and as older when speech rate was slower than normal. This speech rate effect was slightly greater in magnitude for older (60–65 years) speakers in comparison with younger (20–25 years) speakers, suggesting that speech rate may gain greater importance as a perceptual age cue with increased speaker age. This pattern was more pronounced in Experiment 2, in which listeners estimated age from spontaneous speech. Faster speech rate was associated with lower age estimates, but only for older and middle aged (40–45 years) speakers. Taken together, speakers of all age groups were estimated as older when speech rate decreased, except for the youngest speakers in Experiment 2. The absence of a linear speech rate effect in estimates of younger speakers, for spontaneous speech, implies that listeners use different age estimation strategies or cues (possibly vocabulary) depending on the age of the speaker and the spontaneity of the speech. Potential implications for forensic investigations and other applied domains are discussed. PMID:26236259

  1. A Functional Imaging Study of Self-Regulatory Capacities in Persons Who Stutter

    PubMed Central

    Liu, Jie; Wang, Zhishun; Huo, Yuankai; Davidson, Stephanie M.; Klahr, Kristin; Herder, Carl L.; Sikora, Chamonix O.; Peterson, Bradley S.

    2014-01-01

    Developmental stuttering is a disorder of speech fluency with an unknown pathogenesis. The similarity of its phenotype and natural history with other childhood neuropsychiatric disorders of frontostriatal pathology suggests that stuttering may have a closely related pathogenesis. We investigated in this study the potential involvement of frontostriatal circuits in developmental stuttering. We collected functional magnetic resonance imaging data from 46 persons with stuttering and 52 fluent controls during performance of the Simon Spatial Incompatibility Task. We examined differences between the two groups of blood-oxygen-level-dependent activation associated with two neural processes, the resolution of cognitive conflict and the context-dependent adaptation to changes in conflict. Stuttering speakers and controls did not differ on behavioral performance on the task. In the presence of conflict-laden stimuli, however, stuttering speakers activated more strongly the cingulate cortex, left anterior prefrontal cortex, right medial frontal cortex, left supplementary motor area, right caudate nucleus, and left parietal cortex. The magnitude of activation in the anterior cingulate cortex correlated inversely in stuttering speakers with symptom severity. Stuttering speakers also showed blunted activation during context-dependent adaptation in the left dorsolateral prefrontal cortex, a brain region that mediates cross-temporal contingencies. Frontostriatal hyper-responsivity to conflict resembles prior findings in other disorders of frontostriatal pathology, and therefore likely represents a general mechanism supporting functional compensation for an underlying inefficiency of neural processing in these circuits. The reduced activation of dorsolateral prefrontal cortex likely represents the inadequate readiness of stuttering speakers to execute a sequence of motor responses. PMID:24587104

  2. ``The perceptual bases of speaker identity'' revisited

    NASA Astrophysics Data System (ADS)

    Voiers, William D.

    2003-10-01

    A series of experiments begun 40 years ago [W. D. Voiers, J. Acoust. Soc. Am. 36, 1065-1073 (1964)] was concerned with identifying the perceived voice traits (PVTs) on which human recognition of voices depends. It culminated with the development of a voice taxonomy based on 20 PVTs and a set of highly reliable rating scales for classifying voices with respect to those PVTs. The development of a perceptual voice taxonomy was motivated by the need for a practical method of evaluating speaker recognizability in voice communication systems. The Diagnostic Speaker Recognition Test (DSRT) evaluates the effects of systems on speaker recognizability as reflected in changes in the inter-listener reliability of voice ratings on the 20 PVTs. The DSRT thus provides a qualitative, as well as quantitative, evaluation of the effects of a system on speaker recognizability. A fringe benefit of this project is PVT rating data for a sample of 680 voices. [Work partially supported by USAFRL.

  3. The prevalence of synaesthesia depends on early language learning.

    PubMed

    Watson, Marcus R; Chromý, Jan; Crawford, Lyle; Eagleman, David M; Enns, James T; Akins, Kathleen A

    2017-02-01

    According to one theory, synaesthesia develops, or is preserved, because it helps children learn. If so, it should be more common among adults who faced greater childhood learning challenges. In the largest survey of synaesthesia to date, the incidence of synaesthesia was compared among native speakers of languages with transparent (easier) and opaque (more difficult) orthographies. Contrary to our prediction, native speakers of Czech (transparent) were more likely to be synaesthetes than native speakers of English (opaque). However, exploratory analyses suggested that this was because more Czechs learned non-native second languages, which was strongly associated with synaesthesia, consistent with the learning hypothesis. Furthermore, the incidence of synaesthesia among speakers of opaque languages was double that among speakers of transparent languages other than Czech, also consistent with the learning hypothesis. These findings contribute to an emerging understanding of synaesthetic development as a complex and lengthy process with multiple causal influences. Copyright © 2016. Published by Elsevier Inc.

  4. English as a Second Language for Adults. Discussion Paper 04/79.

    ERIC Educational Resources Information Center

    Selman, Mary

    Because of a growing community of non-English speakers in British Columbia, there is an urgent need for effective teaching programs in English as a Second Language (ESL). Non-English speakers frequently face educational deprivation, difficulty in using their skills and in finding employment, dependency on government assistance, and, if children,…

  5. Towards the identification of Idiopathic Parkinson’s Disease from the speech. New articulatory kinetic biomarkers

    PubMed Central

    Shattuck-Hufnagel, S.; Choi, J. Y.; Moro-Velázquez, L.; Gómez-García, J. A.

    2017-01-01

    Although a large amount of acoustic indicators have already been proposed in the literature to evaluate the hypokinetic dysarthria of people with Parkinson’s Disease, the goal of this work is to identify and interpret new reliable and complementary articulatory biomarkers that could be applied to predict/evaluate Parkinson’s Disease from a diadochokinetic test, contributing to the possibility of a further multidimensional analysis of the speech of parkinsonian patients. The new biomarkers proposed are based on the kinetic behaviour of the envelope trace, which is directly linked with the articulatory dysfunctions introduced by the disease since the early stages. The interest of these new articulatory indicators stands on their easiness of identification and interpretation, and their potential to be translated into computer based automatic methods to screen the disease from the speech. Throughout this paper, the accuracy provided by these acoustic kinetic biomarkers is compared with the one obtained with a baseline system based on speaker identification techniques. Results show accuracies around 85% that are in line with those obtained with the complex state of the art speaker recognition techniques, but with an easier physical interpretation, which open the possibility to be transferred to a clinical setting. PMID:29240814

  6. The object of my desire: Five-year-olds rapidly reason about a speaker's desire during referential communication.

    PubMed

    San Juan, Valerie; Chambers, Craig G; Berman, Jared; Humphry, Chelsea; Graham, Susan A

    2017-10-01

    Two experiments examined whether 5-year-olds draw inferences about desire outcomes that constrain their online interpretation of an utterance. Children were informed of a speaker's positive (Experiment 1) or negative (Experiment 2) desire to receive a specific toy as a gift before hearing a referentially ambiguous statement ("That's my present") spoken with either a happy or sad voice. After hearing the speaker express a positive desire, children (N=24) showed an implicit (i.e., eye gaze) and explicit ability to predict reference to the desired object when the speaker sounded happy, but they showed only implicit consideration of the alternate object when the speaker sounded sad. After hearing the speaker express a negative desire, children (N=24) used only happy prosodic cues to predict the intended referent of the statement. Taken together, the findings indicate that the efficiency with which 5-year-olds integrate desire reasoning with language processing depends on the emotional valence of the speaker's voice but not on the type of desire representations (i.e., positive vs. negative) that children must reason about online. Copyright © 2017 Elsevier Inc. All rights reserved.

  7. The non-trusty clown attack on model-based speaker recognition systems

    NASA Astrophysics Data System (ADS)

    Farrokh Baroughi, Alireza; Craver, Scott

    2015-03-01

    Biometric detectors for speaker identification commonly employ a statistical model for a subject's voice, such as a Gaussian Mixture Model, that combines multiple means to improve detector performance. This allows a malicious insider to amend or append a component of a subject's statistical model so that a detector behaves normally except under a carefully engineered circumstance. This allows an attacker to force a misclassification of his or her voice only when desired, by smuggling data into a database far in advance of an attack. Note that the attack is possible if attacker has access to database even for a limited time to modify victim's model. We exhibit such an attack on a speaker identification, in which an attacker can force a misclassification by speaking in an unusual voice, and replacing the least weighted component of victim's model by the most weighted competent of the unusual voice of the attacker's model. The reason attacker make his or her voice unusual during the attack is because his or her normal voice model can be in database, and by attacking with unusual voice, the attacker has the option to be recognized as himself or herself when talking normally or as the victim when talking in the unusual manner. By attaching an appropriately weighted vector to a victim's model, we can impersonate all users in our simulations, while avoiding unwanted false rejections.

  8. Application of the wavelet transform for speech processing

    NASA Technical Reports Server (NTRS)

    Maes, Stephane

    1994-01-01

    Speaker identification and word spotting will shortly play a key role in space applications. An approach based on the wavelet transform is presented that, in the context of the 'modulation model,' enables extraction of speech features which are used as input for the classification process.

  9. Individual differences in selective attention predict speech identification at a cocktail party

    PubMed Central

    Oberfeld, Daniel; Klöckner-Nowotny, Felicitas

    2016-01-01

    Listeners with normal hearing show considerable individual differences in speech understanding when competing speakers are present, as in a crowded restaurant. Here, we show that one source of this variance are individual differences in the ability to focus selective attention on a target stimulus in the presence of distractors. In 50 young normal-hearing listeners, the performance in tasks measuring auditory and visual selective attention was associated with sentence identification in the presence of spatially separated competing speakers. Together, the measures of selective attention explained a similar proportion of variance as the binaural sensitivity for the acoustic temporal fine structure. Working memory span, age, and audiometric thresholds showed no significant association with speech understanding. These results suggest that a reduced ability to focus attention on a target is one reason why some listeners with normal hearing sensitivity have difficulty communicating in situations with background noise. DOI: http://dx.doi.org/10.7554/eLife.16747.001 PMID:27580272

  10. Euclidean Distances as measures of speaker similarity including identical twin pairs: A forensic investigation using source and filter voice characteristics.

    PubMed

    San Segundo, Eugenia; Tsanas, Athanasios; Gómez-Vilda, Pedro

    2017-01-01

    There is a growing consensus that hybrid approaches are necessary for successful speaker characterization in Forensic Speaker Comparison (FSC); hence this study explores the forensic potential of voice features combining source and filter characteristics. The former relate to the action of the vocal folds while the latter reflect the geometry of the speaker's vocal tract. This set of features have been extracted from pause fillers, which are long enough for robust feature estimation while spontaneous enough to be extracted from voice samples in real forensic casework. Speaker similarity was measured using standardized Euclidean Distances (ED) between pairs of speakers: 54 different-speaker (DS) comparisons, 54 same-speaker (SS) comparisons and 12 comparisons between monozygotic twins (MZ). Results revealed that the differences between DS and SS comparisons were significant in both high quality and telephone-filtered recordings, with no false rejections and limited false acceptances; this finding suggests that this set of voice features is highly speaker-dependent and therefore forensically useful. Mean ED for MZ pairs lies between the average ED for SS comparisons and DS comparisons, as expected according to the literature on twin voices. Specific cases of MZ speakers with very high ED (i.e. strong dissimilarity) are discussed in the context of sociophonetic and twin studies. A preliminary simplification of the Vocal Profile Analysis (VPA) Scheme is proposed, which enables the quantification of voice quality features in the perceptual assessment of speaker similarity, and allows for the calculation of perceptual-acoustic correlations. The adequacy of z-score normalization for this study is also discussed, as well as the relevance of heat maps for detecting the so-called phantoms in recent approaches to the biometric menagerie. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  11. Differential use of temporal cues to the /s/-/z/ contrast by native and non-native speakers of English.

    PubMed

    Flege, J E; Hillenbrand, J

    1986-02-01

    This study examined the effect of linguistic experience on perception of the English /s/-/z/ contrast in word-final position. The durations of the periodic ("vowel") and aperiodic ("fricative") portions of stimuli, ranging from peas to peace, were varied in a 5 X 5 factorial design. Forced-choice identification judgments were elicited from two groups of native speakers of American English differing in dialect, and from two groups each of native speakers of French, Swedish, and Finnish differing in English-language experience. The results suggested that the non-native subjects used cues established for the perception of phonetic contrasts in their native language to identify fricatives as /s/ or /z/. Lengthening vowel duration increased /z/ judgments in all eight subject groups, although the effect was smaller for native speakers of French than for native speakers of the other languages. Shortening fricative duration, on the other hand, significantly decreased /z/ judgments only by the English and French subjects. It did not influence voicing judgments by the Swedish and Finnish subjects, even those who had lived for a year or more in an English-speaking environment. These findings raise the question of whether adults who learn a foreign language can acquire the ability to integrate multiple acoustic cues to a phonetic contrast which does not exist in their native language.

  12. Proficiency and Working Memory Based Explanations for Nonnative Speakers' Sensitivity to Agreement in Sentence Processing

    ERIC Educational Resources Information Center

    Coughlin, Caitlin E.; Tremblay, Annie

    2013-01-01

    This study examines the roles of proficiency and working memory (WM) capacity in second-/foreign-language (L2) learners' processing of agreement morphology. It investigates the processing of grammatical and ungrammatical short- and long-distance number agreement dependencies by native English speakers at two proficiencies in French, and the…

  13. A Cross-Language Study of Acoustic Predictors of Speech Intelligibility in Individuals With Parkinson's Disease

    PubMed Central

    Choi, Yaelin

    2017-01-01

    Purpose The present study aimed to compare acoustic models of speech intelligibility in individuals with the same disease (Parkinson's disease [PD]) and presumably similar underlying neuropathologies but with different native languages (American English [AE] and Korean). Method A total of 48 speakers from the 4 speaker groups (AE speakers with PD, Korean speakers with PD, healthy English speakers, and healthy Korean speakers) were asked to read a paragraph in their native languages. Four acoustic variables were analyzed: acoustic vowel space, voice onset time contrast scores, normalized pairwise variability index, and articulation rate. Speech intelligibility scores were obtained from scaled estimates of sentences extracted from the paragraph. Results The findings indicated that the multiple regression models of speech intelligibility were different in Korean and AE, even with the same set of predictor variables and with speakers matched on speech intelligibility across languages. Analysis of the descriptive data for the acoustic variables showed the expected compression of the vowel space in speakers with PD in both languages, lower normalized pairwise variability index scores in Korean compared with AE, and no differences within or across language in articulation rate. Conclusions The results indicate that the basis of an intelligibility deficit in dysarthria is likely to depend on the native language of the speaker and listener. Additional research is required to explore other potential predictor variables, as well as additional language comparisons to pursue cross-linguistic considerations in classification and diagnosis of dysarthria types. PMID:28821018

  14. A study of voice production characteristics of astronuat speech during Apollo 11 for speaker modeling in space.

    PubMed

    Yu, Chengzhu; Hansen, John H L

    2017-03-01

    Human physiology has evolved to accommodate environmental conditions, including temperature, pressure, and air chemistry unique to Earth. However, the environment in space varies significantly compared to that on Earth and, therefore, variability is expected in astronauts' speech production mechanism. In this study, the variations of astronaut voice characteristics during the NASA Apollo 11 mission are analyzed. Specifically, acoustical features such as fundamental frequency and phoneme formant structure that are closely related to the speech production system are studied. For a further understanding of astronauts' vocal tract spectrum variation in space, a maximum likelihood frequency warping based analysis is proposed to detect the vocal tract spectrum displacement during space conditions. The results from fundamental frequency, formant structure, as well as vocal spectrum displacement indicate that astronauts change their speech production mechanism when in space. Moreover, the experimental results for astronaut voice identification tasks indicate that current speaker recognition solutions are highly vulnerable to astronaut voice production variations in space conditions. Future recommendations from this study suggest that successful applications of speaker recognition during extended space missions require robust speaker modeling techniques that could effectively adapt to voice production variation caused by diverse space conditions.

  15. Noise-robust speech triage.

    PubMed

    Bartos, Anthony L; Cipr, Tomas; Nelson, Douglas J; Schwarz, Petr; Banowetz, John; Jerabek, Ladislav

    2018-04-01

    A method is presented in which conventional speech algorithms are applied, with no modifications, to improve their performance in extremely noisy environments. It has been demonstrated that, for eigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice of signal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependent model was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performance was optimized when the SNR of the testing and training data were close or identical. In this current effort multiple i-vector algorithms were used, greatly improving both processing throughput and equal error rate classification accuracy. Using identical approaches in the same noisy environment, performance of SID, language identification, gender identification, and diarization were significantly improved. A critical factor in this improvement is speech activity detection (SAD) that performs reliably in extremely noisy environments, where the speech itself is barely audible. To optimize SAD operation at all SNR levels, two algorithms were employed. The first maximized detection probability at low levels (-10 dB ≤ SNR < +10 dB) using just the voiced speech envelope, and the second exploited features extracted from the original speech to improve overall accuracy at higher quality levels (SNR ≥ +10 dB).

  16. Visual abilities are important for auditory-only speech recognition: evidence from autism spectrum disorder.

    PubMed

    Schelinski, Stefanie; Riedel, Philipp; von Kriegstein, Katharina

    2014-12-01

    In auditory-only conditions, for example when we listen to someone on the phone, it is essential to fast and accurately recognize what is said (speech recognition). Previous studies have shown that speech recognition performance in auditory-only conditions is better if the speaker is known not only by voice, but also by face. Here, we tested the hypothesis that such an improvement in auditory-only speech recognition depends on the ability to lip-read. To test this we recruited a group of adults with autism spectrum disorder (ASD), a condition associated with difficulties in lip-reading, and typically developed controls. All participants were trained to identify six speakers by name and voice. Three speakers were learned by a video showing their face and three others were learned in a matched control condition without face. After training, participants performed an auditory-only speech recognition test that consisted of sentences spoken by the trained speakers. As a control condition, the test also included speaker identity recognition on the same auditory material. The results showed that, in the control group, performance in speech recognition was improved for speakers known by face in comparison to speakers learned in the matched control condition without face. The ASD group lacked such a performance benefit. For the ASD group auditory-only speech recognition was even worse for speakers known by face compared to speakers not known by face. In speaker identity recognition, the ASD group performed worse than the control group independent of whether the speakers were learned with or without face. Two additional visual experiments showed that the ASD group performed worse in lip-reading whereas face identity recognition was within the normal range. The findings support the view that auditory-only communication involves specific visual mechanisms. Further, they indicate that in ASD, speaker-specific dynamic visual information is not available to optimize auditory-only speech recognition. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. An Event-Related Potential (ERP) Investigation of Filler-Gap Processing in Native and Second Language Speakers

    ERIC Educational Resources Information Center

    Dallas, Andrea; DeDe, Gayle; Nicol, Janet

    2013-01-01

    The current study employed a neuro-imaging technique, Event-Related Potentials (ERP), to investigate real-time processing of sentences containing filler-gap dependencies by late-learning speakers of English as a second language (L2) with a Chinese native language background. An individual differences approach was also taken to examine the role of…

  18. Use of Speaker Intent and Grammatical Cues in Fast-Mapping by Adolescents with Down Syndrome

    ERIC Educational Resources Information Center

    McDuffie, Andrea S.; Sindberg, Heidi A.; Hesketh, Linda J.; Chapman, Robin S.

    2007-01-01

    Purpose: The authors asked whether adolescents with Down syndrome (DS) could fast-map novel nouns and verbs when word learning depended on using the speaker's pragmatic or syntactic cues. Compared with typically developing (TD) comparison children, the authors predicted that syntactic cues would prove harder for the group with DS to use and that…

  19. Utterance selection model of language change

    NASA Astrophysics Data System (ADS)

    Baxter, G. J.; Blythe, R. A.; Croft, W.; McKane, A. J.

    2006-04-01

    We present a mathematical formulation of a theory of language change. The theory is evolutionary in nature and has close analogies with theories of population genetics. The mathematical structure we construct similarly has correspondences with the Fisher-Wright model of population genetics, but there are significant differences. The continuous time formulation of the model is expressed in terms of a Fokker-Planck equation. This equation is exactly soluble in the case of a single speaker and can be investigated analytically in the case of multiple speakers who communicate equally with all other speakers and give their utterances equal weight. Whilst the stationary properties of this system have much in common with the single-speaker case, time-dependent properties are richer. In the particular case where linguistic forms can become extinct, we find that the presence of many speakers causes a two-stage relaxation, the first being a common marginal distribution that persists for a long time as a consequence of ultimate extinction being due to rare fluctuations.

  20. Korean Word Frequency and Commonality Study for Augmentative and Alternative Communication

    ERIC Educational Resources Information Center

    Shin, Sangeun; Hill, Katya

    2016-01-01

    Background: Vocabulary frequency results have been reported to design and support augmentative and alternative communication (AAC) interventions. A few studies exist for adult speakers and for other natural languages. With the increasing demand on AAC treatment for Korean adults, identification of high-frequency or core vocabulary (CV) becomes…

  1. A Report by the Air Pollution Committee

    ERIC Educational Resources Information Center

    Kirkpatrick, Lane

    1972-01-01

    Description of a Symposium on Air Resource Protection and the Environment,'' held at the 1972 Environmental Health Conference and Exposition. Reports included a mathematical model for predicting future levels of air pollution, evaluation and identification of transportation controls, and a panel discussion of points raised by the speakers. (LK)

  2. Myths and Political Rhetoric: Jimmy Carter Accepts the Nomination.

    ERIC Educational Resources Information Center

    Corso, Dianne M.

    Like other political speakers who have drawn on the personification, identification, and dramatic encounter images of mythology to pressure and persuade audiences, Jimmy Carter evoked the myths of the hero, the American Dream, and the ideal political process in his presidential nomination acceptance speech. By stressing his unknown status, his…

  3. Speech enhancement based on neural networks improves speech intelligibility in noise for cochlear implant users.

    PubMed

    Goehring, Tobias; Bolner, Federico; Monaghan, Jessica J M; van Dijk, Bas; Zarowski, Andrzej; Bleeck, Stefan

    2017-02-01

    Speech understanding in noisy environments is still one of the major challenges for cochlear implant (CI) users in everyday life. We evaluated a speech enhancement algorithm based on neural networks (NNSE) for improving speech intelligibility in noise for CI users. The algorithm decomposes the noisy speech signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimation of which frequency channels contain more perceptually important information (higher signal-to-noise ratio, SNR). This estimate is used to attenuate noise-dominated and retain speech-dominated CI channels for electrical stimulation, as in traditional n-of-m CI coding strategies. The proposed algorithm was evaluated by measuring the speech-in-noise performance of 14 CI users using three types of background noise. Two NNSE algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The NNSE algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for CI users while meeting the requirements of low computational complexity and processing delay for application in CI devices. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  4. Revisiting the role of language in spatial cognition: Categorical perception of spatial relations in English and Korean speakers.

    PubMed

    Holmes, Kevin J; Moty, Kelsey; Regier, Terry

    2017-12-01

    The spatial relation of support has been regarded as universally privileged in nonlinguistic cognition and immune to the influence of language. English, but not Korean, obligatorily distinguishes support from nonsupport via basic spatial terms. Despite this linguistic difference, previous research suggests that English and Korean speakers show comparable nonlinguistic sensitivity to the support/nonsupport distinction. Here, using a paradigm previously found to elicit cross-language differences in color discrimination, we provide evidence for a difference in sensitivity to support/nonsupport between native English speakers and native Korean speakers who were late English learners and tested in a context that privileged Korean. Whereas the former group showed categorical perception (CP) when discriminating spatial scenes capturing the support/nonsupport distinction, the latter did not. An additional group of native Korean speakers-relatively early English learners tested in an English-salient context-patterned with the native English speakers in showing CP for support/nonsupport. These findings suggest that obligatory marking of support/nonsupport in one's native language can affect nonlinguistic sensitivity to this distinction, contra earlier findings, but that such sensitivity may also depend on aspects of language background and the immediate linguistic context.

  5. Teaching the Order of Adjectives in the English Noun Phrase.

    ERIC Educational Resources Information Center

    Ney, James W.

    A number of studies on the order of adjectives in the English noun phrase are reviewed. Analysis of the studies and examples used in them indicates that almost any order of adjective seems to be possible depending on the intended meaning of the speaker or the situation in which the speaker frames an utterance. To see if in fact the ordering of…

  6. Gender Differences in the Recognition of Vocal Emotions

    PubMed Central

    Lausen, Adi; Schacht, Annekathrin

    2018-01-01

    The conflicting findings from the few studies conducted with regard to gender differences in the recognition of vocal expressions of emotion have left the exact nature of these differences unclear. Several investigators have argued that a comprehensive understanding of gender differences in vocal emotion recognition can only be achieved by replicating these studies while accounting for influential factors such as stimulus type, gender-balanced samples, number of encoders, decoders, and emotional categories. This study aimed to account for these factors by investigating whether emotion recognition from vocal expressions differs as a function of both listeners' and speakers' gender. A total of N = 290 participants were randomly and equally allocated to two groups. One group listened to words and pseudo-words, while the other group listened to sentences and affect bursts. Participants were asked to categorize the stimuli with respect to the expressed emotions in a fixed-choice response format. Overall, females were more accurate than males when decoding vocal emotions, however, when testing for specific emotions these differences were small in magnitude. Speakers' gender had a significant impact on how listeners' judged emotions from the voice. The group listening to words and pseudo-words had higher identification rates for emotions spoken by male than by female actors, whereas in the group listening to sentences and affect bursts the identification rates were higher when emotions were uttered by female than male actors. The mixed pattern for emotion-specific effects, however, indicates that, in the vocal channel, the reliability of emotion judgments is not systematically influenced by speakers' gender and the related stereotypes of emotional expressivity. Together, these results extend previous findings by showing effects of listeners' and speakers' gender on the recognition of vocal emotions. They stress the importance of distinguishing these factors to explain recognition ability in the processing of emotional prosody. PMID:29922202

  7. Hearing history influences voice gender perceptual performance in cochlear implant users.

    PubMed

    Kovačić, Damir; Balaban, Evan

    2010-12-01

    The study was carried out to assess the role that five hearing history variables (chronological age, age at onset of deafness, age of first cochlear implant [CI] activation, duration of CI use, and duration of known deafness) play in the ability of CI users to identify speaker gender. Forty-one juvenile CI users participated in two voice gender identification tasks. In a fixed, single-interval task, subjects listened to a single speech item from one of 20 adult male or 20 adult female speakers and had to identify speaker gender. In an adaptive speech-based voice gender discrimination task with the fundamental frequency difference between the voices as the adaptive parameter, subjects listened to a pair of speech items presented in sequential order, one of which was always spoken by an adult female and the other by an adult male. Subjects had to identify the speech item spoken by the female voice. Correlation and regression analyses between perceptual scores in the two tasks and the hearing history variables were performed. Subjects fell into three performance groups: (1) those who could distinguish voice gender in both tasks, (2) those who could distinguish voice gender in the adaptive but not the fixed task, and (3) those who could not distinguish voice gender in either task. Gender identification performance for single voices in the fixed task was significantly and negatively related to the duration of deafness before cochlear implantation (shorter deafness yielded better performance), whereas performance in the adaptive task was weakly but significantly related to age at first activation of the CI device, with earlier activations yielding better scores. The existence of a group of subjects able to perform adaptive discrimination but unable to identify the gender of singly presented voices demonstrates the potential dissociability of the skills required for these two tasks, suggesting that duration of deafness and age of cochlear implantation could have dissociable effects on the development of different skills required by CI users to identify speaker gender.

  8. A fundamental residue pitch perception bias for tone language speakers

    NASA Astrophysics Data System (ADS)

    Petitti, Elizabeth

    A complex tone composed of only higher-order harmonics typically elicits a pitch percept equivalent to the tone's missing fundamental frequency (f0). When judging the direction of residue pitch change between two such tones, however, listeners may have completely opposite perceptual experiences depending on whether they are biased to perceive changes based on the overall spectrum or the missing f0 (harmonic spacing). Individual differences in residue pitch change judgments are reliable and have been associated with musical experience and functional neuroanatomy. Tone languages put greater pitch processing demands on their speakers than non-tone languages, and we investigated whether these lifelong differences in linguistic pitch processing affect listeners' bias for residue pitch. We asked native tone language speakers and native English speakers to perform a pitch judgment task for two tones with missing fundamental frequencies. Given tone pairs with ambiguous pitch changes, listeners were asked to judge the direction of pitch change, where the direction of their response indicated whether they attended to the overall spectrum (exhibiting a spectral bias) or the missing f0 (exhibiting a fundamental bias). We found that tone language speakers are significantly more likely to perceive pitch changes based on the missing f0 than English speakers. These results suggest that tone-language speakers' privileged experience with linguistic pitch fundamentally tunes their basic auditory processing.

  9. The Language of Persuasion, English, Vocabulary: 5114.68.

    ERIC Educational Resources Information Center

    Groff, Irvin

    Developed for a high school quinmester unit on the language of persuasion, this guide provides the teacher with teaching strategies for a study of the speaker or writer as a persuader, the identification of the logical and psychological tools of persuasion, an examination of the levels of abstraction, the techniques of propaganda, and the…

  10. Perception of Melodic Contour and Intonation in Autism Spectrum Disorder: Evidence from Mandarin Speakers

    ERIC Educational Resources Information Center

    Jiang, Jun; Liu, Fang; Wan, Xuan; Jiang, Cunmei

    2015-01-01

    Tone language experience benefits pitch processing in music and speech for typically developing individuals. No known studies have examined pitch processing in individuals with autism who speak a tone language. This study investigated discrimination and identification of melodic contour and speech intonation in a group of Mandarin-speaking…

  11. A Cross-Language Study of Perception of Lexical Stress in English

    ERIC Educational Resources Information Center

    Yu, Vickie Y.; Andruski, Jean E.

    2010-01-01

    This study investigates the question of whether language background affects the perception of lexical stress in English. Thirty native English speakers and 30 native Chinese learners of English participated in a stressed-syllable identification task and a discrimination task involving three types of stimuli (real words/pseudowords/hums). The…

  12. Processing of Acoustic Cues in Lexical-Tone Identification by Pediatric Cochlear-Implant Recipients

    ERIC Educational Resources Information Center

    Peng, Shu-Chen; Lu, Hui-Ping; Lu, Nelson; Lin, Yung-Song; Deroche, Mickael L. D.; Chatterjee, Monita

    2017-01-01

    Purpose: The objective was to investigate acoustic cue processing in lexical-tone recognition by pediatric cochlear-implant (CI) recipients who are native Mandarin speakers. Method: Lexical-tone recognition was assessed in pediatric CI recipients and listeners with normal hearing (NH) in 2 tasks. In Task 1, participants identified naturally…

  13. Acquisition of L2 Vowel Duration in Japanese by Native English Speakers

    ERIC Educational Resources Information Center

    Okuno, Tomoko

    2013-01-01

    Research has demonstrated that focused perceptual training facilitates L2 learners' segmental perception and spoken word identification. Hardison (2003) and Motohashi-Saigo and Hardison (2009) found benefits of visual cues in the training for acquisition of L2 contrasts. The present study examined factors affecting perception and production…

  14. Electrophysiology of subject-verb agreement mediated by speakers' gender.

    PubMed

    Hanulíková, Adriana; Carreiras, Manuel

    2015-01-01

    An important property of speech is that it explicitly conveys features of a speaker's identity such as age or gender. This event-related potential (ERP) study examined the effects of social information provided by a speaker's gender, i.e., the conceptual representation of gender, on subject-verb agreement. Despite numerous studies on agreement, little is known about syntactic computations generated by speaker characteristics extracted from the acoustic signal. Slovak is well suited to investigate this issue because it is a morphologically rich language in which agreement involves features for number, case, and gender. Grammaticality of a sentence can be evaluated by checking a speaker's gender as conveyed by his/her voice. We examined how conceptual information about speaker gender, which is not syntactic but rather social and pragmatic in nature, is interpreted for the computation of agreement patterns. ERP responses to verbs disagreeing with the speaker's gender (e.g., a sentence including a masculine verbal inflection spoken by a female person 'the neighbors were upset because I (∗)stoleMASC plums') elicited a larger early posterior negativity compared to correct sentences. When the agreement was purely syntactic and did not depend on the speaker's gender, a disagreement between a formally marked subject and the verb inflection (e.g., the womanFEM (∗)stoleMASC plums) resulted in a larger P600 preceded by a larger anterior negativity compared to the control sentences. This result is in line with proposals according to which the recruitment of non-syntactic information such as the gender of the speaker results in N400-like effects, while formally marked syntactic features lead to structural integration as reflected in a LAN/P600 complex.

  15. Assimilation and accommodation patterns in ventral occipitotemporal cortex in learning a second writing system

    PubMed Central

    Nelson, Jessica R.; Liu, Ying; Fiez, Julie; Perfetti, Charles A.

    2017-01-01

    Using fMRI, we compared the patterns of fusiform activity produced by viewing English and Chinese for readers who were either English speakers learning Chinese, or Chinese-English bilinguals. The pattern of fusiform activity depended on both the writing system and the reader’s native language. Native Chinese speakers fluent in English recruited bilateral fusiform areas when viewing both Chinese and English. English speakers learning Chinese, however, used heavily left-lateralized fusiform regions when viewing English, but recruited an additional right fusiform region for viewing Chinese. Thus, English learners of Chinese show an accommodation pattern, in which the reading network accommodates the new writing system by adding neural resources that support its specific graphic requirements. Chinese speakers show an assimilation pattern, in which the reading network established for L1 includes procedures sufficient for the graphic demands of L2 without major change. PMID:18381767

  16. Coupled Electro-Magneto-Mechanical-Acoustic Analysis Method Developed by Using 2D Finite Element Method for Flat Panel Speaker Driven by Magnetostrictive-Material-Based Actuator

    NASA Astrophysics Data System (ADS)

    Yoo, Byungjin; Hirata, Katsuhiro; Oonishi, Atsurou

    In this study, a coupled analysis method for flat panel speakers driven by giant magnetostrictive material (GMM) based actuator was developed. The sound field produced by a flat panel speaker that is driven by a GMM actuator depends on the vibration of the flat panel, this vibration is a result of magnetostriction property of the GMM. In this case, to predict the sound pressure level (SPL) in the audio-frequency range, it is necessary to take into account not only the magnetostriction property of the GMM but also the effect of eddy current and the vibration characteristics of the actuator and the flat panel. In this paper, a coupled electromagnetic-structural-acoustic analysis method is presented; this method was developed by using the finite element method (FEM). This analysis method is used to predict the performance of a flat panel speaker in the audio-frequency range. The validity of the analysis method is verified by comparing with the measurement results of a prototype speaker.

  17. Robust Speaker Authentication Based on Combined Speech and Voiceprint Recognition

    NASA Astrophysics Data System (ADS)

    Malcangi, Mario

    2009-08-01

    Personal authentication is becoming increasingly important in many applications that have to protect proprietary data. Passwords and personal identification numbers (PINs) prove not to be robust enough to ensure that unauthorized people do not use them. Biometric authentication technology may offer a secure, convenient, accurate solution but sometimes fails due to its intrinsically fuzzy nature. This research aims to demonstrate that combining two basic speech processing methods, voiceprint identification and speech recognition, can provide a very high degree of robustness, especially if fuzzy decision logic is used.

  18. Reaching Spanish-speaking smokers online: a 10-year worldwide research program

    PubMed Central

    Muñoz, Ricardo Felipe; Chen, Ken; Bunge, Eduardo Liniers; Bravin, Julia Isabela; Shaughnessy, Elizabeth Annelly; Pérez-Stable, Eliseo Joaquín

    2014-01-01

    Objective To describe a 10-year proof-of-concept smoking cessation research program evaluating the reach of online health interventions throughout the Americas. Methods Recruitment occurred from 2002–2011, primarily using Google.com AdWords. Over 6 million smokers from the Americas entered keywords related to smoking cessation; 57 882 smokers (15 912 English speakers and 41 970 Spanish speakers) were recruited into online self-help automated intervention studies. To examine disparities in utilization of methods to quit smoking, cessation aids used by English speakers and Spanish speakers were compared. To determine whether online interventions reduce disparities, abstinence rates were also compared. Finally, the reach of the intervention was illustrated for three large Spanish-speaking countries of the Americas—Argentina, Mexico, and Peru—and the United States of America. Results Few participants had utilized other methods to stop smoking before coming to the Internet site; most reported using no previous smoking cessation aids: 69.2% of Spanish speakers versus 51.8% of English speakers (P < 0.01). The most used method was nicotine gum, 13.9%. Nicotine dependence levels were similar to those reported for in-person smoking cessation trials. Overall observed quit rate for English speakers was 38.1% and for Spanish speakers, 37.0%; quit rates in which participants with missing data were considered to be smoking were 11.1% and 10.6%, respectively. Neither comparison was significantly different. Conclusions The systematic use of evidence-based Internet interventions for health problems could have a broad impact throughout the Americas, at little or no cost to individuals or to ministries of health. PMID:25211569

  19. Attentional influences on functional mapping of speech sounds in human auditory cortex.

    PubMed

    Obleser, Jonas; Elbert, Thomas; Eulitz, Carsten

    2004-07-21

    The speech signal contains both information about phonological features such as place of articulation and non-phonological features such as speaker identity. These are different aspects of the 'what'-processing stream (speaker vs. speech content), and here we show that they can be further segregated as they may occur in parallel but within different neural substrates. Subjects listened to two different vowels, each spoken by two different speakers. During one block, they were asked to identify a given vowel irrespectively of the speaker (phonological categorization), while during the other block the speaker had to be identified irrespectively of the vowel (speaker categorization). Auditory evoked fields were recorded using 148-channel magnetoencephalography (MEG), and magnetic source imaging was obtained for 17 subjects. During phonological categorization, a vowel-dependent difference of N100m source location perpendicular to the main tonotopic gradient replicated previous findings. In speaker categorization, the relative mapping of vowels remained unchanged but sources were shifted towards more posterior and more superior locations. These results imply that the N100m reflects the extraction of abstract invariants from the speech signal. This part of the processing is accomplished in auditory areas anterior to AI, which are part of the auditory 'what' system. This network seems to include spatially separable modules for identifying the phonological information and for associating it with a particular speaker that are activated in synchrony but within different regions, suggesting that the 'what' processing can be more adequately modeled by a stream of parallel stages. The relative activation of the parallel processing stages can be modulated by attentional or task demands.

  20. Event identification by acoustic signature recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dress, W.B.; Kercel, S.W.

    1995-07-01

    Many events of interest to the security commnnity produce acoustic emissions that are, in principle, identifiable as to cause. Some obvious examples are gunshots, breaking glass, takeoffs and landings of small aircraft, vehicular engine noises, footsteps (high frequencies when on gravel, very low frequencies. when on soil), and voices (whispers to shouts). We are investigating wavelet-based methods to extract unique features of such events for classification and identification. We also discuss methods of classification and pattern recognition specifically tailored for acoustic signatures obtained by wavelet analysis. The paper is divided into three parts: completed work, work in progress, and futuremore » applications. The completed phase has led to the successful recognition of aircraft types on landing and takeoff. Both small aircraft (twin-engine turboprop) and large (commercial airliners) were included in the study. The project considered the design of a small, field-deployable, inexpensive device. The techniques developed during the aircraft identification phase were then adapted to a multispectral electromagnetic interference monitoring device now deployed in a nuclear power plant. This is a general-purpose wavelet analysis engine, spanning 14 octaves, and can be adapted for other specific tasks. Work in progress is focused on applying the methods previously developed to speaker identification. Some of the problems to be overcome include recognition of sounds as voice patterns and as distinct from possible background noises (e.g., music), as well as identification of the speaker from a short-duration voice sample. A generalization of the completed work and the work in progress is a device capable of classifying any number of acoustic events-particularly quasi-stationary events such as engine noises and voices and singular events such as gunshots and breaking glass. We will show examples of both kinds of events and discuss their recognition likelihood.« less

  1. Comparing headphone and speaker effects on simulated driving.

    PubMed

    Nelson, T M; Nilsson, T H

    1990-12-01

    Twelve persons drove for three hours in an automobile simulator while listening to music at sound level 63dB over stereo headphones during one session and from a dashboard speaker during another session. They were required to steer a mountain highway, maintain a certain indicated speed, shift gears, and respond to occasional hazards. Steering and speed control were dependent on visual cues. The need to shift and the hazards were indicated by sound and vibration effects. With the headphones, the driver's average reaction time for the most complex task presented--shifting gears--was about one-third second longer than with the speaker. The use of headphones did not delay the development of subjective fatigue.

  2. Contributing to the early detection of Rett syndrome: the potential role of auditory Gestalt perception.

    PubMed

    Marschik, Peter B; Einspieler, Christa; Sigafoos, Jeff

    2012-01-01

    To assess whether there are qualitatively deviant characteristics in the early vocalizations of children with Rett syndrome, we had 400 native Austrian-German speakers listen to audio recordings of vocalizations from typically developing girls and girls with Rett syndrome. The audio recordings were rated as (a) inconspicuous, (b) conspicuous or (c) not able to decide between (a) and (b). The results showed that participants were accurate in differentiating the vocalizations of typically developing children compared to children with Rett syndrome. However, the accuracy for rating verbal behaviors was dependent on the type of vocalization with greater accuracy for canonical babbling compared to cooing vocalizations. The results suggest a potential role for the use of rating child vocalizations for early detection of Rett syndrome. This is important because clinical criteria related to speech and language development remain important for early identification of Rett syndrome. Copyright © 2011 Elsevier Ltd. All rights reserved.

  3. Individual aptitude in Mandarin lexical tone perception predicts effectiveness of high-variability training

    PubMed Central

    Sadakata, Makiko; McQueen, James M.

    2014-01-01

    Although the high-variability training method can enhance learning of non-native speech categories, this can depend on individuals’ aptitude. The current study asked how general the effects of perceptual aptitude are by testing whether they occur with training materials spoken by native speakers and whether they depend on the nature of the to-be-learned material. Forty-five native Dutch listeners took part in a 5-day training procedure in which they identified bisyllabic Mandarin pseudowords (e.g., asa) pronounced with different lexical tone combinations. The training materials were presented to different groups of listeners at three levels of variability: low (many repetitions of a limited set of words recorded by a single speaker), medium (fewer repetitions of a more variable set of words recorded by three speakers), and high (similar to medium but with five speakers). Overall, variability did not influence learning performance, but this was due to an interaction with individuals’ perceptual aptitude: increasing variability hindered improvements in performance for low-aptitude perceivers while it helped improvements in performance for high-aptitude perceivers. These results show that the previously observed interaction between individuals’ aptitude and effects of degree of variability extends to natural tokens of Mandarin speech. This interaction was not found, however, in a closely matched study in which native Dutch listeners were trained on the Japanese geminate/singleton consonant contrast. This may indicate that the effectiveness of high-variability training depends not only on individuals’ aptitude in speech perception but also on the nature of the categories being acquired. PMID:25505434

  4. Performance of wavelet analysis and neural networks for pathological voices identification

    NASA Astrophysics Data System (ADS)

    Salhi, Lotfi; Talbi, Mourad; Abid, Sabeur; Cherif, Adnane

    2011-09-01

    Within the medical environment, diverse techniques exist to assess the state of the voice of the patient. The inspection technique is inconvenient for a number of reasons, such as its high cost, the duration of the inspection, and above all, the fact that it is an invasive technique. This study focuses on a robust, rapid and accurate system for automatic identification of pathological voices. This system employs non-invasive, non-expensive and fully automated method based on hybrid approach: wavelet transform analysis and neural network classifier. First, we present the results obtained in our previous study while using classic feature parameters. These results allow visual identification of pathological voices. Second, quantified parameters drifting from the wavelet analysis are proposed to characterise the speech sample. On the other hand, a system of multilayer neural networks (MNNs) has been developed which carries out the automatic detection of pathological voices. The developed method was evaluated using voice database composed of recorded voice samples (continuous speech) from normophonic or dysphonic speakers. The dysphonic speakers were patients of a National Hospital 'RABTA' of Tunis Tunisia and a University Hospital in Brussels, Belgium. Experimental results indicate a success rate ranging between 75% and 98.61% for discrimination of normal and pathological voices using the proposed parameters and neural network classifier. We also compared the average classification rate based on the MNN, Gaussian mixture model and support vector machines.

  5. Perception of Non-Native Consonant Length Contrast: The Role of Attention in Phonetic Processing

    ERIC Educational Resources Information Center

    Porretta, Vincent J.; Tucker, Benjamin V.

    2015-01-01

    The present investigation examines English speakers' ability to identify and discriminate non-native consonant length contrast. Three groups (L1 English No-Instruction, L1 English Instruction, and L1 Finnish control) performed a speeded forced-choice identification task and a speeded AX discrimination task on Finnish non-words (e.g.…

  6. The Effect of Pitch Peak Alignment on Sentence Type Identification in Russian

    ERIC Educational Resources Information Center

    Makarova, Veronika

    2007-01-01

    This paper reports the results of an experimental phonetic study examining pitch peak alignment in production and perception of three-syllable one-word sentences with phonetic rising-falling pitch movement by speakers of Russian. The first part of the study (Experiment 1) utilizes 22 one-word three-syllable utterances read by five female speakers…

  7. Differential Recognition of Pitch Patterns in Discrete and Gliding Stimuli in Congenital Amusia: Evidence from Mandarin Speakers

    ERIC Educational Resources Information Center

    Liu, Fang; Xu, Yi; Patel, Aniruddh D.; Francart, Tom; Jiang, Cunmei

    2012-01-01

    This study examined whether "melodic contour deafness" (insensitivity to the direction of pitch movement) in congenital amusia is associated with specific types of pitch patterns (discrete versus gliding pitches) or stimulus types (speech syllables versus complex tones). Thresholds for identification of pitch direction were obtained using discrete…

  8. Automatic Method of Pause Measurement for Normal and Dysarthric Speech

    ERIC Educational Resources Information Center

    Rosen, Kristin; Murdoch, Bruce; Folker, Joanne; Vogel, Adam; Cahill, Louise; Delatycki, Martin; Corben, Louise

    2010-01-01

    This study proposes an automatic method for the detection of pauses and identification of pause types in conversational speech for the purpose of measuring the effects of Friedreich's Ataxia (FRDA) on speech. Speech samples of [approximately] 3 minutes were recorded from 13 speakers with FRDA and 18 healthy controls. Pauses were measured from the…

  9. Effects of Audio-Visual Integration on the Detection of Masked Speech and Non-Speech Sounds

    ERIC Educational Resources Information Center

    Eramudugolla, Ranmalee; Henderson, Rachel; Mattingley, Jason B.

    2011-01-01

    Integration of simultaneous auditory and visual information about an event can enhance our ability to detect that event. This is particularly evident in the perception of speech, where the articulatory gestures of the speaker's lips and face can significantly improve the listener's detection and identification of the message, especially when that…

  10. Response Identification in the Extremely Low Frequency Region of an Electret Condenser Microphone

    PubMed Central

    Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin

    2011-01-01

    This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems. PMID:22346594

  11. Response identification in the extremely low frequency region of an electret condenser microphone.

    PubMed

    Jeng, Yih-Nen; Yang, Tzung-Ming; Lee, Shang-Yin

    2011-01-01

    This study shows that a small electret condenser microphone connected to a notebook or a personal computer (PC) has a prominent response in the extremely low frequency region in a specific environment. It confines most acoustic waves within a tiny air cell as follows. The air cell is constructed by drilling a small hole in a digital versatile disk (DVD) plate. A small speaker and an electret condenser microphone are attached to the two sides of the hole. Thus, the acoustic energy emitted by the speaker and reaching the microphone is strong enough to actuate the diaphragm of the latter. The experiments showed that, once small air leakages are allowed on the margin of the speaker, the microphone captured the signal in the range of 0.5 to 20 Hz. Moreover, by removing the plastic cover of the microphone and attaching the microphone head to the vibration surface, the low frequency signal can be effectively captured too. Two examples are included to show the convenience of applying the microphone to pick up the low frequency vibration information of practical systems.

  12. Attentional influences on functional mapping of speech sounds in human auditory cortex

    PubMed Central

    Obleser, Jonas; Elbert, Thomas; Eulitz, Carsten

    2004-01-01

    Background The speech signal contains both information about phonological features such as place of articulation and non-phonological features such as speaker identity. These are different aspects of the 'what'-processing stream (speaker vs. speech content), and here we show that they can be further segregated as they may occur in parallel but within different neural substrates. Subjects listened to two different vowels, each spoken by two different speakers. During one block, they were asked to identify a given vowel irrespectively of the speaker (phonological categorization), while during the other block the speaker had to be identified irrespectively of the vowel (speaker categorization). Auditory evoked fields were recorded using 148-channel magnetoencephalography (MEG), and magnetic source imaging was obtained for 17 subjects. Results During phonological categorization, a vowel-dependent difference of N100m source location perpendicular to the main tonotopic gradient replicated previous findings. In speaker categorization, the relative mapping of vowels remained unchanged but sources were shifted towards more posterior and more superior locations. Conclusions These results imply that the N100m reflects the extraction of abstract invariants from the speech signal. This part of the processing is accomplished in auditory areas anterior to AI, which are part of the auditory 'what' system. This network seems to include spatially separable modules for identifying the phonological information and for associating it with a particular speaker that are activated in synchrony but within different regions, suggesting that the 'what' processing can be more adequately modeled by a stream of parallel stages. The relative activation of the parallel processing stages can be modulated by attentional or task demands. PMID:15268765

  13. The Wildcat Corpus of Native- and Foreign-Accented English: Communicative Efficiency across Conversational Dyads with Varying Language Alignment Profiles

    PubMed Central

    Van Engen, Kristin J.; Baese-Berk, Melissa; Baker, Rachel E.; Choi, Arim; Kim, Midam; Bradlow, Ann R.

    2012-01-01

    This paper describes the development of the Wildcat Corpus of native- and foreign-accented English, a corpus containing scripted and spontaneous speech recordings from 24 native speakers of American English and 52 non-native speakers of English. The core element of this corpus is a set of spontaneous speech recordings, for which a new method of eliciting dialogue-based, laboratory-quality speech recordings was developed (the Diapix task). Dialogues between two native speakers of English, between two non-native speakers of English (with either shared or different L1s), and between one native and one non-native speaker of English are included and analyzed in terms of general measures of communicative efficiency. The overall finding was that pairs of native talkers were most efficient, followed by mixed native/non-native pairs and non-native pairs with shared L1. Non-native pairs with different L1s were least efficient. These results support the hypothesis that successful speech communication depends both on the alignment of talkers to the target language and on the alignment of talkers to one another in terms of native language background. PMID:21313992

  14. Articulatory settings of French-English bilingual speakers

    NASA Astrophysics Data System (ADS)

    Wilson, Ian

    2005-04-01

    The idea of a language-specific articulatory setting (AS), an underlying posture of the articulators during speech, has existed for centuries [Laver, Historiogr. Ling. 5 (1978)], but until recently it had eluded direct measurement. In an analysis of x-ray movies of French and English monolingual speakers, Gick et al. [Phonetica (in press)] link AS to inter-speech posture, allowing measurement of AS without interference from segmental targets during speech, and they give quantitative evidence showing AS to be language-specific. In the present study, ultrasound and Optotrak are used to investigate whether bilingual English-French speakers have two ASs, and whether this varies depending on the mode (monolingual or bilingual) these speakers are in. Specifically, for inter-speech posture of the lips, lip aperture and protrusion are measured using Optotrak. For inter-speech posture of the tongue, tongue root retraction, tongue body and tongue tip height are measured using optically-corrected ultrasound. Segmental context is balanced across the two languages ensuring that the sets of sounds before and after an inter-speech posture are consistent across languages. By testing bilingual speakers, vocal tract morphology across languages is controlled for. Results have implications for L2 acquisition, specifically the teaching and acquisition of pronunciation.

  15. On compensation of mismatched recording conditions in the Bayesian approach for forensic automatic speaker recognition.

    PubMed

    Botti, F; Alexander, A; Drygajlo, A

    2004-12-02

    This paper deals with a procedure to compensate for mismatched recording conditions in forensic speaker recognition, using a statistical score normalization. Bayesian interpretation of the evidence in forensic automatic speaker recognition depends on three sets of recordings in order to perform forensic casework: reference (R) and control (C) recordings of the suspect, and a potential population database (P), as well as a questioned recording (QR) . The requirement of similar recording conditions between suspect control database (C) and the questioned recording (QR) is often not satisfied in real forensic cases. The aim of this paper is to investigate a procedure of normalization of scores, which is based on an adaptation of the Test-normalization (T-norm) [2] technique used in the speaker verification domain, to compensate for the mismatch. Polyphone IPSC-02 database and ASPIC (an automatic speaker recognition system developed by EPFL and IPS-UNIL in Lausanne, Switzerland) were used in order to test the normalization procedure. Experimental results for three different recording condition scenarios are presented using Tippett plots and the effect of the compensation on the evaluation of the strength of the evidence is discussed.

  16. Discourse Factors Influencing Spatial Descriptions in English and German

    NASA Astrophysics Data System (ADS)

    Vorwerg, Constanze; Tenbrink, Thora

    The ways in which objects are referred to by using spatial language depend on many factors, including the spatial configuration and the discourse context. We present the results of a web experiment in which speakers were asked to either describe where a specified item was located in a picture containing several items, or which item was specified. Furthermore, conditions differed as to whether the first six configurations were specifically simple or specifically complex. Results show that speakers' spatial descriptions are more detailed if the question is where rather than which, mirroring the fact that contrasting the target item from the others in which tasks may not always require an equally detailed spatial description as in where tasks. Furthermore, speakers are influenced by the complexity of initial configurations in intricate ways: on the one hand, individual speakers tend to self-align with respect to their earlier linguistic strategies; however, also a contrast effect could be identified with respect to the usage of combined projective terms.

  17. The Mechanism of Speech Processing in Congenital Amusia: Evidence from Mandarin Speakers

    PubMed Central

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results. PMID:22347374

  18. The mechanism of speech processing in congenital amusia: evidence from Mandarin speakers.

    PubMed

    Liu, Fang; Jiang, Cunmei; Thompson, William Forde; Xu, Yi; Yang, Yufang; Stewart, Lauren

    2012-01-01

    Congenital amusia is a neuro-developmental disorder of pitch perception that causes severe problems with music processing but only subtle difficulties in speech processing. This study investigated speech processing in a group of Mandarin speakers with congenital amusia. Thirteen Mandarin amusics and thirteen matched controls participated in a set of tone and intonation perception tasks and two pitch threshold tasks. Compared with controls, amusics showed impaired performance on word discrimination in natural speech and their gliding tone analogs. They also performed worse than controls on discriminating gliding tone sequences derived from statements and questions, and showed elevated thresholds for pitch change detection and pitch direction discrimination. However, they performed as well as controls on word identification, and on statement-question identification and discrimination in natural speech. Overall, tasks that involved multiple acoustic cues to communicative meaning were not impacted by amusia. Only when the tasks relied mainly on pitch sensitivity did amusics show impaired performance compared to controls. These findings help explain why amusia only affects speech processing in subtle ways. Further studies on a larger sample of Mandarin amusics and on amusics of other language backgrounds are needed to consolidate these results.

  19. Perceptions of Patient-Provider Communication in Breast and Cervical Cancer-Related Care: A Qualitative Study of Low-Income English- and Spanish-Speaking Women

    PubMed Central

    Simon, Melissa A.; Ragas, Daiva M.; Nonzee, Narissa J.; Phisuthikul, Ava M.; Luu, Thanh Ha; Dong, XinQi

    2013-01-01

    To explore patient perceptions of patient-provider communication in breast and cervical cancer-related care among low-income English- and Spanish- speaking women, we examined communication barriers and facilitators reported by patients receiving care at safety net clinics. Participants were interviewed in English or Spanish after receiving an abnormal breast or cervical cancer screening test or cancer diagnosis. Following an inductive approach, interviews were coded and analyzed by the language spoken with providers and patient-provider language concordance status. Of 78 participants, 53% (n = 41) were English-speakers and 47% (n = 37) were Spanish-speakers. All English-speakers were language-concordant with providers. Of Spanish-speakers, 27% (n = 10) were Spanish-concordant; 38% (n = 14) were Spanish-discordant, requiring an interpreter; and 35% (n = 13) were Spanish mixed-concordant, experiencing both types of communication throughout the care continuum. English-speakers focused on communication barriers, and difficulty understanding jargon arose as a theme. Spanish-speakers emphasized communication facilitators related to Spanish language use. Themes among all Spanish-speaking sub-groups included appreciation for language support resources and preference for Spanish-speaking providers. Mixed-concordant participants accounted for the majority of Spanish-speakers who reported communication barriers. Our data suggest that, although perception of patient-provider communication may depend on the language spoken throughout the care continuum, jargon is lost when health information is communicated in Spanish. Further, the respective consistency of language concordance or interpretation may play a role in patient perception of patient-provider communication. PMID:23553683

  20. GMM-based speaker age and gender classification in Czech and Slovak

    NASA Astrophysics Data System (ADS)

    Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

    2017-01-01

    The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.

  1. The Directed Megaphone: A Theater Commander’s Means to Communicate His Vision and Intent

    DTIC Science & Technology

    1993-05-01

    commander must have oratory skills--both logos and pathos , as described by Aristotle . He must have a flair for the dramatic to embellish his message...went on to say that persuasion depends on three elements: logos -- The truth and logical validity of what is being argued. ethos -- The speakers...presents the message ( pathos ), and who the speaker is (ethoas). Many experts in the field of communications identify ethos as the most important persuasive

  2. Degraded Vowel Acoustics and the Perceptual Consequences in Dysarthria

    NASA Astrophysics Data System (ADS)

    Lansford, Kaitlin L.

    Distorted vowel production is a hallmark characteristic of dysarthric speech, irrespective of the underlying neurological condition or dysarthria diagnosis. A variety of acoustic metrics have been used to study the nature of vowel production deficits in dysarthria; however, not all demonstrate sensitivity to the exhibited deficits. Less attention has been paid to quantifying the vowel production deficits associated with the specific dysarthrias. Attempts to characterize the relationship between naturally degraded vowel production in dysarthria with overall intelligibility have met with mixed results, leading some to question the nature of this relationship. It has been suggested that aberrant vowel acoustics may be an index of overall severity of the impairment and not an "integral component" of the intelligibility deficit. A limitation of previous work detailing perceptual consequences of disordered vowel acoustics is that overall intelligibility, not vowel identification accuracy, has been the perceptual measure of interest. A series of three experiments were conducted to address the problems outlined herein. The goals of the first experiment were to identify subsets of vowel metrics that reliably distinguish speakers with dysarthria from non-disordered speakers and differentiate the dysarthria subtypes. Vowel metrics that capture vowel centralization and reduced spectral distinctiveness among vowels differentiated dysarthric from non-disordered speakers. Vowel metrics generally failed to differentiate speakers according to their dysarthria diagnosis. The second and third experiments were conducted to evaluate the relationship between degraded vowel acoustics and the resulting percept. In the second experiment, correlation and regression analyses revealed vowel metrics that capture vowel centralization and distinctiveness and movement of the second formant frequency were most predictive of vowel identification accuracy and overall intelligibility. The third experiment was conducted to evaluate the extent to which the nature of the acoustic degradation predicts the resulting percept. Results suggest distinctive vowel tokens are better identified and, likewise, better-identified tokens are more distinctive. Further, an above-chance level agreement between nature of vowel misclassification and misidentification errors was demonstrated for all vowels, suggesting degraded vowel acoustics are not merely an index of severity in dysarthria, but rather are an integral component of the resultant intelligibility disorder.

  3. Hyper-active gap filling

    PubMed Central

    Omaki, Akira; Lau, Ellen F.; Davidson White, Imogen; Dakan, Myles L.; Apple, Aaron; Phillips, Colin

    2015-01-01

    Much work has demonstrated that speakers of verb-final languages are able to construct rich syntactic representations in advance of verb information. This may reflect general architectural properties of the language processor, or it may only reflect a language-specific adaptation to the demands of verb-finality. The present study addresses this issue by examining whether speakers of a verb-medial language (English) wait to consult verb transitivity information before constructing filler-gap dependencies, where internal arguments are fronted and hence precede the verb. This configuration makes it possible to investigate whether the parser actively makes representational commitments on the gap position before verb transitivity information becomes available. A key prediction of the view that rich pre-verbal structure building is a general architectural property is that speakers of verb-medial languages should predictively construct dependencies in advance of verb transitivity information, and therefore that disruption should be observed when the verb has intransitive subcategorization frames that are incompatible with the predicted structure. In three reading experiments (self-paced and eye-tracking) that manipulated verb transitivity, we found evidence for reading disruption when the verb was intransitive, although no such reading difficulty was observed when the critical verb was embedded inside a syntactic island structure, which blocks filler-gap dependency completion. These results are consistent with the hypothesis that in English, as in verb-final languages, information from preverbal noun phrases is sufficient to trigger active dependency completion without having access to verb transitivity information. PMID:25914658

  4. Hyper-active gap filling.

    PubMed

    Omaki, Akira; Lau, Ellen F; Davidson White, Imogen; Dakan, Myles L; Apple, Aaron; Phillips, Colin

    2015-01-01

    Much work has demonstrated that speakers of verb-final languages are able to construct rich syntactic representations in advance of verb information. This may reflect general architectural properties of the language processor, or it may only reflect a language-specific adaptation to the demands of verb-finality. The present study addresses this issue by examining whether speakers of a verb-medial language (English) wait to consult verb transitivity information before constructing filler-gap dependencies, where internal arguments are fronted and hence precede the verb. This configuration makes it possible to investigate whether the parser actively makes representational commitments on the gap position before verb transitivity information becomes available. A key prediction of the view that rich pre-verbal structure building is a general architectural property is that speakers of verb-medial languages should predictively construct dependencies in advance of verb transitivity information, and therefore that disruption should be observed when the verb has intransitive subcategorization frames that are incompatible with the predicted structure. In three reading experiments (self-paced and eye-tracking) that manipulated verb transitivity, we found evidence for reading disruption when the verb was intransitive, although no such reading difficulty was observed when the critical verb was embedded inside a syntactic island structure, which blocks filler-gap dependency completion. These results are consistent with the hypothesis that in English, as in verb-final languages, information from preverbal noun phrases is sufficient to trigger active dependency completion without having access to verb transitivity information.

  5. Sound Processing Features for Speaker-Dependent and Phrase-Independent Emotion Recognition in Berlin Database

    NASA Astrophysics Data System (ADS)

    Anagnostopoulos, Christos Nikolaos; Vovoli, Eftichia

    An emotion recognition framework based on sound processing could improve services in human-computer interaction. Various quantitative speech features obtained from sound processing of acting speech were tested, as to whether they are sufficient or not to discriminate between seven emotions. Multilayered perceptrons were trained to classify gender and emotions on the basis of a 24-input vector, which provide information about the prosody of the speaker over the entire sentence using statistics of sound features. Several experiments were performed and the results were presented analytically. Emotion recognition was successful when speakers and utterances were “known” to the classifier. However, severe misclassifications occurred during the utterance-independent framework. At least, the proposed feature vector achieved promising results for utterance-independent recognition of high- and low-arousal emotions.

  6. Parallel Processing of Large Scale Microphone Arrays for Sound Capture

    NASA Astrophysics Data System (ADS)

    Jan, Ea-Ee.

    1995-01-01

    Performance of microphone sound pick up is degraded by deleterious properties of the acoustic environment, such as multipath distortion (reverberation) and ambient noise. The degradation becomes more prominent in a teleconferencing environment in which the microphone is positioned far away from the speaker. Besides, the ideal teleconference should feel as easy and natural as face-to-face communication with another person. This suggests hands-free sound capture with no tether or encumbrance by hand-held or body-worn sound equipment. Microphone arrays for this application represent an appropriate approach. This research develops new microphone array and signal processing techniques for high quality hands-free sound capture in noisy, reverberant enclosures. The new techniques combine matched-filtering of individual sensors and parallel processing to provide acute spatial volume selectivity which is capable of mitigating the deleterious effects of noise interference and multipath distortion. The new method outperforms traditional delay-and-sum beamformers which provide only directional spatial selectivity. The research additionally explores truncated matched-filtering and random distribution of transducers to reduce complexity and improve sound capture quality. All designs are first established by computer simulation of array performance in reverberant enclosures. The simulation is achieved by a room model which can efficiently calculate the acoustic multipath in a rectangular enclosure up to a prescribed order of images. It also calculates the incident angle of the arriving signal. Experimental arrays were constructed and their performance was measured in real rooms. Real room data were collected in a hard-walled laboratory and a controllable variable acoustics enclosure of similar size, approximately 6 x 6 x 3 m. An extensive speech database was also collected in these two enclosures for future research on microphone arrays. The simulation results are shown to be consistent with the real room data. Localization of sound sources has been explored using cross-power spectrum time delay estimation and has been evaluated using real room data under slightly, moderately and highly reverberant conditions. To improve the accuracy and reliability of the source localization, an outlier detector that removes incorrect time delay estimation has been invented. To provide speaker selectivity for microphone array systems, a hands-free speaker identification system has been studied. A recently invented feature using selected spectrum information outperforms traditional recognition methods. Measured results demonstrate the capabilities of speaker selectivity from a matched-filtered array. In addition, simulation utilities, including matched -filtering processing of the array and hands-free speaker identification, have been implemented on the massively -parallel nCube super-computer. This parallel computation highlights the requirements for real-time processing of array signals.

  7. Functional activity and white matter microstructure reveal the independent effects of age of acquisition and proficiency on second-language learning.

    PubMed

    Nichols, Emily S; Joanisse, Marc F

    2016-12-01

    Two key factors govern how bilingual speakers neurally maintain two languages: the speakers' second language age of acquisition (AoA) and their subsequent proficiency. However, the relative roles of these two factors have been difficult to disentangle given that the two can be closely correlated, and most prior studies have examined the two factors in isolation. Here, we combine functional magnetic resonance imaging with diffusion tensor imaging to identify specific brain areas that are independently modulated by AoA and proficiency in second language speakers. First-language Mandarin Chinese speakers who are second language speakers of English were scanned as they performed a picture-word matching task in either language. In the same session we also acquired diffusion-weighted scans to assess white matter microstructure, along with behavioural measures of language proficiency prior to entering the scanner. Results reveal gray- and white-matter networks involving both the left and right hemisphere that independently vary as a function of a second-language speaker's AoA and proficiency, focused on the superior temporal gyrus, middle and inferior frontal gyrus, parahippocampal gyrus, and the basal ganglia. These results indicate that proficiency and AoA explain separate functional and structural networks in the bilingual brain, which we interpret as suggesting distinct types of plasticity for age-dependent effects (i.e., AoA) versus experience and/or predisposition (i.e., proficiency). Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  8. You had me at "Hello": Rapid extraction of dialect information from spoken words.

    PubMed

    Scharinger, Mathias; Monahan, Philip J; Idsardi, William J

    2011-06-15

    Research on the neuronal underpinnings of speaker identity recognition has identified voice-selective areas in the human brain with evolutionary homologues in non-human primates who have comparable areas for processing species-specific calls. Most studies have focused on estimating the extent and location of these areas. In contrast, relatively few experiments have investigated the time-course of speaker identity, and in particular, dialect processing and identification by electro- or neuromagnetic means. We show here that dialect extraction occurs speaker-independently, pre-attentively and categorically. We used Standard American English and African-American English exemplars of 'Hello' in a magnetoencephalographic (MEG) Mismatch Negativity (MMN) experiment. The MMN as an automatic change detection response of the brain reflected dialect differences that were not entirely reducible to acoustic differences between the pronunciations of 'Hello'. Source analyses of the M100, an auditory evoked response to the vowels suggested additional processing in voice-selective areas whenever a dialect change was detected. These findings are not only relevant for the cognitive neuroscience of language, but also for the social sciences concerned with dialect and race perception. Copyright © 2011 Elsevier Inc. All rights reserved.

  9. Development of equally intelligible Telugu sentence-lists to test speech recognition in noise.

    PubMed

    Tanniru, Kishore; Narne, Vijaya Kumar; Jain, Chandni; Konadath, Sreeraj; Singh, Niraj Kumar; Sreenivas, K J Ramadevi; K, Anusha

    2017-09-01

    To develop sentence lists in the Telugu language for the assessment of speech recognition threshold (SRT) in the presence of background noise through identification of the mean signal-to-noise ratio required to attain a 50% sentence recognition score (SRTn). This study was conducted in three phases. The first phase involved the selection and recording of Telugu sentences. In the second phase, 20 lists, each consisting of 10 sentences with equal intelligibility, were formulated using a numerical optimisation procedure. In the third phase, the SRTn of the developed lists was estimated using adaptive procedures on individuals with normal hearing. A total of 68 native Telugu speakers with normal hearing participated in the study. Of these, 18 (including the speakers) performed on various subjective measures in first phase, 20 performed on sentence/word recognition in noise for second phase and 30 participated in the list equivalency procedures in third phase. In all, 15 lists of comparable difficulty were formulated as test material. The mean SRTn across these lists corresponded to -2.74 (SD = 0.21). The developed sentence lists provided a valid and reliable tool to measure SRTn in Telugu native speakers.

  10. Fifty years of progress in speech and speaker recognition

    NASA Astrophysics Data System (ADS)

    Furui, Sadaoki

    2004-10-01

    Speech and speaker recognition technology has made very significant progress in the past 50 years. The progress can be summarized by the following changes: (1) from template matching to corpus-base statistical modeling, e.g., HMM and n-grams, (2) from filter bank/spectral resonance to Cepstral features (Cepstrum + DCepstrum + DDCepstrum), (3) from heuristic time-normalization to DTW/DP matching, (4) from gdistanceh-based to likelihood-based methods, (5) from maximum likelihood to discriminative approach, e.g., MCE/GPD and MMI, (6) from isolated word to continuous speech recognition, (7) from small vocabulary to large vocabulary recognition, (8) from context-independent units to context-dependent units for recognition, (9) from clean speech to noisy/telephone speech recognition, (10) from single speaker to speaker-independent/adaptive recognition, (11) from monologue to dialogue/conversation recognition, (12) from read speech to spontaneous speech recognition, (13) from recognition to understanding, (14) from single-modality (audio signal only) to multi-modal (audio/visual) speech recognition, (15) from hardware recognizer to software recognizer, and (16) from no commercial application to many practical commercial applications. Most of these advances have taken place in both the fields of speech recognition and speaker recognition. The majority of technological changes have been directed toward the purpose of increasing robustness of recognition, including many other additional important techniques not noted above.

  11. Perception of a non-native speech contrast: Voiced and voiceless stops as perceived by Tamil speakers

    NASA Astrophysics Data System (ADS)

    Tur, Sylwia

    2004-05-01

    The effect of linguistic experience plays a significant role in how speech sounds are perceived. The findings of many studies imply that the perception of non-native contrasts depends on their status in the native language of the listener. Tamil is a language with a single voicing category. All stop consonants in Tamil are phonemically voiceless, though allophonic voicing has been observed in spoken Tamil. The present study examined how native Tamil speakers and English controls perceived voiced and voiceless bilabial, alveolar, and velar stops in English. Voice onset time (VOT) was manipulated for editing of naturally produced stimuli with increasingly longer continuum. Perceptual data was collected from 16 Tamil and 16 English speakers. Experiment 1 was an AX task in which subjects responded same or different to 162 pairs of stimuli. Experiment 2 was a forced choice ID task in which subjects identified 99 individually presented stimuli as pa, ta, ka or ba, da, ga. Experiments show statistically significant differences between Tamil and English speakers in their perception of English stop consonants. Results of the study imply that the allophonic status of voiced stops in Tamil does not aid the Tamil speakers in perceiving phonemically voiced stops in English.

  12. A Method for Determining the Timing of Displaying the Speaker's Face and Captions for a Real-Time Speech-to-Caption System

    NASA Astrophysics Data System (ADS)

    Kuroki, Hayato; Ino, Shuichi; Nakano, Satoko; Hori, Kotaro; Ifukube, Tohru

    The authors of this paper have been studying a real-time speech-to-caption system using speech recognition technology with a “repeat-speaking” method. In this system, they used a “repeat-speaker” who listens to a lecturer's voice and then speaks back the lecturer's speech utterances into a speech recognition computer. The througoing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures, and face and mouth movements. So the authors found the idea to display information of captions and speaker's face movement images with a suitable way to achieve a higher comprehension after storing information once into a computer briefly. In this paper, we investigate the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results show that the sequence “to display the caption before the speaker's face image” improves the comprehension of the captions. The sequence “to display both simultaneously” shows an improvement only a few percent higher than the question sentence, and the sequence “to display the speaker's face image before the caption” shows almost no change. In addition, the sequence “to display the caption 1 second before the speaker's face shows the most significant improvement of all the conditions.

  13. Effects of audio-visual presentation of target words in word translation training

    NASA Astrophysics Data System (ADS)

    Akahane-Yamada, Reiko; Komaki, Ryo; Kubo, Rieko

    2004-05-01

    Komaki and Akahane-Yamada (Proc. ICA2004) used 2AFC translation task in vocabulary training, in which the target word is presented visually in orthographic form of one language, and the appropriate meaning in another language has to be chosen between two choices. Present paper examined the effect of audio-visual presentation of target word when native speakers of Japanese learn to translate English words into Japanese. Pairs of English words contrasted in several phonemic distinctions (e.g., /r/-/l/, /b/-/v/, etc.) were used as word materials, and presented in three conditions; visual-only (V), audio-only (A), and audio-visual (AV) presentations. Identification accuracy of those words produced by two talkers was also assessed. During pretest, the accuracy for A stimuli was lowest, implying that insufficient translation ability and listening ability interact with each other when aurally presented word has to be translated. However, there was no difference in accuracy between V and AV stimuli, suggesting that participants translate the words depending on visual information only. The effect of translation training using AV stimuli did not transfer to identification ability, showing that additional audio information during translation does not help improve speech perception. Further examination is necessary to determine the effective L2 training method. [Work supported by TAO, Japan.

  14. Effects of age of acquisition on brain activation during Chinese character recognition.

    PubMed

    Weekes, Brendan Stuart; Chan, Alice H D; Tan, Li Hai

    2008-01-01

    The age of acquisition of a word (AoA) has a specific effect on brain activation during word identification in English and German. However, the neural locus of AoA effects differs across studies. According to Hernandez and Fiebach [Hernandez, A., & Fiebach, C. (2006). The brain bases of reading late-learned words: Evidence from functional MRI. Visual Cognition, 13(8), 1027-1043], the effects of AoA on brain activation depend on the predictability of the connections between input (orthography) and output (phonology) in a lexical network. We tested this hypothesis by examining AoA effects in a non-alphabetic script with relatively arbitrary mappings between orthography and phonology--Chinese. Our results showed that the effects of AoA in Chinese speakers are located in brain regions that are spatially distinctive including the bilateral middle temporal gyrus and the left inferior parietal cortex. An additional finding was that word frequency had an independent effect on brain activation in the right middle occipital gyrus only. We conclude that spatially distinctive effects of AoA on neural activity depend on the predictability of the mappings between orthography and phonology and reflect a division of labour towards greater lexical-semantic retrieval in non-alphabetic scripts.

  15. Computer-Mediated Assessment of Intelligibility in Aphasia and Apraxia of Speech

    PubMed Central

    Haley, Katarina L.; Roth, Heidi; Grindstaff, Enetta; Jacks, Adam

    2011-01-01

    Background Previous work indicates that single word intelligibility tests developed for dysarthria are sensitive to segmental production errors in aphasic individuals with and without apraxia of speech. However, potential listener learning effects and difficulties adapting elicitation procedures to coexisting language impairments limit their applicability to left hemisphere stroke survivors. Aims The main purpose of this study was to examine basic psychometric properties for a new monosyllabic intelligibility test developed for individuals with aphasia and/or AOS. A related purpose was to examine clinical feasibility and potential to standardize a computer-mediated administration approach. Methods & Procedures A 600-item monosyllabic single word intelligibility test was constructed by assembling sets of phonetically similar words. Custom software was used to select 50 target words from this test in a pseudo-random fashion and to elicit and record production of these words by 23 speakers with aphasia and 20 neurologically healthy participants. To evaluate test-retest reliability, two identical sets of 50-word lists were elicited by requesting repetition after a live speaker model. To examine the effect of a different word set and auditory model, an additional set of 50 different words was elicited with a pre-recorded model. The recorded words were presented to normal-hearing listeners for identification via orthographic and multiple-choice response formats. To examine construct validity, production accuracy for each speaker was estimated via phonetic transcription and rating of overall articulation. Outcomes & Results Recording and listening tasks were completed in less than six minutes for all speakers and listeners. Aphasic speakers were significantly less intelligible than neurologically healthy speakers and displayed a wide range of intelligibility scores. Test-retest and inter-listener reliability estimates were strong. No significant difference was found in scores based on recordings from a live model versus a pre-recorded model, but some individual speakers favored the live model. Intelligibility test scores correlated highly with segmental accuracy derived from broad phonetic transcription of the same speech sample and a motor speech evaluation. Scores correlated moderately with rated articulation difficulty. Conclusions We describe a computerized, single-word intelligibility test that yields clinically feasible, reliable, and valid measures of segmental speech production in adults with aphasia. This tool can be used in clinical research to facilitate appropriate participant selection and to establish matching across comparison groups. For a majority of speakers, elicitation procedures can be standardized by using a pre-recorded auditory model for repetition. This assessment tool has potential utility for both clinical assessment and outcomes research. PMID:22215933

  16. Gender parity trends for invited speakers at four prominent virology conference series.

    PubMed

    Kalejta, Robert F; Palmenberg, Ann C

    2017-06-07

    Scientific conferences are most beneficial to participants when they showcase significant new experimental developments, accurately summarize the current state of the field, and provide strong opportunities for collaborative networking. A top-notch slate of invited speakers, assembled by conference organizers or committees, is key to achieving these goals. The perceived underrepresentation of female speakers at prominent scientific meetings is currently a popular topic for discussion, but one that often lacks supportive data. We compiled the full rosters of invited speakers over the last 35 years for four prominent international virology conferences, the American Society for Virology Annual Meeting (ASV), the International Herpesvirus Workshop (IHW), the Positive-Strand RNA Virus Symposium (PSR), and the Gordon Research Conference on Viruses & Cells (GRC). The rosters were cross-indexed by unique names, gender, year, and repeat invitations. When plotted as gender-dependent trends over time, all four conferences showed a clear proclivity for male-dominated invited speaker lists. Encouragingly, shifts toward parity are emerging within all units, but at different rates. Not surprisingly, both selection of a larger percentage of first time participants and the presence of a woman on the speaker selection committee correlated with improved parity. Session chair information was also collected for the IHW and GRC. These visible positions also displayed a strong male dominance over time that is eroding slowly. We offer our personal interpretation of these data to aid future organizers achieve improved equity among the limited number of available positions for session moderators and invited speakers. IMPORTANCE Politicians and media members have a tendency to cite anecdotes as conclusions without any supporting data. This happens so frequently now, that a name for it has emerged: fake news. Good science proceeds otherwise. The under representation of women as invited speakers at international scientific conferences exemplifies a present-day discussion topic usually occurring without facts to support or refute the arguments. We now provide records profiling four prominent virology conferences over the years 1982 to 2017 with the intention that the trends and accompanying analyses of the gender parity of invited speakers may allow the ongoing discussions to be informed. Copyright © 2017 American Society for Microbiology.

  17. Gender Parity Trends for Invited Speakers at Four Prominent Virology Conference Series

    PubMed Central

    Palmenberg, Ann C.

    2017-01-01

    ABSTRACT Scientific conferences are most beneficial to participants when they showcase significant new experimental developments, accurately summarize the current state of the field, and provide strong opportunities for collaborative networking. A top-notch slate of invited speakers, assembled by conference organizers or committees, is key to achieving these goals. The perceived underrepresentation of female speakers at prominent scientific meetings is currently a popular topic for discussion, but one that often lacks supportive data. We compiled the full rosters of invited speakers over the last 35 years for four prominent international virology conferences, the American Society for Virology Annual Meeting (ASV), the International Herpesvirus Workshop (IHW), the Positive-Strand RNA Virus Symposium (PSR), and the Gordon Research Conference on Viruses & Cells (GRC). The rosters were cross-indexed by unique names, gender, year, and repeat invitations. When plotted as gender-dependent trends over time, all four conferences showed a clear proclivity for male-dominated invited speaker lists. Encouragingly, shifts toward parity are emerging within all units, but at different rates. Not surprisingly, both selection of a larger percentage of first-time participants and the presence of a woman on the speaker selection committee correlated with improved parity. Session chair information was also collected for the IHW and GRC. These visible positions also displayed a strong male dominance over time that is eroding slowly. We offer our personal interpretation of these data to aid future organizers achieve improved equity among the limited number of available positions for session moderators and invited speakers. IMPORTANCE Politicians and media members have a tendency to cite anecdotes as conclusions without any supporting data. This happens so frequently now, that a name for it has emerged: fake news. Good science proceeds otherwise. The underrepresentation of women as invited speakers at international scientific conferences exemplifies a present-day discussion topic usually occurring without facts to support or refute the arguments. We now provide records profiling four prominent virology conferences over the years 1982 to 2017 with the intention that the trends and accompanying analyses of the gender parity of invited speakers may allow the ongoing discussions to be informed. PMID:28592542

  18. Variability in the relationships among voice quality, harmonic amplitudes, open quotient, and glottal area waveform shape in sustained phonationa

    PubMed Central

    Kreiman, Jody; Shue, Yen-Liang; Chen, Gang; Iseli, Markus; Gerratt, Bruce R.; Neubauer, Juergen; Alwan, Abeer

    2012-01-01

    Increases in open quotient are widely assumed to cause changes in the amplitude of the first harmonic relative to the second (H1*–H2*), which in turn correspond to increases in perceived vocal breathiness. Empirical support for these assumptions is rather limited, and reported relationships among these three descriptive levels have been variable. This study examined the empirical relationship among H1*–H2*, the glottal open quotient (OQ), and glottal area waveform skewness, measured synchronously from audio recordings and high-speed video images of the larynges of six phonetically knowledgeable, vocally healthy speakers who varied fundamental frequency and voice qualities quasi-orthogonally. Across speakers and voice qualities, OQ, the asymmetry coefficient, and fundamental frequency accounted for an average of 74% of the variance in H1*–H2*. However, analyses of individual speakers showed large differences in the strategies used to produce the same intended voice qualities. Thus, H1*–H2* can be predicted with good overall accuracy, but its relationship to phonatory characteristics appears to be speaker dependent. PMID:23039455

  19. Connected word recognition using a cascaded neuro-computational model

    NASA Astrophysics Data System (ADS)

    Hoya, Tetsuya; van Leeuwen, Cees

    2016-10-01

    We propose a novel framework for processing a continuous speech stream that contains a varying number of words, as well as non-speech periods. Speech samples are segmented into word-tokens and non-speech periods. An augmented version of an earlier-proposed, cascaded neuro-computational model is used for recognising individual words within the stream. Simulation studies using both a multi-speaker-dependent and speaker-independent digit string database show that the proposed method yields a recognition performance comparable to that obtained by a benchmark approach using hidden Markov models with embedded training.

  20. Developing Appreciation for Sarcasm and Sarcastic Gossip: It Depends on Perspective.

    PubMed

    Glenwright, Melanie; Tapley, Brent; Rano, Jacqueline K S; Pexman, Penny M

    2017-11-09

    Speakers use sarcasm to criticize others and to be funny; the indirectness of sarcasm protects the addressee's face (Brown & Levinson, 1987). Thus, appreciation of sarcasm depends on the ability to consider perspectives. We investigated development of this ability from late childhood into adulthood and examined effects of interpretive perspective and parties present. We presented 9- to 10-year-olds, 13- to 14-year-olds, and adults with sarcastic and literal remarks in three parties-present conditions: private evaluation, public evaluation, and gossip. Participants interpreted the speaker's attitude and humor from the addressee's perspective and, when appropriate, from the bystander's perspective. Children showed no influence of interpretive perspective or parties present on appreciation of the speaker's attitude or humor. Adolescents and adults, however, shifted their interpretations, judging that addressees have less favorable views of criticisms than bystanders. Further, adolescents and adults differed in their perceptions of the social functions of gossip, with adolescents showing more positive attitudes than adults toward sarcastic gossip. We suggest that adults' disapproval of sarcastic gossip shows a deeper understanding of the utility of sarcasm's face-saving function. Thus, the ability to modulate appreciation of sarcasm according to interpretive perspective and parties present continues to develop in adolescence and into adulthood.

  1. Sociolinguistic variables and cognition.

    PubMed

    Thomas, Erik R

    2011-11-01

    Sociolinguistics has examined mental organization of language only sporadically. Meanwhile, areas of linguistics that deal with cognitive organization seldom delve deeply into language variation. Variation is essential for understanding how language is structured cognitively, however. Three kinds of evidence are discussed to illustrate this point. First, style shifting demonstrates that language users develop detailed associations of when to produce specific linguistic forms, depending on the pragmatic context. Second, variation in fine-grained phonetic cues shows that cognitive organization applies to linguistic forms not otherwise known to be under speakers' control. Finally, experiments on dialect comprehension and identification demonstrate that listeners have detailed cognitive associations of language variants with groups of people, whether or not they can produce the same variants themselves. A model is presented for how sociolinguistic knowledge can be viewed in relation to other parts of language with regard to cognitive and neural representations. WIREs Cogni Sci 2011 2 701-716 DOI: 10.1002/wcs.152 For further resources related to this article, please visit the WIREs website. Copyright © 2011 John Wiley & Sons, Ltd.

  2. Sleep duration predicts behavioral and neural differences in adult speech sound learning.

    PubMed

    Earle, F Sayako; Landi, Nicole; Myers, Emily B

    2017-01-01

    Sleep is important for memory consolidation and contributes to the formation of new perceptual categories. This study examined sleep as a source of variability in typical learners' ability to form new speech sound categories. We trained monolingual English speakers to identify a set of non-native speech sounds at 8PM, and assessed their ability to identify and discriminate between these sounds immediately after training, and at 8AM on the following day. We tracked sleep duration overnight, and found that light sleep duration predicted gains in identification performance, while total sleep duration predicted gains in discrimination ability. Participants obtained an average of less than 6h of sleep, pointing to the degree of sleep deprivation as a potential factor. Behavioral measures were associated with ERP indexes of neural sensitivity to the learned contrast. These results demonstrate that the relative success in forming new perceptual categories depends on the duration of post-training sleep. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  3. Sleep duration predicts behavioral and neural differences in adult speech sound learning

    PubMed Central

    Earle, F. Sayako; Landi, Nicole; Myers, Emily B.

    2016-01-01

    Sleep is important for memory consolidation and contributes to the formation of new perceptual categories. This study examined sleep as a source of variability in typical learners’ ability to form new speech sound categories. We trained monolingual English speakers to identify a set of non-native speech sounds at 8PM, and assessed their ability to identify and discriminate between these sounds immediately after training, and at 8AM on the following day. We tracked sleep duration overnight, and found that light sleep duration predicted gains in identification performance, while total sleep duration predicted gains in discrimination ability. Participants obtained an average of less than 6 hours of sleep, pointing to the degree of sleep deprivation as a potential factor. Behavioral measures were associated with ERP indexes of neural sensitivity to the learned contrast. These results demonstrate that the relative success in forming new perceptual categories depends on the duration of post-training sleep. PMID:27793703

  4. The impact of musical training and tone language experience on talker identification

    PubMed Central

    Xie, Xin; Myers, Emily

    2015-01-01

    Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner. PMID:25618071

  5. The impact of musical training and tone language experience on talker identification.

    PubMed

    Xie, Xin; Myers, Emily

    2015-01-01

    Listeners can use pitch changes in speech to identify talkers. Individuals exhibit large variability in sensitivity to pitch and in accuracy perceiving talker identity. In particular, people who have musical training or long-term tone language use are found to have enhanced pitch perception. In the present study, the influence of pitch experience on talker identification was investigated as listeners identified talkers in native language as well as non-native languages. Experiment 1 was designed to explore the influence of pitch experience on talker identification in two groups of individuals with potential advantages for pitch processing: musicians and tone language speakers. Experiment 2 further investigated individual differences in pitch processing and the contribution to talker identification by testing a mediation model. Cumulatively, the results suggested that (a) musical training confers an advantage for talker identification, supporting a shared resources hypothesis regarding music and language and (b) linguistic use of lexical tones also increases accuracy in hearing talker identity. Importantly, these two types of hearing experience enhance talker identification by sharpening pitch perception skills in a domain-general manner.

  6. Masking Release for Igbo and English.

    PubMed

    Ebem, Deborah U; Desloge, Joseph G; Reed, Charlotte M; Braida, Louis D; Uguru, Joy O

    2013-09-01

    In this research, we explored the effect of noise interruption rate on speech intelligibility. Specifically, we used the Hearing In Noise Test (HINT) procedure with the original HINT stimuli (English) and Igbo stimuli to assess speech reception ability in interrupted noise. For a given noise level, the HINT test provides an estimate of the signal-to-noise ratio (SNR) required for 50%-correct speech intelligibility. The SNR for 50%-correct intelligibility changes depending upon the interruption rate of the noise. This phenomenon (called Masking Release) has been studied extensively in English but not for Igbo - which is an African tonal language spoken predominantly in South Eastern Nigeria. This experiment explored and compared the phenomenon of Masking Release for (i) native English speakers listening to English, (ii) native Igbo speakers listening to English, and (iii) native Igbo speakers listening to Igbo. Since Igbo is a tonal language and English is a non-tonal language, this allowed us to compare Masking Release patterns on native speakers of tonal and non-tonal languages. Our results for native English speakers listening to English HINT show that the SNR and the masking release are orderly and consistent with other English HINT data for English speakers. Our result for Igbo speakers listening to English HINT sentences show that there is greater variability in results across the different Igbo listeners than across the English listeners. This result likely reflects different levels of ability in the English language across the Igbo listeners. The masking release values in dB are less than for English listeners. Our results for Igbo speakers listening to Igbo show that in general, the SNRs for Igbo sentences are lower than for English/English and Igbo/English. This means that the Igbo listeners could understand 50% of the Igbo sentences at SNRs less than those required for English sentences by either native or non-native listeners. This result can be explained by the fact that the perception of Igbo utterances by Igbo subjects may have been aided by the prediction of tonal and vowel harmony features existent in the Igbo language. In agreement with other studies, our results also show that in a noisy environment listeners are able to perceive their native language better than a second language. The ability of native language speakers to perceive their language better than a second language in a noisy environment may be attributed to the fact that: Native speakers are more familiar with the sounds of their language than second language speakers.One of the features of language is that it is predictable hence even in noise a native speaker may be able to predict a succeeding word that is scarcely audible. These contextual effects are facilitated by familiarity.

  7. What does it take to stress a word? Digital manipulation of stress markers in ataxic dysarthria.

    PubMed

    Lowit, Anja; Ijitona, Tolulope; Kuschmann, Anja; Corson, Stephen; Soraghan, John

    2018-05-18

    Stress production is important for effective communication, but this skill is frequently impaired in people with motor speech disorders. The literature reports successful treatment of these deficits in this population, thus highlighting the therapeutic potential of this area. However, no specific guidance is currently available to clinicians about whether any of the stress markers are more effective than others, to what degree they have to be manipulated, and whether strategies need to differ according to the underlying symptoms. In order to provide detailed information on how stress production problems can be addressed, the study investigated (1) the minimum amount of change in a single stress marker necessary to achieve significant improvement in stress target identification; and (2) whether stress can be signalled more effectively with a combination of stress markers. Data were sourced from a sentence stress task performed by 10 speakers with ataxic dysarthria and 10 healthy matched control participants. Fifteen utterances perceived as having incorrect stress patterns (no stress, all words stressed or inappropriate word stressed) were selected and digitally manipulated in a stepwise fashion based on typical speaker performance. Manipulations were performed on F0, intensity and duration, either in isolation or in combination with each other. In addition, pitch contours were modified for some utterances. A total of 50 naïve listeners scored which word they perceived as being stressed. Results showed that increases in duration and intensity at levels smaller than produced by the control participants resulted in significant improvements in listener accuracy. The effectiveness of F0 increases depended on the underlying error pattern. Overall intensity showed the most stable effects. Modifications of the pitch contour also resulted in significant improvements, but not to the same degree as amplification. Integration of two or more stress markers did not result in better results than manipulation of individual stress markers, unless they were combined with pitch contour modifications. The results highlight the potential for improvement of stress production in speakers with motor speech disorders. The fact that individual parameter manipulation is as effective as combining them will facilitate the therapeutic process considerably, as will the result that amplification at lower levels than seen in typical speakers is sufficient. The difference in results across utterance sets highlights the need to investigate the underlying error pattern in order to select the most effective compensatory strategy for clients. © 2018 Royal College of Speech and Language Therapists.

  8. Effects of low speed wind on the recognition/identification and pass-through communication tasks of auditory situation awareness afforded by military hearing protection/enhancement devices and tactical communication and protective systems.

    PubMed

    Lee, Kichol; Casali, John G

    2016-01-01

    To investigate the effect of controlled low-speed wind-noise on the auditory situation awareness performance afforded by military hearing protection/enhancement devices (HPED) and tactical communication and protective systems (TCAPS). Recognition/identification and pass-through communications tasks were separately conducted under three wind conditions (0, 5, and 10 mph). Subjects wore two in-ear-type TCAPS, one earmuff-type TCAPS, a Combat Arms Earplug in its 'open' or pass-through setting, and an EB-15LE electronic earplug. Devices with electronic gain systems were tested under two gain settings: 'unity' and 'max'. Testing without any device (open ear) was conducted as a control. Ten subjects were recruited from the student population at Virginia Tech. Audiometric requirements were 25 dBHL or better at 500, 1000, 2000, 4000, and 8000 Hz in both ears. Performance on the interaction of communication task-by-device was significantly different only in 0 mph wind speed. The between-device performance differences varied with azimuthal speaker locations. It is evident from this study that stable (non-gusting) wind speeds up to 10 mph did not significantly degrade recognition/identification task performance and pass-through communication performance of the group of HPEDs and TCAPS tested. However, the various devices performed differently as the test sound signal speaker location was varied and it appears that physical as well as electronic features may have contributed to this directional result.

  9. Intentional switching in auditory selective attention: Exploring age-related effects in a spatial setup requiring speech perception.

    PubMed

    Oberem, Josefa; Koch, Iring; Fels, Janina

    2017-06-01

    Using a binaural-listening paradigm, age-related differences in the ability to intentionally switch auditory selective attention between two speakers, defined by their spatial location, were examined. Therefore 40 normal-hearing participants (20 young, Ø 24.8years; 20 older Ø 67.8years) were tested. The spatial reproduction of stimuli was provided by headphones using head-related-transfer-functions of an artificial head. Spoken number words of two speakers were presented simultaneously to participants from two out of eight locations on the horizontal plane. Guided by a visual cue indicating the spatial location of the target speaker, the participants were asked to categorize the target's number word into smaller vs. greater than five while ignoring the distractor's speech. Results showed significantly higher reaction times and error rates for older participants. The relative influence of the spatial switch of the target-speaker (switch or repetition of speaker's direction in space) was identical across age groups. Congruency effects (stimuli spoken by target and distractor may evoke the same answer or different answers) were increased for older participants and depend on the target's position. Results suggest that the ability to intentionally switch auditory attention to a new cued location was unimpaired whereas it was generally harder for older participants to suppress processing the distractor's speech. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Developing Appreciation for Sarcasm and Sarcastic Gossip: It Depends on Perspective

    ERIC Educational Resources Information Center

    Glenwright, Melanie; Tapley, Brent; Rano, Jacqueline K. S.; Pexman, Penny M.

    2017-01-01

    Background: Speakers use sarcasm to criticize others and to be funny; the indirectness of sarcasm protects the addressee's face (Brown & Levinson, 1987). Thus, appreciation of sarcasm depends on the ability to consider perspectives. Purpose: We investigated development of this ability from late childhood into adulthood and examined effects of…

  11. Absolute Interrogative Intonation Patterns in Buenos Aires Spanish

    ERIC Educational Resources Information Center

    Lee, Su Ar

    2010-01-01

    In Spanish, each uttered phrase, depending on its use, has one of a variety of intonation patterns. For example, a phrase such as "Maria viene manana" "Mary is coming tomorrow" can be used as a declarative or as an absolute interrogative (a yes/no question) depending on the intonation pattern that a speaker produces. …

  12. Intonation as an encoder of speaker certainty: information and confirmation yes-no questions in Catalan.

    PubMed

    Vanrell, Maria del Mar; Mascaró, Ignasi; Torres-Tamarit, Francesc; Prieto, Pilar

    2013-06-01

    Recent studies in the field of intonational phonology have shown that information-seeking questions can be distinguished from confirmation-seeking questions by prosodic means in a variety of languages (Armstrong, 2010, for Puerto Rican Spanish; Grice & Savino, 1997, for Bari Italian; Kügler, 2003, for Leipzig German; Mata & Santos, 2010, for European Portuguese; Vanrell, Mascaró, Prieto, & Torres-Tamarit, 2010, for Catalan). However, all these studies have relied on production experiments and little is known about the perceptual relevance of these intonational cues. This paper explores whether Majorcan Catalan listeners distinguish information- and confirmation-seeking questions by means of two distinct nuclear falling pitch accents. Three behavioral tasks were conducted with 20 Majorcan Catalan subjects, namely a semantic congruity test, a rating test, and a classical categorical perception identification/discrimination test. The results show that a difference in pitch scaling on the leading H tone of the H+L* nuclear pitch accent is the main cue used by Majorcan Catalan listeners to distinguish confirmation questions from information-seeking questions. Thus, while a iH+L* pitch accent signals an information-seeking question (i.e., the speaker has no expectation about the nature of the answer), the H+L* pitch accent indicates that the speaker is asking about mutually shared information. We argue that these results have implications in representing the distinctions of tonal height in Catalan. The results also support the claim that phonological contrasts in intonation, together with other linguistic strategies, can signal the speakers' beliefs about the certainty of the proposition expressed.

  13. Investigating Executive Working Memory and Phonological Short-Term Memory in Relation to Fluency and Self-Repair Behavior in L2 Speech.

    PubMed

    Georgiadou, Effrosyni; Roehr-Brackin, Karen

    2017-08-01

    This paper reports the findings of a study investigating the relationship of executive working memory (WM) and phonological short-term memory (PSTM) to fluency and self-repair behavior during an unrehearsed oral task performed by second language (L2) speakers of English at two levels of proficiency, elementary and lower intermediate. Correlational analyses revealed a negative relationship between executive WM and number of pauses in the lower intermediate L2 speakers. However, no reliable association was found in our sample between executive WM or PSTM and self-repair behavior in terms of either frequency or type of self-repair. Taken together, our findings suggest that while executive WM may enhance performance at the conceptualization and formulation stages of the speech production process, self-repair behavior in L2 speakers may depend on factors other than working memory.

  14. Perceptual Learning of Time-Compressed Speech: More than Rapid Adaptation

    PubMed Central

    Banai, Karen; Lavner, Yizhar

    2012-01-01

    Background Time-compressed speech, a form of rapidly presented speech, is harder to comprehend than natural speech, especially for non-native speakers. Although it is possible to adapt to time-compressed speech after a brief exposure, it is not known whether additional perceptual learning occurs with further practice. Here, we ask whether multiday training on time-compressed speech yields more learning than that observed during the initial adaptation phase and whether the pattern of generalization following successful learning is different than that observed with initial adaptation only. Methodology/Principal Findings Two groups of non-native Hebrew speakers were tested on five different conditions of time-compressed speech identification in two assessments conducted 10–14 days apart. Between those assessments, one group of listeners received five practice sessions on one of the time-compressed conditions. Between the two assessments, trained listeners improved significantly more than untrained listeners on the trained condition. Furthermore, the trained group generalized its learning to two untrained conditions in which different talkers presented the trained speech materials. In addition, when the performance of the non-native speakers was compared to that of a group of naïve native Hebrew speakers, performance of the trained group was equivalent to that of the native speakers on all conditions on which learning occurred, whereas performance of the untrained non-native listeners was substantially poorer. Conclusions/Significance Multiday training on time-compressed speech results in significantly more perceptual learning than brief adaptation. Compared to previous studies of adaptation, the training induced learning is more stimulus specific. Taken together, the perceptual learning of time-compressed speech appears to progress from an initial, rapid adaptation phase to a subsequent prolonged and more stimulus specific phase. These findings are consistent with the predictions of the Reverse Hierarchy Theory of perceptual learning and suggest constraints on the use of perceptual-learning regimens during second language acquisition. PMID:23056592

  15. Analysis and Classification of Voice Pathologies Using Glottal Signal Parameters.

    PubMed

    Forero M, Leonardo A; Kohler, Manoela; Vellasco, Marley M B R; Cataldo, Edson

    2016-09-01

    The classification of voice diseases has many applications in health, in diseases treatment, and in the design of new medical equipment for helping doctors in diagnosing pathologies related to the voice. This work uses the parameters of the glottal signal to help the identification of two types of voice disorders related to the pathologies of the vocal folds: nodule and unilateral paralysis. The parameters of the glottal signal are obtained through a known inverse filtering method, and they are used as inputs to an Artificial Neural Network, a Support Vector Machine, and also to a Hidden Markov Model, to obtain the classification, and to compare the results, of the voice signals into three different groups: speakers with nodule in the vocal folds; speakers with unilateral paralysis of the vocal folds; and speakers with normal voices, that is, without nodule or unilateral paralysis present in the vocal folds. The database is composed of 248 voice recordings (signals of vowels production) containing samples corresponding to the three groups mentioned. In this study, a larger database was used for the classification when compared with similar studies, and its classification rate is superior to other studies, reaching 97.2%. Copyright © 2016 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  16. Integrated Robust Open-Set Speaker Identification System (IROSIS)

    DTIC Science & Technology

    2012-05-01

    29 LIST OF TABLES Table 1. Detail of NIST Data Used for Training and Testing ............................................ 3 Table 2...scenarios are referred to as VB-YB, VL-YL, VB-YL and VL-YB respectively. Table 1. Detail of NIST Data Used for Training and Testing Purpose Source No...M is the UBM supervector, and that the difference between ( )L m and ( , )Q M m is the Kullback - Leibler divergence between the “alignment” of the

  17. Speed-difficulty trade-off in speech: Chinese versus English

    PubMed Central

    Sun, Yao; Latash, Elizaveta M.; Mikaelian, Irina L.

    2011-01-01

    This study continues the investigation of the previously described speed-difficulty trade-off in picture description tasks. In particular, we tested a hypothesis that the Mandarin Chinese and American English are similar in showing logarithmic dependences between speech time and index of difficulty (ID), while they differ significantly in the amount of time needed to describe simple pictures, this difference increases for more complex pictures, and it is associated with a proportional difference in the number of syllables used. Subjects (eight Chinese speakers and eight English speakers) were tested in pairs. One subject (the Speaker) described simple pictures, while the other subject (the Performer) tried to reproduce the pictures based on the verbal description as quickly as possible with a set of objects. The Chinese speakers initiated speech production significantly faster than the English speakers. Speech time scaled linearly with ln(ID) in all subjects, but the regression coefficient was significantly higher in the English speakers as compared with the Chinese speakers. The number of errors was somewhat lower in the Chinese participants (not significantly). The Chinese pairs also showed a shorter delay between the initiation of speech and initiation of action by the Performer, shorter movement time by the Performer, and shorter overall performance time. The number of syllables scaled with ID, and the Chinese speakers used significantly smaller numbers of syllables. Speech rate was comparable between the two groups, about 3 syllables/s; it dropped for more complex pictures (higher ID). When asked to reproduce the same pictures without speaking, movement time scaled linearly with ln(ID); the Chinese performers were slower than the English performers. We conclude that natural languages show a speed-difficulty trade-off similar to Fitts’ law; the trade-offs in movement and speech production are likely to originate at a cognitive level. The time advantage of the Chinese participants originates not from similarity of the simple pictures and Chinese written characters and not from more sloppy performance. It is linked to using fewer syllables to transmit the same information. We suggest that natural languages may differ by informational density defined as the amount of information transmitted by a given number of syllables. PMID:21479658

  18. Words and pictures: An electrophysiological investigation of domain specific processing in native Chinese and English speakers

    PubMed Central

    Yum, Yen Na; Holcomb, Phillip J.; Grainger, Jonathan

    2011-01-01

    Comparisons of word and picture processing using Event-Related Potentials (ERPs) are contaminated by gross physical differences between the two types of stimuli. In the present study, we tackle this problem by comparing picture processing with word processing in an alphabetic and a logographic script, that are also characterized by gross physical differences. Native Mandarin Chinese speakers viewed pictures (line drawings) and Chinese characters (Experiment 1), native English speakers viewed pictures and English words (Experiment 2), and naïve Chinese readers (native English speakers) viewed pictures and Chinese characters (Experiment 3) in a semantic categorization task. The varying pattern of differences in the ERPs elicited by pictures and words across the three experiments provided evidence for i) script-specific processing arising between 150–200 ms post-stimulus onset, ii) domain-specific but script-independent processing arising between 200–300 ms post-stimulus onset, and iii) processing that depended on stimulus meaningfulness in the N400 time window. The results are interpreted in terms of differences in the way visual features are mapped onto higher-level representations for pictures and words in alphabetic and logographic writing systems. PMID:21439991

  19. Sentence Recall by Children With SLI Across Two Nonmainstream Dialects of English

    PubMed Central

    McDonald, Janet L.; Seidel, Christy M.; Hegarty, Michael

    2016-01-01

    Purpose The inability to accurately recall sentences has proven to be a clinical marker of specific language impairment (SLI); this task yields moderate-to-high levels of sensitivity and specificity. However, it is not yet known if these results hold for speakers of dialects whose nonmainstream grammatical productions overlap with those that are produced at high rates by children with SLI. Method Using matched groups of 70 African American English speakers and 36 Southern White English speakers and dialect-strategic scoring, we examined children's sentence recall abilities as a function of their dialect and clinical status (SLI vs. typically developing [TD]). Results For both dialects, the SLI group earned lower sentence recall scores than the TD group with sensitivity and specificity values ranging from .80 to .94, depending on the analysis. Children with SLI, as compared with TD controls, manifested lower levels of verbatim recall, more ungrammatical recalls when the recall was not exact, and higher levels of error on targeted functional categories, especially those marking tense. Conclusion When matched groups are examined and dialect-strategic scoring is used, sentence recall yields moderate-to-high levels of diagnostic accuracy to identify SLI within speakers of nonmainstream dialects of English. PMID:26501934

  20. Birth order and mortality in two ethno-linguistic groups: Register-based evidence from Finland.

    PubMed

    Saarela, Jan; Cederström, Agneta; Rostila, Mikael

    2016-06-01

    Previous research has documented an association between birth order and suicide, although no study has examined whether it depends on the cultural context. Our aim was to study the association between birth order and cause-specific mortality in Finland, and whether it varies by ethno-linguistic affiliation. We used data from the Finnish population register, representing a 5% random sample of all Finnish speakers and a 20% random sample of Swedish speakers, who lived in Finland in any year 1987-2011. For each person, there was a link to all children who were alive in 1987. In total, there were 254,059 siblings in 96,387 sibling groups, and 9797 deaths. We used Cox regressions stratified by each siblings group and estimated all-cause and cause-specific mortality risks during the period 1987-2011. In line with previous research from Sweden, deaths from suicide were significantly associated with birth order. As compared to first-born, second-born had a suicide risk of 1.27, third-born of 1.35, and fourth- or higher-born of 1.72, while other causes of death did not display an evident and consistent birth-order pattern. Results for the Finnish-speaking siblings groups were almost identical to those based on both ethno-linguistic groups. In the Swedish-speaking siblings groups, there was no increase in the suicide risk by birth order, but a statistically not significant tendency towards an association with other external causes of death and deaths from cardiovascular diseases. Our findings provided evidence for an association between birth order and suicide among Finnish speakers in Finland, while no such association was found for Swedish speakers, suggesting that the birth order effect might depend on the cultural context. Copyright © 2016 Elsevier Ltd. All rights reserved.

  1. 'All the better for not seeing you': effects of communicative context on the speech of an individual with acquired communication difficulties.

    PubMed

    Bruce, Carolyn; Braidwood, Ursula; Newton, Caroline

    2013-01-01

    Evidence shows that speakers adjust their speech depending on the demands of the listener. However, it is unclear whether people with acquired communication disorders can and do make similar adaptations. This study investigated the impact of different conversational settings on the intelligibility of a speaker with acquired communication difficulties. Twenty-eight assessors listened to recordings of the speaker reading aloud 40 words and 32 sentences to a listener who was either face-to-face or unseen. The speaker's ability to convey information was measured by the accuracy of assessors' orthographic transcriptions of the words and sentences. Assessors' scores were significantly higher in the unseen condition for the single word task particularly if they had heard the face-to-face condition first. Scores for the sentence task were significantly higher in the second presentation regardless of the condition. The results from this study suggest that therapy conducted in situations where the client is not able to see their conversation partner may encourage them to perform at a higher level and increase the clarity of their speech. Readers will be able to describe: (1) the range of conversational adjustments made by speakers without communication difficulties; (2) differences between these tasks in offering contextual information to the listener; and (3) the potential for using challenging communicative situations to improve the performance of adults with communication disorders. Copyright © 2013 Elsevier Inc. All rights reserved.

  2. Ways of looking ahead: hierarchical planning in language production.

    PubMed

    Lee, Eun-Kyung; Brown-Schmidt, Sarah; Watson, Duane G

    2013-12-01

    It is generally assumed that language production proceeds incrementally, with chunks of linguistic structure planned ahead of speech. Extensive research has examined the scope of language production and suggests that the size of planned chunks varies across contexts (Ferreira & Swets, 2002; Wagner & Jescheniak, 2010). By contrast, relatively little is known about the structure of advance planning, specifically whether planning proceeds incrementally according to the surface structure of the utterance, or whether speakers plan according to the hierarchical relationships between utterance elements. In two experiments, we examine the structure and scope of lexical planning in language production using a picture description task. Analyses of speech onset times and word durations show that speakers engage in hierarchical planning such that structurally dependent lexical items are planned together and that hierarchical planning occurs for both direct and indirect dependencies. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. Opposing and following responses in sensorimotor speech control: Why responses go both ways.

    PubMed

    Franken, Matthias K; Acheson, Daniel J; McQueen, James M; Hagoort, Peter; Eisner, Frank

    2018-06-04

    When talking, speakers continuously monitor and use the auditory feedback of their own voice to control and inform speech production processes. When speakers are provided with auditory feedback that is perturbed in real time, most of them compensate for this by opposing the feedback perturbation. But some responses follow the perturbation. In the present study, we investigated whether the state of the speech production system at perturbation onset may determine what type of response (opposing or following) is made. The results suggest that whether a perturbation-related response is opposing or following depends on ongoing fluctuations of the production system: The system initially responds by doing the opposite of what it was doing. This effect and the nontrivial proportion of following responses suggest that current production models are inadequate: They need to account for why responses to unexpected sensory feedback depend on the production system's state at the time of perturbation.

  4. General contrast effects in speech perception: effect of preceding liquid on stop consonant identification.

    PubMed

    Lotto, A J; Kluender, K R

    1998-05-01

    When members of a series of synthesized stop consonants varying acoustically in F3 characteristics and varying perceptually from /da/ to /ga/ are preceded by /al/, subjects report hearing more /ga/ syllables relative to when each member is preceded by /ar/ (Mann, 1980). It has been suggested that this result demonstrates the existence of a mechanism that compensates for coarticulation via tacit knowledge of articulatory dynamics and constraints, or through perceptual recovery of vocal-tract dynamics. The present study was designed to assess the degree to which these perceptual effects are specific to qualities of human articulatory sources. In three experiments, series of consonant-vowel (CV) stimuli varying in F3-onset frequency (/da/-/ga/) were preceded by speech versions or nonspeech analogues of /al/ and /ar/. The effect of liquid identity on stop consonant labeling remained when the preceding VC was produced by a female speaker and the CV syllable was modeled after a male speaker's productions. Labeling boundaries also shifted when the CV was preceded by a sine wave glide modeled after F3 characteristics of /al/ and /ar/. Identifications shifted even when the preceding sine wave was of constant frequency equal to the offset frequency of F3 from a natural production. These results suggest an explanation in terms of general auditory processes as opposed to recovery of or knowledge of specific articulatory dynamics.

  5. Using Discursis to enhance the qualitative analysis of hospital pharmacist-patient interactions.

    PubMed

    Chevalier, Bernadette A M; Watson, Bernadette M; Barras, Michael A; Cottrell, William N; Angus, Daniel J

    2018-01-01

    Pharmacist-patient communication during medication counselling has been successfully investigated using Communication Accommodation Theory (CAT). Communication researchers in other healthcare professions have utilised Discursis software as an adjunct to their manual qualitative analysis processes. Discursis provides a visual, chronological representation of communication exchanges and identifies patterns of interactant engagement. The aim of this study was to describe how Discursis software was used to enhance previously conducted qualitative analysis of pharmacist-patient interactions (by visualising pharmacist-patient speech patterns, episodes of engagement, and identifying CAT strategies employed by pharmacists within these episodes). Visual plots from 48 transcribed audio recordings of pharmacist-patient exchanges were generated by Discursis. Representative plots were selected to show moderate-high and low- level speaker engagement. Details of engagement were investigated for pharmacist application of CAT strategies (approximation, interpretability, discourse management, emotional expression, and interpersonal control). Discursis plots allowed for identification of distinct patterns occurring within pharmacist-patient exchanges. Moderate-high pharmacist-patient engagement was characterised by multiple off-diagonal squares while alternating single coloured squares depicted low engagement. Engagement episodes were associated with multiple CAT strategies such as discourse management (open-ended questions). Patterns reflecting pharmacist or patient speaker dominance were dependant on clinical setting. Discursis analysis of pharmacist-patient interactions, a novel application of the technology in health communication, was found to be an effective visualisation tool to pin-point episodes for CAT analysis. Discursis has numerous practical and theoretical applications for future health communication research and training. Researchers can use the software to support qualitative analysis where large data sets can be quickly reviewed to identify key areas for concentrated analysis. Because Discursis plots are easily generated from audio recorded transcripts, they are conducive as teaching tools for both students and practitioners to assess and develop their communication skills.

  6. The Voice of Emotion: Acoustic Properties of Six Emotional Expressions.

    NASA Astrophysics Data System (ADS)

    Baldwin, Carol May

    Studies in the perceptual identification of emotional states suggested that listeners seemed to depend on a limited set of vocal cues to distinguish among emotions. Linguistics and speech science literatures have indicated that this small set of cues included intensity, fundamental frequency, and temporal properties such as speech rate and duration. Little research has been done, however, to validate these cues in the production of emotional speech, or to determine if specific dimensions of each cue are associated with the production of a particular emotion for a variety of speakers. This study addressed deficiencies in understanding of the acoustical properties of duration and intensity as components of emotional speech by means of speech science instrumentation. Acoustic data were conveyed in a brief sentence spoken by twelve English speaking adult male and female subjects, half with dramatic training, and half without such training. Simulated expressions included: happiness, surprise, sadness, fear, anger, and disgust. The study demonstrated that the acoustic property of mean intensity served as an important cue for a vocal taxonomy. Overall duration was rejected as an element for a general taxonomy due to interactions involving gender and role. Findings suggested a gender-related taxonomy, however, based on differences in the ways in which men and women use the duration cue in their emotional expressions. Results also indicated that speaker training may influence greater use of the duration cue in expressions of emotion, particularly for male actors. Discussion of these results provided linkages to (1) practical management of emotional interactions in clinical and interpersonal environments, (2) implications for differences in the ways in which males and females may be socialized to express emotions, and (3) guidelines for future perceptual studies of emotional sensitivity.

  7. 76 FR 53525 - Final Environmental Impact Statement for the Proposed Keystone XL Project; Public Meetings

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-08-26

    ..., depending on the number of people who sign up to speak. Speakers will be asked to state their name and any organization with which they are affiliated. Depending on attendance, it may not be possible for all those who... take into account a wide range of factors, including environmental, economic, energy security, foreign...

  8. Structural impact and crashworthiness. Volume 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Davies, G.A.O.

    1984-01-01

    This volume contains the keynote addresses of those speakers invited to the International Confernece on Structural Impact and Crashworthiness held at Imperial College, London, in 1984. The speakers represent authoritative views on topics covering the spectrum of impact and crashworthiness involving several materials. The theme of this book may be summarized as 'understanding/modelling/prediction.' Ultimately a crashworthy design depends on many conceptual decisions being correct in the initial design phase. The overall configuration of a structure may be paramount; the detail design of joints and so on has to enable the structure to exploit energy absorption; the fail-safe features must notmore » be prohibitively expensive.« less

  9. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, J.F.; Ng, L.C.

    1998-03-17

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.

  10. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOEpatents

    Holzrichter, John F.; Ng, Lawrence C.

    1998-01-01

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used for purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching.

  11. Influence of encoding focus and stereotypes on source monitoring event-related-potentials.

    PubMed

    Leynes, P Andrew; Nagovsky, Irina

    2016-01-01

    Source memory, memory for the origin of a memory, can be influenced by stereotypes and the information of focus during encoding processes. Participants studied words from two different speakers (male or female) using self-focus or other-focus encoding. Source judgments for the speaker׳s voice and Event-Related Potentials (ERPs) were recorded during test. Self-focus encoding increased dependence on stereotype information and the Late Posterior Negativity (LPN). The results link the LPN with an increase in systematic decision processes such as consulting prior knowledge to support an episodic memory judgment. In addition, other-focus encoding increased conditional source judgments and resulted in weaker old/new recognition relative to the self-focus encoding. The putative correlate of recollection (LPC) was absent during this condition and this was taken as evidence that recollection of partial information supported source judgments. Collectively, the results suggest that other-focus encoding changes source monitoring processing by altering the weight of specific memory features. Copyright © 2015 Elsevier B.V. All rights reserved.

  12. The Impact of Dysphonic Voices on Healthy Listeners: Listener Reaction Times, Speech Intelligibility, and Listener Comprehension.

    PubMed

    Evitts, Paul M; Starmer, Heather; Teets, Kristine; Montgomery, Christen; Calhoun, Lauren; Schulze, Allison; MacKenzie, Jenna; Adams, Lauren

    2016-11-01

    There is currently minimal information on the impact of dysphonia secondary to phonotrauma on listeners. Considering the high incidence of voice disorders with professional voice users, it is important to understand the impact of a dysphonic voice on their audiences. Ninety-one healthy listeners (39 men, 52 women; mean age = 23.62 years) were presented with speech stimuli from 5 healthy speakers and 5 speakers diagnosed with dysphonia secondary to phonotrauma. Dependent variables included processing speed (reaction time [RT] ratio), speech intelligibility, and listener comprehension. Voice quality ratings were also obtained for all speakers by 3 expert listeners. Statistical results showed significant differences between RT ratio and number of speech intelligibility errors between healthy and dysphonic voices. There was not a significant difference in listener comprehension errors. Multiple regression analyses showed that voice quality ratings from the Consensus Assessment Perceptual Evaluation of Voice (Kempster, Gerratt, Verdolini Abbott, Barkmeier-Kraemer, & Hillman, 2009) were able to predict RT ratio and speech intelligibility but not listener comprehension. Results of the study suggest that although listeners require more time to process and have more intelligibility errors when presented with speech stimuli from speakers with dysphonia secondary to phonotrauma, listener comprehension may not be affected.

  13. Prosody and informativity: A cross-linguistic investigation

    NASA Astrophysics Data System (ADS)

    Ouyang, Iris Chuoying

    This dissertation aims to extend our knowledge of prosody -- in particular, what kinds of information may be conveyed through prosody, which prosodic dimensions may be used to convey them, and how individual speakers differ from one another in how they use prosody. Four production studies were conducted to examine how various factors interact with one another in shaping the prosody of an utterance and how prosody fulfills its multi-functional role. Experiments 1 explores the interaction between two types of informativity, namely information structure and information-theoretic properties. The results show that the prosodic consequences of new-information focus are modulated by the focused word's frequency, whereas the prosodic consequences of corrective focus are modulated by the focused word's probability in the context. Furthermore, f0 ranges appear to be more informative than f0 shapes in reflecting informativity across speakers. Specifically, speakers seem to have individual 'preferences' regarding f0 shapes, the f0 ranges they use for an utterance, and the magnitude of differences in f0 ranges by which they mark information-structural distinctions. In contrast, there is more cross-speaker validity in the actual directions of differences in f0 ranges between information-structural types. Experiments 2 and 3 further show that the interaction found between corrective focus and contextual probability depends on the interlocutor's knowledge state. When the interlocutor has no access to the crucial information concerning utterances' contextual probability, speakers prosodically emphasize contextually improbable corrections, but not contextually probable corrections. Furthermore, speakers prosodically emphasize the corrections in response to contextually probable misstatements, but not the corrections in response to contextually improbable misstatements. In contrast, completely opposite patterns are found when words' contextual probability is shared knowledge between the speaker and the interlocutor: speakers prosodically emphasize contextually probable corrections and the corrections in response to contextually improbable misstatements. Experiment 4 demonstrates the multi-functionality of prosody by investigating its discourse-level functions in Mandarin Chinese, a tone language where a word's prosodic patterns is crucial to its meaning. The results show that, although prosody serves fundamental, lexical-level functions in Mandarin Chinese, it nevertheless provides cues to information structure as well. Similar to what has been found with English, corrective information is prosodically more prominent than non-corrective information, and new information is prosodically more prominent than given information. Taken together, these experiments demonstrate the complex relationship between prosody and the different types of information it encodes in a given language. To better understand prosody, it is important to integrate insights from different traditions of research and to investigate across languages. In addition, the findings of this research suggest that speakers' assumptions about what their interlocutors know -- as well as speakers' ability to update these expectations -- play a key role in shaping the prosody of utterances. I hypothesize that prosodic prominence may reflect the gap between what speakers had expected their interlocutors to say and what their interlocutors have actually said.

  14. Training Japanese listeners to identify English /r/ and /l/: A first report

    PubMed Central

    Logan, John S.; Lively, Scott E.; Pisoni, David B.

    2012-01-01

    Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest–posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language. PMID:2016438

  15. A Computational Wireless Network Backplane: Performance in a Distributed Speaker Identification Application Postprint

    DTIC Science & Technology

    2008-12-01

    AUTHOR(S) H.T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah, John Grieco, Mark Huggins, and Bruce Suter 5d. PROJECT NUMBER WCNA 5e. TASK NUMBER...APPLICATION H. T. Kung, Chit-Kwan Lin, Chia-Yung Su, Dario Vlah John Grieco†, Mark Huggins‡, Bruce Suter† Harvard University Air Force Research Lab†, Oasis...contributing its C7 processors used in our wireless testbed. REFERENCES [1] R. North, N. Browne, and L. Schiavone , “Joint tactical radio system - connecting

  16. Knowledge of Connectors as Cohesion in Text: A Comparative Study of Native English and ESL (English as a Second Language) Speakers

    DTIC Science & Technology

    1989-08-18

    for public release; distribution 2b. DECLASSIFICATION / DOWNGRADING SCHEDULE ufnl i mi ted. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) S. MONITORING... ORGANIZATION REPORT NUMBER(S) 6a. NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION University of California (if...OF FUNDING/SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) N00014-85-K0562 8c. ADDRESS (City

  17. Using Nonword Repetition Tasks for the Identification of Language Impairment in Spanish-English Speaking Children: Does the Language of Assessment Matter?

    PubMed Central

    Gutiérrez-Clellen, Vera F.; Simon-Cereijido, Gabriela

    2012-01-01

    The purpose of this study was twofold: (a) to evaluate the clinical utility of a verbal working memory measure, specifically, a nonword repetition task, with a sample of Spanish-English bilingual children and (b) to determine the extent to which individual differences in relative language skills and language use had an effect on the clinical differentiation of these children by the measures. A total of 144 Latino children (95 children with typical language development and 49 children with language impairment) were tested using nonword lists developed for each language. The results show that the clinical accuracy of nonword repetition tasks varies depending on the language(s) tested. Test performance appeared related to individual differences in language use and exposure. The findings do not support a monolingual approach to the assessment of bilingual children with nonword repetition tasks, even if children appear fluent speakers in the language of testing. Nonword repetition may assist in the screening of Latino children if used bilingually and in combination with other clinical measures. PMID:22707854

  18. A positron emission tomography study of the neural basis of informational and energetic masking effects in speech perception

    NASA Astrophysics Data System (ADS)

    Scott, Sophie K.; Rosen, Stuart; Wickham, Lindsay; Wise, Richard J. S.

    2004-02-01

    Positron emission tomography (PET) was used to investigate the neural basis of the comprehension of speech in unmodulated noise (``energetic'' masking, dominated by effects at the auditory periphery), and when presented with another speaker (``informational'' masking, dominated by more central effects). Each type of signal was presented at four different signal-to-noise ratios (SNRs) (+3, 0, -3, -6 dB for the speech-in-speech, +6, +3, 0, -3 dB for the speech-in-noise), with listeners instructed to listen for meaning to the target speaker. Consistent with behavioral studies, there was SNR-dependent activation associated with the comprehension of speech in noise, with no SNR-dependent activity for the comprehension of speech-in-speech (at low or negative SNRs). There was, in addition, activation in bilateral superior temporal gyri which was associated with the informational masking condition. The extent to which this activation of classical ``speech'' areas of the temporal lobes might delineate the neural basis of the informational masking is considered, as is the relationship of these findings to the interfering effects of unattended speech and sound on more explicit working memory tasks. This study is a novel demonstration of candidate neural systems involved in the perception of speech in noisy environments, and of the processing of multiple speakers in the dorso-lateral temporal lobes.

  19. Linguistic Stereotyping in Older Adults' Perceptions of Health Care Aides.

    PubMed

    Rubin, Donald; Coles, Valerie Berenice; Barnett, Joshua Trey

    2016-07-01

    The cultural and linguistic diversity of the U.S. health care provider workforce is expanding. Diversity among health care personnel such as paraprofessional health care assistants (HCAs)-many of whom are immigrants-means that intimate, high-stakes cross-cultural and cross-linguistic contact characterizes many health interactions. In particular, nonmainstream HCAs may face negative patient expectations because of patients' language stereotypes. In other contexts, reverse linguistic stereotyping has been shown to result in negative speaker evaluations and even reduced listening comprehension quite independently of the actual language performance of the speaker. The present study extends the language and attitude paradigm to older adults' perceptions of HCAs. Listeners heard the identical speaker of Standard American English as they watched interactions between an HCA and an older patient. Ethnolinguistic identities-either an Anglo native speaker of English or a Mexican nonnative speaker-were ascribed to HCAs by means of fabricated personnel files. Dependent variables included measures of perceived HCA language proficiency, personal characteristics, and professional competence, as well as listeners' comprehension of a health message delivered by the putative HCA. For most of these outcomes, moderate effect sizes were found such that the HCA with an ascribed Anglo identity-relative to the Mexican guise-was judged more proficient in English, socially superior, interpersonally more attractive, more dynamic, and a more satisfactory home health aide. No difference in listening comprehension emerged, but the Anglo guise tended to engender a more compliant listening mind set. Results of this study can inform both provider-directed and patient-directed efforts to improve health care services for members of all linguistic and cultural groups.

  20. Oral-diadochokinesis rates across languages: English and Hebrew norms.

    PubMed

    Icht, Michal; Ben-David, Boaz M

    2014-01-01

    Oro-facial and speech motor control disorders represent a variety of speech and language pathologies. Early identification of such problems is important and carries clinical implications. A common and simple tool for gauging the presence and severity of speech motor control impairments is oral-diadochokinesis (oral-DDK). Surprisingly, norms for adult performance are missing from the literature. The goals of this study were: (1) to establish a norm for oral-DDK rate for (young to middle-age) adult English speakers, by collecting data from the literature (five studies, N=141); (2) to investigate the possible effect of language (and culture) on oral-DDK performance, by analyzing studies conducted in other languages (five studies, N=140), alongside the English norm; and (3) to find a new norm for adult Hebrew speakers, by testing 115 speakers. We first offer an English norm with a mean of 6.2syllables/s (SD=.8), and a lower boundary of 5.4syllables/s that can be used to indicate possible abnormality. Next, we found significant differences between four tested languages (English, Portuguese, Farsi and Greek) in oral-DDK rates. Results suggest the need to set language and culture sensitive norms for the application of the oral-DDK task world-wide. Finally, we found the oral-DDK performance for adult Hebrew speakers to be 6.4syllables/s (SD=.8), not significantly different than the English norms. This implies possible phonological similarities between English and Hebrew. We further note that no gender effects were found in our study. We recommend using oral-DDK as an important tool in the speech language pathologist's arsenal. Yet, application of this task should be done carefully, comparing individual performance to a set norm within the specific language. Readers will be able to: (1) identify the Speech-Language Pathologist assessment process using the oral-DDK task, by comparing an individual performance to the present English norm, (2) describe the impact of language on oral-DDK performance, and (3) accurately detect Hebrew speakers' patients using this tool. Copyright © 2014 Elsevier Inc. All rights reserved.

  1. Using perturbed handwriting to support writer identification in the presence of severe data constraints

    NASA Astrophysics Data System (ADS)

    Chen, Jin; Cheng, Wen; Lopresti, Daniel

    2011-01-01

    Since real data is time-consuming and expensive to collect and label, researchers have proposed approaches using synthetic variations for the tasks of signature verification, speaker authentication, handwriting recognition, keyword spotting, etc. However, the limitation of real data is particularly critical in the field of writer identification in that in forensics, adversaries cannot be expected to provide sufficient data to train a classifier. Therefore, it is unrealistic to always assume sufficient real data to train classifiers extensively for writer identification. In addition, this field differs from many others in that we strive to preserve as much inter-writer variations, but model-perturbed handwriting might break such discriminability among writers. Building on work described in another paper where human subjects were involved in calibrating realistic-looking transformation, we then measured the effects of incorporating perturbed handwriting into the training dataset. Experimental results justified our hypothesis that with limited real data, model-perturbed handwriting improved the performance of writer identification. Particularly, if only one single sample for each writer was available, incorporating perturbed data achieved a 36x performance gain.

  2. Speech coding, reconstruction and recognition using acoustics and electromagnetic waves

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.; Ng, L.C.

    The use of EM radiation in conjunction with simultaneously recorded acoustic speech information enables a complete mathematical coding of acoustic speech. The methods include the forming of a feature vector for each pitch period of voiced speech and the forming of feature vectors for each time frame of unvoiced, as well as for combined voiced and unvoiced speech. The methods include how to deconvolve the speech excitation function from the acoustic speech output to describe the transfer function each time frame. The formation of feature vectors defining all acoustic speech units over well defined time frames can be used formore » purposes of speech coding, speech compression, speaker identification, language-of-speech identification, speech recognition, speech synthesis, speech translation, speech telephony, and speech teaching. 35 figs.« less

  3. Blue-green color categorization in Mandarin-English speakers.

    PubMed

    Wuerger, Sophie; Xiao, Kaida; Mylonas, Dimitris; Huang, Qingmei; Karatzas, Dimosthenis; Hird, Emily; Paramei, Galina

    2012-02-01

    Observers are faster to detect a target among a set of distracters if the targets and distracters come from different color categories. This cross-boundary advantage seems to be limited to the right visual field, which is consistent with the dominance of the left hemisphere for language processing [Gilbert et al., Proc. Natl. Acad. Sci. USA 103, 489 (2006)]. Here we study whether a similar visual field advantage is found in the color identification task in speakers of Mandarin, a language that uses a logographic system. Forty late Mandarin-English bilinguals performed a blue-green color categorization task, in a blocked design, in their first language (L1: Mandarin) or second language (L2: English). Eleven color singletons ranging from blue to green were presented for 160 ms, randomly in the left visual field (LVF) or right visual field (RVF). Color boundary and reaction times (RTs) at the color boundary were estimated in L1 and L2, for both visual fields. We found that the color boundary did not differ between the languages; RTs at the color boundary, however, were on average more than 100 ms shorter in the English compared to the Mandarin sessions, but only when the stimuli were presented in the RVF. The finding may be explained by the script nature of the two languages: Mandarin logographic characters are analyzed visuospatially in the right hemisphere, which conceivably facilitates identification of color presented to the LVF. © 2012 Optical Society of America

  4. Speech endpoint detection with non-language speech sounds for generic speech processing applications

    NASA Astrophysics Data System (ADS)

    McClain, Matthew; Romanowski, Brian

    2009-05-01

    Non-language speech sounds (NLSS) are sounds produced by humans that do not carry linguistic information. Examples of these sounds are coughs, clicks, breaths, and filled pauses such as "uh" and "um" in English. NLSS are prominent in conversational speech, but can be a significant source of errors in speech processing applications. Traditionally, these sounds are ignored by speech endpoint detection algorithms, where speech regions are identified in the audio signal prior to processing. The ability to filter NLSS as a pre-processing step can significantly enhance the performance of many speech processing applications, such as speaker identification, language identification, and automatic speech recognition. In order to be used in all such applications, NLSS detection must be performed without the use of language models that provide knowledge of the phonology and lexical structure of speech. This is especially relevant to situations where the languages used in the audio are not known apriori. We present the results of preliminary experiments using data from American and British English speakers, in which segments of audio are classified as language speech sounds (LSS) or NLSS using a set of acoustic features designed for language-agnostic NLSS detection and a hidden-Markov model (HMM) to model speech generation. The results of these experiments indicate that the features and model used are capable of detection certain types of NLSS, such as breaths and clicks, while detection of other types of NLSS such as filled pauses will require future research.

  5. [The contribution of different cochlear insertion region to Mandarin speech perception in users of cochlear implant].

    PubMed

    Qi, Beier; Liu, Bo; Liu, Sha; Liu, Haihong; Dong, Ruijuan; Zhang, Ning; Gong, Shusheng

    2011-05-01

    To study the effect of cochlear electrode coverage and different insertion region on speech recognition, especially tone perception of cochlear implant users whose native language is Mandarin Chinese. Setting seven test conditions by fitting software. All conditions were created by switching on/off respective channels in order to simulate different insertion position. Then Mandarin CI users received 4 Speech tests, including Vowel Identification test, Consonant Identification test, Tone Identification test-male speaker, Mandarin HINT test (SRS) in quiet and noise. To all test conditions: the average score of vowel identification was significantly different, from 56% to 91% (Rank sum test, P < 0.05). The average score of consonant identification was significantly different, from 72% to 85% (ANOVNA, P < 0.05). The average score of Tone identification was not significantly different (ANOVNA, P > 0.05). However the more channels activated, the higher scores obtained, from 68% to 81%. This study shows that there is a correlation between insertion depth and speech recognition. Because all parts of the basement membrane can help CI users to improve their speech recognition ability, it is very important to enhance verbal communication ability and social interaction ability of CI users by increasing insertion depth and actively stimulating the top region of cochlear.

  6. Deep bottleneck features for spoken language identification.

    PubMed

    Jiang, Bing; Song, Yan; Wei, Si; Liu, Jun-Hua; McLoughlin, Ian Vince; Dai, Li-Rong

    2014-01-01

    A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for short duration speech utterances. With the hypothesis that language information is weak and represented only latently in speech, and is largely dependent on the statistical properties of the speech content, existing representations may be insufficient. Furthermore they may be susceptible to the variations caused by different speakers, specific content of the speech segments, and background noise. To address this, we propose using Deep Bottleneck Features (DBF) for spoken LID, motivated by the success of Deep Neural Networks (DNN) in speech recognition. We show that DBFs can form a low-dimensional compact representation of the original inputs with a powerful descriptive and discriminative capability. To evaluate the effectiveness of this, we design two acoustic models, termed DBF-TV and parallel DBF-TV (PDBF-TV), using a DBF based i-vector representation for each speech utterance. Results on NIST language recognition evaluation 2009 (LRE09) show significant improvements over state-of-the-art systems. By fusing the output of phonotactic and acoustic approaches, we achieve an EER of 1.08%, 1.89% and 7.01% for 30 s, 10 s and 3 s test utterances respectively. Furthermore, various DBF configurations have been extensively evaluated, and an optimal system proposed.

  7. Domain-specific impairment of source memory following a right posterior medial temporal lobe lesion.

    PubMed

    Peters, Jan; Koch, Benno; Schwarz, Michael; Daum, Irene

    2007-01-01

    This single case analysis of memory performance in a patient with an ischemic lesion affecting posterior but not anterior right medial temporal lobe (MTL) indicates that source memory can be disrupted in a domain-specific manner. The patient showed normal recognition memory for gray-scale photos of objects (visual condition) and spoken words (auditory condition). While memory for visual source (texture/color of the background against which pictures appeared) was within the normal range, auditory source memory (male/female speaker voice) was at chance level, a performance pattern significantly different from the control group. This dissociation is consistent with recent fMRI evidence of anterior/posterior MTL dissociations depending upon the nature of source information (visual texture/color vs. auditory speaker voice). The findings are in good agreement with the view of dissociable memory processing by the perirhinal cortex (anterior MTL) and parahippocampal cortex (posterior MTL), depending upon the neocortical input that these regions receive. (c) 2007 Wiley-Liss, Inc.

  8. Beyond the language given: the neural correlates of inferring speaker meaning.

    PubMed

    Bašnáková, Jana; Weber, Kirsten; Petersson, Karl Magnus; van Berkum, Jos; Hagoort, Peter

    2014-10-01

    Even though language allows us to say exactly what we mean, we often use language to say things indirectly, in a way that depends on the specific communicative context. For example, we can use an apparently straightforward sentence like "It is hard to give a good presentation" to convey deeper meanings, like "Your talk was a mess!" One of the big puzzles in language science is how listeners work out what speakers really mean, which is a skill absolutely central to communication. However, most neuroimaging studies of language comprehension have focused on the arguably much simpler, context-independent process of understanding direct utterances. To examine the neural systems involved in getting at contextually constrained indirect meaning, we used functional magnetic resonance imaging as people listened to indirect replies in spoken dialog. Relative to direct control utterances, indirect replies engaged dorsomedial prefrontal cortex, right temporo-parietal junction and insula, as well as bilateral inferior frontal gyrus and right medial temporal gyrus. This suggests that listeners take the speaker's perspective on both cognitive (theory of mind) and affective (empathy-like) levels. In line with classic pragmatic theories, our results also indicate that currently popular "simulationist" accounts of language comprehension fail to explain how listeners understand the speaker's intended message. © The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Native Language Experience Shapes Neural Basis of Addressed and Assembled Phonologies

    PubMed Central

    Mei, Leilei; Xue, Gui; Lu, Zhong-Lin; He, Qinghua; Wei, Miao; Zhang, Mingxia; Dong, Qi; Chen, Chuansheng

    2015-01-01

    Previous studies have suggested differential engagement of addressed and assembled phonologies in reading Chinese and alphabetic languages (e.g., English) and the modulatory role of native language in learning to read a second language. However, it is not clear whether native language experience shapes the neural mechanisms of addressed and assembled phonologies. To address this question, we trained native Chinese and native English speakers to read the same artificial language (based on Korean Hangul) either through addressed (i.e., whole-word mapping) or assembled (i.e., grapheme-to-phoneme mapping) phonology. We found that, for both native Chinese and native English speakers, addressed phonology relied on the regions in the ventral pathway, whereas assembled phonology depended on the regions in the dorsal pathway. More importantly, we found that the neural mechanisms of addressed and assembled phonologies were shaped by native language experience. Specifically, two key regions for addressed phonology (i.e., the left middle temporal gyrus and right inferior temporal gyrus) showed greater activation for addressed phonology in native Chinese speakers, while one key region for assembled phonology (i.e., the left supramarginal gyrus) showed more activation for assembled phonology in native English speakers. These results provide direct neuroimaging evidence for the effect of native language experience on the neural mechanisms of phonological access in a new language and support the assimilation-accommodation hypothesis. PMID:25858447

  10. Did you or I say pretty, rude or brief? An ERP study of the effects of speaker's identity on emotional word processing.

    PubMed

    Pinheiro, Ana P; Rezaii, Neguine; Nestor, Paul G; Rauber, Andréia; Spencer, Kevin M; Niznikiewicz, Margaret

    2016-02-01

    During speech comprehension, multiple cues need to be integrated at a millisecond speed, including semantic information, as well as voice identity and affect cues. A processing advantage has been demonstrated for self-related stimuli when compared with non-self stimuli, and for emotional relative to neutral stimuli. However, very few studies investigated self-other speech discrimination and, in particular, how emotional valence and voice identity interactively modulate speech processing. In the present study we probed how the processing of words' semantic valence is modulated by speaker's identity (self vs. non-self voice). Sixteen healthy subjects listened to 420 prerecorded adjectives differing in voice identity (self vs. non-self) and semantic valence (neutral, positive and negative), while electroencephalographic data were recorded. Participants were instructed to decide whether the speech they heard was their own (self-speech condition), someone else's (non-self speech), or if they were unsure. The ERP results demonstrated interactive effects of speaker's identity and emotional valence on both early (N1, P2) and late (Late Positive Potential - LPP) processing stages: compared with non-self speech, self-speech with neutral valence elicited more negative N1 amplitude, self-speech with positive valence elicited more positive P2 amplitude, and self-speech with both positive and negative valence elicited more positive LPP. ERP differences between self and non-self speech occurred in spite of similar accuracy in the recognition of both types of stimuli. Together, these findings suggest that emotion and speaker's identity interact during speech processing, in line with observations of partially dependent processing of speech and speaker information. Copyright © 2016. Published by Elsevier Inc.

  11. Flexible spatial perspective-taking: conversational partners weigh multiple cues in collaborative tasks.

    PubMed

    Galati, Alexia; Avraamides, Marios N

    2013-01-01

    Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., "to your right") when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks.

  12. Flexible spatial perspective-taking: conversational partners weigh multiple cues in collaborative tasks

    PubMed Central

    Galati, Alexia; Avraamides, Marios N.

    2013-01-01

    Research on spatial perspective-taking often focuses on the cognitive processes of isolated individuals as they adopt or maintain imagined perspectives. Collaborative studies of spatial perspective-taking typically examine speakers' linguistic choices, while overlooking their underlying processes and representations. We review evidence from two collaborative experiments that examine the contribution of social and representational cues to spatial perspective choices in both language and the organization of spatial memory. Across experiments, speakers organized their memory representations according to the convergence of various cues. When layouts were randomly configured and did not afford intrinsic cues, speakers encoded their partner's viewpoint in memory, if available, but did not use it as an organizing direction. On the other hand, when the layout afforded an intrinsic structure, speakers organized their spatial memories according to the person-centered perspective reinforced by the layout's structure. Similarly, in descriptions, speakers considered multiple cues whether available a priori or at the interaction. They used partner-centered expressions more frequently (e.g., “to your right”) when the partner's viewpoint was misaligned by a small offset or coincided with the layout's structure. Conversely, they used egocentric expressions more frequently when their own viewpoint coincided with the intrinsic structure or when the partner was misaligned by a computationally difficult, oblique offset. Based on these findings we advocate for a framework for flexible perspective-taking: people weigh multiple cues (including social ones) to make attributions about the relative difficulty of perspective-taking for each partner, and adapt behavior to minimize their collective effort. This framework is not specialized for spatial reasoning but instead emerges from the same principles and memory-depended processes that govern perspective-taking in non-spatial tasks. PMID:24133432

  13. TU-D-213AB-01: How You Can Be the Speaker and Communicator Everyone Wants You to Be.

    PubMed

    Collins, J; Aydogan, B

    2012-06-01

    Effectiveness of an oral presentation depends on the ability of the speaker to communicate with the audience. An important part of this communication is focusing on two to five key points and emphasizing those points during the presentation. Every aspect of the presentation should be purposeful and directed at facilitating learners' achievement of the objectives. This necessitates that the speaker has carefully developed the objectives and built the presentation around attainment of the objectives. A presentation should be designed to include as much audience participation as possible, no matter the size of the audience. Techniques to encourage audience participation include questioning, brainstorming, small-group activities, role-playing, case-based examples, directed listening, and use of an audience response system. It is first necessary to motivate and gain attention of the learner for learning to take place. This can be accomplished through appropriate use of humor, anecdotes, and quotations. This course will review adult learning principles and effective presentation skills, Learning Objectives: 1. Apply adult learning principles. 2. Demonstrate effective presentations skills. © 2012 American Association of Physicists in Medicine.

  14. General perceptual contributions to lexical tone normalization.

    PubMed

    Huang, Jingyuan; Holt, Lori L

    2009-06-01

    Within tone languages that use pitch variations to contrast meaning, large variability exists in the pitches produced by different speakers. Context-dependent perception may help to resolve this perceptual challenge. However, whether speakers rely on context in contour tone perception is unclear; previous studies have produced inconsistent results. The present study aimed to provide an unambiguous test of the effect of context on contour lexical tone perception and to explore its underlying mechanisms. In three experiments, Mandarin listeners' perception of Mandarin first and second (high-level and mid-rising) tones was investigated with preceding speech and non-speech contexts. Results indicate that the mean fundamental frequency (f0) of a preceding sentence affects perception of contour lexical tones and the effect is contrastive. Following a sentence with a higher-frequency mean f0, the following syllable is more likely to be perceived as a lower frequency lexical tone and vice versa. Moreover, non-speech precursors modeling the mean spectrum of f0 also elicit this effect, suggesting general perceptual processing rather than articulatory-based or speaker-identity-driven mechanisms.

  15. Speaking fundamental frequency and vowel formant frequencies: effects on perception of gender.

    PubMed

    Gelfer, Marylou Pausewang; Bennett, Quinn E

    2013-09-01

    The purpose of the present study was to investigate the contribution of vowel formant frequencies to gender identification in connected speech, the distinctiveness of vowel formants in males versus females, and how ambiguous speaking fundamental frequencies (SFFs) and vowel formants might affect perception of gender. Multivalent experimental. Speakers subjects (eight tall males, eight short females, and seven males and seven females of "middle" height) were recorded saying two carrier phrases to elicit the vowels /i/ and /α/ and a sentence. The gender/height groups were selected to (presumably) maximize formant differences between some groups (tall vs short) and minimize differences between others (middle height). Each subjects' samples were digitally altered to distinct SFFs (116, 145, 155, 165, and 207 Hz) to represent SFFs typical of average males, average females, and in an ambiguous range. Listeners judged the gender of each randomized altered speech sample. Results indicated that female speakers were perceived as female even with an SFF in the typical male range. For male speakers, gender perception was less accurate at SFFs of 165 Hz and higher. Although the ranges of vowel formants had considerable overlap between genders, significant differences in formant frequencies of males and females were seen. Vowel formants appeared to be important to perception of gender, especially for SFFs in the range of 145-165 Hz; however, formants may be a more salient cue in connected speech when compared with isolated vowels or syllables. Copyright © 2013 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  16. Effect of gender on communication of health information to older adults.

    PubMed

    Dearborn, Jennifer L; Panzer, Victoria P; Burleson, Joseph A; Hornung, Frederick E; Waite, Harrison; Into, Frances H

    2006-04-01

    To examine the effect of gender on three key elements of communication with elderly individuals: effectiveness of the communication, perceived relevance to the individual, and effect of gender-stereotyped content. Survey. University of Connecticut Health Center. Thirty-three subjects (17 female); aged 69 to 91 (mean+/-standard deviation 82+/-5.4). Older adults listened to 16 brief narratives randomized in order and by the sex of the speaker (Narrator Voice). Effectiveness was measured according to ability to identify key features (Risks), and subjects were asked to rate the relevance (Plausibility). Number of Risks detected and determinations of plausibility were analyzed according to Subject Gender and Narrator Voice. Narratives were written for either sex or included male or female bias (Neutral or Stereotyped). Female subjects identified a significantly higher number of Risks across all narratives (P=.01). Subjects perceived a significantly higher number of Risks with a female Narrator Voice (P=.03). A significant Voice-by-Stereotype interaction was present for female-stereotyped narratives (P=.009). In narratives rated as Plausible, subjects detected more Risks (P=.02). Subject Gender influenced communication effectiveness. A female speaker resulted in identification of more Risks for subjects of both sexes, particularly for Stereotyped narratives. There was no significant effect of matching Subject Gender and Narrator Voice. This study suggests that the sex of the speaker influences the effectiveness of communication with older adults. These findings should motivate future research into the means by which medical providers can improve communication with their patients.

  17. Verification of endocrinological functions at a short distance between parametric speakers and the human body.

    PubMed

    Lee, Soomin; Katsuura, Tetsuo; Shimomura, Yoshihiro

    2011-01-01

    In recent years, a new type of speaker called the parametric speaker has been used to generate highly directional sound, and these speakers are now commercially available. In our previous study, we verified that the burden of the parametric speaker was lower than that of the general speaker for endocrine functions. However, nothing has yet been demonstrated about the effects of the shorter distance than 2.6 m between parametric speakers and the human body. Therefore, we investigated the distance effect on endocrinological function and subjective evaluation. Nine male subjects participated in this study. They completed three consecutive sessions: a 20-min quiet period as a baseline, a 30-min mental task period with general speakers or parametric speakers, and a 20-min recovery period. We measured salivary cortisol and chromogranin A (CgA) concentrations. Furthermore, subjects took the Kwansei-gakuin Sleepiness Scale (KSS) test before and after the task and also a sound quality evaluation test after it. Four experiments, one with a speaker condition (general speaker and parametric speaker), the other with a distance condition (0.3 m and 1.0 m), were conducted, respectively, at the same time of day on separate days. We used three-way repeated measures ANOVA (speaker factor × distance factor × time factor) to examine the effects of the parametric speaker. We found that the endocrinological functions were not significantly different between the speaker condition and the distance condition. The results also showed that the physiological burdens increased with progress in time independent of the speaker condition and distance condition.

  18. The Communication of Public Speaking Anxiety: Perceptions of Asian and American Speakers.

    ERIC Educational Resources Information Center

    Martini, Marianne; And Others

    1992-01-01

    Finds that U.S. audiences perceive Asian speakers to have more speech anxiety than U.S. speakers, even though Asian speakers do not self-report higher anxiety levels. Confirms that speech state anxiety is not communicated effectively between speakers and audiences for Asian or U.S. speakers. (SR)

  19. An Investigation of Syntactic Priming among German Speakers at Varying Proficiency Levels

    ERIC Educational Resources Information Center

    Ruf, Helena T.

    2011-01-01

    This dissertation investigates syntactic priming in second language (L2) development among three speaker populations: (1) less proficient L2 speakers; (2) advanced L2 speakers; and (3) LI speakers. Using confederate scripting this study examines how German speakers choose certain word orders in locative constructions (e.g., "Auf dem Tisch…

  20. Modeling Speaker Proficiency, Comprehensibility, and Perceived Competence in a Language Use Domain

    ERIC Educational Resources Information Center

    Schmidgall, Jonathan Edgar

    2013-01-01

    Research suggests that listener perceptions of a speaker's oral language use, or a speaker's "comprehensibility," may be influenced by a variety of speaker-, listener-, and context-related factors. Primary speaker factors include aspects of the speaker's proficiency in the target language such as pronunciation and…

  1. Methods and apparatus for non-acoustic speech characterization and recognition

    DOEpatents

    Holzrichter, John F.

    1999-01-01

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  2. Methods and apparatus for non-acoustic speech characterization and recognition

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Holzrichter, J.F.

    By simultaneously recording EM wave reflections and acoustic speech information, the positions and velocities of the speech organs as speech is articulated can be defined for each acoustic speech unit. Well defined time frames and feature vectors describing the speech, to the degree required, can be formed. Such feature vectors can uniquely characterize the speech unit being articulated each time frame. The onset of speech, rejection of external noise, vocalized pitch periods, articulator conditions, accurate timing, the identification of the speaker, acoustic speech unit recognition, and organ mechanical parameters can be determined.

  3. Do children go for the nice guys? The influence of speaker benevolence and certainty on selective word learning.

    PubMed

    Bergstra, Myrthe; DE Mulder, Hannah N M; Coopmans, Peter

    2018-04-06

    This study investigated how speaker certainty (a rational cue) and speaker benevolence (an emotional cue) influence children's willingness to learn words in a selective learning paradigm. In two experiments four- to six-year-olds learnt novel labels from two speakers and, after a week, their memory for these labels was reassessed. Results demonstrated that children retained the label-object pairings for at least a week. Furthermore, children preferred to learn from certain over uncertain speakers, but they had no significant preference for nice over nasty speakers. When the cues were combined, children followed certain speakers, even if they were nasty. However, children did prefer to learn from nice and certain speakers over nasty and certain speakers. These results suggest that rational cues regarding a speaker's linguistic competence trump emotional cues regarding a speaker's affective status in word learning. However, emotional cues were found to have a subtle influence on this process.

  4. Depression recognition and capacity for self-report among ethnically diverse nursing homes residents: Evidence of disparities in screening.

    PubMed

    Chun, Audrey; Reinhardt, Joann P; Ramirez, Mildred; Ellis, Julie M; Silver, Stephanie; Burack, Orah; Eimicke, Joseph P; Cimarolli, Verena; Teresi, Jeanne A

    2017-12-01

    To examine agreement between Minimum Data Set clinician ratings and researcher assessments of depression among ethnically diverse nursing home residents using the 9-item Patient Health Questionnaire. Although depression is common among nursing homes residents, its recognition remains a challenge. Observational baseline data from a longitudinal intervention study. Sample of 155 residents from 12 long-term care units in one US facility; 50 were interviewed in Spanish. Convergence between clinician and researcher ratings was examined for (i) self-report capacity, (ii) suicidal ideation, (iii) at least moderate depression, (iv) Patient Health Questionnaire severity scores. Experiences by clinical raters using the depression assessment were analysed. The intraclass correlation coefficient was used to examine concordance and Cohen's kappa to examine agreement between clinicians and researchers. Moderate agreement (κ = 0.52) was observed in determination of capacity and poor to fair agreement in reporting suicidal ideation (κ = 0.10-0.37) across time intervals. Poor agreement was observed in classification of at least moderate depression (κ = -0.02 to 0.24), lower than the maximum kappa obtainable (0.58-0.85). Eight assessors indicated problems assessing Spanish-speaking residents. Among Spanish speakers, researchers identified 16% with Patient Health Questionnaire scores of 10 or greater, and 14% with thoughts of self-harm whilst clinicians identified 6% and 0%, respectively. This study advances the field of depression recognition in long-term care by identification of possible challenges in assessing Spanish speakers. Use of the Patient Health Questionnaire requires further investigation, particularly among non-English speakers. Depression screening for ethnically diverse nursing home residents is required, as underreporting of depression and suicidal ideation among Spanish speakers may result in lack of depression recognition and referral for evaluation and treatment. Training in depression recognition is imperative to improve the recognition, evaluation and treatment of depression in older people living in nursing homes. © 2017 John Wiley & Sons Ltd.

  5. Evaluating the lexico-grammatical differences in the writing of native and non-native speakers of English in peer-reviewed medical journals in the field of pediatric oncology: Creation of the genuine index scoring system.

    PubMed

    Gayle, Alberto Alexander; Shimaoka, Motomu

    2017-01-01

    The predominance of English in scientific research has created hurdles for "non-native speakers" of English. Here we present a novel application of native language identification (NLI) for the assessment of medical-scientific writing. For this purpose, we created a novel classification system whereby scoring would be based solely on text features found to be distinctive among native English speakers (NS) within a given context. We dubbed this the "Genuine Index" (GI). This methodology was validated using a small set of journals in the field of pediatric oncology. Our dataset consisted of 5,907 abstracts, representing work from 77 countries. A support vector machine (SVM) was used to generate our model and for scoring. Accuracy, precision, and recall of the classification model were 93.3%, 93.7%, and 99.4%, respectively. Class specific F-scores were 96.5% for NS and 39.8% for our benchmark class, Japan. Overall kappa was calculated to be 37.2%. We found significant differences between countries with respect to the GI score. Significant correlation was found between GI scores and two validated objective measures of writing proficiency and readability. Two sets of key terms and phrases differentiating NS and non-native writing were identified. Our GI model was able to detect, with a high degree of reliability, subtle differences between the terms and phrasing used by native and non-native speakers in peer reviewed journals, in the field of pediatric oncology. In addition, L1 language transfer was found to be very likely to survive revision, especially in non-Western countries such as Japan. These findings show that even when the language used is technically correct, there may still be some phrasing or usage that impact quality.

  6. Multisensory and modality specific processing of visual speech in different regions of the premotor cortex

    PubMed Central

    Callan, Daniel E.; Jones, Jeffery A.; Callan, Akiko

    2014-01-01

    Behavioral and neuroimaging studies have demonstrated that brain regions involved with speech production also support speech perception, especially under degraded conditions. The premotor cortex (PMC) has been shown to be active during both observation and execution of action (“Mirror System” properties), and may facilitate speech perception by mapping unimodal and multimodal sensory features onto articulatory speech gestures. For this functional magnetic resonance imaging (fMRI) study, participants identified vowels produced by a speaker in audio-visual (saw the speaker's articulating face and heard her voice), visual only (only saw the speaker's articulating face), and audio only (only heard the speaker's voice) conditions with varying audio signal-to-noise ratios in order to determine the regions of the PMC involved with multisensory and modality specific processing of visual speech gestures. The task was designed so that identification could be made with a high level of accuracy from visual only stimuli to control for task difficulty and differences in intelligibility. The results of the functional magnetic resonance imaging (fMRI) analysis for visual only and audio-visual conditions showed overlapping activity in inferior frontal gyrus and PMC. The left ventral inferior premotor cortex (PMvi) showed properties of multimodal (audio-visual) enhancement with a degraded auditory signal. The left inferior parietal lobule and right cerebellum also showed these properties. The left ventral superior and dorsal premotor cortex (PMvs/PMd) did not show this multisensory enhancement effect, but there was greater activity for the visual only over audio-visual conditions in these areas. The results suggest that the inferior regions of the ventral premotor cortex are involved with integrating multisensory information, whereas, more superior and dorsal regions of the PMC are involved with mapping unimodal (in this case visual) sensory features of the speech signal with articulatory speech gestures. PMID:24860526

  7. Improvements of ModalMax High-Fidelity Piezoelectric Audio Device

    NASA Technical Reports Server (NTRS)

    Woodard, Stanley E.

    2005-01-01

    ModalMax audio speakers have been enhanced by innovative means of tailoring the vibration response of thin piezoelectric plates to produce a high-fidelity audio response. The ModalMax audio speakers are 1 mm in thickness. The device completely supplants the need to have a separate driver and speaker cone. ModalMax speakers can perform the same applications of cone speakers, but unlike cone speakers, ModalMax speakers can function in harsh environments such as high humidity or extreme wetness. New design features allow the speakers to be completely submersed in salt water, making them well suited for maritime applications. The sound produced from the ModalMax audio speakers has sound spatial resolution that is readily discernable for headset users.

  8. Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners

    PubMed Central

    Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S.; Cho, Chang Hyun

    2018-01-01

    Background and Objectives It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Subjects and Methods Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. Results CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. Conclusions This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing. PMID:29325391

  9. Frequency-Limiting Effects on Speech and Environmental Sound Identification for Cochlear Implant and Normal Hearing Listeners.

    PubMed

    Chang, Son-A; Won, Jong Ho; Kim, HyangHee; Oh, Seung-Ha; Tyler, Richard S; Cho, Chang Hyun

    2017-12-01

    It is important to understand the frequency region of cues used, and not used, by cochlear implant (CI) recipients. Speech and environmental sound recognition by individuals with CI and normal-hearing (NH) was measured. Gradients were also computed to evaluate the pattern of change in identification performance with respect to the low-pass filtering or high-pass filtering cutoff frequencies. Frequency-limiting effects were implemented in the acoustic waveforms by passing the signals through low-pass filters (LPFs) or high-pass filters (HPFs) with seven different cutoff frequencies. Identification of Korean vowels and consonants produced by a male and female speaker and environmental sounds was measured. Crossover frequencies were determined for each identification test, where the LPF and HPF conditions show the identical identification scores. CI and NH subjects showed changes in identification performance in a similar manner as a function of cutoff frequency for the LPF and HPF conditions, suggesting that the degraded spectral information in the acoustic signals may similarly constraint the identification performance for both subject groups. However, CI subjects were generally less efficient than NH subjects in using the limited spectral information for speech and environmental sound identification due to the inefficient coding of acoustic cues through the CI sound processors. This finding will provide vital information in Korean for understanding how different the frequency information is in receiving speech and environmental sounds by CI processor from normal hearing.

  10. Partially supervised speaker clustering.

    PubMed

    Tang, Hao; Chu, Stephen Mingyu; Hasegawa-Johnson, Mark; Huang, Thomas S

    2012-05-01

    Content-based multimedia indexing, retrieval, and processing as well as multimedia databases demand the structuring of the media content (image, audio, video, text, etc.), one significant goal being to associate the identity of the content to the individual segments of the signals. In this paper, we specifically address the problem of speaker clustering, the task of assigning every speech utterance in an audio stream to its speaker. We offer a complete treatment to the idea of partially supervised speaker clustering, which refers to the use of our prior knowledge of speakers in general to assist the unsupervised speaker clustering process. By means of an independent training data set, we encode the prior knowledge at the various stages of the speaker clustering pipeline via 1) learning a speaker-discriminative acoustic feature transformation, 2) learning a universal speaker prior model, and 3) learning a discriminative speaker subspace, or equivalently, a speaker-discriminative distance metric. We study the directional scattering property of the Gaussian mixture model (GMM) mean supervector representation of utterances in the high-dimensional space, and advocate exploiting this property by using the cosine distance metric instead of the euclidean distance metric for speaker clustering in the GMM mean supervector space. We propose to perform discriminant analysis based on the cosine distance metric, which leads to a novel distance metric learning algorithm—linear spherical discriminant analysis (LSDA). We show that the proposed LSDA formulation can be systematically solved within the elegant graph embedding general dimensionality reduction framework. Our speaker clustering experiments on the GALE database clearly indicate that 1) our speaker clustering methods based on the GMM mean supervector representation and vector-based distance metrics outperform traditional speaker clustering methods based on the “bag of acoustic features” representation and statistical model-based distance metrics, 2) our advocated use of the cosine distance metric yields consistent increases in the speaker clustering performance as compared to the commonly used euclidean distance metric, 3) our partially supervised speaker clustering concept and strategies significantly improve the speaker clustering performance over the baselines, and 4) our proposed LSDA algorithm further leads to state-of-the-art speaker clustering performance.

  11. Context Effects on Lexical Choice and Lexical Activation

    ERIC Educational Resources Information Center

    Jescheniak, Jorg D.; Hantsch, Ansgar; Schriefers, Herbert

    2005-01-01

    Speakers are regularly confronted with the choice among lexical alternatives when referring to objects, including basic-level names (e.g., car) and subordinate-level names (e.g., Beetle). Which of these names is eventually selected often depends on contextual factors. The present article reports a series of picture-word interference experiments…

  12. Reconsidering Language Orientation for Undergraduate Singers

    ERIC Educational Resources Information Center

    Paver, Barbara E.

    2009-01-01

    Foreign language lyric diction is a compulsory subject in all undergraduate vocal performance degrees in universities. However, the effectiveness of its teaching depends on the capacity of students to absorb the material, for which many are largely unprepared, due to their lack of previous language study. Further, native speakers of North American…

  13. Effects of Acoustic Variability on Second Language Vocabulary Learning

    ERIC Educational Resources Information Center

    Barcroft, Joe; Sommers, Mitchell S.

    2005-01-01

    This study examined the effects of acoustic variability on second language vocabulary learning. English native speakers learned new words in Spanish. Exposure frequency to the words was constant. Dependent measures were accuracy and latency of picture-to-Spanish and Spanish-to-English recall. Experiment 1 compared presentation formats of neutral…

  14. Facets of Speaking Proficiency

    ERIC Educational Resources Information Center

    de Jong, Nivja H.; Steinel, Margarita P.; Florijn, Arjen F.; Schoonen, Rob; Hulstijn, Jan H.

    2012-01-01

    This study examined the componential structure of second-language (L2) speaking proficiency. Participants--181 L2 and 54 native speakers of Dutch--performed eight speaking tasks and six tasks tapping nine linguistic skills. Performance in the speaking tasks was rated on functional adequacy by a panel of judges and formed the dependent variable in…

  15. Teaching First Language Speakers to Communicate across Linguistic Difference: Addressing Attitudes, Comprehension, and Strategies

    ERIC Educational Resources Information Center

    Subtirelu, Nicholas Close; Lindemann, Stephanie

    2016-01-01

    While most research in applied linguistics has focused on second language (L2) speakers and their language capabilities, the success of interaction between such speakers and first language (L1) speakers also relies on the positive attitudes and communication skills of the L1 speakers. However, some research has suggested that many L1 speakers lack…

  16. Temporal and acoustic characteristics of Greek vowels produced by adults with cerebral palsy

    NASA Astrophysics Data System (ADS)

    Botinis, Antonis; Orfanidou, Ioanna; Fourakis, Marios; Fourakis, Marios

    2005-09-01

    The present investigation examined the temporal and spectral characteristics of Greek vowels as produced by speakers with intact (NO) versus cerebral palsy affected (CP) neuromuscular systems. Six NO and six CP native speakers of Greek produced the Greek vowels [i, e, a, o, u] in the first syllable of CVCV nonsense words in a short carrier phrase. Stress could be on either the first or second syllable. There were three female and three male speakers in each group. In terms of temporal characteristics, the results showed that: vowels produced by CP speakers were longer than vowels produced by NO speakers; stressed vowels were longer than unstressed vowels; vowels produced by female speakers were longer than vowels produced by male speakers. In terms of spectral characteristics the results showed that the vowel space of the CP speakers was smaller than that of the NO speakers. This is similar to the results recently reported by Liu et al. [J. Acoust. Soc. Am. 117, 3879-3889 (2005)] for CP speakers of Mandarin. There was also a reduction of the acoustic vowel space defined by unstressed vowels, but this reduction was much more pronounced in the vowel productions of CP speakers than NO speakers.

  17. Consistency between verbal and non-verbal affective cues: a clue to speaker credibility.

    PubMed

    Gillis, Randall L; Nilsen, Elizabeth S

    2017-06-01

    Listeners are exposed to inconsistencies in communication; for example, when speakers' words (i.e. verbal) are discrepant with their demonstrated emotions (i.e. non-verbal). Such inconsistencies introduce ambiguity, which may render a speaker to be a less credible source of information. Two experiments examined whether children make credibility discriminations based on the consistency of speakers' affect cues. In Experiment 1, school-age children (7- to 8-year-olds) preferred to solicit information from consistent speakers (e.g. those who provided a negative statement with negative affect), over novel speakers, to a greater extent than they preferred to solicit information from inconsistent speakers (e.g. those who provided a negative statement with positive affect) over novel speakers. Preschoolers (4- to 5-year-olds) did not demonstrate this preference. Experiment 2 showed that school-age children's ratings of speakers were influenced by speakers' affect consistency when the attribute being judged was related to information acquisition (speakers' believability, "weird" speech), but not general characteristics (speakers' friendliness, likeability). Together, findings suggest that school-age children are sensitive to, and use, the congruency of affect cues to determine whether individuals are credible sources of information.

  18. Inferring speaker attributes in adductor spasmodic dysphonia: ratings from unfamiliar listeners.

    PubMed

    Isetti, Derek; Xuereb, Linnea; Eadie, Tanya L

    2014-05-01

    To determine whether unfamiliar listeners' perceptions of speakers with adductor spasmodic dysphonia (ADSD) differ from control speakers on the parameters of relative age, confidence, tearfulness, and vocal effort and are related to speaker-rated vocal effort or voice-specific quality of life. Twenty speakers with ADSD (including 6 speakers with ADSD plus tremor) and 20 age- and sex-matched controls provided speech recordings, completed a voice-specific quality-of-life instrument (Voice Handicap Index; Jacobson et al., 1997), and rated their own vocal effort. Twenty listeners evaluated speech samples for relative age, confidence, tearfulness, and vocal effort using rating scales. Listeners judged speakers with ADSD as sounding significantly older, less confident, more tearful, and more effortful than control speakers (p < .01). Increased vocal effort was strongly associated with decreased speaker confidence (rs = .88-.89) and sounding more tearful (rs = .83-.85). Self-rated speaker effort was moderately related (rs = .45-.52) to listener impressions. Listeners' perceptions of confidence and tearfulness were also moderately associated with higher Voice Handicap Index scores (rs = .65-.70). Unfamiliar listeners judge speakers with ADSD more negatively than control speakers, with judgments extending beyond typical clinical measures. The results have implications for counseling and understanding the psychosocial effects of ADSD.

  19. The Memory Jog Service

    NASA Astrophysics Data System (ADS)

    Dimakis, Nikolaos; Soldatos, John; Polymenakos, Lazaros; Sturm, Janienke; Neumann, Joachim; Casas, Josep R.

    The CHIL Memory Jog service focuses on facilitating the collaboration of participants in meetings, lectures, presentations, and other human interactive events, occurring in indoor CHIL spaces. It exploits the whole set of the perceptual components that have been developed by the CHIL Consortium partners (e.g., person tracking, face identification, audio source localization, etc) along with a wide range of actuating devices such as projectors, displays, targeted audio devices, speakers, etc. The underlying set of perceptual components provides a constant flow of elementary contextual information, such as “person at location x0,y0”, “speech at location x0,y0”, information that alone is not of significant use. However, the CHIL Memory Jog service is accompanied by powerful situation identification techniques that fuse all the incoming information and creates complex states that drive the actuating logic.

  20. Sensory Intelligence for Extraction of an Abstract Auditory Rule: A Cross-Linguistic Study.

    PubMed

    Guo, Xiao-Tao; Wang, Xiao-Dong; Liang, Xiu-Yuan; Wang, Ming; Chen, Lin

    2018-02-21

    In a complex linguistic environment, while speech sounds can greatly vary, some shared features are often invariant. These invariant features constitute so-called abstract auditory rules. Our previous study has shown that with auditory sensory intelligence, the human brain can automatically extract the abstract auditory rules in the speech sound stream, presumably serving as the neural basis for speech comprehension. However, whether the sensory intelligence for extraction of abstract auditory rules in speech is inherent or experience-dependent remains unclear. To address this issue, we constructed a complex speech sound stream using auditory materials in Mandarin Chinese, in which syllables had a flat lexical tone but differed in other acoustic features to form an abstract auditory rule. This rule was occasionally and randomly violated by the syllables with the rising, dipping or falling tone. We found that both Chinese and foreign speakers detected the violations of the abstract auditory rule in the speech sound stream at a pre-attentive stage, as revealed by the whole-head recordings of mismatch negativity (MMN) in a passive paradigm. However, MMNs peaked earlier in Chinese speakers than in foreign speakers. Furthermore, Chinese speakers showed different MMN peak latencies for the three deviant types, which paralleled recognition points. These findings indicate that the sensory intelligence for extraction of abstract auditory rules in speech sounds is innate but shaped by language experience. Copyright © 2018 IBRO. Published by Elsevier Ltd. All rights reserved.

  1. Speaker Linking and Applications using Non-Parametric Hashing Methods

    DTIC Science & Technology

    2016-09-08

    clustering method based on hashing—canopy- clustering . We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs...and compare to other hash- ing methods. Index Terms: speaker recognition, clustering , hashing, locality sensitive hashing. 1. Introduction We assume...speaker in our corpus. Second, given a QBE method, how can we perform speaker clustering —each clustering should be a single speaker, and a cluster should

  2. On the Time Course of Vocal Emotion Recognition

    PubMed Central

    Pell, Marc D.; Kotz, Sonja A.

    2011-01-01

    How quickly do listeners recognize emotions from a speaker's voice, and does the time course for recognition vary by emotion type? To address these questions, we adapted the auditory gating paradigm to estimate how much vocal information is needed for listeners to categorize five basic emotions (anger, disgust, fear, sadness, happiness) and neutral utterances produced by male and female speakers of English. Semantically-anomalous pseudo-utterances (e.g., The rivix jolled the silling) conveying each emotion were divided into seven gate intervals according to the number of syllables that listeners heard from sentence onset. Participants (n = 48) judged the emotional meaning of stimuli presented at each gate duration interval, in a successive, blocked presentation format. Analyses looked at how recognition of each emotion evolves as an utterance unfolds and estimated the “identification point” for each emotion. Results showed that anger, sadness, fear, and neutral expressions are recognized more accurately at short gate intervals than happiness, and particularly disgust; however, as speech unfolds, recognition of happiness improves significantly towards the end of the utterance (and fear is recognized more accurately than other emotions). When the gate associated with the emotion identification point of each stimulus was calculated, data indicated that fear (M = 517 ms), sadness (M = 576 ms), and neutral (M = 510 ms) expressions were identified from shorter acoustic events than the other emotions. These data reveal differences in the underlying time course for conscious recognition of basic emotions from vocal expressions, which should be accounted for in studies of emotional speech processing. PMID:22087275

  3. The effect of tonal changes on voice onset time in Mandarin esophageal speech.

    PubMed

    Liu, Hanjun; Ng, Manwa L; Wan, Mingxi; Wang, Supin; Zhang, Yi

    2008-03-01

    The present study investigated the effect of tonal changes on voice onset time (VOT) between normal laryngeal (NL) and superior esophageal (SE) speakers of Mandarin Chinese. VOT values were measured from the syllables /pha/, /tha/, and /kha/ produced at four tone levels by eight NL and seven SE speakers who were native speakers of Mandarin. Results indicated that Mandarin tones were associated with significantly different VOT values for NL speakers, in which high-falling tone was associated with significantly shorter VOT values than mid-rising tone and falling-rising tone. Regarding speaker group, SE speakers showed significantly shorter VOT values than NL speakers across all tone levels. This may be related to their use of pharyngoesophageal (PE) segment as another sound source. SE speakers appear to take a shorter time to start PE segment vibration compared to NL speakers using the vocal folds for vibration.

  4. Proficiency in English sentence stress production by Cantonese speakers who speak English as a second language (ESL).

    PubMed

    Ng, Manwa L; Chen, Yang

    2011-12-01

    The present study examined English sentence stress produced by native Cantonese speakers who were speaking English as a second language (ESL). Cantonese ESL speakers' proficiency in English stress production as perceived by English-speaking listeners was also studied. Acoustical parameters associated with sentence stress including fundamental frequency (F0), vowel duration, and intensity were measured from the English sentences produced by 40 Cantonese ESL speakers. Data were compared with those obtained from 40 native speakers of American English. The speech samples were also judged by eight native listeners who were native speakers of American English for placement, degree, and naturalness of stress. Results showed that Cantonese ESL speakers were able to use F0, vowel duration, and intensity to differentiate sentence stress patterns. Yet, both female and male Cantonese ESL speakers exhibited consistently higher F0 in stressed words than English speakers. Overall, Cantonese ESL speakers were found to be proficient in using duration and intensity to signal sentence stress, in a way comparable with English speakers. In addition, F0 and intensity were found to correlate closely with perceptual judgement and the degree of stress with the naturalness of stress.

  5. Using Discursis to enhance the qualitative analysis of hospital pharmacist-patient interactions

    PubMed Central

    Barras, Michael A.; Angus, Daniel J.

    2018-01-01

    Introduction Pharmacist-patient communication during medication counselling has been successfully investigated using Communication Accommodation Theory (CAT). Communication researchers in other healthcare professions have utilised Discursis software as an adjunct to their manual qualitative analysis processes. Discursis provides a visual, chronological representation of communication exchanges and identifies patterns of interactant engagement. Aim The aim of this study was to describe how Discursis software was used to enhance previously conducted qualitative analysis of pharmacist-patient interactions (by visualising pharmacist-patient speech patterns, episodes of engagement, and identifying CAT strategies employed by pharmacists within these episodes). Methods Visual plots from 48 transcribed audio recordings of pharmacist-patient exchanges were generated by Discursis. Representative plots were selected to show moderate-high and low- level speaker engagement. Details of engagement were investigated for pharmacist application of CAT strategies (approximation, interpretability, discourse management, emotional expression, and interpersonal control). Results Discursis plots allowed for identification of distinct patterns occurring within pharmacist-patient exchanges. Moderate-high pharmacist-patient engagement was characterised by multiple off-diagonal squares while alternating single coloured squares depicted low engagement. Engagement episodes were associated with multiple CAT strategies such as discourse management (open-ended questions). Patterns reflecting pharmacist or patient speaker dominance were dependant on clinical setting. Discussion and conclusions Discursis analysis of pharmacist-patient interactions, a novel application of the technology in health communication, was found to be an effective visualisation tool to pin-point episodes for CAT analysis. Discursis has numerous practical and theoretical applications for future health communication research and training. Researchers can use the software to support qualitative analysis where large data sets can be quickly reviewed to identify key areas for concentrated analysis. Because Discursis plots are easily generated from audio recorded transcripts, they are conducive as teaching tools for both students and practitioners to assess and develop their communication skills. PMID:29787568

  6. Molecular and Neuroendocrine Approaches to Understanding Trade-offs: Food, Sex, Aggression, Stress, and Longevity-An Introduction to the Symposium.

    PubMed

    Schneider, Jill E; Deviche, Pierre

    2017-12-01

    Life history strategies are composed of multiple fitness components, each of which incurs costs and benefits. Consequently, organisms cannot maximize all fitness components simultaneously. This situation results in a dynamic array of trade-offs in which some fitness traits prevail at the expense of others, often depending on context. The identification of specific constraints and trade-offs has helped elucidate physiological mechanisms that underlie variation in behavioral and physiological life history strategies. There is general recognition that trade-offs are made at the individual and population level, but much remains to be learned concerning the molecular neuroendocrine mechanisms that underlie trade-offs. For example, we still do not know whether the mechanisms that underlie trade-offs at the individual level relate to trade-offs at the population level. To advance our understanding of trade-offs, we organized a group of speakers who study neuroendocrine mechanisms at the interface of traits that are not maximized simultaneously. Speakers were invited to represent research from a wide range of taxa including invertebrates (e.g., worms and insects), fish, nonavian reptiles, birds, and mammals. Three general themes emerged. First, the study of trade-offs requires that we investigate traditional endocrine mechanisms that include hormones, neuropeptides, and their receptors, and in addition, other chemical messengers not traditionally included in endocrinology. The latter group includes growth factors, metabolic intermediates, and molecules of the immune system. Second, the nomenclature and theory of neuroscience that has dominated the study of behavior is being re-evaluated in the face of evidence for the peripheral actions of so-called neuropeptides and neurotransmitters and the behavioral repercussions of these actions. Finally, environmental and ecological contexts continue to be critical in unmasking molecular mechanisms that are hidden when study animals are housed in enclosed spaces, with unlimited food, without competitors or conspecifics, and in constant ambient conditions. © The Author 2017. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology.

  7. Stop identity cue as a cue to language identity

    NASA Astrophysics Data System (ADS)

    Castonguay, Paula Lisa

    The purpose of the present study was to determine whether language membership could potentially be cued by the acoustic-phonetic detail of word-initial stops and retained all the way through the process of lexical access to aid in language identification. Of particular interest were language-specific differences in CE and CF word-initial stops. Experiment 1 consisted of an interlingual homophone production task. The purpose of this study was to examine how word-initial stop consonants differ in terms of acoustic properties in Canadian English (CE) and Canadian French (CF) interlingual homophones. The analyses from the bilingual speakers in Experiment 1 indicate that bilinguals do produce language-specific differences in CE and CF word-initial stops, and that closure duration, voice onset time, and burst spectral SD may provide cues to language identity in CE and CF stops. Experiment 2 consisted of a Phoneme and Language Categorization task. The purpose of this study was to examine how stop identity cues, such as VOT and closure duration, influence a listener to identify word-initial stop consonants as belonging to Canadian English (CE) or Canadian French (CF). The RTs from the bilingual listeners in this study indicate that bilinguals do perceive language-specific differences in CE and CF word-initial stops, and that voice onset time may provide cues to phoneme and language membership in CE and CF stops. Experiment 3 consisted of a Phonological-Semantic priming task. The purpose of this study was to examine how subphonetic variations, such as changes in the VOT, affect lexical access. The results of Experiment 3 suggest that language-specific cues, such as VOT, affects the composition of the bilingual cohort and that the extent to which English and/or French words are activated is dependent on the language-specific cues present in a word. The findings of this study enhanced our theoretical understanding of lexical structure and lexical access in bilingual speakers. In addition, this study provides further insight on cross-language effects at the subphonetic level.

  8. Salience Effects: L2 Sentence Production as a Window on L1 Speech Planning

    ERIC Educational Resources Information Center

    Antón-Méndez, Inés; Gerfen, Chip; Ramos, Miguel

    2016-01-01

    Salience influences grammatical structure during production in a language-dependent manner because different languages afford different options to satisfy preferences. During production, speakers may always try to satisfy all syntactic encoding preferences (e.g., salient entities to be mentioned early, themes to be assigned the syntactic function…

  9. Rapid Learning of Syllable Classes from a Perceptually Continuous Speech Stream

    ERIC Educational Resources Information Center

    Endress, Ansgar D.; Bonatti, Luca L.

    2007-01-01

    To learn a language, speakers must learn its words and rules from fluent speech; in particular, they must learn dependencies among linguistic classes. We show that when familiarized with a short artificial, subliminally bracketed stream, participants can learn relations about the structure of its words, which specify the classes of syllables…

  10. Speaker-dependent Multipitch Tracking Using Deep Neural Networks

    DTIC Science & Technology

    2015-01-01

    connections through time. Studies have shown that RNNs are good at modeling sequential data like handwriting [12] and speech [26]. We plan to explore RNNs in...Schmidhuber, and S. Fernández, “Unconstrained on-line handwriting recognition with recurrent neural networks,” in Proceedings of NIPS, 2008, pp. 577–584. [13

  11. A Usage-Based Approach to Preposition Placement in English as a Second Language

    ERIC Educational Resources Information Center

    Jach, Daniel

    2018-01-01

    This study examined the acquisition of preposition placement in English as a second language from a usage-based perspective. German and Chinese learners of English and English native speakers rated the acceptability of English oblique "wh" relative clauses in a magnitude estimation task. Results indicated that acceptability depended on…

  12. Code-Switching in Persian-English and Telugu-English Conversations: With a Focus on Light Verb Constructions

    ERIC Educational Resources Information Center

    Moradi, Hamzeh

    2014-01-01

    Depending on the demands of a particular communicative situation, bilingual or multilingual speakers ("bilingualism-multilingualism") will switch between language varieties. Code-switching is the practice of moving between variations of languages in different contexts. In an educational context, code-switching is defined as the practice…

  13. Applied Linguistics and the Use of Minority Languages in Education

    ERIC Educational Resources Information Center

    Cenoz, Jasone; Gorter, Durk

    2008-01-01

    Research on minority languages is ordinarily not well known by speakers of "big" languages but it has focused on several areas of Applied Linguistics and it is relevant to many areas. This current volume of "AILA Review" features five articles. Each of the articles emphasizes some aspects of research, depending on the recent…

  14. Encouraging Students to Engage with Native Speakers during Study Abroad

    ERIC Educational Resources Information Center

    Cadd, Marc

    2012-01-01

    Students, their parents, and educators trust that a study-abroad experience is the best way to increase linguistic proficiency. The professional literature, however, shows a much more complex picture. Gains in linguistic proficiency appear to depend on variables such as whether the students experience a homestay or dormitory, the length of time…

  15. Poorer Phonetic Perceivers Show Greater Benefit in Phonetic-Phonological Speech Learning

    ERIC Educational Resources Information Center

    Ingvalson, Erin M.; Barr, Allison M.; Wong, Patrick C. M.

    2013-01-01

    Purpose: Previous research has demonstrated that native English speakers can learn lexical tones in word context (pitch-to-word learning), to an extent. However, learning success depends on learners' pre-training sensitivity to pitch patterns. The aim of this study was to determine whether lexical pitch-pattern training given before lexical…

  16. Characteristics of Speaking Rate in the Dysarthria Associated with Amyotrophic Lateral Sclerosis.

    ERIC Educational Resources Information Center

    Turner, Greg S.; Weismer, Gary

    1993-01-01

    The ability to alter speaking rate was studied in nine adult subjects with amyotrophic lateral sclerosis and nine control subjects. Results suggest that the relationship between speaking rate, articulation rate, pause duration, and pause frequency remained largely intact for the dysarthric speakers. Data showed greater dependence on pausing by the…

  17. The Speaker Gender Gap at Critical Care Conferences.

    PubMed

    Mehta, Sangeeta; Rose, Louise; Cook, Deborah; Herridge, Margaret; Owais, Sawayra; Metaxa, Victoria

    2018-06-01

    To review women's participation as faculty at five critical care conferences over 7 years. Retrospective analysis of five scientific programs to identify the proportion of females and each speaker's profession based on conference conveners, program documents, or internet research. Three international (European Society of Intensive Care Medicine, International Symposium on Intensive Care and Emergency Medicine, Society of Critical Care Medicine) and two national (Critical Care Canada Forum, U.K. Intensive Care Society State of the Art Meeting) annual critical care conferences held between 2010 and 2016. Female faculty speakers. None. Male speakers outnumbered female speakers at all five conferences, in all 7 years. Overall, women represented 5-31% of speakers, and female physicians represented 5-26% of speakers. Nursing and allied health professional faculty represented 0-25% of speakers; in general, more than 50% of allied health professionals were women. Over the 7 years, Society of Critical Care Medicine had the highest representation of female (27% overall) and nursing/allied health professional (16-25%) speakers; notably, male physicians substantially outnumbered female physicians in all years (62-70% vs 10-19%, respectively). Women's representation on conference program committees ranged from 0% to 40%, with Society of Critical Care Medicine having the highest representation of women (26-40%). The female proportions of speakers, physician speakers, and program committee members increased significantly over time at the Society of Critical Care Medicine and U.K. Intensive Care Society State of the Art Meeting conferences (p < 0.05), but there was no temporal change at the other three conferences. There is a speaker gender gap at critical care conferences, with male faculty outnumbering female faculty. This gap is more marked among physician speakers than those speakers representing nursing and allied health professionals. Several organizational strategies can address this gender gap.

  18. Listeners' comprehension of uptalk in spontaneous speech.

    PubMed

    Tomlinson, John M; Fox Tree, Jean E

    2011-04-01

    Listeners' comprehension of phrase final rising pitch on declarative utterances, or uptalk, was examined to test the hypothesis that prolongations might differentiate conflicting functions of rising pitch. In Experiment 1 we found that listeners rated prolongations as indicating more speaker uncertainty, but that rising pitch was unrelated to ratings. In Experiment 2 we found that prolongations interacted with rising pitch when listeners monitored for words in the subsequent utterance. Words preceded by prolonged uptalk were monitored faster than words preceded by non-prolonged uptalk. In Experiment 3 we found that the interaction between rising pitch and prolongations depended on listeners' beliefs about speakers' mental states. Results support the theory that temporal and situational context are important in determining intonational meaning. Copyright © 2010 Elsevier B.V. All rights reserved.

  19. Reflecting on Native Speaker Privilege

    ERIC Educational Resources Information Center

    Berger, Kathleen

    2014-01-01

    The issues surrounding native speakers (NSs) and nonnative speakers (NNSs) as teachers (NESTs and NNESTs, respectively) in the field of teaching English to speakers of other languages (TESOL) are a current topic of interest. In many contexts, the native speaker of English is viewed as the model teacher, thus putting the NEST into a position of…

  20. English Speakers Attend More Strongly than Spanish Speakers to Manner of Motion when Classifying Novel Objects and Events

    ERIC Educational Resources Information Center

    Kersten, Alan W.; Meissner, Christian A.; Lechuga, Julia; Schwartz, Bennett L.; Albrechtsen, Justin S.; Iglesias, Adam

    2010-01-01

    Three experiments provide evidence that the conceptualization of moving objects and events is influenced by one's native language, consistent with linguistic relativity theory. Monolingual English speakers and bilingual Spanish/English speakers tested in an English-speaking context performed better than monolingual Spanish speakers and bilingual…

  1. The contribution of dynamic visual cues to audiovisual speech perception.

    PubMed

    Jaekl, Philip; Pesquita, Ana; Alsius, Agnes; Munhall, Kevin; Soto-Faraco, Salvador

    2015-08-01

    Seeing a speaker's facial gestures can significantly improve speech comprehension, especially in noisy environments. However, the nature of the visual information from the speaker's facial movements that is relevant for this enhancement is still unclear. Like auditory speech signals, visual speech signals unfold over time and contain both dynamic configural information and luminance-defined local motion cues; two information sources that are thought to engage anatomically and functionally separate visual systems. Whereas, some past studies have highlighted the importance of local, luminance-defined motion cues in audiovisual speech perception, the contribution of dynamic configural information signalling changes in form over time has not yet been assessed. We therefore attempted to single out the contribution of dynamic configural information to audiovisual speech processing. To this aim, we measured word identification performance in noise using unimodal auditory stimuli, and with audiovisual stimuli. In the audiovisual condition, speaking faces were presented as point light displays achieved via motion capture of the original talker. Point light displays could be isoluminant, to minimise the contribution of effective luminance-defined local motion information, or with added luminance contrast, allowing the combined effect of dynamic configural cues and local motion cues. Audiovisual enhancement was found in both the isoluminant and contrast-based luminance conditions compared to an auditory-only condition, demonstrating, for the first time the specific contribution of dynamic configural cues to audiovisual speech improvement. These findings imply that globally processed changes in a speaker's facial shape contribute significantly towards the perception of articulatory gestures and the analysis of audiovisual speech. Copyright © 2015 Elsevier Ltd. All rights reserved.

  2. Hybrid Speaker Recognition Using Universal Acoustic Model

    NASA Astrophysics Data System (ADS)

    Nishimura, Jun; Kuroda, Tadahiro

    We propose a novel speaker recognition approach using a speaker-independent universal acoustic model (UAM) for sensornet applications. In sensornet applications such as “Business Microscope”, interactions among knowledge workers in an organization can be visualized by sensing face-to-face communication using wearable sensor nodes. In conventional studies, speakers are detected by comparing energy of input speech signals among the nodes. However, there are often synchronization errors among the nodes which degrade the speaker recognition performance. By focusing on property of the speaker's acoustic channel, UAM can provide robustness against the synchronization error. The overall speaker recognition accuracy is improved by combining UAM with the energy-based approach. For 0.1s speech inputs and 4 subjects, speaker recognition accuracy of 94% is achieved at the synchronization error less than 100ms.

  3. Analysis of wolves and sheep. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hogden, J.; Papcun, G.; Zlokarnik, I.

    1997-08-01

    In evaluating speaker verification systems, asymmetries have been observed in the ease with which people are able to break into other people`s voice locks. People who are good at breaking into voice locks are called wolves, and people whose locks are easy to break into are called sheep. (Goats are people that have a difficult time opening their own voice locks.) Analyses of speaker verification algorithms could be used to understand wolf/sheep asymmetries. Using the notion of a ``speaker space``, it is demonstrated that such asymmetries could arise even though the similarity of voice 1 to voice 2 is themore » same as the inverse similarity. This explains partially the wolf/sheep asymmetries, although there may be other factors. The speaker space can be computed from interspeaker similarity data using multidimensional scaling, and such speaker space can be used to given a good approximation of the interspeaker similarities. The derived speaker space can be used to predict which of the enrolled speakers are likely to be wolves and which are likely to be sheep. However, a speaker must first enroll in the speaker key system and then be compared to each of the other speakers; a good estimate of a person`s speaker space position could be obtained using only a speech sample.« less

  4. Investigating Auditory Processing of Syntactic Gaps with L2 Speakers Using Pupillometry

    ERIC Educational Resources Information Center

    Fernandez, Leigh; Höhle, Barbara; Brock, Jon; Nickels, Lyndsey

    2018-01-01

    According to the Shallow Structure Hypothesis (SSH), second language (L2) speakers, unlike native speakers, build shallow syntactic representations during sentence processing. In order to test the SSH, this study investigated the processing of a syntactic movement in both native speakers of English and proficient late L2 speakers of English using…

  5. A Model of Mandarin Tone Categories--A Study of Perception and Production

    ERIC Educational Resources Information Center

    Yang, Bei

    2010-01-01

    The current study lays the groundwork for a model of Mandarin tones based on both native speakers' and non-native speakers' perception and production. It demonstrates that there is variability in non-native speakers' tone productions and that there are differences in the perceptual boundaries in native speakers and non-native speakers. There…

  6. Literacy Skill Differences between Adult Native English and Native Spanish Speakers

    ERIC Educational Resources Information Center

    Herman, Julia; Cote, Nicole Gilbert; Reilly, Lenore; Binder, Katherine S.

    2013-01-01

    The goal of this study was to compare the literacy skills of adult native English and native Spanish ABE speakers. Participants were 169 native English speakers and 124 native Spanish speakers recruited from five prior research projects. The results showed that the native Spanish speakers were less skilled on morphology and passage comprehension…

  7. Grammatical Planning Units during Real-Time Sentence Production in Speakers with Agrammatic Aphasia and Healthy Speakers

    ERIC Educational Resources Information Center

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K.

    2015-01-01

    Purpose: Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia…

  8. Development of panel loudspeaker system: design, evaluation and enhancement.

    PubMed

    Bai, M R; Huang, T

    2001-06-01

    Panel speakers are investigated in terms of structural vibration and acoustic radiation. A panel speaker primarily consists of a panel and an inertia exciter. Contrary to conventional speakers, flexural resonance is encouraged such that the panel vibrates as randomly as possible. Simulation tools are developed to facilitate system integration of panel speakers. In particular, electro-mechanical analogy, finite element analysis, and fast Fourier transform are employed to predict panel vibration and the acoustic radiation. Design procedures are also summarized. In order to compare the panel speakers with the conventional speakers, experimental investigations were undertaken to evaluate frequency response, directional response, sensitivity, efficiency, and harmonic distortion of both speakers. The results revealed that the panel speakers suffered from a problem of sensitivity and efficiency. To alleviate the problem, a woofer using electronic compensation based on H2 model matching principle is utilized to supplement the bass response. As indicated in the result, significant improvement over the panel speaker alone was achieved by using the combined panel-woofer system.

  9. Word Durations in Non-Native English

    PubMed Central

    Baker, Rachel E.; Baese-Berk, Melissa; Bonnasse-Gahot, Laurent; Kim, Midam; Van Engen, Kristin J.; Bradlow, Ann R.

    2010-01-01

    In this study, we compare the effects of English lexical features on word duration for native and non-native English speakers and for non-native speakers with different L1s and a range of L2 experience. We also examine whether non-native word durations lead to judgments of a stronger foreign accent. We measured word durations in English paragraphs read by 12 American English (AE), 20 Korean, and 20 Chinese speakers. We also had AE listeners rate the `accentedness' of these non-native speakers. AE speech had shorter durations, greater within-speaker word duration variance, greater reduction of function words, and less between-speaker variance than non-native speech. However, both AE and non-native speakers showed sensitivity to lexical predictability by reducing second mentions and high frequency words. Non-native speakers with more native-like word durations, greater within-speaker word duration variance, and greater function word reduction were perceived as less accented. Overall, these findings identify word duration as an important and complex feature of foreign-accented English. PMID:21516172

  10. Experimental study on GMM-based speaker recognition

    NASA Astrophysics Data System (ADS)

    Ye, Wenxing; Wu, Dapeng; Nucci, Antonio

    2010-04-01

    Speaker recognition plays a very important role in the field of biometric security. In order to improve the recognition performance, many pattern recognition techniques have be explored in the literature. Among these techniques, the Gaussian Mixture Model (GMM) is proved to be an effective statistic model for speaker recognition and is used in most state-of-the-art speaker recognition systems. The GMM is used to represent the 'voice print' of a speaker through modeling the spectral characteristic of speech signals of the speaker. In this paper, we implement a speaker recognition system, which consists of preprocessing, Mel-Frequency Cepstrum Coefficients (MFCCs) based feature extraction, and GMM based classification. We test our system with TIDIGITS data set (325 speakers) and our own recordings of more than 200 speakers; our system achieves 100% correct recognition rate. Moreover, we also test our system under the scenario that training samples are from one language but test samples are from a different language; our system also achieves 100% correct recognition rate, which indicates that our system is language independent.

  11. Bystander capability to activate speaker function for continuous dispatcher assisted CPR in case of suspected cardiac arrest.

    PubMed

    Steensberg, Alvilda T; Eriksen, Mette M; Andersen, Lars B; Hendriksen, Ole M; Larsen, Heinrich D; Laier, Gunnar H; Thougaard, Thomas

    2017-06-01

    The European Resuscitation Council Guidelines 2015 recommend bystanders to activate their mobile phone speaker function, if possible, in case of suspected cardiac arrest. This is to facilitate continuous dialogue with the dispatcher including (if required) cardiopulmonary resuscitation instructions. The aim of this study was to measure the bystander capability to activate speaker function in case of suspected cardiac arrest. In 87days, a systematic prospective registration of bystander capability to activate the speaker function, when cardiac arrest was suspected, was performed. For those asked, "can you activate your mobile phone's speaker function", audio recordings were examined and categorized into groups according to the bystanders capability to activate speaker function on their own initiative, without instructions, or with instructions from the emergency medical dispatcher. Time delay was measured, in seconds, for the bystanders without pre-activated speaker function. 42.0% (58) was able to activate the speaker function without instructions, 2.9% (4) with instructions, 18.1% (25) on own initiative and 37.0% (51) were unable to activate the speaker function. The median time to activate speaker function was 19s and 8s, with and without instructions, respectively. Dispatcher assisted cardiopulmonary resuscitation with activated speaker function, in cases of suspected cardiac arrest, allows for continuous dialogue between the emergency medical dispatcher and the bystander. In this study, we found a 63.0% success rate of activating the speaker function in such situations. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Hemispheric lateralization of linguistic prosody recognition in comparison to speech and speaker recognition.

    PubMed

    Kreitewolf, Jens; Friederici, Angela D; von Kriegstein, Katharina

    2014-11-15

    Hemispheric specialization for linguistic prosody is a controversial issue. While it is commonly assumed that linguistic prosody and emotional prosody are preferentially processed in the right hemisphere, neuropsychological work directly comparing processes of linguistic prosody and emotional prosody suggests a predominant role of the left hemisphere for linguistic prosody processing. Here, we used two functional magnetic resonance imaging (fMRI) experiments to clarify the role of left and right hemispheres in the neural processing of linguistic prosody. In the first experiment, we sought to confirm previous findings showing that linguistic prosody processing compared to other speech-related processes predominantly involves the right hemisphere. Unlike previous studies, we controlled for stimulus influences by employing a prosody and speech task using the same speech material. The second experiment was designed to investigate whether a left-hemispheric involvement in linguistic prosody processing is specific to contrasts between linguistic prosody and emotional prosody or whether it also occurs when linguistic prosody is contrasted against other non-linguistic processes (i.e., speaker recognition). Prosody and speaker tasks were performed on the same stimulus material. In both experiments, linguistic prosody processing was associated with activity in temporal, frontal, parietal and cerebellar regions. Activation in temporo-frontal regions showed differential lateralization depending on whether the control task required recognition of speech or speaker: recognition of linguistic prosody predominantly involved right temporo-frontal areas when it was contrasted against speech recognition; when contrasted against speaker recognition, recognition of linguistic prosody predominantly involved left temporo-frontal areas. The results show that linguistic prosody processing involves functions of both hemispheres and suggest that recognition of linguistic prosody is based on an inter-hemispheric mechanism which exploits both a right-hemispheric sensitivity to pitch information and a left-hemispheric dominance in speech processing. Copyright © 2014 Elsevier Inc. All rights reserved.

  13. "I May Be a Native Speaker but I'm Not Monolingual": Reimagining "All" Teachers' Linguistic Identities in TESOL

    ERIC Educational Resources Information Center

    Ellis, Elizabeth M.

    2016-01-01

    Teacher linguistic identity has so far mainly been researched in terms of whether a teacher identifies (or is identified by others) as a native speaker (NEST) or nonnative speaker (NNEST) (Moussu & Llurda, 2008; Reis, 2011). Native speakers are presumed to be monolingual, and nonnative speakers, although by definition bilingual, tend to be…

  14. Stromal-epithelial dynamics in response to fractionated radiotherapy

    NASA Astrophysics Data System (ADS)

    Rong, Panying

    The speech of individuals with velopharyngeal incompetency (VPI) is characterized by hypernasality, a speech quality related to excessive emission of acoustic energy through the nose, as caused by failure of velopharyngeal closure. As an attempt to reduce hypernasality and, in turn, improve the quality of VPI-related hypernasal speech, this study is dedicated to developing an approach that uses speech-dependent articulatory adjustments to reduce hypernasality caused by excessive velopharyngeal opening. A preliminary study has been done to derive such articulatory adjustments for hypernasal /i/ vowels based on the simulation of an articulatorymodel (Speech Processing and Synthesis Toolboxes, Childers (2000)). Both nasal /i/ vowels with and without articulatory adjustments were synthesized by the model. Spectral analysis found that nasal acoustic features were attenuated and oral formant structures were restored after articulatory adjustments. In addition, comparisons of perceptual ratings of nasality between the two types of nasal vowels showed the articulatory adjustments generated by the model significantly reduced the perception of nasality for nasal /i/ vowels. Such articulatory adjustments for nasal /i/ have two patterns: 1) a consistent adjustment pattern, which corresponds an expansion at the velopharynx, and 2) some speech-dependent fine-tuning adjustment patterns, including adjustments in the lip area and the upper pharynx. The long-term goal of this study is to apply this approach of articulatory adjustment as a therapeutic tool in clinical speech treatment to detect and correct the maladaptive articulatory behaviors developed spontaneously by speakers with VPI on individual bases. This study constructed a speaker-adaptive articulatory model on the basis of the framework of Childers's vocal tract model to simulate articulatory adjustments aiming at compensating for the acoustic outcome caused by velopharyngeal opening and reducing nasality. To construct such a speaker-adaptive articulatory model, (1) an articulatory-acoustic-aerodynamic database was recorded using the articulography and aerodynamic instruments to provide point-wise articulatory data to be fitted into the framework of Childers's standard vocal tract model; (2) the length and transverse dimension of the vocal tract were adjusted to fit individual speaker by minimizing the acoustic discrepancy between the model simulation and the target derived from acoustic signal in the database using the simulated annealing algorithm; (3) the articulatory space of the model was adjusted to fit individual articulatory features by adapting the movement ranges of all articulators. With the speaker-adaptive articulatory model, the articulatory configurations of the oral and nasal vowels in the database were simulated and synthesized. Given the acoustic targets derived from the oral vowels in the database, speech-dependent articulatory adjustments were simulated to compensate for the acoustic outcome caused by VPO. The resultant articulatory configurations corresponds to nasal vowels with articulatory adjustment, which were synthesized to serve as the perceptual stimuli for a listening task of nasality rating. The oral and nasal vowels synthesized based on the oral and nasal vowel targets in the database also served as the perceptual stimuli. The results suggest both acoustic and perceptual effects of the mode-generated articulatory adjustment on the nasal vowels /a/, /i/ and /u/. In terms of acoustics, the articulatory adjustment (1) restores the altered formant structures due to nasal coupling, including shifted formant frequency, attenuated formant intensity and expanded formant bandwidth and (2) attenuates the peaks and zeros caused by nasal resonances. Perceptually, the articulatory adjustment generated by the speaker-adaptive model significantly reduces the perceived nasality for all three vowels (/a/, /i/, /u/). The acoustic and perceptual effects of articulatory adjustment suggest achievement of the acoustic goal of compensating for the acoustic discrepancy caused by VPO and the auditory goal of reducing the perception of nasality. Such a finding is consistent with motor equivalence (Hughes and Abbs, 1976; Maeda, 1990), which enables inter-articulator coordination to compensate for the deviation from the acoustic/auditory goal caused by the shifted position of an articulator. The articulatory adjustment responsible for the acoustic and perceptual effects as described above was decomposed into a set of empirical orthogonal modes (Story and Titze, 1998). Both gross articulatory patterns and fine-tuning adjustments were found in the principal orthogonal modes, which lead to the acoustic compensation and reduction of nasality. For /a/ and /i/, a direct relationship was found among the acoustic features, nasality, and articulatory adjustment patterns. Specifically, the articulatory adjustments indicated by the principal orthogonal modes of the adjusted nasal /a/ and /i/ were directly correlated with the attenuation of the acoustic cues of nasality (i.e., shifting of F1 and F2 frequencies) and the reduction of nasality rating. For /u/, such a direct relationship among the acoustic features, nasality and articulatory adjustment was not as prominent, suggesting the possibility of additional acoustic correlates of nasality other than F1 and F2. The findings of this study demonstrate the possibility of using articulatory adjustment to reduce the perception of nasality through model simulation. A speaker-adaptive articulatory model is able to simulate individual-based articulatory adjustment strategies that can be applied in clinical settings to serve as the articulatory targets for correction of the maladaptive articulatory behaviors developed spontaneously by speakers with hypernasal speech. Such a speaker-adaptive articulatory model provides an intuitive way of articulatory learning and self-training for speakers with VPI to learn appropriate articulatory strategies through model-speaker interaction.

  15. How Cognitive Load Influences Speakers' Choice of Referring Expressions.

    PubMed

    Vogels, Jorrig; Krahmer, Emiel; Maes, Alfons

    2015-08-01

    We report on two experiments investigating the effect of an increased cognitive load for speakers on the choice of referring expressions. Speakers produced story continuations to addressees, in which they referred to characters that were either salient or non-salient in the discourse. In Experiment 1, referents that were salient for the speaker were non-salient for the addressee, and vice versa. In Experiment 2, all discourse information was shared between speaker and addressee. Cognitive load was manipulated by the presence or absence of a secondary task for the speaker. The results show that speakers under load are more likely to produce pronouns, at least when referring to less salient referents. We take this finding as evidence that speakers under load have more difficulties taking discourse salience into account, resulting in the use of expressions that are more economical for themselves. © 2014 Cognitive Science Society, Inc.

  16. The irreversibility of sensitive period effects in language development: evidence from second language acquisition in international adoptees.

    PubMed

    Norrman, Gunnar; Bylund, Emanuel

    2016-05-01

    The question of a sensitive period in language acquisition has been subject to extensive research and debate for more than half a century. While it has been well established that the ability to learn new languages declines in early years, the extent to which this outcome depends on biological maturation in contrast to previously acquired knowledge remains disputed. In the present study, we addressed this question by examining phonetic discriminatory abilities in early second language (L2) speakers of Swedish, who had either maintained their first language (L1) (immigrants) or had lost it (international adoptees), using native speaker controls. Through this design, we sought to disentangle the effects of the maturational state of the learner on L2 development from the effects of L1 interference: if additional language development is indeed constrained by an interfering L1, then adoptees should outperform immigrant speakers. The results of an auditory lexical decision task, in which fine vowel distinctions in Swedish had been modified, showed, however, no difference between the L2 groups. Instead, both L2 groups scored significantly lower than the native speaker group. The three groups did not differ in their ability to discriminate non-modified words. These findings demonstrate that L1 loss is not a crucial condition for successfully acquiring an L2, which in turn is taken as support for a maturational constraints view on L2 acquisition. A video abstract of this article can be viewed at: https://youtu.be/1J9X50aePeU. © 2015 John Wiley & Sons Ltd.

  17. Cross-language differences in the brain network subserving intelligible speech.

    PubMed

    Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P; Gao, Jia-Hong

    2015-03-10

    How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca's and Wernicke's areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension.

  18. Cross-language differences in the brain network subserving intelligible speech

    PubMed Central

    Ge, Jianqiao; Peng, Gang; Lyu, Bingjiang; Wang, Yi; Zhuo, Yan; Niu, Zhendong; Tan, Li Hai; Leff, Alexander P.; Gao, Jia-Hong

    2015-01-01

    How is language processed in the brain by native speakers of different languages? Is there one brain system for all languages or are different languages subserved by different brain systems? The first view emphasizes commonality, whereas the second emphasizes specificity. We investigated the cortical dynamics involved in processing two very diverse languages: a tonal language (Chinese) and a nontonal language (English). We used functional MRI and dynamic causal modeling analysis to compute and compare brain network models exhaustively with all possible connections among nodes of language regions in temporal and frontal cortex and found that the information flow from the posterior to anterior portions of the temporal cortex was commonly shared by Chinese and English speakers during speech comprehension, whereas the inferior frontal gyrus received neural signals from the left posterior portion of the temporal cortex in English speakers and from the bilateral anterior portion of the temporal cortex in Chinese speakers. Our results revealed that, although speech processing is largely carried out in the common left hemisphere classical language areas (Broca’s and Wernicke’s areas) and anterior temporal cortex, speech comprehension across different language groups depends on how these brain regions interact with each other. Moreover, the right anterior temporal cortex, which is crucial for tone processing, is equally important as its left homolog, the left anterior temporal cortex, in modulating the cortical dynamics in tone language comprehension. The current study pinpoints the importance of the bilateral anterior temporal cortex in language comprehension that is downplayed or even ignored by popular contemporary models of speech comprehension. PMID:25713366

  19. The Role of the Auditory Brainstem in Processing Linguistically-Relevant Pitch Patterns

    ERIC Educational Resources Information Center

    Krishnan, Ananthanarayan; Gandour, Jackson T.

    2009-01-01

    Historically, the brainstem has been neglected as a part of the brain involved in language processing. We review recent evidence of language-dependent effects in pitch processing based on comparisons of native vs. nonnative speakers of a tonal language from electrophysiological recordings in the auditory brainstem. We argue that there is enhancing…

  20. The Role of Mother Tongue Literacy in Language Learning and Mathematical Learning: Is There a Multilingual Benefit for Both?

    ERIC Educational Resources Information Center

    Dahm, Rebecca; De Angelis, Gessica

    2018-01-01

    The present study examines the multilingual benefit in relation to language learning and mathematical learning. The objective is to assess whether speakers of three or more languages, depending on language profile and personal histories, show significant advantages in language learning and/or mathematical learning, and whether mother tongue…

  1. The Effects of Learning English as a Second Language on the Acquisition of a New Phonemic Contrast.

    ERIC Educational Resources Information Center

    Streeter, Lynn A.; Landauer, Thomas K.

    Very sharp discrimination functions for the timing of voice onset relative to stop release characterize perceptual boundaries between certain pairs of stop consonants for adult speakers of many languages. To explore how these discriminations depend on experience, their development was studied among Kikuyu children, whose native language contains…

  2. Time to English Reading Proficiency. Research Brief. RB 1201

    ERIC Educational Resources Information Center

    Shneyderman, Aleksandr; Froman, Terry

    2012-01-01

    The time it takes for an English Language Learner (ELL) to reach reading proficiency in English depends on the grade level of entry into the English for Speakers of Other Languages (ESOL) program and on the student's initial English proficiency level. The summary table below presents the average years to English proficiency across different grade…

  3. Wednesday's Meeting Really Is on Friday: A Meta-Analysis and Evaluation of Ambiguous Spatiotemporal Language

    ERIC Educational Resources Information Center

    Stickles, Elise; Lewis, Tasha N.

    2018-01-01

    Experimental work has shown that spatial experiences influence spatiotemporal metaphor use. In these studies, participants are asked a question that yields different responses depending on the metaphor participants use. It has been claimed that English speakers are equally likely to respond with either variant in the absence of priming. Related…

  4. Argumentation and Participation in the Primary Mathematics Classroom: Two Episodes and Related Theoretical Abductions

    ERIC Educational Resources Information Center

    Krummheuer, Gotz

    2007-01-01

    The main assumption of this article is that learning mathematics depends on the student's participation in processes of collective argumentation. On the empirical level, such processes will be analyzed with Toulmin's theory of argumentation and Goffman's idea of decomposition of the speaker's role. On the theoretical level, different statuses of…

  5. Publishing Sami Literature--From Christian Translations to Sami Publishing Houses

    ERIC Educational Resources Information Center

    Paltto, Kirsti

    2010-01-01

    Publishing in the Sami languages has always been difficult. The Sami are currently spread across four countries, Norway, Sweden, Finland, and Russia. There are nine different Sami languages, some of them with only a few speakers. The Sami publishing industry is entirely dependent on government funding as it does not have its own funds nor is there…

  6. The Meaning of English Words across Cultures, with a Focus on Cameroon and Hong Kong

    ERIC Educational Resources Information Center

    Bobda, Augustin Simo

    2009-01-01

    A word, even when considered monosemic, generally has a cluster of meanings, depending on the mental representation of the referent by the speaker/writer or listener/reader. The variation is even more noticeable across cultures. This paper investigates the different ways in which cultural knowledge helps in the interpretation of English lexical…

  7. Local and State Relations: Proceedings of the Junior College Conference (Ocean Springs, Mississippi, June 24-26, 1968).

    ERIC Educational Resources Information Center

    Roberts, Dayton Y., Ed.

    Several speakers addressed this conference of the Southeastern Regional Junior College Leadership Program and the Mississippi Junior College Commission. (1) The balance of state and local relations (J.L.Wattenbarger) depends on current trends: the change from local to state or federal finance, national standards required by population mobility,…

  8. Contemplating Regretted Messages: Learning-Oriented, Repair-Oriented, and Emotion-Focused Reflection

    ERIC Educational Resources Information Center

    Meyer, Janet R.

    2013-01-01

    Regretted messages provide speakers an opportunity to learn. Whether learning occurs should depend upon how the incident is processed. This study had two objectives: (a) to determine how the goal a message conflicts with and seriousness influence the emotion(s) evoked; and (b) to determine which variables predict adoption of learning-oriented,…

  9. The Different Time Course of Phonotactic Constraint Learning in Children and Adults: Evidence from Speech Errors

    ERIC Educational Resources Information Center

    Smalle, Eleonore H. M.; Muylle, Merel; Szmalec, Arnaud; Duyck, Wouter

    2017-01-01

    Speech errors typically respect the speaker's implicit knowledge of language-wide phonotactics (e.g., /t/ cannot be a syllable onset in the English language). Previous work demonstrated that adults can learn novel experimentally induced phonotactic constraints by producing syllable strings in which the allowable position of a phoneme depends on…

  10. The Timing of Island Effects in Nonnative Sentence Processing

    ERIC Educational Resources Information Center

    Felser, Claudia; Cunnings, Ian; Batterham, Claire; Clahsen, Harald

    2012-01-01

    Using the eye-movement monitoring technique in two reading comprehension experiments, this study investigated the timing of constraints on wh-dependencies (so-called island constraints) in first- and second-language (L1 and L2) sentence processing. The results show that both L1 and L2 speakers of English are sensitive to extraction islands during…

  11. An EMA/EPG Study of Vowel-to-Vowel Articulation across Velars in Southern British English

    ERIC Educational Resources Information Center

    Fletcher, Janet

    2004-01-01

    Recent studies have attested that the extent of transconsonantal vowel-to-vowel coarticulation is at least partly dependent on degree of prosodic accentuation, in languages like English. A further important factor is the mutual compatibility of consonant and vowel gestures associated with the segments in question. In this study two speakers of…

  12. Long-Term Experience with Chinese Language Shapes the Fusiform Asymmetry of English Reading

    PubMed Central

    Mei, Leilei; Xue, Gui; Lu, Zhong-Lin; Chen, Chuansheng; Wei, Miao; He, Qinghua; Dong, Qi

    2015-01-01

    Previous studies have suggested differential engagement of the bilateral fusiform gyrus in the processing of Chinese and English. The present study tested the possibility that long-term experience with Chinese language affects the fusiform laterality of English reading by comparing three samples: Chinese speakers, English speakers with Chinese experience, and English speakers without Chinese experience. We found that, when reading words in their respective native language, Chinese and English speakers without Chinese experience differed in functional laterality of the posterior fusiform region (right laterality for Chinese speakers, but left laterality for English speakers). More importantly, compared with English speakers without Chinese experience, English speakers with Chinese experience showed more recruitment of the right posterior fusiform cortex for English words and pseudowords, which is similar to how Chinese speakers processed Chinese. These results suggest that long-term experience with Chinese shapes the fusiform laterality of English reading and have important implications for our understanding of the cross-language influences in terms of neural organization and of the functions of different fusiform subregions in reading. PMID:25598049

  13. The Effects of Self-Disclosure on Male and Female Perceptions of Individuals Who Stutter.

    PubMed

    Byrd, Courtney T; McGill, Megann; Gkalitsiou, Zoi; Cappellini, Colleen

    2017-02-01

    The purpose of this study was to examine the influence of self-disclosure on observers' perceptions of persons who stutter. Participants (N = 173) were randomly assigned to view 2 of 4 possible videos (i.e., male self-disclosure, male no self-disclosure, female self-disclosure, and female no self-disclosure). After viewing both videos, participants completed a survey assessing their perceptions of the speakers. Controlling for observer and speaker gender, listeners were more likely to select speakers who self-disclosed their stuttering as more friendly, outgoing, and confident compared with speakers who did not self-disclose. Observers were more likely to select speakers who did not self-disclose as unfriendly and shy compared with speakers who used a self-disclosure statement. Controlling for self-disclosure and observer gender, observers were less likely to choose the female speaker as friendlier, outgoing, and confident compared with the male speaker. Observers also were more likely to select the female speaker as unfriendly, shy, unintelligent, and insecure compared with the male speaker and were more likely to report that they were more distracted when viewing the videos. Results lend support to the effectiveness of self-disclosure as a technique that persons who stutter can use to positively influence the perceptions of listeners.

  14. A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia

    PubMed Central

    Law, Sam-Po; Chak, Gigi Wan-Chi

    2017-01-01

    Purpose Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Results Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. Conclusions The current results supported the sketch model of language–gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed. PMID:28609510

  15. Accounting for the listener: comparing the production of contrastive intonation in typically-developing speakers and speakers with autism.

    PubMed

    Kaland, Constantijn; Swerts, Marc; Krahmer, Emiel

    2013-09-01

    The present research investigates what drives the prosodic marking of contrastive information. For example, a typically developing speaker of a Germanic language like Dutch generally refers to a pink car as a "PINK car" (accented words in capitals) when a previously mentioned car was red. The main question addressed in this paper is whether contrastive intonation is produced with respect to the speaker's or (also) the listener's perspective on the preceding discourse. Furthermore, this research investigates the production of contrastive intonation by typically developing speakers and speakers with autism. The latter group is investigated because people with autism are argued to have difficulties accounting for another person's mental state and exhibit difficulties in the production and perception of accentuation and pitch range. To this end, utterances with contrastive intonation are elicited from both groups and analyzed in terms of function and form of prosody using production and perception measures. Contrary to expectations, typically developing speakers and speakers with autism produce functionally similar contrastive intonation as both groups account for both their own and their listener's perspective. However, typically developing speakers use a larger pitch range and are perceived as speaking more dynamically than speakers with autism, suggesting differences in their use of prosodic form.

  16. A Comparison of Coverbal Gesture Use in Oral Discourse Among Speakers With Fluent and Nonfluent Aphasia.

    PubMed

    Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi

    2017-07-12

    Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Multimedia data of discourse samples from these speakers were extracted from the Cantonese AphasiaBank. Gestures were independently annotated on their forms and functions to determine how gesturing rate and distribution of gestures differed across speaker groups. A multiple regression was conducted to determine the most predictive variable(s) for gesture-to-word ratio. Although speakers with nonfluent aphasia gestured most frequently, the rate of gesture use in counterparts with fluent aphasia did not differ significantly from controls. Different patterns of gesture functions in the 3 speaker groups revealed that gesture plays a minor role in lexical retrieval whereas its role in enhancing communication dominates among the speakers with aphasia. The percentages of complete sentences and dysfluency strongly predicted the gesturing rate in aphasia. The current results supported the sketch model of language-gesture association. The relationship between gesture production and linguistic abilities and clinical implications for gesture-based language intervention for speakers with aphasia are also discussed.

  17. Measurement of trained speech patterns in stuttering: interjudge and intrajudge agreement of experts by means of modified time-interval analysis.

    PubMed

    Alpermann, Anke; Huber, Walter; Natke, Ulrich; Willmes, Klaus

    2010-09-01

    Improved fluency after stuttering therapy is usually measured by the percentage of stuttered syllables. However, outcome studies rarely evaluate the use of trained speech patterns that speakers use to manage stuttering. This study investigated whether the modified time interval analysis can distinguish between trained speech patterns, fluent speech, and stuttered speech. Seventeen German experts on stuttering judged a speech sample on two occasions. Speakers of the sample were stuttering adults, who were not undergoing therapy, as well as participants in a fluency shaping and a stuttering modification therapy. Results showed satisfactory inter-judge and intra-judge agreement above 80%. Intervals with trained speech patterns were identified as consistently as stuttered and fluent intervals. We discuss limitations of the study, as well as implications of our findings for the development of training for identification of trained speech patterns and future outcome studies. The reader will be able to (a) explain different methods to measure the use of trained speech patterns, (b) evaluate whether German experts are able to discriminate intervals with trained speech patterns reliably from fluent and stuttered intervals and (c) describe how the measurement of trained speech patterns can contribute to outcome studies.

  18. The human genome: Some assembly required. Final report

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    NONE

    1994-12-31

    The Human Genome Project promises to be one of the most rewarding endeavors in modern biology. The cost and the ethical and social implications, however, have made this project the source of considerable debate both in the scientific community and in the public at large. The 1994 Graduate Student Symposium addresses the scientific merits of the project, the technical issues involved in accomplishing the task, as well as the medical and social issues which stem from the wealth of knowledge which the Human Genome Project will help create. To this end, speakers were brought together who represent the diverse areasmore » of expertise characteristic of this multidisciplinary project. The keynote speaker addresses the project`s motivations and goals in the larger context of biological and medical sciences. The first two sessions address relevant technical issues, data collection with a focus on high-throughput sequencing methods and data analysis with an emphasis on identification of coding sequences. The third session explores recent advances in the understanding of genetic diseases and possible routes to treatment. Finally, the last session addresses some of the ethical, social and legal issues which will undoubtedly arise from having a detailed knowledge of the human genome.« less

  19. Sentence durations and accentedness judgments

    NASA Astrophysics Data System (ADS)

    Bond, Z. S.; Stockmal, Verna; Markus, Dace

    2003-04-01

    Talkers in a second language can frequently be identified as speaking with a foreign accent. It is not clear to what degree a foreign accent represents specific deviations from a target language versus more general characteristics. We examined the identifications of native and non-native talkers by listeners with various amount of knowledge of the target language. Native and non-native speakers of Latvian provided materials. All the non-native talkers spoke Russian as their first language and were long-term residents of Latvia. A listening test, containing sentences excerpted from a short recorded passage, was presented to three groups of listeners: native speakers of Latvian, Russians for whom Latvian was a second language, and Americans with no knowledge of either of the two languages. The listeners were asked to judge whether each utterance was produced by a native or non-native talker. The Latvians identified the non-native talkers very accurately, 88%. The Russians were somewhat less accurate, 83%. The American listeners were least accurate, but still identified the non-native talkers at above chance levels, 62%. Sentence durations correlated with the judgments provided by the American listeners but not with the judgments provided by native or L2 listeners.

  20. How Do Speakers Avoid Ambiguous Linguistic Expressions?

    ERIC Educational Resources Information Center

    Ferreira, V.S.; Slevc, L.R.; Rogers, E.S.

    2005-01-01

    Three experiments assessed how speakers avoid linguistically and nonlinguistically ambiguous expressions. Speakers described target objects (a flying mammal, bat) in contexts including foil objects that caused linguistic (a baseball bat) and nonlinguistic (a larger flying mammal) ambiguity. Speakers sometimes avoided linguistic-ambiguity, and they…

  1. Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers.

    PubMed

    Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

    2018-01-01

    The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration.

  2. Combining Behavioral and ERP Methodologies to Investigate the Differences Between McGurk Effects Demonstrated by Cantonese and Mandarin Speakers

    PubMed Central

    Zhang, Juan; Meng, Yaxuan; McBride, Catherine; Fan, Xitao; Yuan, Zhen

    2018-01-01

    The present study investigated the impact of Chinese dialects on McGurk effect using behavioral and event-related potential (ERP) methodologies. Specifically, intra-language comparison of McGurk effect was conducted between Mandarin and Cantonese speakers. The behavioral results showed that Cantonese speakers exhibited a stronger McGurk effect in audiovisual speech perception compared to Mandarin speakers, although both groups performed equally in the auditory and visual conditions. ERP results revealed that Cantonese speakers were more sensitive to visual cues than Mandarin speakers, though this was not the case for the auditory cues. Taken together, the current findings suggest that the McGurk effect generated by Chinese speakers is mainly influenced by segmental phonology during audiovisual speech integration. PMID:29780312

  3. Left-lateralized N170 Effects of Visual Expertise in Reading: Evidence from Japanese Syllabic and Logographic Scripts

    PubMed Central

    Maurer, Urs; Zevin, Jason D.; McCandliss, Bruce D.

    2015-01-01

    The N170 component of the event-related potential (ERP) reflects experience-dependent neural changes in several forms of visual expertise, including expertise for visual words. Readers skilled in writing systems that link characters to phonemes (i.e., alphabetic writing) typically produce a left-lateralized N170 to visual word forms. This study examined the N170 in three Japanese scripts that link characters to larger phonological units. Participants were monolingual English speakers (EL1) and native Japanese speakers (JL1) who were also proficient in English. ERPs were collected using a 129-channel array, as participants performed a series of experiments viewing words or novel control stimuli in a repetition detection task. The N170 was strongly left-lateralized for all three Japanese scripts (including logographic Kanji characters) in JL1 participants, but bilateral in EL1 participants viewing these same stimuli. This demonstrates that left-lateralization of the N170 is dependent on specific reading expertise and is not limited to alphabetic scripts. Additional contrasts within the moraic Katakana script revealed equivalent N170 responses in JL1 speakers for familiar Katakana words and for Kanji words transcribed into novel Katakana words, suggesting that the N170 expertise effect is driven by script familiarity rather than familiarity with particular visual word forms. Finally, for English words and novel symbol string stimuli, both EL1 and JL1 subjects produced equivalent responses for the novel symbols, and more left-lateralized N170 responses for the English words, indicating that such effects are not limited to the first language. Taken together, these cross-linguistic results suggest that similar neural processes underlie visual expertise for print in very different writing systems. PMID:18370600

  4. Objective eye-gaze behaviour during face-to-face communication with proficient alaryngeal speakers: a preliminary study.

    PubMed

    Evitts, Paul; Gallop, Robert

    2011-01-01

    There is a large body of research demonstrating the impact of visual information on speaker intelligibility in both normal and disordered speaker populations. However, there is minimal information on which specific visual features listeners find salient during conversational discourse. To investigate listeners' eye-gaze behaviour during face-to-face conversation with normal, laryngeal and proficient alaryngeal speakers. Sixty participants individually participated in a 10-min conversation with one of four speakers (typical laryngeal, tracheoesophageal, oesophageal, electrolaryngeal; 15 participants randomly assigned to one mode of speech). All speakers were > 85% intelligible and were judged to be 'proficient' by two certified speech-language pathologists. Participants were fitted with a head-mounted eye-gaze tracking device (Mobile Eye, ASL) that calculated the region of interest and mean duration of eye-gaze. Self-reported gaze behaviour was also obtained following the conversation using a 10 cm visual analogue scale. While listening, participants viewed the lower facial region of the oesophageal speaker more than the normal or tracheoesophageal speaker. Results of non-hierarchical cluster analyses showed that while listening, the pattern of eye-gaze was predominantly directed at the lower face of the oesophageal and electrolaryngeal speaker and more evenly dispersed among the background, lower face, and eyes of the normal and tracheoesophageal speakers. Finally, results show a low correlation between self-reported eye-gaze behaviour and objective regions of interest data. Overall, results suggest similar eye-gaze behaviour when healthy controls converse with normal and tracheoesophageal speakers and that participants had significantly different eye-gaze patterns when conversing with an oesophageal speaker. Results are discussed in terms of existing eye-gaze data and its potential implications on auditory-visual speech perception. © 2011 Royal College of Speech & Language Therapists.

  5. Speech serial control in healthy speakers and speakers with hypokinetic or ataxic dysarthria: effects of sequence length and practice

    PubMed Central

    Reilly, Kevin J.; Spencer, Kristie A.

    2013-01-01

    The current study investigated the processes responsible for selection of sounds and syllables during production of speech sequences in 10 adults with hypokinetic dysarthria from Parkinson’s disease, five adults with ataxic dysarthria, and 14 healthy control speakers. Speech production data from a choice reaction time task were analyzed to evaluate the effects of sequence length and practice on speech sound sequencing. Speakers produced sequences that were between one and five syllables in length over five experimental runs of 60 trials each. In contrast to the healthy speakers, speakers with hypokinetic dysarthria demonstrated exaggerated sequence length effects for both inter-syllable intervals (ISIs) and speech error rates. Conversely, speakers with ataxic dysarthria failed to demonstrate a sequence length effect on ISIs and were also the only group that did not exhibit practice-related changes in ISIs and speech error rates over the five experimental runs. The exaggerated sequence length effects in the hypokinetic speakers with Parkinson’s disease are consistent with an impairment of action selection during speech sequence production. The absent length effects observed in the speakers with ataxic dysarthria is consistent with previous findings that indicate a limited capacity to buffer speech sequences in advance of their execution. In addition, the lack of practice effects in these speakers suggests that learning-related improvements in the production rate and accuracy of speech sequences involves processing by structures of the cerebellum. Together, the current findings inform models of serial control for speech in healthy speakers and support the notion that sequencing deficits contribute to speech symptoms in speakers with hypokinetic or ataxic dysarthria. In addition, these findings indicate that speech sequencing is differentially impaired in hypokinetic and ataxic dysarthria. PMID:24137121

  6. Discourse comprehension in L2: Making sense of what is not explicitly said.

    PubMed

    Foucart, Alice; Romero-Rivas, Carlos; Gort, Bernharda Lottie; Costa, Albert

    2016-12-01

    Using ERPs, we tested whether L2 speakers can integrate multiple sources of information (e.g., semantic, pragmatic information) during discourse comprehension. We presented native speakers and L2 speakers with three-sentence scenarios in which the final sentence was highly causally related, intermediately related, or causally unrelated to its context; its interpretation therefore required simple or complex inferences. Native speakers revealed a gradual N400-like effect, larger in the causally unrelated condition than in the highly related condition, and falling in-between in the intermediately related condition, replicating previous results. In the crucial intermediately related condition, L2 speakers behaved like native speakers, however, showing extra processing in a later time-window. Overall, the results show that, when reading, L2 speakers are able to process information from the local context and prior information (e.g., world knowledge) to build global coherence, suggesting that they process different sources of information to make inferences online during discourse comprehension, like native speakers. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Gender differences in identifying emotions from auditory and visual stimuli.

    PubMed

    Waaramaa, Teija

    2017-12-01

    The present study focused on gender differences in emotion identification from auditory and visual stimuli produced by two male and two female actors. Differences in emotion identification from nonsense samples, language samples and prolonged vowels were investigated. It was also studied whether auditory stimuli can convey the emotional content of speech without visual stimuli, and whether visual stimuli can convey the emotional content of speech without auditory stimuli. The aim was to get a better knowledge of vocal attributes and a more holistic understanding of the nonverbal communication of emotion. Females tended to be more accurate in emotion identification than males. Voice quality parameters played a role in emotion identification in both genders. The emotional content of the samples was best conveyed by nonsense sentences, better than by prolonged vowels or shared native language of the speakers and participants. Thus, vocal non-verbal communication tends to affect the interpretation of emotion even in the absence of language. The emotional stimuli were better recognized from visual stimuli than auditory stimuli by both genders. Visual information about speech may not be connected to the language; instead, it may be based on the human ability to understand the kinetic movements in speech production more readily than the characteristics of the acoustic cues.

  8. Development of a speaker discrimination test for cochlear implant users based on the Oldenburg Logatome corpus.

    PubMed

    Mühler, Roland; Ziese, Michael; Rostalski, Dorothea

    2009-01-01

    The purpose of the study was to develop a speaker discrimination test for cochlear implant (CI) users. The speech material was drawn from the Oldenburg Logatome (OLLO) corpus, which contains 150 different logatomes read by 40 German and 10 French native speakers. The prototype test battery included 120 logatome pairs spoken by 5 male and 5 female speakers with balanced representations of the conditions 'same speaker' and 'different speaker'. Ten adult normal-hearing listeners and 12 adult postlingually deafened CI users were included in a study to evaluate the suitability of the test. The mean speaker discrimination score for the CI users was 67.3% correct and for the normal-hearing listeners 92.2% correct. A significant influence of voice gender and fundamental frequency difference on the speaker discrimination score was found in CI users as well as in normal-hearing listeners. Since the test results of the CI users were significantly above chance level and no ceiling effect was observed, we conclude that subsets of the OLLO corpus are very well suited to speaker discrimination experiments in CI users. Copyright 2008 S. Karger AG, Basel.

  9. Speaker Clustering for a Mixture of Singing and Reading (Preprint)

    DTIC Science & Technology

    2012-03-01

    diarization [2, 3] which answers the ques- tion of ”who spoke when?” is a combination of speaker segmentation and clustering. Although it is possible to...focuses on speaker clustering, the techniques developed here can be applied to speaker diarization . For the remainder of this paper, the term ”speech...and retrieval,” Proceedings of the IEEE, vol. 88, 2000. [2] S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE

  10. Multicultural issues in test interpretation.

    PubMed

    Langdon, Henriette W; Wiig, Elisabeth H

    2009-11-01

    Designing the ideal test or series of tests to assess individuals who speak languages other than English is difficult. This article first describes some of the roadblocks-one of which is the lack of identification criteria for language and learning disabilities in monolingual and bilingual populations in most countries of the non-English-speaking world. This lag exists, in part, because access to general education is often limited. The second section describes tests that have been developed in the United States, primarily for Spanish-speaking individuals because they now represent the largest first-language majority in the United States (80% of English-language learners [ELLs] speak Spanish at home). We discuss tests developed for monolingual and bilingual English-Spanish speakers in the United States and divide this coverage into two parts: The first addresses assessment of students' first language (L1) and second language (L2), usually English, with different versions of the same test; the second describes assessment of L1 and L2 using the same version of the test, administered in the two languages. Examples of tests that fit a priori-determined criteria are briefly discussed throughout the article. Suggestions how to develop tests for speakers of languages other than English are also provided. In conclusion, we maintain that there will never be a perfect test or set of tests to adequately assess the communication skills of a bilingual individual. This is not surprising because we have yet to develop an ideal test or set of tests that fits monolingual Anglo speakers perfectly. Tests are tools, and the speech-language pathologist needs to know how to use those tools most effectively and equitably. The goal of this article is to provide such guidance. Thieme Medical Publishers.

  11. Learning Words from Speakers with False Beliefs

    ERIC Educational Resources Information Center

    Papafragou, Anna; Fairchild, Sarah; Cohen, Matthew L.; Friedberg, Carlyn

    2017-01-01

    During communication, hearers try to infer the speaker's intentions to be able to understand what the speaker means. Nevertheless, whether (and how early) preschoolers track their interlocutors' mental states is still a matter of debate. Furthermore, there is disagreement about how children's ability to consult a speaker's belief in communicative…

  12. International Student Speaker Programs: "Someone from Another World."

    ERIC Educational Resources Information Center

    Wilson, Angene

    This study surveyed members of the Association of International Educators and community volunteers to find out how international student speaker programs actually work. An international student speaker program provides speakers (from the university foreign student population) for community organizations and schools. The results of the survey (49…

  13. Linguistic "Mudes" and the De-Ethnicization of Language Choice in Catalonia

    ERIC Educational Resources Information Center

    Pujolar, Joan; Gonzalez, Isaac

    2013-01-01

    Catalan speakers have traditionally constructed the Catalan language as the main emblem of their identity even as migration filled the country with substantial numbers of speakers of Castilian. Although Catalan speakers have been bilingual in Catalan and Castilian for generations, sociolinguistic research has shown how speakers' bilingual…

  14. Embodied Communication: Speakers' Gestures Affect Listeners' Actions

    ERIC Educational Resources Information Center

    Cook, Susan Wagner; Tanenhaus, Michael K.

    2009-01-01

    We explored how speakers and listeners use hand gestures as a source of perceptual-motor information during naturalistic communication. After solving the Tower of Hanoi task either with real objects or on a computer, speakers explained the task to listeners. Speakers' hand gestures, but not their speech, reflected properties of the particular…

  15. Speech Breathing in Speakers Who Use an Electrolarynx

    ERIC Educational Resources Information Center

    Bohnenkamp, Todd A.; Stowell, Talena; Hesse, Joy; Wright, Simon

    2010-01-01

    Speakers who use an electrolarynx following a total laryngectomy no longer require pulmonary support for speech. Subsequently, chest wall movements may be affected; however, chest wall movements in these speakers are not well defined. The purpose of this investigation was to evaluate speech breathing in speakers who use an electrolarynx during…

  16. A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks.

    PubMed

    Shahamiri, Seyed Reza; Salim, Siti Salwah Binti

    2014-09-01

    Automatic speech recognition (ASR) can be very helpful for speakers who suffer from dysarthria, a neurological disability that damages the control of motor speech articulators. Although a few attempts have been made to apply ASR technologies to sufferers of dysarthria, previous studies show that such ASR systems have not attained an adequate level of performance. In this study, a dysarthric multi-networks speech recognizer (DM-NSR) model is provided using a realization of multi-views multi-learners approach called multi-nets artificial neural networks, which tolerates variability of dysarthric speech. In particular, the DM-NSR model employs several ANNs (as learners) to approximate the likelihood of ASR vocabulary words and to deal with the complexity of dysarthric speech. The proposed DM-NSR approach was presented as both speaker-dependent and speaker-independent paradigms. In order to highlight the performance of the proposed model over legacy models, multi-views single-learner models of the DM-NSRs were also provided and their efficiencies were compared in detail. Moreover, a comparison among the prominent dysarthric ASR methods and the proposed one is provided. The results show that the DM-NSR recorded improved recognition rate by up to 24.67% and the error rate was reduced by up to 8.63% over the reference model.

  17. Auditory evoked fields to vocalization during passive listening and active generation in adults who stutter.

    PubMed

    Beal, Deryk S; Cheyne, Douglas O; Gracco, Vincent L; Quraan, Maher A; Taylor, Margot J; De Nil, Luc F

    2010-10-01

    We used magnetoencephalography to investigate auditory evoked responses to speech vocalizations and non-speech tones in adults who do and do not stutter. Neuromagnetic field patterns were recorded as participants listened to a 1 kHz tone, playback of their own productions of the vowel /i/ and vowel-initial words, and actively generated the vowel /i/ and vowel-initial words. Activation of the auditory cortex at approximately 50 and 100 ms was observed during all tasks. A reduction in the peak amplitudes of the M50 and M100 components was observed during the active generation versus passive listening tasks dependent on the stimuli. Adults who stutter did not differ in the amount of speech-induced auditory suppression relative to fluent speakers. Adults who stutter had shorter M100 latencies for the actively generated speaking tasks in the right hemisphere relative to the left hemisphere but the fluent speakers showed similar latencies across hemispheres. During passive listening tasks, adults who stutter had longer M50 and M100 latencies than fluent speakers. The results suggest that there are timing, rather than amplitude, differences in auditory processing during speech in adults who stutter and are discussed in relation to hypotheses of auditory-motor integration breakdown in stuttering. Copyright 2010 Elsevier Inc. All rights reserved.

  18. Top–Down Modulation on the Perception and Categorization of Identical Pitch Contours in Speech and Music

    PubMed Central

    Weidema, Joey L.; Roncaglia-Denissen, M. P.; Honing, Henkjan

    2016-01-01

    Whether pitch in language and music is governed by domain-specific or domain-general cognitive mechanisms is contentiously debated. The aim of the present study was to investigate whether mechanisms governing pitch contour perception operate differently when pitch information is interpreted as either speech or music. By modulating listening mode, this study aspired to demonstrate that pitch contour perception relies on domain-specific cognitive mechanisms, which are regulated by top–down influences from language and music. Three groups of participants (Mandarin speakers, Dutch speaking non-musicians, and Dutch musicians) were exposed to identical pitch contours, and tested on their ability to identify these contours in a language and musical context. Stimuli consisted of disyllabic words spoken in Mandarin, and melodic tonal analogs, embedded in a linguistic and melodic carrier phrase, respectively. Participants classified identical pitch contours as significantly different depending on listening mode. Top–down influences from language appeared to alter the perception of pitch contour in speakers of Mandarin. This was not the case for non-musician speakers of Dutch. Moreover, this effect was lacking in Dutch speaking musicians. The classification patterns of pitch contours in language and music seem to suggest that domain-specific categorization is modulated by top–down influences from language and music. PMID:27313552

  19. Speaker-independent factors affecting the perception of foreign accent in a second languagea)

    PubMed Central

    Levi, Susannah V.; Winters, Stephen J.; Pisoni, David B.

    2012-01-01

    Previous research on foreign accent perception has largely focused on speaker-dependent factors such as age of learning and length of residence. Factors that are independent of a speaker’s language learning history have also been shown to affect perception of second language speech. The present study examined the effects of two such factors—listening context and lexical frequency—on the perception of foreign-accented speech. Listeners rated foreign accent in two listening contexts: auditory-only, where listeners only heard the target stimuli, and auditory+orthography, where listeners were presented with both an auditory signal and an orthographic display of the target word. Results revealed that higher frequency words were consistently rated as less accented than lower frequency words. The effect of the listening context emerged in two interactions: the auditory +orthography context reduced the effects of lexical frequency, but increased the perceived differences between native and non-native speakers. Acoustic measurements revealed some production differences for words of different levels of lexical frequency, though these differences could not account for all of the observed interactions from the perceptual experiment. These results suggest that factors independent of the speakers’ actual speech articulations can influence the perception of degree of foreign accent. PMID:17471745

  20. 2016 Microbial Stress Response GRC/GRS

    DTIC Science & Technology

    2016-09-13

    Holyoke College South Hadley, MA Chairs: Eduardo A. Groisman & Dianne K. Newman Vice Chairs: Petra A. Levin & William W. Navarre Contributors...by Discussion Leader 9:10 am - 9:35 am Martin Ackermann (ETH Zurich, Switzerland) "History-Dependence in Bacterial Stress Response – Scaling up from...Government. Microbial Stress Response GRC – Registration List Ackermann, Martin ETH Zurich Speaker Registered Andersson, Dan I Uppsala

  1. STS-41 Voice Command System Flight Experiment Report

    NASA Technical Reports Server (NTRS)

    Salazar, George A.

    1981-01-01

    This report presents the results of the Voice Command System (VCS) flight experiment on the five-day STS-41 mission. Two mission specialists,Bill Shepherd and Bruce Melnick, used the speaker-dependent system to evaluate the operational effectiveness of using voice to control a spacecraft system. In addition, data was gathered to analyze the effects of microgravity on speech recognition performance.

  2. Processing Subject-Verb Agreement in a Second Language Depends on Proficiency

    ERIC Educational Resources Information Center

    Hoshino, Noriko; Dussias, Paola E.; Kroll, Judith F.

    2010-01-01

    Subject-verb agreement is a computation that is often difficult to execute perfectly in the first language (L1) and even more difficult to produce skillfully in a second language (L2). In this study, we examine the way in which bilingual speakers complete sentence fragments in a manner that reflects access to both grammatical and conceptual…

  3. Second Language Processing: When Are First and Second Languages Processed Similarly?

    ERIC Educational Resources Information Center

    Sabourin, Laura; Stowe, Laurie A.

    2008-01-01

    In this article we investigate the effects of first language (L1) on second language (L2) neural processing for two grammatical constructions (verbal domain dependency and grammatical gender), focusing on the event-related potential P600 effect, which has been found in both L1 and L2 processing. Native Dutch speakers showed a P600 effect for both…

  4. Processing of Tense Morphology and Filler-Gap Dependencies by Chinese Second Language Speakers of English

    ERIC Educational Resources Information Center

    Dong, Zhiyin Renee

    2014-01-01

    There is an ongoing debate in the field of Second Language Acquisition concerning whether a fundamental difference exists between the native language (L1) and adult second language (L2) online processing of syntax and morpho-syntax. The Shallow Structure Hypothesis (SSH) (Clahsen and Felser, 2006a, b) states that L2 online parsing is qualitatively…

  5. What Do You Teach...? The Role of Argument in Rhetorical Invention: An Integrated Skills Approach.

    ERIC Educational Resources Information Center

    Stelzner, Sara L.

    Speaking and writing should be taught together as they both are concerned with the communication model that includes a speaker, a listener, and a subject and the way these elements affect each other. In speaking, it is clear that invention is a public process depending on the listener's or receiver's active participation in the creation of…

  6. Timed and Untimed Grammaticality Judgments Measure Distinct Types of Knowledge: Evidence from Eye-Movement Patterns

    ERIC Educational Resources Information Center

    Godfroid, Aline; Loewen, Shawn; Jung, Sehoon; Park, Ji-Hyun; Gass, Susan; Ellis, Rod

    2015-01-01

    Grammaticality judgment tests (GJTs) have been used to elicit data reflecting second language (L2) speakers' knowledge of L2 grammar. However, the exact constructs measured by GJTs, whether primarily implicit or explicit knowledge, are disputed and have been argued to differ depending on test-related variables (i.e., time pressure and item…

  7. The speakers' bureau system: a form of peer selling.

    PubMed

    Reid, Lynette; Herder, Matthew

    2013-01-01

    In the speakers' bureau system, physicians are recruited and trained by pharmaceutical, biotechnology, and medical device companies to deliver information about products to other physicians, in exchange for a fee. Using publicly available disclosures, we assessed the thesis that speakers' bureau involvement is not a feature of academic medicine in Canada, by estimating the prevalence of participation in speakers' bureaus among Canadian faculty in one medical specialty, cardiology. We analyzed the relevant features of an actual contract made public by the physician addressee and applied the Canadian Medical Association (CMA) guidelines on physician-industry relations to participation in a speakers' bureau. We argue that speakers' bureau participation constitutes a form of peer selling that should be understood to contravene the prohibition on product endorsement in the CMA Code of Ethics. Academic medical institutions, in conjunction with regulatory colleges, should continue and strengthen their policies to address participation in speakers' bureaus.

  8. Simultaneous Talk--From the Perspective of Floor Management of English and Japanese Speakers.

    ERIC Educational Resources Information Center

    Hayashi, Reiko

    1988-01-01

    Investigates simultaneous talk in face-to-face conversation using the analytic framework of "floor" proposed by Edelsky (1981). Analysis of taped conversation among speakers of Japanese and among speakers of English shows that, while both groups use simultaneous talk, it is used more frequently by Japanese speakers. A reference list…

  9. Respiratory Control in Stuttering Speakers: Evidence from Respiratory High-Frequency Oscillations.

    ERIC Educational Resources Information Center

    Denny, Margaret; Smith, Anne

    2000-01-01

    This study examined whether stuttering speakers (N=10) differed from fluent speakers in relations between the neural control systems for speech and life support. It concluded that in some stuttering speakers the relations between respiratory controllers are atypical, but that high participation by the high frequency oscillation-producing circuitry…

  10. The Effects of Source Unreliability on Prior and Future Word Learning

    ERIC Educational Resources Information Center

    Faught, Gayle G.; Leslie, Alicia D.; Scofield, Jason

    2015-01-01

    Young children regularly learn words from interactions with other speakers, though not all speakers are reliable informants. Interestingly, children will reverse to trusting a reliable speaker when a previously endorsed speaker proves unreliable. When later asked to identify the referent of a novel word, children who reverse trust are less willing…

  11. The Semantic Basis of Do So.

    ERIC Educational Resources Information Center

    Binder, Richard

    The thesis of this paper is that the "do so" test described by Lakoff and Ross (1966) is a test of the speaker's belief system regarding the relationship of verbs to their surface subject, and that judgments of grammaticality concerning "do so" are based on the speaker's underlying semantic beliefs. ("Speaker" refers here to both speakers and…

  12. Speaker Reliability Guides Children's Inductive Inferences about Novel Properties

    ERIC Educational Resources Information Center

    Kim, Sunae; Kalish, Charles W.; Harris, Paul L.

    2012-01-01

    Prior work shows that children can make inductive inferences about objects based on their labels rather than their appearance (Gelman, 2003). A separate line of research shows that children's trust in a speaker's label is selective. Children accept labels from a reliable speaker over an unreliable speaker (e.g., Koenig & Harris, 2005). In the…

  13. Native-Speakerism and the Complexity of Personal Experience: A Duoethnographic Study

    ERIC Educational Resources Information Center

    Lowe, Robert J.; Kiczkowiak, Marek

    2016-01-01

    This paper presents a duoethnographic study into the effects of native-speakerism on the professional lives of two English language teachers, one "native", and one "non-native speaker" of English. The goal of the study was to build on and extend existing research on the topic of native-speakerism by investigating, through…

  14. Research Timeline: Second Language Communication Strategies

    ERIC Educational Resources Information Center

    Kennedy, Sara; Trofimovich, Pavel

    2016-01-01

    Speakers of a second language (L2), regardless of profciency level, communicate for specifc purposes. For example, an L2 speaker of English may wish to build rapport with a co-worker by chatting about the weather. The speaker will draw on various resources to accomplish her communicative purposes. For instance, the speaker may say "falling…

  15. Word Stress and Pronunciation Teaching in English as a Lingua Franca Contexts

    ERIC Educational Resources Information Center

    Lewis, Christine; Deterding, David

    2018-01-01

    Traditionally, pronunciation was taught by reference to native-speaker models. However, as speakers around the world increasingly interact in English as a lingua franca (ELF) contexts, there is less focus on native-speaker targets, and there is wide acceptance that achieving intelligibility is crucial while mimicking native-speaker pronunciation…

  16. Defining "Native Speaker" in Multilingual Settings: English as a Native Language in Asia

    ERIC Educational Resources Information Center

    Hansen Edwards, Jette G.

    2017-01-01

    The current study examines how and why speakers of English from multilingual contexts in Asia are identifying as native speakers of English. Eighteen participants from different contexts in Asia, including Singapore, Malaysia, India, Taiwan, and The Philippines, who self-identified as native speakers of English participated in hour-long interviews…

  17. Speaker Identity Supports Phonetic Category Learning

    ERIC Educational Resources Information Center

    Mani, Nivedita; Schneider, Signe

    2013-01-01

    Visual cues from the speaker's face, such as the discriminable mouth movements used to produce speech sounds, improve discrimination of these sounds by adults. The speaker's face, however, provides more information than just the mouth movements used to produce speech--it also provides a visual indexical cue of the identity of the speaker. The…

  18. The Interpretability Hypothesis: Evidence from Wh-Interrogatives in Second Language Acquisition

    ERIC Educational Resources Information Center

    Tsimpli, Ianthi Maria; Dimitrakopoulou, Maria

    2007-01-01

    The second language acquisition (SLA) literature reports numerous studies of proficient second language (L2) speakers who diverge significantly from native speakers despite the evidence offered by the L2 input. Recent SLA theories have attempted to account for native speaker/non-native speaker (NS/NNS) divergence by arguing for the dissociation…

  19. The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex, and age

    NASA Astrophysics Data System (ADS)

    Smith, David R. R.; Patterson, Roy D.

    2005-11-01

    Glottal-pulse rate (GPR) and vocal-tract length (VTL) are related to the size, sex, and age of the speaker but it is not clear how the two factors combine to influence our perception of speaker size, sex, and age. This paper describes experiments designed to measure the effect of the interaction of GPR and VTL upon judgements of speaker size, sex, and age. Vowels were scaled to represent people with a wide range of GPRs and VTLs, including many well beyond the normal range of the population, and listeners were asked to judge the size and sex/age of the speaker. The judgements of speaker size show that VTL has a strong influence upon perceived speaker size. The results for the sex and age categorization (man, woman, boy, or girl) show that, for vowels with GPR and VTL values in the normal range, judgements of speaker sex and age are influenced about equally by GPR and VTL. For vowels with abnormal combinations of low GPRs and short VTLs, the VTL information appears to decide the sex/age judgement.

  20. Comparison of singer's formant, speaker's ring, and LTA spectrum among classical singers and untrained normal speakers.

    PubMed

    Oliveira Barrichelo, V M; Heuer, R J; Dean, C M; Sataloff, R T

    2001-09-01

    Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.

  1. Talker and accent variability effects on spoken word recognition

    NASA Astrophysics Data System (ADS)

    Nyang, Edna E.; Rogers, Catherine L.; Nishi, Kanae

    2003-04-01

    A number of studies have shown that words in a list are recognized less accurately in noise and with longer response latencies when they are spoken by multiple talkers, rather than a single talker. These results have been interpreted as support for an exemplar-based model of speech perception, in which it is assumed that detailed information regarding the speaker's voice is preserved in memory and used in recognition, rather than being eliminated via normalization. In the present study, the effects of varying both accent and talker are investigated using lists of words spoken by (a) a single native English speaker, (b) six native English speakers, (c) three native English speakers and three Japanese-accented English speakers. Twelve /hVd/ words were mixed with multi-speaker babble at three signal-to-noise ratios (+10, +5, and 0 dB) to create the word lists. Native English-speaking listeners' percent-correct recognition for words produced by native English speakers across the three talker conditions (single talker native, multi-talker native, and multi-talker mixed native and non-native) and three signal-to-noise ratios will be compared to determine whether sources of speaker variability other than voice alone add to the processing demands imposed by simple (i.e., single accent) speaker variability in spoken word recognition.

  2. Speaker and Observer Perceptions of Physical Tension during Stuttering.

    PubMed

    Tichenor, Seth; Leslie, Paula; Shaiman, Susan; Yaruss, J Scott

    2017-01-01

    Speech-language pathologists routinely assess physical tension during evaluation of those who stutter. If speakers experience tension that is not visible to clinicians, then judgments of severity may be inaccurate. This study addressed this potential discrepancy by comparing judgments of tension by people who stutter and expert clinicians to determine if clinicians could accurately identify the speakers' experience of physical tension. Ten adults who stutter were audio-video recorded in two speaking samples. Two board-certified specialists in fluency evaluated the samples using the Stuttering Severity Instrument-4 and a checklist adapted for this study. Speakers rated their tension using the same forms, and then discussed their experiences in a qualitative interview so that themes related to physical tension could be identified. The degree of tension reported by speakers was higher than that observed by specialists. Tension in parts of the body that were less visible to the observer (chest, abdomen, throat) was reported more by speakers than by specialists. The thematic analysis revealed that speakers' experience of tension changes over time and that these changes may be related to speakers' acceptance of stuttering. The lack of agreement between speaker and specialist perceptions of tension suggests that using self-reports is a necessary component for supporting the accurate diagnosis of tension in stuttering. © 2018 S. Karger AG, Basel.

  3. Non-English speakers attend gastroenterology clinic appointments at higher rates than English speakers in a vulnerable patient population

    PubMed Central

    Sewell, Justin L.; Kushel, Margot B.; Inadomi, John M.; Yee, Hal F.

    2009-01-01

    Goals We sought to identify factors associated with gastroenterology clinic attendance in an urban safety net healthcare system. Background Missed clinic appointments reduce the efficiency and availability of healthcare, but subspecialty clinic attendance among patients with established healthcare access has not been studied. Study We performed an observational study using secondary data from administrative sources to study patients referred to, and scheduled for an appointment in, the adult gastroenterology clinic serving the safety net healthcare system of San Francisco, California. Our dependent variable was whether subjects attended or missed a scheduled appointment. Analysis included multivariable logistic regression and classification tree analysis. 1,833 patients were referred and scheduled for an appointment between 05/2005 and 08/2006. Prisoners were excluded. All patients had a primary care provider. Results 683 patients (37.3%) missed their appointment; 1,150 (62.7%) attended. Language was highly associated with attendance in the logistic regression; non-English speakers were less likely than English speakers to miss an appointment (adjusted odds ratio 0.42 [0.28,0.63] for Spanish, 0.56 [0.38,0.82] for Asian language, p < 0.001). Other factors were also associated with attendance, but classification tree analysis identified language to be the most highly associated variable. Conclusions In an urban safety net healthcare population, among patients with established healthcare access and a scheduled gastroenterology clinic appointment, not speaking English was most strongly associated with higher attendance rates. Patient related factors associated with not speaking English likely influence subspecialty clinic attendance rates, and these factors may differ from those affecting general healthcare access. PMID:19169147

  4. Education techniques for lifelong learning: giving a PowerPoint presentation: the art of communicating effectively.

    PubMed

    Collins, Jannette

    2004-01-01

    Effectiveness of an oral presentation depends on the ability of the speaker to communicate with the audience. An important part of this communication is focusing on two to five key points and emphasizing those points during the presentation. Every aspect of the presentation should be purposeful and directed at facilitating learners' achievement of the objectives. This necessitates that the speaker has carefully developed the objectives and built the presentation around attainment of the objectives. The best presentations are rehearsed, not so that the speaker memorizes exactly what he or she will say, but to facilitate the speaker's ability to interact with the audience and portray a relaxed, professional, and confident demeanor. Rehearsal also helps alleviate stage fright. The most useful method of controlling nervousness is to visualize success. When showing images, it is important to orient the audience with an adequate description, point out the relevant findings, and allow enough time for the audience to assimilate the information before moving on. This can be facilitated with appropriate use of a laser pointer, cursor, or use of builds and transitioning. A presentation should be designed to include as much audience participation as possible, no matter the size of the audience. Techniques to encourage audience participation include questioning, brainstorming, small-group activities, role-playing, case-based examples, and directed listening. It is first necessary to motivate and gain attention of the learner for learning to take place. This can be accomplished through appropriate use of humor, anecdotes, and quotations. Attention should be given to posture, body movement, eye contact, and voice when speaking, as how one appears to the audience will have an impact on their reaction to what is presented. Copyright RSNA, 2004

  5. Speech Prosody Across Stimulus Types for Individuals with Parkinson's Disease.

    PubMed

    K-Y Ma, Joan; Schneider, Christine B; Hoffmann, Rüdiger; Storch, Alexander

    2015-01-01

    Up to 89% of the individuals with Parkinson's disease (PD) experience speech problem over the course of the disease. Speech prosody and intelligibility are two of the most affected areas in hypokinetic dysarthria. However, assessment of these areas could potentially be problematic as speech prosody and intelligibility could be affected by the type of speech materials employed. To comparatively explore the effects of different types of speech stimulus on speech prosody and intelligibility in PD speakers. Speech prosody and intelligibility of two groups of individuals with varying degree of dysarthria resulting from PD was compared to that of a group of control speakers using sentence reading, passage reading and monologue. Acoustic analysis including measures on fundamental frequency (F0), intensity and speech rate was used to form a prosodic profile for each individual. Speech intelligibility was measured for the speakers with dysarthria using direct magnitude estimation. Difference in F0 variability between the speakers with dysarthria and control speakers was only observed in sentence reading task. Difference in the average intensity level was observed for speakers with mild dysarthria to that of the control speakers. Additionally, there were stimulus effect on both intelligibility and prosodic profile. The prosodic profile of PD speakers was different from that of the control speakers in the more structured task, and lower intelligibility was found in less structured task. This highlighted the value of both structured and natural stimulus to evaluate speech production in PD speakers.

  6. Social dominance orientation, nonnative accents, and hiring recommendations.

    PubMed

    Hansen, Karolina; Dovidio, John F

    2016-10-01

    Discrimination against nonnative speakers is widespread and largely socially acceptable. Nonnative speakers are evaluated negatively because accent is a sign that they belong to an outgroup and because understanding their speech requires unusual effort from listeners. The present research investigated intergroup bias, based on stronger support for hierarchical relations between groups (social dominance orientation [SDO]), as a predictor of hiring recommendations of nonnative speakers. In an online experiment using an adaptation of the thin-slices methodology, 65 U.S. adults (54% women; 80% White; Mage = 35.91, range = 18-67) heard a recording of a job applicant speaking with an Asian (Mandarin Chinese) or a Latino (Spanish) accent. Participants indicated how likely they would be to recommend hiring the speaker, answered questions about the text, and indicated how difficult it was to understand the applicant. Independent of objective comprehension, participants high in SDO reported that it was more difficult to understand a Latino speaker than an Asian speaker. SDO predicted hiring recommendations of the speakers, but this relationship was mediated by the perception that nonnative speakers were difficult to understand. This effect was stronger for speakers from lower status groups (Latinos relative to Asians) and was not related to objective comprehension. These findings suggest a cycle of prejudice toward nonnative speakers: Not only do perceptions of difficulty in understanding cause prejudice toward them, but also prejudice toward low-status groups can lead to perceived difficulty in understanding members of these groups. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

  7. Effects of Language Background on Gaze Behavior: A Crosslinguistic Comparison Between Korean and German Speakers

    PubMed Central

    Goller, Florian; Lee, Donghoon; Ansorge, Ulrich; Choi, Soonja

    2017-01-01

    Languages differ in how they categorize spatial relations: While German differentiates between containment (in) and support (auf) with distinct spatial words—(a) den Kuli IN die Kappe stecken (”put pen in cap”); (b) die Kappe AUF den Kuli stecken (”put cap on pen”)—Korean uses a single spatial word (kkita) collapsing (a) and (b) into one semantic category, particularly when the spatial enclosure is tight-fit. Korean uses a different word (i.e., netha) for loose-fits (e.g., apple in bowl). We tested whether these differences influence the attention of the speaker. In a crosslinguistic study, we compared native German speakers with native Korean speakers. Participants rated the similarity of two successive video clips of several scenes where two objects were joined or nested (either in a tight or loose manner). The rating data show that Korean speakers base their rating of similarity more on tight- versus loose-fit, whereas German speakers base their rating more on containment versus support (in vs. auf). Throughout the experiment, we also measured the participants’ eye movements. Korean speakers looked equally long at the moving Figure object and at the stationary Ground object, whereas German speakers were more biased to look at the Ground object. Additionally, Korean speakers also looked more at the region where the two objects touched than did German speakers. We discuss our data in the light of crosslinguistic semantics and the extent of their influence on spatial cognition and perception. PMID:29362644

  8. Factor analysis of auto-associative neural networks with application in speaker verification.

    PubMed

    Garimella, Sri; Hermansky, Hynek

    2013-04-01

    Auto-associative neural network (AANN) is a fully connected feed-forward neural network, trained to reconstruct its input at its output through a hidden compression layer, which has fewer numbers of nodes than the dimensionality of input. AANNs are used to model speakers in speaker verification, where a speaker-specific AANN model is obtained by adapting (or retraining) the universal background model (UBM) AANN, an AANN trained on multiple held out speakers, using corresponding speaker data. When the amount of speaker data is limited, this adaptation procedure may lead to overfitting as all the parameters of UBM-AANN are adapted. In this paper, we introduce and develop the factor analysis theory of AANNs to alleviate this problem. We hypothesize that only the weight matrix connecting the last nonlinear hidden layer and the output layer is speaker-specific, and further restrict it to a common low-dimensional subspace during adaptation. The subspace is learned using large amounts of development data, and is held fixed during adaptation. Thus, only the coordinates in a subspace, also known as i-vector, need to be estimated using speaker-specific data. The update equations are derived for learning both the common low-dimensional subspace and the i-vectors corresponding to speakers in the subspace. The resultant i-vector representation is used as a feature for the probabilistic linear discriminant analysis model. The proposed system shows promising results on the NIST-08 speaker recognition evaluation (SRE), and yields a 23% relative improvement in equal error rate over the previously proposed weighted least squares-based subspace AANNs system. The experiments on NIST-10 SRE confirm that these improvements are consistent and generalize across datasets.

  9. Speaker and Accent Variation Are Handled Differently: Evidence in Native and Non-Native Listeners

    PubMed Central

    Kriengwatana, Buddhamas; Terry, Josephine; Chládková, Kateřina; Escudero, Paola

    2016-01-01

    Listeners are able to cope with between-speaker variability in speech that stems from anatomical sources (i.e. individual and sex differences in vocal tract size) and sociolinguistic sources (i.e. accents). We hypothesized that listeners adapt to these two types of variation differently because prior work indicates that adapting to speaker/sex variability may occur pre-lexically while adapting to accent variability may require learning from attention to explicit cues (i.e. feedback). In Experiment 1, we tested our hypothesis by training native Dutch listeners and Australian-English (AusE) listeners without any experience with Dutch or Flemish to discriminate between the Dutch vowels /I/ and /ε/ from a single speaker. We then tested their ability to classify /I/ and /ε/ vowels of a novel Dutch speaker (i.e. speaker or sex change only), or vowels of a novel Flemish speaker (i.e. speaker or sex change plus accent change). We found that both Dutch and AusE listeners could successfully categorize vowels if the change involved a speaker/sex change, but not if the change involved an accent change. When AusE listeners were given feedback on their categorization responses to the novel speaker in Experiment 2, they were able to successfully categorize vowels involving an accent change. These results suggest that adapting to accents may be a two-step process, whereby the first step involves adapting to speaker differences at a pre-lexical level, and the second step involves adapting to accent differences at a contextual level, where listeners have access to word meaning or are given feedback that allows them to appropriately adjust their perceptual category boundaries. PMID:27309889

  10. Integrating hidden Markov model and PRAAT: a toolbox for robust automatic speech transcription

    NASA Astrophysics Data System (ADS)

    Kabir, A.; Barker, J.; Giurgiu, M.

    2010-09-01

    An automatic time-aligned phone transcription toolbox of English speech corpora has been developed. Especially the toolbox would be very useful to generate robust automatic transcription and able to produce phone level transcription using speaker independent models as well as speaker dependent models without manual intervention. The system is based on standard Hidden Markov Models (HMM) approach and it was successfully experimented over a large audiovisual speech corpus namely GRID corpus. One of the most powerful features of the toolbox is the increased flexibility in speech processing where the speech community would be able to import the automatic transcription generated by HMM Toolkit (HTK) into a popular transcription software, PRAAT, and vice-versa. The toolbox has been evaluated through statistical analysis on GRID data which shows that automatic transcription deviates by an average of 20 ms with respect to manual transcription.

  11. Prosody in the hands of the speaker

    PubMed Central

    Guellaï, Bahia; Langus, Alan; Nespor, Marina

    2014-01-01

    In everyday life, speech is accompanied by gestures. In the present study, two experiments tested the possibility that spontaneous gestures accompanying speech carry prosodic information. Experiment 1 showed that gestures provide prosodic information, as adults are able to perceive the congruency between low-pass filtered—thus unintelligible—speech and the gestures of the speaker. Experiment 2 shows that in the case of ambiguous sentences (i.e., sentences with two alternative meanings depending on their prosody) mismatched prosody and gestures lead participants to choose more often the meaning signaled by gestures. Our results demonstrate that the prosody that characterizes speech is not a modality specific phenomenon: it is also perceived in the spontaneous gestures that accompany speech. We draw the conclusion that spontaneous gestures and speech form a single communication system where the suprasegmental aspects of spoken language are mapped to the motor-programs responsible for the production of both speech sounds and hand gestures. PMID:25071666

  12. Value of Sample Return and High Precision Analyses: Need for A Resource of Compelling Stories, Metaphors and Examples for Public Speakers

    NASA Technical Reports Server (NTRS)

    Allton, J. H.

    2017-01-01

    There is widespread agreement among planetary scientists that much of what we know about the workings of the solar system comes from accurate, high precision measurements on returned samples. Precision is a function of the number of atoms the instrumentation is able to count. Accuracy depends on the calibration or standardization technique. For Genesis, the solar wind sample return mission, acquiring enough atoms to ensure precise SW measurements and then accurately quantifying those measurements were steps known to be non-trivial pre-flight. The difficulty of precise and accurate measurements on returned samples, and why they cannot be made remotely, is not communicated well to the public. In part, this is be-cause "high precision" is abstract and error bars are not very exciting topics. This paper explores ideas for collecting and compiling compelling metaphors and colorful examples as a resource for planetary science public speakers.

  13. Stimulus presentation order and the perception of lexical tones in Cantonese

    NASA Astrophysics Data System (ADS)

    Francis, Alexander L.; Ciocca, Valter

    2003-09-01

    Listeners' auditory discrimination of vowel sounds depends in part on the order in which stimuli are presented. Such presentation order effects have been argued to be language independent, and to result from psychophysical (not speech- or language-specific) factors such as the decay of memory traces over time or increased weighting of later-occurring stimuli. In the present study, native Cantonese speakers' discrimination of a linguistic tone continuum is shown to exhibit order of presentation effects similar to those shown for vowels in previous studies. When presented with two successive syllables differing in fundamental frequency by approximately 4 Hz, listeners were significantly more sensitive to this difference when the first syllable was higher in frequency than the second. However, American English-speaking listeners with no experience listening to Cantonese showed no such contrast effect when tested in the same manner using the same stimuli. Neither English nor Cantonese listeners showed any order of presentation effects in the discrimination of a nonspeech continuum in which tokens had the same fundamental frequencies as the Cantonese speech tokens but had a qualitatively non-speech-like timbre. These results suggest that tone presentation order effects, unlike vowel effects, may be language specific, possibly resulting from the need to compensate for utterance-related pitch declination when evaluating fundamental frequency for tone identification.

  14. Human Language Technology: Opportunities and Challenges

    DTIC Science & Technology

    2005-01-01

    because of the connections to and reliance on signal processing. Audio diarization critically includes indexing of speakers [12], since speaker ...to reduce inter- speaker variability in training. Standard techniques include vocal-tract length normalization, adaptation of acoustic models using...maximum likelihood linear regression (MLLR), and speaker -adaptive training based on MLLR. The acoustic models are mixtures of Gaussians, typically with

  15. Evaluation of Speakers with Foreign-Accented Speech in Japan: The Effect of Accent Produced by English Native Speakers

    ERIC Educational Resources Information Center

    Tsurutani, Chiharu

    2012-01-01

    Foreign-accented speakers are generally regarded as less educated, less reliable and less interesting than native speakers and tend to be associated with cultural stereotypes of their country of origin. This discrimination against foreign accents has, however, been discussed mainly using accented English in English-speaking countries. This study…

  16. The Employability of Non-Native-Speaker Teachers of EFL: A UK Survey

    ERIC Educational Resources Information Center

    Clark, Elizabeth; Paran, Amos

    2007-01-01

    The native speaker still has a privileged position in English language teaching, representing both the model speaker and the ideal teacher. Non-native-speaker teachers of English are often perceived as having a lower status than their native-speaking counterparts, and have been shown to face discriminatory attitudes when applying for teaching…

  17. Generic Language and Speaker Confidence Guide Preschoolers' Inferences about Novel Animate Kinds

    ERIC Educational Resources Information Center

    Stock, Hayli R.; Graham, Susan A.; Chambers, Craig G.

    2009-01-01

    We investigated the influence of speaker certainty on 156 four-year-old children's sensitivity to generic and nongeneric statements. An inductive inference task was implemented, in which a speaker described a nonobvious property of a novel creature using either a generic or a nongeneric statement. The speaker appeared to be confident, neutral, or…

  18. Modern Greek Language: Acquisition of Morphology and Syntax by Non-Native Speakers

    ERIC Educational Resources Information Center

    Andreou, Georgia; Karapetsas, Anargyros; Galantomos, Ioannis

    2008-01-01

    This study investigated the performance of native and non native speakers of Modern Greek language on morphology and syntax tasks. Non-native speakers of Greek whose native language was English, which is a language with strict word order and simple morphology, made more errors and answered more slowly than native speakers on morphology but not…

  19. A Comparison of Coverbal Gesture Use in Oral Discourse among Speakers with Fluent and Nonfluent Aphasia

    ERIC Educational Resources Information Center

    Kong, Anthony Pak-Hin; Law, Sam-Po; Chak, Gigi Wan-Chi

    2017-01-01

    Purpose: Coverbal gesture use, which is affected by the presence and degree of aphasia, can be culturally specific. The purpose of this study was to compare gesture use among Cantonese-speaking individuals: 23 neurologically healthy speakers, 23 speakers with fluent aphasia, and 21 speakers with nonfluent aphasia. Method: Multimedia data of…

  20. Accent Attribution in Speakers with Foreign Accent Syndrome

    ERIC Educational Resources Information Center

    Verhoeven, Jo; De Pauw, Guy; Pettinato, Michele; Hirson, Allen; Van Borsel, John; Marien, Peter

    2013-01-01

    Purpose: The main aim of this experiment was to investigate the perception of Foreign Accent Syndrome in comparison to speakers with an authentic foreign accent. Method: Three groups of listeners attributed accents to conversational speech samples of 5 FAS speakers which were embedded amongst those of 5 speakers with a real foreign accent and 5…

  1. Race in Conflict with Heritage: "Black" Heritage Language Speaker of Japanese

    ERIC Educational Resources Information Center

    Doerr, Neriko Musha; Kumagai, Yuri

    2014-01-01

    "Heritage language speaker" is a relatively new term to denote minority language speakers who grew up in a household where the language was used or those who have a family, ancestral, or racial connection to the minority language. In research on heritage language speakers, overlap between these 2 definitions is often assumed--that is,…

  2. Early Language Experience Facilitates the Processing of Gender Agreement in Spanish Heritage Speakers

    ERIC Educational Resources Information Center

    Montrul, Silvina; Davidson, Justin; De La Fuente, Israel; Foote, Rebecca

    2014-01-01

    We examined how age of acquisition in Spanish heritage speakers and L2 learners interacts with implicitness vs. explicitness of tasks in gender processing of canonical and non-canonical ending nouns. Twenty-three Spanish native speakers, 29 heritage speakers, and 33 proficiency-matched L2 learners completed three on-line spoken word recognition…

  3. The Role of Interaction in Native Speaker Comprehension of Nonnative Speaker Speech.

    ERIC Educational Resources Information Center

    Polio, Charlene; Gass, Susan M.

    1998-01-01

    Because interaction gives language learners an opportunity to modify their speech upon a signal of noncomprehension, it should also have a positive effect on native speakers' (NS) comprehension of nonnative speakers (NNS). This study shows that interaction does help NSs comprehend NNSs, contrasting the claims of an earlier study that found no…

  4. The perception of syllable affiliation of singleton stops in repetitive speech.

    PubMed

    de Jong, Kenneth J; Lim, Byung-Jin; Nagao, Kyoko

    2004-01-01

    Stetson (1951) noted that repeating singleton coda consonants at fast speech rates makes them be perceived as onset consonants affiliated with a following vowel. The current study documents the perception of rate-induced resyllabification, as well as what temporal properties give rise to the perception of syllable affiliation. Stimuli were extracted from a previous study of repeated stop + vowel and vowel + stop syllables (de Jong, 2001a, 2001b). Forced-choice identification tasks show that slow repetitions are clearly distinguished. As speakers increase rate, they reach a point after which listeners disagree as to the affiliation of the stop. This pattern is found for voiced and voiceless consonants using different stimulus extraction techniques. Acoustic models of the identifications indicate that the sudden shift in syllabification occurs with the loss of an acoustic hiatus between successive syllables. Acoustic models of the fast rate identifications indicate various other qualities, such as consonant voicing, affect the probability that the consonants will be perceived as onsets. These results indicate a model of syllabic affiliation where specific juncture-marking aspects of the signal dominate parsing, and in their absence other differences provide additional, weaker cues to syllabic affiliation.

  5. Person authentication using brainwaves (EEG) and maximum a posteriori model adaptation.

    PubMed

    Marcel, Sébastien; Millán, José Del R

    2007-04-01

    In this paper, we investigate the use of brain activity for person authentication. It has been shown in previous studies that the brain-wave pattern of every individual is unique and that the electroencephalogram (EEG) can be used for biometric identification. EEG-based biometry is an emerging research topic and we believe that it may open new research directions and applications in the future. However, very little work has been done in this area and was focusing mainly on person identification but not on person authentication. Person authentication aims to accept or to reject a person claiming an identity, i.e., comparing a biometric data to one template, while the goal of person identification is to match the biometric data against all the records in a database. We propose the use of a statistical framework based on Gaussian Mixture Models and Maximum A Posteriori model adaptation, successfully applied to speaker and face authentication, which can deal with only one training session. We perform intensive experimental simulations using several strict train/test protocols to show the potential of our method. We also show that there are some mental tasks that are more appropriate for person authentication than others.

  6. The cognitive neuroscience of person identification.

    PubMed

    Biederman, Irving; Shilowich, Bryan E; Herald, Sarah B; Margalit, Eshed; Maarek, Rafael; Meschke, Emily X; Hacker, Catrina M

    2018-02-14

    We compare and contrast five differences between person identification by voice and face. 1. There is little or no cost when a familiar face is to be recognized from an unrestricted set of possible faces, even at Rapid Serial Visual Presentation (RSVP) rates, but the accuracy of familiar voice recognition declines precipitously when the set of possible speakers is increased from one to a mere handful. 2. Whereas deficits in face recognition are typically perceptual in origin, those with normal perception of voices can manifest severe deficits in their identification. 3. Congenital prosopagnosics (CPros) and congenital phonagnosics (CPhon) are generally unable to imagine familiar faces and voices, respectively. Only in CPros, however, is this deficit a manifestation of a general inability to form visual images of any kind. CPhons report no deficit in imaging non-voice sounds. 4. The prevalence of CPhons of 3.2% is somewhat higher than the reported prevalence of approximately 2.0% for CPros in the population. There is evidence that CPhon represents a distinct condition statistically and not just normal variation. 5. Face and voice recognition proficiency are uncorrelated rather than reflecting limitations of a general capacity for person individuation. Copyright © 2018 Elsevier Ltd. All rights reserved.

  7. Working with Speakers.

    ERIC Educational Resources Information Center

    Pestel, Ann

    1989-01-01

    The author discusses working with speakers from business and industry to present career information at the secondary level. Advice for speakers is presented, as well as tips for program coordinators. (CH)

  8. Variability and Intelligibility of Clarified Speech to Different Listener Groups

    NASA Astrophysics Data System (ADS)

    Silber, Ronnie F.

    Two studies examined the modifications that adult speakers make in speech to disadvantaged listeners. Previous research that has focused on speech to the deaf individuals and to young children has shown that adults clarify speech when addressing these two populations. Acoustic measurements suggest that the signal undergoes similar changes for both populations. Perceptual tests corroborate these results for the deaf population, but are nonsystematic in developmental studies. The differences in the findings for these populations and the nonsystematic results in the developmental literature may be due to methodological factors. The present experiments addressed these methodological questions. Studies of speech to hearing impaired listeners have used read, nonsense, sentences, for which speakers received explicit clarification instructions and feedback, while in the child literature, excerpts of real-time conversations were used. Therefore, linguistic samples were not precisely matched. In this study, experiments used various linguistic materials. Experiment 1 used a children's story; experiment 2, nonsense sentences. Four mothers read both types of material in four ways: (1) in "normal" adult speech, (2) in "babytalk," (3) under the clarification instructions used in the "hearing impaired studies" (instructed clear speech) and (4) in (spontaneous) clear speech without instruction. No extra practice or feedback was given. Sentences were presented to 40 normal hearing college students with and without simultaneous masking noise. Results were separately tabulated for content and function words, and analyzed using standard statistical tests. The major finding in the study was individual variation in speaker intelligibility. "Real world" speakers vary in their baseline intelligibility. The four speakers also showed unique patterns of intelligibility as a function of each independent variable. Results were as follows. Nonsense sentences were less intelligible than story sentences. Function words were equal to, or more intelligible than, content words. Babytalk functioned as a clear speech style in story sentences but not nonsense sentences. One of the two clear speech styles was clearer than normal speech in adult-directed clarification. However, which style was clearer depended on interactions among the variables. The individual patterns seemed to result from interactions among demand characteristics, baseline intelligibility, materials, and differences in articulatory flexibility.

  9. Grammatical Planning Units During Real-Time Sentence Production in Speakers With Agrammatic Aphasia and Healthy Speakers.

    PubMed

    Lee, Jiyeon; Yoshida, Masaya; Thompson, Cynthia K

    2015-08-01

    Grammatical encoding (GE) is impaired in agrammatic aphasia; however, the nature of such deficits remains unclear. We examined grammatical planning units during real-time sentence production in speakers with agrammatic aphasia and control speakers, testing two competing models of GE. We queried whether speakers with agrammatic aphasia produce sentences word by word without advanced planning or whether hierarchical syntactic structure (i.e., verb argument structure; VAS) is encoded as part of the advanced planning unit. Experiment 1 examined production of sentences with a predefined structure (i.e., "The A and the B are above the C") using eye tracking. Experiment 2 tested production of transitive and unaccusative sentences without a predefined sentence structure in a verb-priming study. In Experiment 1, both speakers with agrammatic aphasia and young and age-matched control speakers used word-by-word strategies, selecting the first lemma (noun A) only prior to speech onset. However, in Experiment 2, unlike controls, speakers with agrammatic aphasia preplanned transitive and unaccusative sentences, encoding VAS before speech onset. Speakers with agrammatic aphasia show incremental, word-by-word production for structurally simple sentences, requiring retrieval of multiple noun lemmas. However, when sentences involve functional (thematic to grammatical) structure building, advanced planning strategies (i.e., VAS encoding) are used. This early use of hierarchical syntactic information may provide a scaffold for impaired GE in agrammatism.

  10. Grammatical Encoding and Learning in Agrammatic Aphasia: Evidence from Structural Priming

    PubMed Central

    Cho-Reyes, Soojin; Mack, Jennifer E.; Thompson, Cynthia K.

    2017-01-01

    The present study addressed open questions about the nature of sentence production deficits in agrammatic aphasia. In two structural priming experiments, 13 aphasic and 13 age-matched control speakers repeated visually- and auditorily-presented prime sentences, and then used visually-presented word arrays to produce dative sentences. Experiment 1 examined whether agrammatic speakers form structural and thematic representations during sentence production, whereas Experiment 2 tested the lasting effects of structural priming in lags of two and four sentences. Results of Experiment 1 showed that, like unimpaired speakers, the aphasic speakers evinced intact structural priming effects, suggesting that they are able to generate such representations. Unimpaired speakers also evinced reliable thematic priming effects, whereas agrammatic speakers did so in some experimental conditions, suggesting that access to thematic representations may be intact. Results of Experiment 2 showed structural priming effects of comparable magnitude for aphasic and unimpaired speakers. In addition, both groups showed lasting structural priming effects in both lag conditions, consistent with implicit learning accounts. In both experiments, aphasic speakers with more severe language impairments exhibited larger priming effects, consistent with the “inverse preference” prediction of implicit learning accounts. The findings indicate that agrammatic speakers are sensitive to structural priming across levels of representation and that such effects are lasting, suggesting that structural priming may be beneficial for the treatment of sentence production deficits in agrammatism. PMID:28924328

  11. Brief Report: Relations between Prosodic Performance and Communication and Socialization Ratings in High Functioning Speakers with Autism Spectrum Disorders

    ERIC Educational Resources Information Center

    Paul, Rhea; Shriberg, Lawrence D.; McSweeny, Jane; Cicchetti, Domenic; Klin, Ami; Volkmar, Fred

    2005-01-01

    Shriberg "et al." [Shriberg, L. "et al." (2001). "Journal of Speech, Language and Hearing Research, 44," 1097-1115] described prosody-voice features of 30 high functioning speakers with autistic spectrum disorder (ASD) compared to age-matched control speakers. The present study reports additional information on the speakers with ASD, including…

  12. Investigating Holistic Measures of Speech Prosody

    ERIC Educational Resources Information Center

    Cunningham, Dana Aliel

    2012-01-01

    Speech prosody is a multi-faceted dimension of speech which can be measured and analyzed in a variety of ways. In this study, the speech prosody of Mandarin L1 speakers, English L2 speakers, and English L1 speakers was assessed by trained raters who listened to sound clips of the speakers responding to a graph prompt and reading a short passage.…

  13. Young Children's Sensitivity to Speaker Gender When Learning from Others

    ERIC Educational Resources Information Center

    Ma, Lili; Woolley, Jacqueline D.

    2013-01-01

    This research explores whether young children are sensitive to speaker gender when learning novel information from others. Four- and 6-year-olds ("N" = 144) chose between conflicting statements from a male versus a female speaker (Studies 1 and 3) or decided which speaker (male or female) they would ask (Study 2) when learning about the functions…

  14. Switches to English during French Service Encounters: Relationships with L2 French Speakers' Willingness to Communicate and Motivation

    ERIC Educational Resources Information Center

    McNaughton, Stephanie; McDonough, Kim

    2015-01-01

    This exploratory study investigated second language (L2) French speakers' service encounters in the multilingual setting of Montreal, specifically whether switches to English during French service encounters were related to L2 speakers' willingness to communicate or motivation. Over a two-week period, 17 French L2 speakers in Montreal submitted…

  15. A Respirometric Technique to Evaluate Velopharyngeal Function in Speakers with Cleft Palate, with and without Prostheses.

    ERIC Educational Resources Information Center

    Gilbert, Harvey R.; Ferrand, Carole T.

    1987-01-01

    Respirometric quotients (RQ), the ratio of oral air volume expended to total volume expended, were obtained from the productions of oral and nasal airflow of 10 speakers with cleft palate, with and without their prosthetic appliances, and 10 normal speakers. Cleft palate speakers without their appliances exhibited the lowest RQ values. (Author/DB)

  16. Using Stimulated Recall to Investigate Native Speaker Perceptions in Native-Nonnative Speaker Interaction

    ERIC Educational Resources Information Center

    Polio, Charlene; Gass, Susan; Chapin, Laura

    2006-01-01

    Implicit negative feedback has been shown to facilitate SLA, and the extent to which such feedback is given is related to a variety of task and interlocutor variables. The background of a native speaker (NS), in terms of amount of experience in interactions with nonnative speakers (NNSs), has been shown to affect the quantity of implicit negative…

  17. Compliment Responses: Comparing American Learners of Japanese, Native Japanese Speakers, and American Native English Speakers

    ERIC Educational Resources Information Center

    Tatsumi, Naofumi

    2012-01-01

    Previous research shows that American learners of Japanese (AJs) tend to differ from native Japanese speakers in their compliment responses (CRs). Yokota (1986) and Shimizu (2009) have reported that AJs tend to respond more negatively than native Japanese speakers. It has also been reported that AJs' CRs tend to lack the use of avoidance or…

  18. Factors Influencing Oral Corrective Feedback Provision in the Spanish Foreign Language Classroom: Investigating Instructor Native/Nonnative Speaker Status, SLA Education, & Teaching Experience

    ERIC Educational Resources Information Center

    Gurzynski-Weiss, Laura

    2010-01-01

    The role of interactional feedback has been a critical area of second language acquisition (SLA) research for decades and while findings suggest interactional feedback can facilitate SLA, the extent of its influence can vary depending on a number of factors, including the native language of those involved in communication. Although studies have…

  19. Intelligibility of clear speech: effect of instruction.

    PubMed

    Lam, Jennifer; Tjaden, Kris

    2013-10-01

    The authors investigated how clear speech instructions influence sentence intelligibility. Twelve speakers produced sentences in habitual, clear, hearing impaired, and overenunciate conditions. Stimuli were amplitude normalized and mixed with multitalker babble for orthographic transcription by 40 listeners. The main analysis investigated percentage-correct intelligibility scores as a function of the 4 conditions and speaker sex. Additional analyses included listener response variability, individual speaker trends, and an alternate intelligibility measure: proportion of content words correct. Relative to the habitual condition, the overenunciate condition was associated with the greatest intelligibility benefit, followed by the hearing impaired and clear conditions. Ten speakers followed this trend. The results indicated different patterns of clear speech benefit for male and female speakers. Greater listener variability was observed for speakers with inherently low habitual intelligibility compared to speakers with inherently high habitual intelligibility. Stable proportions of content words were observed across conditions. Clear speech instructions affected the magnitude of the intelligibility benefit. The instruction to overenunciate may be most effective in clear speech training programs. The findings may help explain the range of clear speech intelligibility benefit previously reported. Listener variability analyses suggested the importance of obtaining multiple listener judgments of intelligibility, especially for speakers with inherently low habitual intelligibility.

  20. Discrimination of speaker sex and size when glottal-pulse rate and vocal-tract length are controlled.

    PubMed

    Smith, David R R; Walters, Thomas C; Patterson, Roy D

    2007-12-01

    A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.

  1. On the same wavelength: predictable language enhances speaker-listener brain-to-brain synchrony in posterior superior temporal gyrus.

    PubMed

    Dikker, Suzanne; Silbert, Lauren J; Hasson, Uri; Zevin, Jason D

    2014-04-30

    Recent research has shown that the degree to which speakers and listeners exhibit similar brain activity patterns during human linguistic interaction is correlated with communicative success. Here, we used an intersubject correlation approach in fMRI to test the hypothesis that a listener's ability to predict a speaker's utterance increases such neural coupling between speakers and listeners. Nine subjects listened to recordings of a speaker describing visual scenes that varied in the degree to which they permitted specific linguistic predictions. In line with our hypothesis, the temporal profile of listeners' brain activity was significantly more synchronous with the speaker's brain activity for highly predictive contexts in left posterior superior temporal gyrus (pSTG), an area previously associated with predictive auditory language processing. In this region, predictability differentially affected the temporal profiles of brain responses in the speaker and listeners respectively, in turn affecting correlated activity between the two: whereas pSTG activation increased with predictability in the speaker, listeners' pSTG activity instead decreased for more predictable sentences. Listeners additionally showed stronger BOLD responses for predictive images before sentence onset, suggesting that highly predictable contexts lead comprehenders to preactivate predicted words.

  2. When speaker identity is unavoidable: Neural processing of speaker identity cues in natural speech.

    PubMed

    Tuninetti, Alba; Chládková, Kateřina; Peter, Varghese; Schiller, Niels O; Escudero, Paola

    2017-11-01

    Speech sound acoustic properties vary largely across speakers and accents. When perceiving speech, adult listeners normally disregard non-linguistic variation caused by speaker or accent differences, in order to comprehend the linguistic message, e.g. to correctly identify a speech sound or a word. Here we tested whether the process of normalizing speaker and accent differences, facilitating the recognition of linguistic information, is found at the level of neural processing, and whether it is modulated by the listeners' native language. In a multi-deviant oddball paradigm, native and nonnative speakers of Dutch were exposed to naturally-produced Dutch vowels varying in speaker, sex, accent, and phoneme identity. Unexpectedly, the analysis of mismatch negativity (MMN) amplitudes elicited by each type of change shows a large degree of early perceptual sensitivity to non-linguistic cues. This finding on perception of naturally-produced stimuli contrasts with previous studies examining the perception of synthetic stimuli wherein adult listeners automatically disregard acoustic cues to speaker identity. The present finding bears relevance to speech normalization theories, suggesting that at an unattended level of processing, listeners are indeed sensitive to changes in fundamental frequency in natural speech tokens. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. Patterns of lung volume use during an extemporaneous speech task in persons with Parkinson disease.

    PubMed

    Bunton, Kate

    2005-01-01

    This study examined patterns of lung volume use in speakers with Parkinson disease (PD) during an extemporaneous speaking task. The performance of a control group was also examined. Behaviors described are based on acoustic, kinematic and linguistic measures. Group differences were found in breath group duration, lung volume initiation, and lung volume termination measures. Speakers in the control group alternated between a longer and shorter breath groups. With starting lung volumes being higher for the longer breath groups and lower for shorter breath groups. Speech production was terminated before reaching tidal end expiratory level. This pattern was also seen in 4 of 7 speakers with PD. The remaining 3 PD speakers initiated speech at low starting lung volumes and continued speaking below EEL. This subgroup of PD speakers ended breath groups at agrammatical boundaries, whereas control speakers ended at appropriate grammatical boundaries. As a result of participating in this exercise, the reader will (1) be able to describe the patterns of lung volume use in speakers with Parkinson disease and compare them with those employed by control speakers; and (2) obtain information about the influence of speaking task on speech breathing.

  4. When one person's mistake is another's standard usage: the effect of foreign accent on syntactic processing.

    PubMed

    Hanulíková, Adriana; van Alphen, Petra M; van Goch, Merel M; Weber, Andrea

    2012-04-01

    How do native listeners process grammatical errors that are frequent in non-native speech? We investigated whether the neural correlates of syntactic processing are modulated by speaker identity. ERPs to gender agreement errors in sentences spoken by a native speaker were compared with the same errors spoken by a non-native speaker. In line with previous research, gender violations in native speech resulted in a P600 effect (larger P600 for violations in comparison with correct sentences), but when the same violations were produced by the non-native speaker with a foreign accent, no P600 effect was observed. Control sentences with semantic violations elicited comparable N400 effects for both the native and the non-native speaker, confirming no general integration problem in foreign-accented speech. The results demonstrate that the P600 is modulated by speaker identity, extending our knowledge about the role of speaker's characteristics on neural correlates of speech processing.

  5. Factors affecting the perception of Korean-accented American English

    NASA Astrophysics Data System (ADS)

    Cho, Kwansun; Harris, John G.; Shrivastav, Rahul

    2005-09-01

    This experiment examines the relative contribution of two factors, intonation and articulation errors, on the perception of foreign accent in Korean-accented American English. Ten native speakers of Korean and ten native speakers of American English were asked to read ten English sentences. These sentences were then modified using high-quality speech resynthesis techniques [STRAIGHT Kawahara et al., Speech Commun. 27, 187-207 (1999)] to generate four sets of stimuli. In the first two sets of stimuli, the intonation patterns of the Korean speakers and American speakers were switched with one another. The articulatory errors for each speaker were not modified. In the final two sets, the sentences from the Korean and American speakers were resynthesized without any modifications. Fifteen listeners were asked to rate all the stimuli for the degree of foreign accent. Preliminary results show that, for native speakers of American English, articulation errors may play a greater role in the perception of foreign accent than errors in intonation patterns. [Work supported by KAIM.

  6. Compound nouns in spoken language production by speakers with aphasia compared to neurologically healthy speakers: an exploratory study.

    PubMed

    Eiesland, Eli Anne; Lind, Marianne

    2012-03-01

    Compounds are words that are made up of at least two other words (lexemes), featuring lexical and syntactic characteristics and thus particularly interesting for the study of language processing. Most studies of compounds and language processing have been based on data from experimental single word production and comprehension tasks. To enhance the ecological validity of morphological processing research, data from other contexts, such as discourse production, need to be considered. This study investigates the production of nominal compounds in semi-spontaneous spoken texts by a group of speakers with fluent types of aphasia compared to a group of neurologically healthy speakers. The speakers with aphasia produce significantly fewer nominal compound types in their texts than the non-aphasic speakers, and the compounds they produce exhibit fewer different types of semantic relations than the compounds produced by the non-aphasic speakers. The results are discussed in relation to theories of language processing.

  7. Detecting Infections Rapidly and Easily for Candidemia Trial (DIRECT1): A Prospective, Multicenter Study of the T2Candida Panel

    PubMed Central

    Clancy, Cornelius J; Pappas, Peter; Vazquez, Jose; Judson, Marc A; Tobin, Ellis; Kontoyiannis, Dimitrios P; Thompson, George R; Reboli, Annette; Garey, Kevin W; Greenberg, Richard N; Ostrosky-Zeichner, Luis; Wu, Alan; Lyon, G Marshall; Apewokin, Senu; Nguyen, M Hong; Caliendo, Angela

    2017-01-01

    Abstract Background Blood cultures (BC) are the diagnostic gold standard for candidemia, but sensitivity is <50%. T2 Candida (T2) is a novel, FDA-approved nanodiagnostic panel, which utilizes T2 magnetic resonance and a dedicated instrument to detect Candida within whole blood samples. Methods Candidemic adults were identified at 14 centers by diagnostic BC (dBC). Follow-up blood samples were collected from all patients (pts) for testing by T2 and companion BC (cBC). T2 was run-in batch at a central lab; results are reported qualitatively for three groups of spp. (Candida albicans/C. tropicalis (CA/CT), C. glabrata/C. krusei (CG/CK), or C. parapsilosis (CP)). T2 and cBC were defined as positive (+) if they detected a sp. identified in dBC. Results 152 patients were enrolled (median age: 54 yrs (18–93); 54% (82) men). Candidemia risk factors included indwelling catheters (82%, 125), abdominal surgery (24%, 36), transplant (22%, 33), cancer (22%, 33), hemodialysis (17%, 26), neutropenia (10%, 15). Mean times to Candida detection/spp. identification by dBC were 47/133 hours (2/5.5 d). dBC revealed CA (30%, 46), CG (29%, 45), CP (28%, 43), CT (11%, 17) and CK (3%, 4). Mean time to collection of T2/cBC was 62 hours (2.6 d). 74% (112) of patients received antifungal (AF) therapy prior to T2/cBC (mean: 55 hours (2.3 d)). Overall, T2 results were more likely than cBC to be + (P < 0.0001; Table), a result driven by performance in AF-treated patients (P < 0.0001). T2 was more likely to be + among patients originally infected with CA (61% (28) vs. 20% (9); P = 0.001); there were trends toward higher positivity in patients infected with CT (59% (17) vs. 23% (4; P = 0.08) and CP (42% (18) vs. 28% (12); P = 0.26). T2 was + in 89% (32/36) of patients with + cBC. Conclusion T2 was sensitive for diagnosing candidemia at the time of + cBC, and it was significantly more like to be + than cBC among AF-treated patients. T2 is an important advance in the diagnosis of candidemia, which is likely to be particularly useful in patients receiving prophylactic, pre-emptive or empiric AF therapy. Test results, n (%) Pt group (n) T2+ T2- cBC+ cBC- T2+/cBC+ T2+/cBC- T2-/cBC+ T2-/cBC- All (152) 69 
(45%) 83 
(55%) 36 
(24%) 116 
(76%) 32 
(21%) 37 
(24%) 4 
(3%) 79 
(52%) Prior AF (112) 55 
(49%) 57 
(51%) 23 
(20%) 89 
(80%) 20 
(18%) 35 
(31%) 3 
(3%) 54 
(48%) No AF (40) 14 
(35%) 26 
(65%) 13 
(32%) 27 
(68%) 12 
(30%) 2 
(5%) 1 
(2%) 25 
(62%) Disclosure D. P. Kontoyiannis, Pfizer: Research Contractor, Research support and Speaker honorarium; Astellas: Research Contractor, Research support and Speaker honorarium; Merck: Honorarium, Speaker honorarium; Cidara: Honorarium, Speaker honorarium; Amplyx: Honorarium, Speaker honorarium; F2G: Honorarium, Speaker honorarium; L. Ostrosky-Zeichner, Astellas: Consultant and Grant Investigator, Consulting fee and Research grant; Merck: Scientific Advisor and Speaker’s Bureau, Consulting fee and Speaker honorarium; Pfizer: Grant Investigator and Speaker’s Bureau, Grant recipient and Speaker honorarium; Scynexis: Grant Investigator and Scientific Advisor, Consulting fee and Grant recipient; Cidara: Grant Investigator and Scientific Advisor, Consulting fee and Research grant; S. Apewokin, T2 biosystems: Investigator, Research support; Astellas: Scientific Advisor, Consulting fee

  8. RTP Speakers Bureau

    EPA Pesticide Factsheets

    The Research Triangle Park Speakers Bureau page is a free resource that schools, universities, and community groups in the Raleigh-Durham-Chapel Hill, N.C. area can use to request speakers and find educational resources.

  9. Children's Understanding That Utterances Emanate from Minds: Using Speaker Belief To Aid Interpretation.

    ERIC Educational Resources Information Center

    Mitchell, Peter; Robinson, Elizabeth J.; Thompson, Doreen E.

    1999-01-01

    Three experiments examined 3- to 6-year olds' ability to use a speaker's utterance based on false belief to identify which of several referents was intended. Found that many 4- to 5-year olds performed correctly only when it was unnecessary to consider the speaker's belief. When the speaker gave an ambiguous utterance, many 3- to 6-year olds…

  10. Speaker Introductions at Internal Medicine Grand Rounds: Forms of Address Reveal Gender Bias.

    PubMed

    Files, Julia A; Mayer, Anita P; Ko, Marcia G; Friedrich, Patricia; Jenkins, Marjorie; Bryan, Michael J; Vegunta, Suneela; Wittich, Christopher M; Lyle, Melissa A; Melikian, Ryan; Duston, Trevor; Chang, Yu-Hui H; Hayes, Sharonne N

    2017-05-01

    Gender bias has been identified as one of the drivers of gender disparity in academic medicine. Bias may be reinforced by gender subordinating language or differential use of formality in forms of address. Professional titles may influence the perceived expertise and authority of the referenced individual. The objective of this study is to examine how professional titles were used in the same and mixed-gender speaker introductions at Internal Medicine Grand Rounds (IMGR). A retrospective observational study of video-archived speaker introductions at consecutive IMGR was conducted at two different locations (Arizona, Minnesota) of an academic medical center. Introducers and speakers at IMGR were physician and scientist peers holding MD, PhD, or MD/PhD degrees. The primary outcome was whether or not a speaker's professional title was used during the first form of address during speaker introductions at IMGR. As secondary outcomes, we evaluated whether or not the speakers professional title was used in any form of address during the introduction. Three hundred twenty-one forms of address were analyzed. Female introducers were more likely to use professional titles when introducing any speaker during the first form of address compared with male introducers (96.2% [102/106] vs. 65.6% [141/215]; p < 0.001). Female dyads utilized formal titles during the first form of address 97.8% (45/46) compared with male dyads who utilized a formal title 72.4% (110/152) of the time (p = 0.007). In mixed-gender dyads, where the introducer was female and speaker male, formal titles were used 95.0% (57/60) of the time. Male introducers of female speakers utilized professional titles 49.2% (31/63) of the time (p < 0.001). In this study, women introduced by men at IMGR were less likely to be addressed by professional title than were men introduced by men. Differential formality in speaker introductions may amplify isolation, marginalization, and professional discomfiture expressed by women faculty in academic medicine.

  11. Infants' understanding of false labeling events: the referential roles of words and the speakers who use them.

    PubMed

    Koenig, Melissa A; Echols, Catharine H

    2003-04-01

    The four studies reported here examine whether 16-month-old infants' responses to true and false utterances interact with their knowledge of human agents. In Study 1, infants heard repeated instances either of true or false labeling of common objects; labels came from an active human speaker seated next to the infant. In Study 2, infants experienced the same stimuli and procedure; however, we replaced the human speaker of Study 1 with an audio speaker in the same location. In Study 3, labels came from a hidden audio speaker. In Study 4, a human speaker labeled the objects while facing away from them. In Study 1, infants looked significantly longer to the human agent when she falsely labeled than when she truthfully labeled the objects. Infants did not show a similar pattern of attention for the audio speaker of Study 2, the silent human of Study 3 or the facing-backward speaker of Study 4. In fact, infants who experienced truthful labeling looked significantly longer to the facing-backward labeler of Study 4 than to true labelers of the other three contexts. Additionally, infants were more likely to correct false labels when produced by the human labeler of Study 1 than in any of the other contexts. These findings suggest, first, that infants are developing a critical conception of other human speakers as truthful communicators, and second, that infants understand that human speakers may provide uniquely useful information when a word fails to match its referent. These findings are consistent with the view that infants can recognize differences in knowledge and that such differences can be based on differences in the availability of perceptual experience.

  12. Speech recognition: Acoustic-phonetic knowledge acquisition and representation

    NASA Astrophysics Data System (ADS)

    Zue, Victor W.

    1988-09-01

    The long-term research goal is to develop and implement speaker-independent continuous speech recognition systems. It is believed that the proper utilization of speech-specific knowledge is essential for such advanced systems. This research is thus directed toward the acquisition, quantification, and representation, of acoustic-phonetic and lexical knowledge, and the application of this knowledge to speech recognition algorithms. In addition, we are exploring new speech recognition alternatives based on artificial intelligence and connectionist techniques. We developed a statistical model for predicting the acoustic realization of stop consonants in various positions in the syllable template. A unification-based grammatical formalism was developed for incorporating this model into the lexical access algorithm. We provided an information-theoretic justification for the hierarchical structure of the syllable template. We analyzed segmented duration for vowels and fricatives in continuous speech. Based on contextual information, we developed durational models for vowels and fricatives that account for over 70 percent of the variance, using data from multiple, unknown speakers. We rigorously evaluated the ability of human spectrogram readers to identify stop consonants spoken by many talkers and in a variety of phonetic contexts. Incorporating the declarative knowledge used by the readers, we developed a knowledge-based system for stop identification. We achieved comparable system performance to that to the readers.

  13. Classifying Human Voices by Using Hybrid SFX Time-Series Preprocessing and Ensemble Feature Selection

    PubMed Central

    Wong, Raymond

    2013-01-01

    Voice biometrics is one kind of physiological characteristics whose voice is different for each individual person. Due to this uniqueness, voice classification has found useful applications in classifying speakers' gender, mother tongue or ethnicity (accent), emotion states, identity verification, verbal command control, and so forth. In this paper, we adopt a new preprocessing method named Statistical Feature Extraction (SFX) for extracting important features in training a classification model, based on piecewise transformation treating an audio waveform as a time-series. Using SFX we can faithfully remodel statistical characteristics of the time-series; together with spectral analysis, a substantial amount of features are extracted in combination. An ensemble is utilized in selecting only the influential features to be used in classification model induction. We focus on the comparison of effects of various popular data mining algorithms on multiple datasets. Our experiment consists of classification tests over four typical categories of human voice data, namely, Female and Male, Emotional Speech, Speaker Identification, and Language Recognition. The experiments yield encouraging results supporting the fact that heuristically choosing significant features from both time and frequency domains indeed produces better performance in voice classification than traditional signal processing techniques alone, like wavelets and LPC-to-CC. PMID:24288684

  14. Pitch perception and production in congenital amusia: Evidence from Cantonese speakers.

    PubMed

    Liu, Fang; Chan, Alice H D; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C M

    2016-07-01

    This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production.

  15. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less

  16. Pitch perception and production in congenital amusia: Evidence from Cantonese speakers

    PubMed Central

    Liu, Fang; Chan, Alice H. D.; Ciocca, Valter; Roquet, Catherine; Peretz, Isabelle; Wong, Patrick C. M.

    2016-01-01

    This study investigated pitch perception and production in speech and music in individuals with congenital amusia (a disorder of musical pitch processing) who are native speakers of Cantonese, a tone language with a highly complex tonal system. Sixteen Cantonese-speaking congenital amusics and 16 controls performed a set of lexical tone perception, production, singing, and psychophysical pitch threshold tasks. Their tone production accuracy and singing proficiency were subsequently judged by independent listeners, and subjected to acoustic analyses. Relative to controls, amusics showed impaired discrimination of lexical tones in both speech and non-speech conditions. They also received lower ratings for singing proficiency, producing larger pitch interval deviations and making more pitch interval errors compared to controls. Demonstrating higher pitch direction identification thresholds than controls for both speech syllables and piano tones, amusics nevertheless produced native lexical tones with comparable pitch trajectories and intelligibility as controls. Significant correlations were found between pitch threshold and lexical tone perception, music perception and production, but not between lexical tone perception and production for amusics. These findings provide further evidence that congenital amusia is a domain-general language-independent pitch-processing deficit that is associated with severely impaired music perception and production, mildly impaired speech perception, and largely intact speech production. PMID:27475178

  17. Speech Processing to Improve the Perception of Speech in Background Noise for Children With Auditory Processing Disorder and Typically Developing Peers.

    PubMed

    Flanagan, Sheila; Zorilă, Tudor-Cătălin; Stylianou, Yannis; Moore, Brian C J

    2018-01-01

    Auditory processing disorder (APD) may be diagnosed when a child has listening difficulties but has normal audiometric thresholds. For adults with normal hearing and with mild-to-moderate hearing impairment, an algorithm called spectral shaping with dynamic range compression (SSDRC) has been shown to increase the intelligibility of speech when background noise is added after the processing. Here, we assessed the effect of such processing using 8 children with APD and 10 age-matched control children. The loudness of the processed and unprocessed sentences was matched using a loudness model. The task was to repeat back sentences produced by a female speaker when presented with either speech-shaped noise (SSN) or a male competing speaker (CS) at two signal-to-background ratios (SBRs). Speech identification was significantly better with SSDRC processing than without, for both groups. The benefit of SSDRC processing was greater for the SSN than for the CS background. For the SSN, scores were similar for the two groups at both SBRs. For the CS, the APD group performed significantly more poorly than the control group. The overall improvement produced by SSDRC processing could be useful for enhancing communication in a classroom where the teacher's voice is broadcast using a wireless system.

  18. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    PubMed Central

    Anumanchipalli, Gopala K.; Dichter, Benjamin; Chaisanguanthum, Kris S.; Johnson, Keith; Chang, Edward F.

    2016-01-01

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial—especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship across speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics. PMID:27019106

  19. High-Resolution, Non-Invasive Imaging of Upper Vocal Tract Articulators Compatible with Human Brain Recordings

    DOE PAGES

    Bouchard, Kristofer E.; Conant, David F.; Anumanchipalli, Gopala K.; ...

    2016-03-28

    A complete neurobiological understanding of speech motor control requires determination of the relationship between simultaneously recorded neural activity and the kinematics of the lips, jaw, tongue, and larynx. Many speech articulators are internal to the vocal tract, and therefore simultaneously tracking the kinematics of all articulators is nontrivial-especially in the context of human electrophysiology recordings. Here, we describe a noninvasive, multi-modal imaging system to monitor vocal tract kinematics, demonstrate this system in six speakers during production of nine American English vowels, and provide new analysis of such data. Classification and regression analysis revealed considerable variability in the articulator-to-acoustic relationship acrossmore » speakers. Non-negative matrix factorization extracted basis sets capturing vocal tract shapes allowing for higher vowel classification accuracy than traditional methods. Statistical speech synthesis generated speech from vocal tract measurements, and we demonstrate perceptual identification. We demonstrate the capacity to predict lip kinematics from ventral sensorimotor cortical activity. These results demonstrate a multi-modal system to non-invasively monitor articulator kinematics during speech production, describe novel analytic methods for relating kinematic data to speech acoustics, and provide the first decoding of speech kinematics from electrocorticography. These advances will be critical for understanding the cortical basis of speech production and the creation of vocal prosthetics.« less

  20. An acoustic comparison of two women's infant- and adult-directed speech

    NASA Astrophysics Data System (ADS)

    Andruski, Jean; Katz-Gershon, Shiri

    2003-04-01

    In addition to having prosodic characteristics that are attractive to infant listeners, infant-directed (ID) speech shares certain characteristics of adult-directed (AD) clear speech, such as increased acoustic distance between vowels, that might be expected to make ID speech easier for adults to perceive in noise than AD conversational speech. However, perceptual tests of two women's ID productions by Andruski and Bessega [J. Acoust. Soc. Am. 112, 2355] showed that is not always the case. In a word identification task that compared ID speech with AD clear and conversational speech, one speaker's ID productions were less well-identified than AD clear speech, but better identified than AD conversational speech. For the second woman, ID speech was the least accurately identified of the three speech registers. For both speakers, hard words (infrequent words with many lexical neighbors) were also at an increased disadvantage relative to easy words (frequent words with few lexical neighbors) in speech registers that were less accurately perceived. This study will compare several acoustic properties of these women's productions, including pitch and formant-frequency characteristics. Results of the acoustic analyses will be examined with the original perceptual results to suggest reasons for differences in listener's accuracy in identifying these two women's ID speech in noise.

  1. Crossmodal plasticity in the fusiform gyrus of late blind individuals during voice recognition.

    PubMed

    Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

    2014-12-01

    Blind individuals are trained in identifying other people through voices. In congenitally blind adults the anterior fusiform gyrus has been shown to be active during voice recognition. Such crossmodal changes have been associated with a superiority of blind adults in voice perception. The key question of the present functional magnetic resonance imaging (fMRI) study was whether visual deprivation that occurs in adulthood is followed by similar adaptive changes of the voice identification system. Late blind individuals and matched sighted participants were tested in a priming paradigm, in which two voice stimuli were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either coming from an old or a young person. Only in late blind but not in matched sighted controls, the activation in the anterior fusiform gyrus was modulated by voice identity: late blind volunteers showed an increase of the BOLD signal in response to person-incongruent compared with person-congruent trials. These results suggest that the fusiform gyrus adapts to input of a new modality even in the mature brain and thus demonstrate an adult type of crossmodal plasticity. Copyright © 2014 Elsevier Inc. All rights reserved.

  2. Brain systems mediating voice identity processing in blind humans.

    PubMed

    Hölig, Cordula; Föcker, Julia; Best, Anna; Röder, Brigitte; Büchel, Christian

    2014-09-01

    Blind people rely more on vocal cues when they recognize a person's identity than sighted people. Indeed, a number of studies have reported better voice recognition skills in blind than in sighted adults. The present functional magnetic resonance imaging study investigated changes in the functional organization of neural systems involved in voice identity processing following congenital blindness. A group of congenitally blind individuals and matched sighted control participants were tested in a priming paradigm, in which two voice stimuli (S1, S2) were subsequently presented. The prime (S1) and the target (S2) were either from the same speaker (person-congruent voices) or from two different speakers (person-incongruent voices). Participants had to classify the S2 as either a old or a young person. Person-incongruent voices (S2) compared with person-congruent voices elicited an increased activation in the right anterior fusiform gyrus in congenitally blind individuals but not in matched sighted control participants. In contrast, only matched sighted controls showed a higher activation in response to person-incongruent compared with person-congruent voices (S2) in the right posterior superior temporal sulcus. These results provide evidence for crossmodal plastic changes of the person identification system in the brain after visual deprivation. Copyright © 2014 Wiley Periodicals, Inc.

  3. Request a Speaker

    Science.gov Websites

    . Northern Command Speakers Program The U.S. Northern Command Speaker's Program works to increase face-to -face contact with our public to help build and sustain public understanding of our command missions and

  4. Speakers of Different Languages Process the Visual World Differently

    PubMed Central

    Chabal, Sarah; Marian, Viorica

    2015-01-01

    Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct linguistic input, showing that language is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. PMID:26030171

  5. Learning foreign labels from a foreign speaker: the role of (limited) exposure to a second language.

    PubMed

    Akhtar, Nameera; Menjivar, Jennifer; Hoicka, Elena; Sabbagh, Mark A

    2012-11-01

    Three- and four-year-olds (N = 144) were introduced to novel labels by an English speaker and a foreign speaker (of Nordish, a made-up language), and were asked to endorse one of the speaker's labels. Monolingual English-speaking children were compared to bilingual children and English-speaking children who were regularly exposed to a language other than English. All children tended to endorse the English speaker's labels when asked 'What do you call this?', but when asked 'What do you call this in Nordish?', children with exposure to a second language were more likely to endorse the foreign label than monolingual and bilingual children. The findings suggest that, at this age, exposure to, but not necessarily immersion in, more than one language may promote the ability to learn foreign words from a foreign speaker.

  6. Surmounting the Tower of Babel: Monolingual and bilingual 2-year-olds' understanding of the nature of foreign language words.

    PubMed

    Byers-Heinlein, Krista; Chen, Ke Heng; Xu, Fei

    2014-03-01

    Languages function as independent and distinct conventional systems, and so each language uses different words to label the same objects. This study investigated whether 2-year-old children recognize that speakers of their native language and speakers of a foreign language do not share the same knowledge. Two groups of children unfamiliar with Mandarin were tested: monolingual English-learning children (n=24) and bilingual children learning English and another language (n=24). An English speaker taught children the novel label fep. On English mutual exclusivity trials, the speaker asked for the referent of a novel label (wug) in the presence of the fep and a novel object. Both monolingual and bilingual children disambiguated the reference of the novel word using a mutual exclusivity strategy, choosing the novel object rather than the fep. On similar trials with a Mandarin speaker, children were asked to find the referent of a novel Mandarin label kuò. Monolinguals again chose the novel object rather than the object with the English label fep, even though the Mandarin speaker had no access to conventional English words. Bilinguals did not respond systematically to the Mandarin speaker, suggesting that they had enhanced understanding of the Mandarin speaker's ignorance of English words. The results indicate that monolingual children initially expect words to be conventionally shared across all speakers-native and foreign. Early bilingual experience facilitates children's discovery of the nature of foreign language words. Copyright © 2013 Elsevier Inc. All rights reserved.

  7. Content-specific coordination of listeners' to speakers' EEG during communication.

    PubMed

    Kuhlen, Anna K; Allefeld, Carsten; Haynes, John-Dylan

    2012-01-01

    Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people-a person speaking and a person listening. The EEG of one set of twelve participants ("speakers") was recorded while they were narrating short stories. The EEG of another set of twelve participants ("listeners") was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called "situation models". With this study we link a coordination of neural activity between individuals directly to verbally communicated information.

  8. The ICSI+ Multilingual Sentence Segmentation System

    DTIC Science & Technology

    2006-01-01

    these steps the ASR output needs to be enriched with information additional to words, such as speaker diarization , sentence segmentation, or story...and the out- of a speaker diarization is considered as well. We first detail extraction of the prosodic features, and then describe the clas- ation...also takes into account the speaker turns that estimated by the diarization system. In addition to the Max- 1) model speaker turn unigrams, trigram

  9. Speaker Segmentation and Clustering Using Gender Information

    DTIC Science & Technology

    2006-02-01

    used in the first stages of segmentation forder information in the clustering of the opposite-gender speaker diarization of news broadcasts. files, the...AFRL-HE-WP-TP-2006-0026 AIR FORCE RESEARCH LABORATORY Speaker Segmentation and Clustering Using Gender Information Brian M. Ore General Dynamics...COVERED (From - To) February 2006 ProceedinLgs 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER Speaker Segmentation and Clustering Using Gender Information 5b

  10. The 2016 NIST Speaker Recognition Evaluation

    DTIC Science & Technology

    2017-08-20

    The 2016 NIST Speaker Recognition Evaluation Seyed Omid Sadjadi1,∗, Timothée Kheyrkhah1,†, Audrey Tong1, Craig Greenberg1, Douglas Reynolds2, Elliot...recent in an ongoing series of speaker recognition evaluations (SRE) to foster research in ro- bust text-independent speaker recognition, as well as...online evaluation platform, a fixed training data condition, more variability in test segment duration (uni- formly distributed between 10s and 60s

  11. Magnetic Fluids Deliver Better Speaker Sound Quality

    NASA Technical Reports Server (NTRS)

    2015-01-01

    In the 1960s, Glenn Research Center developed a magnetized fluid to draw rocket fuel into spacecraft engines while in space. Sony has incorporated the technology into its line of slim speakers by using the fluid as a liquid stand-in for the speaker's dampers, which prevent the speaker from blowing out while adding stability. The fluid helps to deliver more volume and hi-fidelity sound while reducing distortion.

  12. Special Observance Planning Guide

    DTIC Science & Technology

    2015-11-01

    Finding the right speaker for an event can be a challenge. Many speakers are recommended based on word-of-mouth or through a group connected to...An unprepared, rambling speaker or one who intentionally or unintentionally attacks a group or its members can be extremely damaging to a program...Don’t assume that an organizational senior leader is an adequate speaker based on position, rank, and/or affiliation with a reference group

  13. Coronal View Ultrasound Imaging of Movement in Different Segments of the Tongue during Paced Recital: Findings from Four Normal Speakers and a Speaker with Partial Glossectomy

    ERIC Educational Resources Information Center

    Bressmann, Tim; Flowers, Heather; Wong, Willy; Irish, Jonathan C.

    2010-01-01

    The goal of this study was to quantitatively describe aspects of coronal tongue movement in different anatomical regions of the tongue. Four normal speakers and a speaker with partial glossectomy read four repetitions of a metronome-paced poem. Their tongue movement was recorded in four coronal planes using two-dimensional B-mode ultrasound…

  14. Making Math Real: Effective Qualities of Guest Speaker Presentations and the Impact of Speakers on Student Attitude and Achievement in the Algebra Classroom

    ERIC Educational Resources Information Center

    McKain, Danielle R.

    2012-01-01

    The term real world is often used in mathematics education, yet the definition of real-world problems and how to incorporate them in the classroom remains ambiguous. One way real-world connections can be made is through guest speakers. Guest speakers can offer different perspectives and share knowledge about various subject areas, yet the impact…

  15. When pitch Accents Encode Speaker Commitment: Evidence from French Intonation.

    PubMed

    Michelas, Amandine; Portes, Cristel; Champagne-Lavau, Maud

    2016-06-01

    Recent studies on a variety of languages have shown that a speaker's commitment to the propositional content of his or her utterance can be encoded, among other strategies, by pitch accent types. Since prior research mainly relied on lexical-stress languages, our understanding of how speakers of a non-lexical-stress language encode speaker commitment is limited. This paper explores the contribution of the last pitch accent of an intonation phrase to convey speaker commitment in French, a language that has stress at the phrasal level as well as a restricted set of pitch accents. In a production experiment, participants had to produce sentences in two pragmatic contexts: unbiased questions (the speaker had no particular belief with respect to the expected answer) and negatively biased questions (the speaker believed the proposition to be false). Results revealed that negatively biased questions consistently exhibited an additional unaccented F0 peak in the preaccentual syllable (an H+!H* pitch accent) while unbiased questions were often realized with a rising pattern across the accented syllable (an H* pitch accent). These results provide evidence that pitch accent types in French can signal the speaker's belief about the certainty of the proposition expressed in French. It also has implications for the phonological model of French intonation.

  16. Sociological effects on vocal aging: Age related F0 effects in two languages

    NASA Astrophysics Data System (ADS)

    Nagao, Kyoko

    2005-04-01

    Listeners can estimate the age of a speaker fairly accurately from their speech (Ptacek and Sander, 1966). It is generally considered that this perception is based on physiologically determined aspects of the speech. However, the degree to which it is due to conventional sociolinguistic aspects of speech is unknown. The current study examines the degree to which fundamental frequency (F0) changes due to advanced aging across two language groups of speakers. It also examines the degree to which the speakers associate these changes with aging in a voice disguising task. Thirty native speakers each of English and Japanese, taken from three age groups, read a target phrase embedded in a carrier sentence in their native language. Each speaker also read the sentence pretending to be 20-years younger or 20-years older than their own age. Preliminary analysis of eighteen Japanese speakers indicates that the mean and maximum F0 values increase when the speakers pretended to be younger than when they pretended to be older. Some previous studies on age perception, however, suggested that F0 has minor effects on listeners' age estimation. The acoustic results will also be discussed in conjunction with the results of the listeners' age estimation of the speakers.

  17. Challenging stereotypes and changing attitudes: Improving quality of care for people with hepatitis C through Positive Speakers programs.

    PubMed

    Brener, Loren; Wilson, Hannah; Rose, Grenville; Mackenzie, Althea; de Wit, John

    2013-01-01

    Positive Speakers programs consist of people who are trained to speak publicly about their illness. The focus of these programs, especially with stigmatised illnesses such as hepatitis C (HCV), is to inform others of the speakers' experiences, thereby humanising the illness and reducing ignorance associated with the disease. This qualitative research aimed to understand the perceived impact of Positive Speakers programs on changing audience members' attitudes towards people with HCV. Interviews were conducted with nine Positive Speakers and 16 of their audience members to assess the way in which these sessions were perceived by both speakers and the audience to challenge stereotypes and stigma associated with HCV and promote positive attitude change amongst the audience. Data were analysed using Intergroup Contact Theory to frame the analysis with a focus on whether the program met the optimal conditions to promote attitude change. Findings suggest that there are a number of vital components to this Positive Speakers program which ensures that the program meets the requirements for successful and equitable intergroup contact. This Positive Speakers program thereby helps to deconstruct stereotypes about people with HCV, while simultaneously increasing positive attitudes among audience members with the ultimate aim of improving quality of health care and treatment for people with HCV.

  18. Aeroacoustic Characterization of the NASA Ames Experimental Aero-Physics Branch 32- by 48-Inch Subsonic Wind Tunnel with a 24-Element Phased Microphone Array

    NASA Technical Reports Server (NTRS)

    Costanza, Bryan T.; Horne, William C.; Schery, S. D.; Babb, Alex T.

    2011-01-01

    The Aero-Physics Branch at NASA Ames Research Center utilizes a 32- by 48-inch subsonic wind tunnel for aerodynamics research. The feasibility of acquiring acoustic measurements with a phased microphone array was recently explored. Acoustic characterization of the wind tunnel was carried out with a floor-mounted 24-element array and two ceiling-mounted speakers. The minimum speaker level for accurate level measurement was evaluated for various tunnel speeds up to a Mach number of 0.15 and streamwise speaker locations. A variety of post-processing procedures, including conventional beamforming and deconvolutional processing such as TIDY, were used. The speaker measurements, with and without flow, were used to compare actual versus simulated in-flow speaker calibrations. Data for wind-off speaker sound and wind-on tunnel background noise were found valuable for predicting sound levels for which the speakers were detectable when the wind was on. Speaker sources were detectable 2 - 10 dB below the peak background noise level with conventional data processing. The effectiveness of background noise cross-spectral matrix subtraction was assessed and found to improve the detectability of test sound sources by approximately 10 dB over a wide frequency range.

  19. The influence of language deprivation in early childhood on L2 processing: An ERP comparison of deaf native signers and deaf signers with a delayed language acquisition.

    PubMed

    Skotara, Nils; Salden, Uta; Kügow, Monique; Hänel-Faulhaber, Barbara; Röder, Brigitte

    2012-05-03

    To examine which language function depends on early experience, the present study compared deaf native signers, deaf non-native signers and hearing German native speakers while processing German sentences. The participants watched simple written sentences while event-related potentials (ERPs) were recorded. At the end of each sentence they were asked to judge whether the sentence was correct or not. Two types of violations were introduced in the middle of the sentence: a semantically implausible noun or a violation of subject-verb number agreement. The results showed a similar ERP pattern after semantic violations (an N400 followed by a positivity) in all three groups. After syntactic violations, native German speakers and native signers of German sign language (DGS) with German as second language (L2) showed a left anterior negativity (LAN) followed by a P600, whereas no LAN but a negativity over the right hemisphere instead was found in deaf participants with a delayed onset of first language (L1) acquisition. The P600 of this group had a smaller amplitude and a different scalp distribution as compared to German native speakers. The results of the present study suggest that language deprivation in early childhood alters the cerebral organization of syntactic language processing mechanisms for L2. Semantic language processing instead was unaffected.

  20. The Whorfian time warp: Representing duration through the language hourglass.

    PubMed

    Bylund, Emanuel; Athanasopoulos, Panos

    2017-07-01

    How do humans construct their mental representations of the passage of time? The universalist account claims that abstract concepts like time are universal across humans. In contrast, the linguistic relativity hypothesis holds that speakers of different languages represent duration differently. The precise impact of language on duration representation is, however, unknown. Here, we show that language can have a powerful role in transforming humans' psychophysical experience of time. Contrary to the universalist account, we found language-specific interference in a duration reproduction task, where stimulus duration conflicted with its physical growth. When reproducing duration, Swedish speakers were misled by stimulus length, and Spanish speakers were misled by stimulus size/quantity. These patterns conform to preferred expressions of duration magnitude in these languages (Swedish: long/short time; Spanish: much/small time). Critically, Spanish-Swedish bilinguals performing the task in both languages showed different interference depending on language context. Such shifting behavior within the same individual reveals hitherto undocumented levels of flexibility in time representation. Finally, contrary to the linguistic relativity hypothesis, language interference was confined to difficult discriminations (i.e., when stimuli varied only subtly in duration and growth), and was eliminated when linguistic cues were removed from the task. These results reveal the malleable nature of human time representation as part of a highly adaptive information processing system. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  1. Engaging spaces: Intimate electro-acoustic display in alternative performance venues

    NASA Astrophysics Data System (ADS)

    Bahn, Curtis; Moore, Stephan

    2004-05-01

    In past presentations to the ASA, we have described the design and construction of four generations of unique spherical speakers (multichannel, outward-radiating geodesic speaker arrays) and Sensor-Speaker-Arrays, (SenSAs: combinations of various sensor devices with outward-radiating multichannel speaker arrays). This presentation will detail the ways in which arrays of these speakers have been employed in alternative performance venues-providing presence and intimacy in the performance of electro-acoustic chamber music and sound installation, while engaging natural and unique acoustical qualities of various locations. We will present documentation of the use of multichannel sonic diffusion arrays in small clubs, ``black-box'' theaters, planetariums, and art galleries.

  2. Speaker diarization system on the 2007 NIST rich transcription meeting recognition evaluation

    NASA Astrophysics Data System (ADS)

    Sun, Hanwu; Nwe, Tin Lay; Koh, Eugene Chin Wei; Bin, Ma; Li, Haizhou

    2007-09-01

    This paper presents a speaker diarization system developed at the Institute for Infocomm Research (I2R) for NIST Rich Transcription 2007 (RT-07) evaluation task. We describe in details our primary approaches for the speaker diarization on the Multiple Distant Microphones (MDM) conditions in conference room scenario. Our proposed system consists of six modules: 1). Least-mean squared (NLMS) adaptive filter for the speaker direction estimate via Time Difference of Arrival (TDOA), 2). An initial speaker clustering via two-stage TDOA histogram distribution quantization approach, 3). Multiple microphone speaker data alignment via GCC-PHAT Time Delay Estimate (TDE) among all the distant microphone channel signals, 4). A speaker clustering algorithm based on GMM modeling approach, 5). Non-speech removal via speech/non-speech verification mechanism and, 6). Silence removal via "Double-Layer Windowing"(DLW) method. We achieves error rate of 31.02% on the 2006 Spring (RT-06s) MDM evaluation task and a competitive overall error rate of 15.32% for the NIST Rich Transcription 2007 (RT-07) MDM evaluation task.

  3. Intonation and gender perception: applications for transgender speakers.

    PubMed

    Hancock, Adrienne; Colton, Lindsey; Douglas, Fiacre

    2014-03-01

    Intonation is commonly addressed in voice and communication feminization therapy, yet empirical evidence of gender differences for intonation is scarce and rarely do studies examine how it relates to gender perception of transgender speakers. This study examined intonation of 12 males, 12 females, six female-to-male, and 14 male-to-female transgender speakers describing a Norman Rockwell image. Several intonation measures were compared between biological gender groups, between perceived gender groups, and between male-to-female (MTF) speakers who were perceived as male, female, or ambiguous gender. Speakers with a larger percentage of utterances with upward intonation and a larger utterance semitone range were perceived as female by listeners, despite no significant differences between the actual intonation of the four gender groups. MTF speakers who do not pass as female appear to use less upward and more downward intonations than female and passing MTF speakers. Intonation has potential for use in transgender communication therapy because it can influence perception to some degree. Copyright © 2014 The Voice Foundation. Published by Mosby, Inc. All rights reserved.

  4. The association between tobacco, alcohol, and drug use, stress, and depression among uninsured free clinic patients: U.S.-born English speakers, non-U.S.-born English speakers, and Spanish speakers.

    PubMed

    Kamimura, Akiko; Ashby, Jeanie; Tabler, Jennifer; Nourian, Maziar M; Trinh, Ha Ngoc; Chen, Jason; Reel, Justine J

    2017-01-01

    The abuse of substances is a significant public health issue. Perceived stress and depression have been found to be related to the abuse of substances. The purpose of this study is to examine the prevalence of substance use (i.e., alcohol problems, smoking, and drug use) and the association between substance use, perceived stress, and depression among free clinic patients. Patients completed a self-administered survey in 2015 (N = 504). The overall prevalence of substance use among free clinic patients was not high compared to the U.S. general population. U.S.-born English speakers reported a higher prevalence rate of tobacco smoking and drug use than did non-U.S.-born English speakers and Spanish speakers. Alcohol problems and smoking were significantly related to higher levels of perceived stress and depression. Substance use prevention and education should be included in general health education programs. U.S.-born English speakers would need additional attention. Mental health intervention would be essential to prevention and intervention.

  5. Iran: Regional Perspectives and U.S. Policy

    DTIC Science & Technology

    2010-01-13

    unwilling to publicly challenge Iran on the issue because of their economic dependence on or relationships with Iran...of Iran’s neighbors, like Iraq, and poses direct military threats to others, like Israel and Lebanon. It also directly challenges U.S. efforts to...reports that a former speaker of the Iranian parliament and then- aide to Iran’s Supreme Leader had referred to Bahrain as Iran’s 14th province sparked a

  6. Combining Multiple Knowledge Sources for Speech Recognition

    DTIC Science & Technology

    1988-09-15

    Thus, the first is thle to clarify the pronunciationt ( TASSEAJ for the acronym TASA !). best adaptation sentence, the second sentence, whens addled...10 rapid adapltati,,n sen- tenrces, and 15 spell-i,, de phrases. 6101 resource rirailageo lei SPEAKER-DEPENDENT DATABASE sentences were randortily...combining the smoothed phoneme models with the de - system tested on a standard database using two well de . tailed context models. BYBLOS makes maximal use

  7. Storytelling as an age-dependent skill: oral recall of orally presented stories.

    PubMed

    Mergler, N L; Faust, M; Goldstein, M D

    During experiment 1, three taped prose passages read by college student, middle-aged, or old tellers were orally recalled by college students in an incidental memory paradigm. More story units were remembered as the age of the teller increased (r = +.642, p less than .05). Comparison of these results, with prior research using written, as opposed to oral, presentation and recall of these stories, showed no differences in specific story units remembered. Teller age predicted recall on the two "storied" passages. These passages elicited more favorable comments from listeners when read by older tellers. The third, descriptive passage was less favorably regarded by listeners hearing older tellers. During experiment 2, taped storied passages read by middle-aged tellers were falsely attributed to young, middle-aged, or old persons before the college students listened. Incidental recall did not show an age of teller effect in this case, but the listener's evaluation of the speaker exhibited age-dependent stereotypes. It was concluded that 1) physical qualities of older voices lead to more effective oral transmission; 2) that one expects to receive certain types of oral information from older persons; and 3) that a mismatch between physical vocal quality and age attribution effects evaluation of the speaker, not recall of the information.

  8. A Study on Metadiscoursive Interaction in the MA Theses of the Native Speakers of English and the Turkish Speakers of English

    ERIC Educational Resources Information Center

    Köroglu, Zehra; Tüm, Gülden

    2017-01-01

    This study has been conducted to evaluate the TM usage in the MA theses written by the native speakers (NSs) of English and the Turkish speakers (TSs) of English. The purpose is to compare the TM usage in the introduction, results and discussion, and conclusion sections by both groups' randomly selected MA theses in the field of ELT between the…

  9. Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data

    DTIC Science & Technology

    2017-08-20

    Improving the Effectiveness of Speaker Verification Domain Adaptation With Inadequate In-Domain Data Bengt J. Borgström1, Elliot Singer1, Douglas...ll.mit.edu.edu, dar@ll.mit.edu, es@ll.mit.edu, omid.sadjadi@nist.gov Abstract This paper addresses speaker verification domain adaptation with...contain speakers with low channel diversity. Existing domain adaptation methods are reviewed, and their shortcomings are discussed. We derive an

  10. Mortality inequality in two native population groups.

    PubMed

    Saarela, Jan; Finnäs, Fjalar

    2005-11-01

    A sample of people aged 40-67 years, taken from a longitudinal register compiled by Statistics Finland, is used to analyse mortality differences between Swedish speakers and Finnish speakers in Finland. Finnish speakers are known to have higher death rates than Swedish speakers. The purpose is to explore whether labour-market experience and partnership status, treated as proxies for measures of variation in health-related characteristics, are related to the mortality differential. Persons who are single, disability pensioners, and those having experienced unemployment are found to have substantially higher death rates than those with a partner and employed persons. Swedish speakers have a more favourable distribution on both variables, which thus notably helps to reduce the Finnish-Swedish mortality gradient. A conclusion from this study is that future analyses on the topic should focus on mechanisms that bring a greater proportion of Finnish speakers into the groups with poor health or supposed unhealthy behaviour.

  11. How Psychological Stress Affects Emotional Prosody.

    PubMed

    Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J

    2016-01-01

    We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity.

  12. In the eye of the beholder: eye contact increases resistance to persuasion.

    PubMed

    Chen, Frances S; Minson, Julia A; Schöne, Maren; Heinrichs, Markus

    2013-11-01

    Popular belief holds that eye contact increases the success of persuasive communication, and prior research suggests that speakers who direct their gaze more toward their listeners are perceived as more persuasive. In contrast, we demonstrate that more eye contact between the listener and speaker during persuasive communication predicts less attitude change in the direction advocated. In Study 1, participants freely watched videos of speakers expressing various views on controversial sociopolitical issues. Greater direct gaze at the speaker's eyes was associated with less attitude change in the direction advocated by the speaker. In Study 2, we instructed participants to look at either the eyes or the mouths of speakers presenting arguments counter to participants' own attitudes. Intentionally maintaining direct eye contact led to less persuasion than did gazing at the mouth. These findings suggest that efforts at increasing eye contact may be counterproductive across a variety of persuasion contexts.

  13. How Psychological Stress Affects Emotional Prosody

    PubMed Central

    Paulmann, Silke; Furnes, Desire; Bøkenes, Anne Ming; Cozzolino, Philip J.

    2016-01-01

    We explored how experimentally induced psychological stress affects the production and recognition of vocal emotions. In Study 1a, we demonstrate that sentences spoken by stressed speakers are judged by naïve listeners as sounding more stressed than sentences uttered by non-stressed speakers. In Study 1b, negative emotions produced by stressed speakers are generally less well recognized than the same emotions produced by non-stressed speakers. Multiple mediation analyses suggest this poorer recognition of negative stimuli was due to a mismatch between the variation of volume voiced by speakers and the range of volume expected by listeners. Together, this suggests that the stress level of the speaker affects judgments made by the receiver. In Study 2, we demonstrate that participants who were induced with a feeling of stress before carrying out an emotional prosody recognition task performed worse than non-stressed participants. Overall, findings suggest detrimental effects of induced stress on interpersonal sensitivity. PMID:27802287

  14. Don't Underestimate the Benefits of Being Misunderstood.

    PubMed

    Gibson, Edward; Tan, Caitlin; Futrell, Richard; Mahowald, Kyle; Konieczny, Lars; Hemforth, Barbara; Fedorenko, Evelina

    2017-06-01

    Being a nonnative speaker of a language poses challenges. Individuals often feel embarrassed by the errors they make when talking in their second language. However, here we report an advantage of being a nonnative speaker: Native speakers give foreign-accented speakers the benefit of the doubt when interpreting their utterances; as a result, apparently implausible utterances are more likely to be interpreted in a plausible way when delivered in a foreign than in a native accent. Across three replicated experiments, we demonstrated that native English speakers are more likely to interpret implausible utterances, such as "the mother gave the candle the daughter," as similar plausible utterances ("the mother gave the candle to the daughter") when the speaker has a foreign accent. This result follows from the general model of language interpretation in a noisy channel, under the hypothesis that listeners assume a higher error rate in foreign-accented than in nonaccented speech.

  15. Rhythmic patterning in Malaysian and Singapore English.

    PubMed

    Tan, Rachel Siew Kuang; Low, Ee-Ling

    2014-06-01

    Previous work on the rhythm of Malaysian English has been based on impressionistic observations. This paper utilizes acoustic analysis to measure the rhythmic patterns of Malaysian English. Recordings of the read speech and spontaneous speech of 10 Malaysian English speakers were analyzed and compared with recordings of an equivalent sample of Singaporean English speakers. Analysis was done using two rhythmic indexes, the PVI and VarcoV. It was found that although the rhythm of read speech of the Singaporean speakers was syllable-based as described by previous studies, the rhythm of the Malaysian speakers was even more syllable-based. Analysis of the syllables in specific utterances showed that Malaysian speakers did not reduce vowels as much as Singaporean speakers in cases of syllables in utterances. Results of the spontaneous speech confirmed the findings for the read speech; that is, the same rhythmic patterning was found which normally triggers vowel reductions.

  16. Speakers of different languages process the visual world differently.

    PubMed

    Chabal, Sarah; Marian, Viorica

    2015-06-01

    Language and vision are highly interactive. Here we show that people activate language when they perceive the visual world, and that this language information impacts how speakers of different languages focus their attention. For example, when searching for an item (e.g., clock) in the same visual display, English and Spanish speakers look at different objects. Whereas English speakers searching for the clock also look at a cloud, Spanish speakers searching for the clock also look at a gift, because the Spanish names for gift (regalo) and clock (reloj) overlap phonologically. These different looking patterns emerge despite an absence of direct language input, showing that linguistic information is automatically activated by visual scene processing. We conclude that the varying linguistic information available to speakers of different languages affects visual perception, leading to differences in how the visual world is processed. (c) 2015 APA, all rights reserved).

  17. Processing ser and estar to locate objects and events: An ERP study with L2 speakers of Spanish.

    PubMed

    Dussias, Paola E; Contemori, Carla; Román, Patricia

    2014-01-01

    In Spanish locative constructions, a different form of the copula is selected in relation to the semantic properties of the grammatical subject: sentences that locate objects require estar while those that locate events require ser (both translated in English as 'to be'). In an ERP study, we examined whether second language (L2) speakers of Spanish are sensitive to the selectional restrictions that the different types of subjects impose on the choice of the two copulas. Twenty-four native speakers of Spanish and two groups of L2 Spanish speakers (24 beginners and 18 advanced speakers) were recruited to investigate the processing of 'object/event + estar/ser ' permutations. Participants provided grammaticality judgments on correct (object + estar ; event + ser ) and incorrect (object + ser ; event + estar ) sentences while their brain activity was recorded. In line with previous studies (Leone-Fernández, Molinaro, Carreiras, & Barber, 2012; Sera, Gathje, & Pintado, 1999), the results of the grammaticality judgment for the native speakers showed that participants correctly accepted object + estar and event + ser constructions. In addition, while 'object + ser ' constructions were considered grossly ungrammatical, 'event + estar ' combinations were perceived as unacceptable to a lesser degree. For these same participants, ERP recording time-locked to the onset of the critical word ' en ' showed a larger P600 for the ser predicates when the subject was an object than when it was an event (*La silla es en la cocina vs. La fiesta es en la cocina). This P600 effect is consistent with syntactic repair of the defining predicate when it does not fit with the adequate semantic properties of the subject. For estar predicates (La silla está en la cocina vs. *La fiesta está en la cocina), the findings showed a central-frontal negativity between 500-700 ms. Grammaticality judgment data for the L2 speakers of Spanish showed that beginners were significantly less accurate than native speakers in all conditions, while the advanced speakers only differed from the natives in the event+ ser and event+ estar conditions. For the ERPs, the beginning learners did not show any effects in the time-windows under analysis. The advanced speakers showed a pattern similar to that of native speakers: (1) a P600 response to 'object + ser ' violation more central and frontally distributed, and (2) a central-frontal negativity between 500-700 ms for 'event + estar ' violation. Findings for the advanced speakers suggest that behavioral methods commonly used to assess grammatical knowledge in the L2 may be underestimating what L2 speakers have actually learned.

  18. Reasoning about knowledge: Children's evaluations of generality and verifiability.

    PubMed

    Koenig, Melissa A; Cole, Caitlin A; Meyer, Meredith; Ridge, Katherine E; Kushnir, Tamar; Gelman, Susan A

    2015-12-01

    In a series of experiments, we examined 3- to 8-year-old children's (N=223) and adults' (N=32) use of two properties of testimony to estimate a speaker's knowledge: generality and verifiability. Participants were presented with a "Generic speaker" who made a series of 4 general claims about "pangolins" (a novel animal kind), and a "Specific speaker" who made a series of 4 specific claims about "this pangolin" as an individual. To investigate the role of verifiability, we systematically varied whether the claim referred to a perceptually-obvious feature visible in a picture (e.g., "has a pointy nose") or a non-evident feature that was not visible (e.g., "sleeps in a hollow tree"). Three main findings emerged: (1) young children showed a pronounced reliance on verifiability that decreased with age. Three-year-old children were especially prone to credit knowledge to speakers who made verifiable claims, whereas 7- to 8-year-olds and adults credited knowledge to generic speakers regardless of whether the claims were verifiable; (2) children's attributions of knowledge to generic speakers was not detectable until age 5, and only when those claims were also verifiable; (3) children often generalized speakers' knowledge outside of the pangolin domain, indicating a belief that a person's knowledge about pangolins likely extends to new facts. Findings indicate that young children may be inclined to doubt speakers who make claims they cannot verify themselves, as well as a developmentally increasing appreciation for speakers who make general claims. Copyright © 2015 Elsevier Inc. All rights reserved.

  19. Why We Serve - U.S. Department of Defense Official Website

    Science.gov Websites

    described by a soldier, sailor, airman or Marine who lives it. Story HOW TO HOST A SPEAKER Organizations other organizations. Speakers Photos MEET THE SPEAKERS January 2008 Army Major Lisa L. Carter Navy

  20. Formant transitions in the fluent speech of Farsi-speaking people who stutter.

    PubMed

    Dehqan, Ali; Yadegari, Fariba; Blomgren, Michael; Scherer, Ronald C

    2016-06-01

    Second formant (F2) transitions can be used to infer attributes of articulatory transitions. This study compared formant transitions during fluent speech segments of Farsi (Persian) speaking people who stutter and normally fluent Farsi speakers. Ten Iranian males who stutter and 10 normally fluent Iranian males participated. Sixteen different "CVt" tokens were embedded within the phrase "Begu CVt an". Measures included overall F2 transition frequency extents, durations, and derived overall slopes, initial F2 transition slopes at 30ms and 60ms, and speaking rate. (1) Mean overall formant frequency extent was significantly greater in 14 of the 16 CVt tokens for the group of stuttering speakers. (2) Stuttering speakers exhibited significantly longer overall F2 transitions for all 16 tokens compared to the nonstuttering speakers. (3) The overall F2 slopes were similar between the two groups. (4) The stuttering speakers exhibited significantly greater initial F2 transition slopes (positive or negative) for five of the 16 tokens at 30ms and six of the 16 tokens at 60ms. (5) The stuttering group produced a slower syllable rate than the non-stuttering group. During perceptually fluent utterances, the stuttering speakers had greater F2 frequency extents during transitions, took longer to reach vowel steady state, exhibited some evidence of steeper slopes at the beginning of transitions, had overall similar F2 formant slopes, and had slower speaking rates compared to nonstuttering speakers. Findings support the notion of different speech motor timing strategies in stuttering speakers. Findings are likely to be independent of the language spoken. Educational objectives This study compares aspects of F2 formant transitions between 10 stuttering and 10 nonstuttering speakers. Readers will be able to describe: (a) characteristics of formant frequency as a specific acoustic feature used to infer speech movements in stuttering and nonstuttering speakers, (b) two methods of measuring second formant (F2) transitions: the visual criteria method and fixed time criteria method, (c) characteristics of F2 transitions in the fluent speech of stuttering speakers and how those characteristics appear to differ from normally fluent speakers, and (d) possible cross-linguistic effects on acoustic analyses of stuttering. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Referential first mention in narratives by mildly mentally retarded adults.

    PubMed

    Kernan, K T; Sabsay, S

    1987-01-01

    Referential first mentions in narrative reports of a short film by 40 mildly mentally retarded adults and 20 nonretarded adults were compared. The mentally retarded sample included equal numbers of male and female, and black and white speakers. The mentally retarded speakers made significantly fewer first mentions and significantly more errors in the form of the first mentions than did nonretarded speakers. A pattern of better performance by black males than by other mentally retarded speakers was found. It is suggested that task difficulty and incomplete mastery of the use of definite and indefinite forms for encoding old and new information, rather than some global type of egocentrism, accounted for the poorer performance by mentally retarded speakers.

  2. Entropy Based Classifier Combination for Sentence Segmentation

    DTIC Science & Technology

    2007-01-01

    speaker diarization system to divide the audio data into hypothetical speakers [17...the prosodic feature also includes turn-based features which describe the position of a word in relation to diarization seg- mentation. The speaker ...ro- bust speaker segmentation: the ICSI-SRI fall 2004 diarization system,” in Proc. RT-04F Workshop, 2004. [18] “The rich transcription fall 2003,” http://nist.gov/speech/tests/rt/rt2003/fall/docs/rt03-fall-eval- plan-v9.pdf.

  3. Somatotype and Body Composition of Normal and Dysphonic Adult Speakers.

    PubMed

    Franco, Débora; Fragoso, Isabel; Andrea, Mário; Teles, Júlia; Martins, Fernando

    2017-01-01

    Voice quality provides information about the anatomical characteristics of the speaker. The patterns of somatotype and body composition can provide essential knowledge to characterize the individuality of voice quality. The aim of this study was to verify if there were significant differences in somatotype and body composition between normal and dysphonic speakers. Cross-sectional study. Anthropometric measurements were taken of a sample of 72 adult participants (40 normal speakers and 32 dysphonic speakers) according to International Society for the Advancement of Kinanthropometry standards, which allowed the calculation of endomorphism, mesomorphism, ectomorphism components, body density, body mass index, fat mass, percentage fat, and fat-free mass. Perception and acoustic evaluations as well as nasoendoscopy were used to assign speakers into normal or dysphonic groups. There were no significant differences between normal and dysphonic speakers in the mean somatotype attitudinal distance and somatotype dispersion distance (in spite of marginally significant differences [P < 0.10] in somatotype attitudinal distance and somatotype dispersion distance between groups) and in the mean vector of the somatotype components. Furthermore, no significant differences were found between groups concerning the mean of percentage fat, fat mass, fat-free mass, body density, and body mass index after controlling by sex. The findings suggested no significant differences in the somatotype and body composition variables, between normal and dysphonic speakers. Copyright © 2017 The Voice Foundation. Published by Elsevier Inc. All rights reserved.

  4. Strength of German accent under altered auditory feedback

    PubMed Central

    HOWELL, PETER; DWORZYNSKI, KATHARINA

    2007-01-01

    Borden’s (1979, 1980) hypothesis that speakers with vulnerable speech systems rely more heavily on feedback monitoring than do speakers with less vulnerable systems was investigated. The second language (L2) of a speaker is vulnerable, in comparison with the native language, so alteration to feedback should have a detrimental effect on it, according to this hypothesis. Here, we specifically examined whether altered auditory feedback has an effect on accent strength when speakers speak L2. There were three stages in the experiment. First, 6 German speakers who were fluent in English (their L2) were recorded under six conditions—normal listening, amplified voice level, voice shifted in frequency, delayed auditory feedback, and slowed and accelerated speech rate conditions. Second, judges were trained to rate accent strength. Training was assessed by whether it was successful in separating German speakers speaking English from native English speakers, also speaking English. In the final stage, the judges ranked recordings of each speaker from the first stage as to increasing strength of German accent. The results show that accents were more pronounced under frequency-shifted and delayed auditory feedback conditions than under normal or amplified feedback conditions. Control tests were done to ensure that listeners were judging accent, rather than fluency changes caused by altered auditory feedback. The findings are discussed in terms of Borden’s hypothesis and other accounts about why altered auditory feedback disrupts speech control. PMID:11414137

  5. Political skill: explaining the effects of nonnative accent on managerial hiring and entrepreneurial investment decisions.

    PubMed

    Huang, Laura; Frideger, Marcia; Pearce, Jone L

    2013-11-01

    We propose and test a new theory explaining glass-ceiling bias against nonnative speakers as driven by perceptions that nonnative speakers have weak political skill. Although nonnative accent is a complex signal, its effects on assessments of the speakers' political skill are something that speakers can actively mitigate; this makes it an important bias to understand. In Study 1, White and Asian nonnative speakers using the same scripted responses as native speakers were found to be significantly less likely to be recommended for a middle-management position, and this bias was fully mediated by assessments of their political skill. The alternative explanations of race, communication skill, and collaborative skill were nonsignificant. In Study 2, entrepreneurial start-up pitches from national high-technology, new-venture funding competitions were shown to experienced executive MBA students. Nonnative speakers were found to have a significantly lower likelihood of receiving new-venture funding, and this was fully mediated by the coders' assessments of their political skill. The entrepreneurs' race, communication skill, and collaborative skill had no effect. We discuss the value of empirically testing various posited reasons for glass-ceiling biases, how the importance and ambiguity of political skill for executive success serve as an ostensibly meritocratic cover for nonnative speaker bias, and other theoretical and practical implications of this work. (c) 2013 APA, all rights reserved.

  6. Aphasia in Persian: Implications for cognitive models of lexical processing.

    PubMed

    Bakhtiar, Mehdi; Jafary, Reyhane; Weekes, Brendan S

    2017-09-01

    Current models of oral reading assume that different routes (sublexical, lexical, and semantic) mediate oral reading performance and reliance on different routes during oral reading depends on the characteristics of print to sound mappings. Studies of single cases of acquired dyslexia in aphasia have contributed to the development of such models by revealing patterns of double dissociation in object naming and oral reading skill that follow brain damage in Indo-European and Sino-Tibetan languages. Print to sound mapping in Persian varies in transparency because orthography to phonology translation depends uniquely on the presence or absence of vowel letters in print. Here a hypothesis is tested that oral reading in Persian requires a semantic reading pathway that is independent of a direct non-semantic reading pathway, by investigating whether Persian speakers with aphasia show selective impairments to object naming and reading aloud. A sample of 21 Persian speakers with aphasia ranging in age from 18 to 77 (mean = 53, SD = 16.9) was asked to name a same set of 200 objects and to read aloud the printed names of these objects in different sessions. As an additional measure of sublexical reading, patients were asked to read aloud 30 non-word stimuli. Results showed that oral reading is significantly more preserved than object naming in Persian speakers with aphasia. However, more preserved object naming than oral reading was also observed in some cases. There was a moderate positive correlation between picture naming and oral reading success (p < .05). Mixed-effects logistic regression revealed that word frequency, age of acquisition and imageability predict success across both tasks and there is an interaction between these variables and orthographic transparency in oral reading. Furthermore, opaque words were read less accurately than transparent words. The results reveal different patterns of acquired dyslexia in some cases that closely resemble phonological, deep, and surface dyslexia in other scripts - reported here in Persian for the first time. © 2016 The British Psychological Society.

  7. The role of temporal speech cues in facilitating the fluency of adults who stutter.

    PubMed

    Park, Jin; Logan, Kenneth J

    2015-12-01

    Adults who stutter speak more fluently during choral speech contexts than they do during solo speech contexts. The underlying mechanisms for this effect remain unclear, however. In this study, we examined the extent to which the choral speech effect depended on presentation of intact temporal speech cues. We also examined whether speakers who stutter followed choral signals more closely than typical speakers did. 8 adults who stuttered and 8 adults who did not stutter read 60 sentences aloud during a solo speaking condition and three choral speaking conditions (240 total sentences), two of which featured either temporally altered or indeterminate word duration patterns. Effects of these manipulations on speech fluency, rate, and temporal entrainment with the choral speech signal were assessed. Adults who stutter spoke more fluently in all choral speaking conditions than they did when speaking solo. They also spoke slower and exhibited closer temporal entrainment with the choral signal during the mid- to late-stages of sentence production than the adults who did not stutter. Both groups entrained more closely with unaltered choral signals than they did with altered choral signals. Findings suggest that adults who stutter make greater use of speech-related information in choral signals when talking than adults with typical fluency do. The presence of fluency facilitation during temporally altered choral speech and conversation babble, however, suggests that temporal/gestural cueing alone cannot account for fluency facilitation in speakers who stutter. Other potential fluency enhancing mechanisms are discussed. The reader will be able to (a) summarize competing views on stuttering as a speech timing disorder, (b) describe the extent to which adults who stutter depend on an accurate rendering of temporal information in order to benefit from choral speech, and (c) discuss possible explanations for fluency facilitation in the presence of inaccurate or indeterminate temporal cues. Copyright © 2015 Elsevier Inc. All rights reserved.

  8. Auditory Perceptual Abilities Are Associated with Specific Auditory Experience

    PubMed Central

    Zaltz, Yael; Globerson, Eitan; Amir, Noam

    2017-01-01

    The extent to which auditory experience can shape general auditory perceptual abilities is still under constant debate. Some studies show that specific auditory expertise may have a general effect on auditory perceptual abilities, while others show a more limited influence, exhibited only in a relatively narrow range associated with the area of expertise. The current study addresses this issue by examining experience-dependent enhancement in perceptual abilities in the auditory domain. Three experiments were performed. In the first experiment, 12 pop and rock musicians and 15 non-musicians were tested in frequency discrimination (DLF), intensity discrimination, spectrum discrimination (DLS), and time discrimination (DLT). Results showed significant superiority of the musician group only for the DLF and DLT tasks, illuminating enhanced perceptual skills in the key features of pop music, in which miniscule changes in amplitude and spectrum are not critical to performance. The next two experiments attempted to differentiate between generalization and specificity in the influence of auditory experience, by comparing subgroups of specialists. First, seven guitar players and eight percussionists were tested in the DLF and DLT tasks that were found superior for musicians. Results showed superior abilities on the DLF task for guitar players, though no difference between the groups in DLT, demonstrating some dependency of auditory learning on the specific area of expertise. Subsequently, a third experiment was conducted, testing a possible influence of vowel density in native language on auditory perceptual abilities. Ten native speakers of German (a language characterized by a dense vowel system of 14 vowels), and 10 native speakers of Hebrew (characterized by a sparse vowel system of five vowels), were tested in a formant discrimination task. This is the linguistic equivalent of a DLS task. Results showed that German speakers had superior formant discrimination, demonstrating highly specific effects for auditory linguistic experience as well. Overall, results suggest that auditory superiority is associated with the specific auditory exposure. PMID:29238318

  9. On how the brain decodes vocal cues about speaker confidence.

    PubMed

    Jiang, Xiaoming; Pell, Marc D

    2015-05-01

    In speech communication, listeners must accurately decode vocal cues that refer to the speaker's mental state, such as their confidence or 'feeling of knowing'. However, the time course and neural mechanisms associated with online inferences about speaker confidence are unclear. Here, we used event-related potentials (ERPs) to examine the temporal neural dynamics underlying a listener's ability to infer speaker confidence from vocal cues during speech processing. We recorded listeners' real-time brain responses while they evaluated statements wherein the speaker's tone of voice conveyed one of three levels of confidence (confident, close-to-confident, unconfident) or were spoken in a neutral manner. Neural responses time-locked to event onset show that the perceived level of speaker confidence could be differentiated at distinct time points during speech processing: unconfident expressions elicited a weaker P2 than all other expressions of confidence (or neutral-intending utterances), whereas close-to-confident expressions elicited a reduced negative response in the 330-500 msec and 550-740 msec time window. Neutral-intending expressions, which were also perceived as relatively confident, elicited a more delayed, larger sustained positivity than all other expressions in the 980-1270 msec window for this task. These findings provide the first piece of evidence of how quickly the brain responds to vocal cues signifying the extent of a speaker's confidence during online speech comprehension; first, a rough dissociation between unconfident and confident voices occurs as early as 200 msec after speech onset. At a later stage, further differentiation of the exact level of speaker confidence (i.e., close-to-confident, very confident) is evaluated via an inferential system to determine the speaker's meaning under current task settings. These findings extend three-stage models of how vocal emotion cues are processed in speech comprehension (e.g., Schirmer & Kotz, 2006) by revealing how a speaker's mental state (i.e., feeling of knowing) is simultaneously inferred from vocal expressions. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Designing, Modeling, Constructing, and Testing a Flat Panel Speaker and Sound Diffuser for a Simulator

    NASA Technical Reports Server (NTRS)

    Dillon, Christina

    2013-01-01

    The goal of this project was to design, model, build, and test a flat panel speaker and frame for a spherical dome structure being made into a simulator. The simulator will be a test bed for evaluating an immersive environment for human interfaces. This project focused on the loud speakers and a sound diffuser for the dome. The rest of the team worked on an Ambisonics 3D sound system, video projection system, and multi-direction treadmill to create the most realistic scene possible. The main programs utilized in this project, were Pro-E and COMSOL. Pro-E was used for creating detailed figures for the fabrication of a frame that held a flat panel loud speaker. The loud speaker was made from a thin sheet of Plexiglas and 4 acoustic exciters. COMSOL, a multiphysics finite analysis simulator, was used to model and evaluate all stages of the loud speaker, frame, and sound diffuser. Acoustical testing measurements were utilized to create polar plots from the working prototype which were then compared to the COMSOL simulations to select the optimal design for the dome. The final goal of the project was to install the flat panel loud speaker design in addition to a sound diffuser on to the wall of the dome. After running tests in COMSOL on various speaker configurations, including a warped Plexiglas version, the optimal speaker design included a flat piece of Plexiglas with a rounded frame to match the curvature of the dome. Eight of these loud speakers will be mounted into an inch and a half of high performance acoustic insulation, or Thinsulate, that will cover the inside of the dome. The following technical paper discusses these projects and explains the engineering processes used, knowledge gained, and the projected future goals of this project

  11. Perception of speaker size and sex of vowel sounds

    NASA Astrophysics Data System (ADS)

    Smith, David R. R.; Patterson, Roy D.

    2005-04-01

    Glottal-pulse rate (GPR) and vocal-tract length (VTL) are both related to speaker size and sex-however, it is unclear how they interact to determine our perception of speaker size and sex. Experiments were designed to measure the relative contribution of GPR and VTL to judgements of speaker size and sex. Vowels were scaled to represent people with different GPRs and VTLs, including many well beyond the normal population values. In a single interval, two response rating paradigm, listeners judged the size (using a 7-point scale) and sex/age of the speaker (man, woman, boy, or girl) of these scaled vowels. Results from the size-rating experiments show that VTL has a much greater influence upon judgements of speaker size than GPR. Results from the sex-categorization experiments show that judgements of speaker sex are influenced about equally by GPR and VTL for vowels with normal GPR and VTL values. For abnormal combinations of GPR and VTL, where low GPRs are combined with short VTLs, VTL has more influence than GPR in sex judgements. [Work supported by the UK MRC (G9901257) and the German Volkswagen Foundation (VWF 1/79 783).

  12. Voice Handicap Index in Persian Speakers with Various Severities of Hearing Loss.

    PubMed

    Aghadoost, Ozra; Moradi, Negin; Dabirmoghaddam, Payman; Aghadoost, Alireza; Naderifar, Ehsan; Dehbokri, Siavash Mohammadi

    2016-01-01

    The purpose of this study was to assess and compare the total score and subscale scores of the Voice Handicap Index (VHI) in speakers with and without hearing loss. A further aim was to determine if a correlation exists between severities of hearing loss with total scores and VHI subscale scores. In this cross-sectional, descriptive analytical study, 100 participants, divided in 2 groups of participants with and without hearing loss, were studied. Background information was gathered by interview, and VHI questionnaires were filled in by all participants. For all variables, including mean total score and VHI subscale scores, there was a considerable difference in speakers with and without hearing loss (p < 0.05). The correlation between severity of hearing loss with total score and VHI subscale scores was significant. Speakers with hearing loss were found to have higher mean VHI scores than speakers with normal hearing. This indicates a high voice handicap related to voice in speakers with hearing loss. In addition, increased severity of hearing loss leads to more severe voice handicap. This finding emphasizes the need for a multilateral assessment and treatment of voice disorders in speakers with hearing loss. © 2017 S. Karger AG, Basel.

  13. Understanding speaker attitudes from prosody by adults with Parkinson's disease.

    PubMed

    Monetta, Laura; Cheang, Henry S; Pell, Marc D

    2008-09-01

    The ability to interpret vocal (prosodic) cues during social interactions can be disrupted by Parkinson's disease, with notable effects on how emotions are understood from speech. This study investigated whether PD patients who have emotional prosody deficits exhibit further difficulties decoding the attitude of a speaker from prosody. Vocally inflected but semantically nonsensical 'pseudo-utterances' were presented to listener groups with and without PD in two separate rating tasks. Task I required participants to rate how confident a speaker sounded from their voice and Task 2 required listeners to rate how polite the speaker sounded for a comparable set of pseudo-utterances. The results showed that PD patients were significantly less able than HC participants to use prosodic cues to differentiate intended levels of speaker confidence in speech, although the patients could accurately detect the politelimpolite attitude of the speaker from prosody in most cases. Our data suggest that many PD patients fail to use vocal cues to effectively infer a speaker's emotions as well as certain attitudes in speech such as confidence, consistent with the idea that the basal ganglia play a role in the meaningful processing of prosodic sequences in spoken language (Pell & Leonard, 2003).

  14. Effect of tonal native language on voice fundamental frequency responses to pitch feedback perturbations during sustained vocalizations

    PubMed Central

    Liu, Hanjun; Wang, Emily Q.; Chen, Zhaocong; Liu, Peng; Larson, Charles R.; Huang, Dongfeng

    2010-01-01

    The purpose of this cross-language study was to examine whether the online control of voice fundamental frequency (F0) during vowel phonation is influenced by language experience. Native speakers of Cantonese and Mandarin, both tonal languages spoken in China, participated in the experiments. Subjects were asked to vocalize a vowel sound ∕u∕ at their comfortable habitual F0, during which their voice pitch was unexpectedly shifted (±50, ±100, ±200, or ±500 cents, 200 ms duration) and fed back instantaneously to them over headphones. The results showed that Cantonese speakers produced significantly smaller responses than Mandarin speakers when the stimulus magnitude varied from 200 to 500 cents. Further, response magnitudes decreased along with the increase in stimulus magnitude in Cantonese speakers, which was not observed in Mandarin speakers. These findings suggest that online control of voice F0 during vocalization is sensitive to language experience. Further, systematic modulations of vocal responses across stimulus magnitude were observed in Cantonese speakers but not in Mandarin speakers, which indicates that this highly automatic feedback mechanism is sensitive to the specific tonal system of each language. PMID:21218905

  15. Inside-in, alternative paradigms for sound spatialization

    NASA Astrophysics Data System (ADS)

    Bahn, Curtis; Moore, Stephan

    2003-04-01

    Arrays of widely spaced mono-directional loudspeakers (P.A.-style stereo configurations or ``outside-in'' surround-sound systems) have long provided the dominant paradigms for electronic sound diffusion. So prevalent are these models that alternatives have largely been ignored and electronic sound, regardless of musical aesthetic, has come to be inseparably associated with single-channel speakers, or headphones. We recognize the value of these familiar paradigms, but believe that electronic sound can and should have many alternative, idiosyncratic voices. Through the design and construction of unique sound diffusion structures, one can reinvent the nature of electronic sound; when allied with new sensor technologies, these structures offer alternative modes of interaction with techniques of sonic computation. This paper describes several recent applications of spherical speakers (multichannel, outward-radiating geodesic speaker arrays) and Sensor-Speaker-Arrays (SenSAs: combinations of various sensor devices with outward-radiating multi-channel speaker arrays). This presentation introduces the development of four generations of spherical speakers-over a hundred individual speakers of various configurations-and their use in many different musical situations including live performance, recording, and sound installation. We describe the design and construction of these systems, and, more generally, the new ``voices'' they give to electronic sound.

  16. The effect of voice quality and competing speakers in a passage comprehension task: performance in relation to cognitive functioning in children with normal hearing.

    PubMed

    von Lochow, Heike; Lyberg-Åhlander, Viveka; Sahlén, Birgitta; Kastberg, Tobias; Brännström, K Jonas

    2018-04-01

    This study explores the effect of voice quality and competing speaker/-s on children's performance in a passage comprehension task. Furthermore, it explores the interaction between passage comprehension and cognitive functioning. Forty-nine children (27 girls and 22 boys) with normal hearing (aged 7-12 years) participated. Passage comprehension was tested in six different listening conditions; a typical voice (non-dysphonic voice) in quiet, a typical voice with one competing speaker, a typical voice with four competing speakers, a dysphonic voice in quiet, a dysphonic voice with one competing speaker, and a dysphonic voice with four competing speakers. The children's working memory capacity and executive functioning were also assessed. The findings indicate no direct effect of voice quality on the children's performance, but a significant effect of background listening condition. Interaction effects were seen between voice quality, background listening condition, and executive functioning. The children's susceptibility to the effect of the dysphonic voice and the background listening conditions are related to the individual's executive functions. The findings have several implications for design of interventions in language learning environments such as classrooms.

  17. Four S's to Turn Your "Sex Talk" into a Super Program.

    ERIC Educational Resources Information Center

    Friedman, Jay

    1995-01-01

    Selection of campus speakers on sexuality is discussed, including assessment of speaker qualifications, the importance of teaching style and tone, choice of subject, program design for a meaningful event, and the sensitivity of both the speaker and the institution. (MSE)

  18. NREL: International Activities - Fourth Renewable Energy Industries Forum

    Science.gov Websites

    Speakers and Presentations International Activities Printable Version Fourth Renewable Energy Industries Forum Speakers and Presentations The Fourth Renewable Energy Industries Forum (REIF) speakers and practices, opportunities and challenges of utility and distributed projects, renewable energy integration

  19. The effects of enactment on communicative competence in aphasic casual conversation: a functional linguistic perspective.

    PubMed

    Groenewold, Rimke; Armstrong, Elizabeth

    2018-05-14

    Previous research has shown that speakers with aphasia rely on enactment more often than non-brain-damaged language users. Several studies have been conducted to explain this observed increase, demonstrating that spoken language containing enactment is easier to produce and is more engaging to the conversation partner. This paper describes the effects of the occurrence of enactment in casual conversation involving individuals with aphasia on its level of conversational assertiveness. To evaluate whether and to what extent the occurrence of enactment in speech of individuals with aphasia contributes to its conversational assertiveness. Conversations between a speaker with aphasia and his wife (drawn from AphasiaBank) were analysed in several steps. First, the transcripts were divided into moves, and all moves were coded according to the systemic functional linguistics (SFL) framework. Next, all moves were labelled in terms of their level of conversational assertiveness, as defined in the previous literature. Finally, all enactments were identified and their level of conversational assertiveness was compared with that of non-enactments. Throughout their conversations, the non-brain-damaged speaker was more assertive than the speaker with aphasia. However, the speaker with aphasia produced more enactments than the non-brain-damaged speaker. The moves of the speaker with aphasia containing enactment were more assertive than those without enactment. The use of enactment in the conversations under study positively affected the level of conversational assertiveness of the speaker with aphasia, a competence that is important for speakers with aphasia because it contributes to their floor time, chances to be heard seriously and degree of control over the conversation topic. © 2018 The Authors International Journal of Language & Communication Disorders published by John Wiley & Sons Ltd on behalf of Royal College of Speech and Language Therapists.

  20. Facial biases on vocal perception and memory.

    PubMed

    Boltz, Marilyn G

    2017-06-01

    Does a speaker's face influence the way their voice is heard and later remembered? This question was addressed through two experiments where in each, participants listened to middle-aged voices accompanied by faces that were either age-appropriate, younger or older than the voice or, as a control, no face at all. In Experiment 1, participants evaluated each voice on various acoustical dimensions and speaker characteristics. The results showed that facial displays influenced perception such that the same voice was heard differently depending on the age of the accompanying face. Experiment 2 further revealed that facial displays led to memory distortions that were age-congruent in nature. These findings illustrate that faces can activate certain social categories and preconceived stereotypes that then influence vocal and person perception in a corresponding fashion. Processes of face/voice integration are very similar to those of music/film, indicating that the two areas can mutually inform one another and perhaps, more generally, reflect a centralized mechanism of cross-sensory integration. Copyright © 2017 Elsevier B.V. All rights reserved.

  1. Speaker normalization and adaptation using second-order connectionist networks.

    PubMed

    Watrous, R L

    1993-01-01

    A method for speaker normalization and adaption using connectionist networks is developed. A speaker-specific linear transformation of observations of the speech signal is computed using second-order network units. Classification is accomplished by a multilayer feedforward network that operates on the normalized speech data. The network is adapted for a new talker by modifying the transformation parameters while leaving the classifier fixed. This is accomplished by backpropagating classification error through the classifier to the second-order transformation units. This method was evaluated for the classification of ten vowels for 76 speakers using the first two formant values of the Peterson-Barney data. The results suggest that rapid speaker adaptation resulting in high classification accuracy can be accomplished by this method.

  2. Simple pendulum for blind students

    NASA Astrophysics Data System (ADS)

    Goncalves, A. M. B.; Cena, C. R.; Alves, D. C. B.; Errobidart, N. C. G.; Jardim, M. I. A.; Queiros, W. P.

    2017-09-01

    Faced with the need to teach physics to the visually impaired, in this paper we propose a way to demonstrate the dependence of distance and time in a pendulum experiment to blind students. The periodic oscillation of the pendulum is translated, by an Arduino and an ultrasonic sensor, in a periodic variation of frequency in a speaker. The main advantage of this proposal is the possibility that a blind student understands the movement without necessity of touching it.

  3. The Storage and Composition of Inflected Forms in Adult-Learned Second Language: A Study of the Influence of Length of Residence, Age of Arrival, Sex, and Other Factors

    ERIC Educational Resources Information Center

    Babcock, Laura; Stowe, John C.; Maloof, Christopher J.; Brovetto, Claudia; Ullman, Michael T.

    2012-01-01

    It remains unclear whether adult-learned second language (L2) depends on similar or different neurocognitive mechanisms as those involved in first language (L1). We examined whether English past tense forms are computed similarly or differently by L1 and L2 English speakers, and what factors might affect this: regularity (regular vs. irregular…

  4. EFL Teachers' Responses to L2 Writing.

    ERIC Educational Resources Information Center

    Chang, Yuh-Fang

    This study investigated differences in the product and process of evaluating second language compositions by Taiwanese speakers of English. It examined whether such factors as language background (native English speaker versus native Chinese speaker), academic discipline, and educational background affected raters' scoring outcomes; whether rating…

  5. Russian Emotion Vocabulary in American Learners' Narratives

    ERIC Educational Resources Information Center

    Pavlenko, Aneta; Driagina, Viktoria

    2007-01-01

    This study compared the uses of emotion vocabulary in narratives elicited from monolingual speakers of Russian and English and advanced American learners of Russian. Monolingual speakers differed significantly in the distribution of emotion terms across morphosyntactic categories: English speakers favored an adjectival pattern of emotion…

  6. Motion cues that make an impression: Predicting perceived personality by minimal motion information.

    PubMed

    Koppensteiner, Markus

    2013-11-01

    The current study presents a methodology to analyze first impressions on the basis of minimal motion information. In order to test the applicability of the approach brief silent video clips of 40 speakers were presented to independent observers (i.e., did not know speakers) who rated them on measures of the Big Five personality traits. The body movements of the speakers were then captured by placing landmarks on the speakers' forehead, one shoulder and the hands. Analysis revealed that observers ascribe extraversion to variations in the speakers' overall activity, emotional stability to the movements' relative velocity, and variation in motion direction to openness. Although ratings of openness and conscientiousness were related to biographical data of the speakers (i.e., measures of career progress), measures of body motion failed to provide similar results. In conclusion, analysis of motion behavior might be done on the basis of a small set of landmarks that seem to capture important parts of relevant nonverbal information.

  7. Motion cues that make an impression☆

    PubMed Central

    Koppensteiner, Markus

    2013-01-01

    The current study presents a methodology to analyze first impressions on the basis of minimal motion information. In order to test the applicability of the approach brief silent video clips of 40 speakers were presented to independent observers (i.e., did not know speakers) who rated them on measures of the Big Five personality traits. The body movements of the speakers were then captured by placing landmarks on the speakers' forehead, one shoulder and the hands. Analysis revealed that observers ascribe extraversion to variations in the speakers' overall activity, emotional stability to the movements' relative velocity, and variation in motion direction to openness. Although ratings of openness and conscientiousness were related to biographical data of the speakers (i.e., measures of career progress), measures of body motion failed to provide similar results. In conclusion, analysis of motion behavior might be done on the basis of a small set of landmarks that seem to capture important parts of relevant nonverbal information. PMID:24223432

  8. Formant trajectory characteristics in speakers with dysarthria and homogeneous speech intelligibility scores: Further data

    NASA Astrophysics Data System (ADS)

    Kim, Yunjung; Weismer, Gary; Kent, Ray D.

    2005-09-01

    In previous work [J. Acoust. Soc. Am. 117, 2605 (2005)], we reported on formant trajectory characteristics of a relatively large number of speakers with dysarthria and near-normal speech intelligibility. The purpose of that analysis was to begin a documentation of the variability, within relatively homogeneous speech-severity groups, of acoustic measures commonly used to predict across-speaker variation in speech intelligibility. In that study we found that even with near-normal speech intelligibility (90%-100%), many speakers had reduced formant slopes for some words and distributional characteristics of acoustic measures that were different than values obtained from normal speakers. In the current report we extend those findings to a group of speakers with dysarthria with somewhat poorer speech intelligibility than the original group. Results are discussed in terms of the utility of certain acoustic measures as indices of speech intelligibility, and as explanatory data for theories of dysarthria. [Work supported by NIH Award R01 DC00319.

  9. Speaker Invariance for Phonetic Information: an fMRI Investigation

    PubMed Central

    Salvata, Caden; Blumstein, Sheila E.; Myers, Emily B.

    2012-01-01

    The current study explored how listeners map the variable acoustic input onto a common sound structure representation while being able to retain phonetic detail to distinguish among the identity of talkers. An adaptation paradigm was utilized to examine areas which showed an equal neural response (equal release from adaptation) to phonetic change when spoken by the same speaker and when spoken by two different speakers, and insensitivity (failure to show release from adaptation) when the same phonetic input was spoken by a different speaker. Neural areas which showed speaker invariance were located in the anterior portion of the middle superior temporal gyrus bilaterally. These findings provide support for the view that speaker normalization processes allow for the translation of a variable speech input to a common abstract sound structure. That this process appears to occur early in the processing stream, recruiting temporal structures, suggests that this mapping takes place prelexically, before sound structure input is mapped on to lexical representations. PMID:23264714

  10. Cortical encoding and neurophysiological tracking of intensity and pitch cues signaling English stress patterns in native and nonnative speakers.

    PubMed

    Chung, Wei-Lun; Bidelman, Gavin M

    2016-01-01

    We examined cross-language differences in neural encoding and tracking of intensity and pitch cues signaling English stress patterns. Auditory mismatch negativities (MMNs) were recorded in English and Mandarin listeners in response to contrastive English pseudowords whose primary stress occurred either on the first or second syllable (i.e., "nocTICity" vs. "NOCticity"). The contrastive syllable stress elicited two consecutive MMNs in both language groups, but English speakers demonstrated larger responses to stress patterns than Mandarin speakers. Correlations between the amplitude of ERPs and continuous changes in the running intensity and pitch of speech assessed how well each language group's brain activity tracked these salient acoustic features of lexical stress. We found that English speakers' neural responses tracked intensity changes in speech more closely than Mandarin speakers (higher brain-acoustic correlation). Findings demonstrate more robust and precise processing of English stress (intensity) patterns in early auditory cortical responses of native relative to nonnative speakers. Copyright © 2016 Elsevier Inc. All rights reserved.

  11. Long short-term memory for speaker generalization in supervised speech separation

    PubMed Central

    Chen, Jitong; Wang, DeLiang

    2017-01-01

    Speech separation can be formulated as learning to estimate a time-frequency mask from acoustic features extracted from noisy speech. For supervised speech separation, generalization to unseen noises and unseen speakers is a critical issue. Although deep neural networks (DNNs) have been successful in noise-independent speech separation, DNNs are limited in modeling a large number of speakers. To improve speaker generalization, a separation model based on long short-term memory (LSTM) is proposed, which naturally accounts for temporal dynamics of speech. Systematic evaluation shows that the proposed model substantially outperforms a DNN-based model on unseen speakers and unseen noises in terms of objective speech intelligibility. Analyzing LSTM internal representations reveals that LSTM captures long-term speech contexts. It is also found that the LSTM model is more advantageous for low-latency speech separation and it, without future frames, performs better than the DNN model with future frames. The proposed model represents an effective approach for speaker- and noise-independent speech separation. PMID:28679261

  12. Applying Rasch model analysis in the development of the cantonese tone identification test (CANTIT).

    PubMed

    Lee, Kathy Y S; Lam, Joffee H S; Chan, Kit T Y; van Hasselt, Charles Andrew; Tong, Michael C F

    2017-01-01

    Applying Rasch analysis to evaluate the internal structure of a lexical tone perception test known as the Cantonese Tone Identification Test (CANTIT). A 75-item pool (CANTIT-75) with pictures and sound tracks was developed. Respondents were required to make a four-alternative forced choice on each item. A short version of 30 items (CANTIT-30) was developed based on fit statistics, difficulty estimates, and content evaluation. Internal structure was evaluated by fit statistics and Rasch Factor Analysis (RFA). 200 children with normal hearing and 141 children with hearing impairment were recruited. For CANTIT-75, all infit and 97% of outfit values were < 2.0. RFA revealed 40.1% of total variance was explained by the Rasch measure. The first residual component explained 2.5% of total variance in an eigenvalue of 3.1. For CANTIT-30, all infit and outfit values were < 2.0. The Rasch measure explained 38.8% of total variance, the first residual component explained 3.9% of total variance in an eigenvalue of 1.9. The Rasch model provides excellent guidance for the development of short forms. Both CANTIT-75 and CANTIT-30 possess satisfactory internal structure as a construct validity evidence in measuring the lexical tone identification ability of the Cantonese speakers.

  13. Does language shape thought? Mandarin and English speakers' conceptions of time.

    PubMed

    Boroditsky, L

    2001-08-01

    Does the language you speak affect how you think about the world? This question is taken up in three experiments. English and Mandarin talk about time differently--English predominantly talks about time as if it were horizontal, while Mandarin also commonly describes time as vertical. This difference between the two languages is reflected in the way their speakers think about time. In one study, Mandarin speakers tended to think about time vertically even when they were thinking for English (Mandarin speakers were faster to confirm that March comes earlier than April if they had just seen a vertical array of objects than if they had just seen a horizontal array, and the reverse was true for English speakers). Another study showed that the extent to which Mandarin-English bilinguals think about time vertically is related to how old they were when they first began to learn English. In another experiment native English speakers were taught to talk about time using vertical spatial terms in a way similar to Mandarin. On a subsequent test, this group of English speakers showed the same bias to think about time vertically as was observed with Mandarin speakers. It is concluded that (1) language is a powerful tool in shaping thought about abstract domains and (2) one's native language plays an important role in shaping habitual thought (e.g., how one tends to think about time) but does not entirely determine one's thinking in the strong Whorfian sense. Copyright 2001 Academic Press.

  14. An oscillator model of the timing of turn-taking.

    PubMed

    Wilson, Margaret; Wilson, Thomas P

    2005-12-01

    When humans talk without conventionalized arrangements, they engage in conversation--that is, a continuous and largely nonsimultaneous exchange in which speakers take turns. Turn-taking is ubiquitous in conversation and is the normal case against which alternatives, such as interruptions, are treated as violations that warrant repair. Furthermore, turn-taking involves highly coordinated timing, including a cyclic rise and fall in the probability of initiating speech during brief silences, and involves the notable rarity, especially in two-party conversations, of two speakers' breaking a silence at once. These phenomena, reported by conversation analysts, have been neglected by cognitive psychologists, and to date there has been no adequate cognitive explanation. Here, we propose that, during conversation, endogenous oscillators in the brains of the speaker and the listeners become mutually entrained, on the basis of the speaker's rate of syllable production. This entrained cyclic pattern governs the potential for initiating speech at any given instant for the speaker and also for the listeners (as potential next speakers). Furthermore, the readiness functions of the listeners are counterphased with that of the speaker, minimizing the likelihood of simultaneous starts by a listener and the previous speaker. This mutual entrainment continues for a brief period when the speech stream ceases, accounting for the cyclic property of silences. This model not only captures the timing phenomena observed inthe literature on conversation analysis, but also converges with findings from the literatures on phoneme timing, syllable organization, and interpersonal coordination.

  15. Mothers' attitudes toward adolescent confidential services: development and validation of scales for use in English- and Spanish-speaking populations.

    PubMed

    Tebb, Kathleen P; Pollack, Lance M; Millstein, Shana; Otero-Sabogal, Regina; Wibbelsman, Charles J

    2014-09-01

    To explore parental beliefs and attitudes about confidential services for their teenagers; and to develop an instrument to assess these beliefs and attitudes that could be used among English and Spanish speakers. The long-term goal is to use this research to better understand and evaluate interventions to improve parental knowledge and attitudes toward their adolescent's access and utilization of comprehensive confidential health services. The instrument was developed using an extensive literature review and theoretical framework followed by qualitative data from focus groups and in-depth interviews. It was then pilot tested with a random sample of English- and Spanish-speaking parents and further revised. The final instrument was administered to a random sample of 1,000 mothers. The psychometric properties of the instrument were assessed for Spanish and English speakers. The instrument consisted of 12 scales. Most Cronbach alphas were >.70 for Spanish and English speakers. Fewer items for Spanish speakers "loaded" for the Responsibility and Communication scales. Parental Control of Health Information failed for Spanish speakers. The Parental Attitudes of Adolescent Confidential Health Services Questionnaire (PAACS-Q) contains 12 scales and is a valid and reliable instrument to assess parental knowledge and attitudes toward confidential health services for adolescents among English speakers and all but one scale was applicable for Spanish speakers. More research is needed to understand key constructs with Spanish speakers. Copyright © 2014 Society for Adolescent Health and Medicine. Published by Elsevier Inc. All rights reserved.

  16. What makes a voice masculine: physiological and acoustical correlates of women's ratings of men's vocal masculinity.

    PubMed

    Cartei, Valentina; Bond, Rod; Reby, David

    2014-09-01

    Men's voices contain acoustic cues to body size and hormonal status, which have been found to affect women's ratings of speaker size, masculinity and attractiveness. However, the extent to which these voice parameters mediate the relationship between speakers' fitness-related features and listener's judgments of their masculinity has not yet been investigated. We audio-recorded 37 adult heterosexual males performing a range of speech tasks and asked 20 adult heterosexual female listeners to rate speakers' masculinity on the basis of their voices only. We then used a two-level (speaker within listener) path analysis to examine the relationships between the physiological (testosterone, height), acoustic (fundamental frequency or F0, and resonances or ΔF) and perceptual dimensions (listeners' ratings) of speakers' masculinity. Overall, results revealed that male speakers who were taller and had higher salivary testosterone levels also had lower F0 and ΔF, and were in turn rated as more masculine. The relationship between testosterone and perceived masculinity was essentially mediated by F0, while that of height and perceived masculinity was partially mediated by both F0 and ΔF. These observations confirm that women listeners attend to sexually dimorphic voice cues to assess the masculinity of unseen male speakers. In turn, variation in these voice features correlate with speakers' variation in stature and hormonal status, highlighting the interdependence of these physiological, acoustic and perceptual dimensions. Copyright © 2014. Published by Elsevier Inc.

  17. The artful dodger: answering the wrong question the right way.

    PubMed

    Rogers, Todd; Norton, Michael I

    2011-06-01

    What happens when speakers try to "dodge" a question they would rather not answer by answering a different question? In 4 studies, we show that listeners can fail to detect dodges when speakers answer similar-but objectively incorrect-questions (the "artful dodge"), a detection failure that goes hand-in-hand with a failure to rate dodgers more negatively. We propose that dodges go undetected because listeners' attention is not usually directed toward a goal of dodge detection (i.e., Is this person answering the question?) but rather toward a goal of social evaluation (i.e., Do I like this person?). Listeners were not blind to all dodge attempts, however. Dodge detection increased when listeners' attention was diverted from social goals toward determining the relevance of the speaker's answers (Study 1), when speakers answered a question egregiously dissimilar to the one asked (Study 2), and when listeners' attention was directed to the question asked by keeping it visible during speakers' answers (Study 4). We also examined the interpersonal consequences of dodge attempts: When listeners were guided to detect dodges, they rated speakers more negatively (Study 2), and listeners rated speakers who answered a similar question in a fluent manner more positively than speakers who answered the actual question but disfluently (Study 3). These results add to the literatures on both Gricean conversational norms and goal-directed attention. We discuss the practical implications of our findings in the contexts of interpersonal communication and public debates.

  18. Content-specific coordination of listeners' to speakers' EEG during communication

    PubMed Central

    Kuhlen, Anna K.; Allefeld, Carsten; Haynes, John-Dylan

    2012-01-01

    Cognitive neuroscience has recently begun to extend its focus from the isolated individual mind to two or more individuals coordinating with each other. In this study we uncover a coordination of neural activity between the ongoing electroencephalogram (EEG) of two people—a person speaking and a person listening. The EEG of one set of twelve participants (“speakers”) was recorded while they were narrating short stories. The EEG of another set of twelve participants (“listeners”) was recorded while watching audiovisual recordings of these stories. Specifically, listeners watched the superimposed videos of two speakers simultaneously and were instructed to attend either to one or the other speaker. This allowed us to isolate neural coordination due to processing the communicated content from the effects of sensory input. We find several neural signatures of communication: First, the EEG is more similar among listeners attending to the same speaker than among listeners attending to different speakers, indicating that listeners' EEG reflects content-specific information. Secondly, listeners' EEG activity correlates with the attended speakers' EEG, peaking at a time delay of about 12.5 s. This correlation takes place not only between homologous, but also between non-homologous brain areas in speakers and listeners. A semantic analysis of the stories suggests that listeners coordinate with speakers at the level of complex semantic representations, so-called “situation models”. With this study we link a coordination of neural activity between individuals directly to verbally communicated information. PMID:23060770

  19. Yes, You Can? A Speaker’s Potency to Act upon His Words Orchestrates Early Neural Responses to Message-Level Meaning

    PubMed Central

    Bornkessel-Schlesewsky, Ina; Krauspenhaar, Sylvia; Schlesewsky, Matthias

    2013-01-01

    Evidence is accruing that, in comprehending language, the human brain rapidly integrates a wealth of information sources–including the reader or hearer’s knowledge about the world and even his/her current mood. However, little is known to date about how language processing in the brain is affected by the hearer’s knowledge about the speaker. Here, we investigated the impact of social attributions to the speaker by measuring event-related brain potentials while participants watched videos of three speakers uttering true or false statements pertaining to politics or general knowledge: a top political decision maker (the German Federal Minister of Finance at the time of the experiment), a well-known media personality and an unidentifiable control speaker. False versus true statements engendered an N400 - late positivity response, with the N400 (150–450 ms) constituting the earliest observable response to message-level meaning. Crucially, however, the N400 was modulated by the combination of speaker and message: for false versus true political statements, an N400 effect was only observable for the politician, but not for either of the other two speakers; for false versus true general knowledge statements, an N400 was engendered by all three speakers. We interpret this result as demonstrating that the neurophysiological response to message-level meaning is immediately influenced by the social status of the speaker and whether he/she has the power to bring about the state of affairs described. PMID:23894425

  20. It doesn't matter what you say: FMRI correlates of voice learning and recognition independent of speech content.

    PubMed

    Zäske, Romi; Awwad Shiekh Hasan, Bashar; Belin, Pascal

    2017-09-01

    Listeners can recognize newly learned voices from previously unheard utterances, suggesting the acquisition of high-level speech-invariant voice representations during learning. Using functional magnetic resonance imaging (fMRI) we investigated the anatomical basis underlying the acquisition of voice representations for unfamiliar speakers independent of speech, and their subsequent recognition among novel voices. Specifically, listeners studied voices of unfamiliar speakers uttering short sentences and subsequently classified studied and novel voices as "old" or "new" in a recognition test. To investigate "pure" voice learning, i.e., independent of sentence meaning, we presented German sentence stimuli to non-German speaking listeners. To disentangle stimulus-invariant and stimulus-dependent learning, during the test phase we contrasted a "same sentence" condition in which listeners heard speakers repeating the sentences from the preceding study phase, with a "different sentence" condition. Voice recognition performance was above chance in both conditions although, as expected, performance was higher for same than for different sentences. During study phases activity in the left inferior frontal gyrus (IFG) was related to subsequent voice recognition performance and same versus different sentence condition, suggesting an involvement of the left IFG in the interactive processing of speaker and speech information during learning. Importantly, at test reduced activation for voices correctly classified as "old" compared to "new" emerged in a network of brain areas including temporal voice areas (TVAs) of the right posterior superior temporal gyrus (pSTG), as well as the right inferior/middle frontal gyrus (IFG/MFG), the right medial frontal gyrus, and the left caudate. This effect of voice novelty did not interact with sentence condition, suggesting a role of temporal voice-selective areas and extra-temporal areas in the explicit recognition of learned voice identity, independent of speech content. Copyright © 2017 Elsevier Ltd. All rights reserved.

Top